File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/80/c80-1074_abstr.xml

Size: 18,757 bytes

Last Modified: 2025-10-06 13:45:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C80-1074">
  <Title>DECOMPOSITION OF JAPANESE SENTENCES INTO NORMAL FORMS BASED ON HUMAN LINGUISTIC PROCESS</Title>
  <Section position="1" start_page="0" end_page="49" type="abstr">
    <SectionTitle>
DECOMPOSITION OF JAPANESE SENTENCES INTO NORMAL FORMS
BASED ON HUMAN LINGUISTIC PROCESS
</SectionTitle>
    <Paragraph position="0"> A diversity and a flexibility of language expression forms are awkward problems for the machine processing of language, such as translation, indexing and question-answering. This paper presents a method of decomposing Japanese sentences appearing in the Patent Documents on &amp;quot;Pulse network&amp;quot;, into normal forms. First, the linguistic information is analysed and classified based on the human linguistic process.</Paragraph>
    <Paragraph position="1"> Then, predicate functions, phrase functions and operators are introduced as the normal forms.</Paragraph>
    <Paragraph position="2"> Finally, the decomposing procedure and some experimental results are shown.</Paragraph>
    <Paragraph position="3"> Introduction One of the most remarkable features of natural language is a diversity and a flexibility of its expression form. Especially, Japanese appears to have a peculiar syntactic structure because it is an agglutinative language. This is an awkward problem for the machine processing of language, such as translation, subject indexing and question-answering. An approach to dealing with this problem is to transform the sentences into some normal forms if any. Proposals for such normalization have been made for some time, but there have been few attempts.l, 2 The normal form needs to have every information which is contained in original sentences. Let us now consider what information the sentences contain. In human linguistic process, the objects to be exprssed are provided first, then the cognitive structure corresponding to them is formed, and lastly the language expression based on the cognitive structure is produced. In other words, the immediate basis of language expression is considered to be human cognitive structure. Therefore, the arrangement of words in sentences represents not only the relation among objects in the external world, but also the cognitions and the relations among them, which are relatively independent of the present objects.</Paragraph>
    <Paragraph position="4"> This paper presents a method of decomposing Japanese sentences into normal forms based on such human linguistic process. First of all, the linguistic information necessary for decomposing process is analysed and classified from the above mentioned point of view. Then, predicate functions, phrase functions and operators are introduced as the normal forms.</Paragraph>
    <Paragraph position="5"> Two kinds of function describe the syntactic structure of the sentences and phrases. The operator describes the relationship among functions. Finally, the decomposing procedure and some experimental results are shown. Sample sentences are selected from the claim points of the Japanese Patent Documents on &amp;quot;pulse network&amp;quot;. Analysis of linguistic information In this section, we analyse and classify the linguistic information necessary for decomposing Japanese sentences into their normal forms.</Paragraph>
    <Paragraph position="6"> Classification of words From the standpoint of the linguistic process, that is, objects, cognitions and expressions, all words are divided into objective expressions W 1 and subjective expressions W 2. W 1 is the set of expressions which reflect external objects, namely, conceptual expressions. On the other hand, W 2 is the set of cognitive expressions without conceptual process, and immediately represents the affection, judgement, desire, will and so on. The detail of the classification of words is summarized in Table i. We give supplementary explanations about Table i.</Paragraph>
    <Paragraph position="7"> Adjective ~ is the words which are called stem of adjectival verb in the traditional Japanese grammar. For inflectional words such as AAn, Vn, TB n and JJn, we specify n as i, 2, 3, 4, and 5(6) according to inflectional forms, that is, negative, declinable word modifying, final, noun modifying, and conditional(imperative) form respectively.</Paragraph>
    <Paragraph position="8"> Analysis of cognitive structure In order to describe the content of words and the relation among words, we introduce the descriptive scheme M which consists of such five descriptors as follows;</Paragraph>
    <Paragraph position="10"> (relation)}. 0 is the cognitive unit formed by separating and abstracting the external objects ideally, and is classified into three large categories, namely, substances, attributes and relations. The symbol ~i specifies the variety and the abstracting level of each unit.</Paragraph>
    <Paragraph position="11"> Thus, 0 is regarded as the classification of concepts in the objective world (e.g., pulse network).</Paragraph>
    <Paragraph position="12"> (2) ~ = {oi, o2, 03}. E describes the relationship between objects from the various view points, o I is the relationship between substance and attribute, o 2 is the relationship between substance and relation, and o 3 is the  492various connection of the same kind of objects.  (3) U represents the active cognitions which are relatively independent of concepts.</Paragraph>
    <Paragraph position="13"> (4) ~ specifies the cognitive behaviors how the speaker cognize the objects.</Paragraph>
    <Paragraph position="14"> (5) A = {if(tense), 12(anaphora)}. A represents  the relation between a speaker and objects. A part of O, Z, U and ~ is tabulated in Table 2-5 respectively.</Paragraph>
    <Paragraph position="15"> Definition of predicate function In this section and following two sections, we define the normal forms of Japanese sentences~ Generally, a sentence expresses the property of an object, or the relationship among objects. The component which indicates such property or relationship, is the predicate of a sentence. So we introduce the function, the constants of which are the predicate and the case postpositions, and the variables of which are noun phrases just in front of case postpositions. This function is called predicate function and is expressed by XlalX2a2...Xiai...XnanP where Xi, a i and P indicate the noun phrase, the case postposition and the predicate respectively.</Paragraph>
    <Paragraph position="17"/>
    <Paragraph position="19"/>
    <Paragraph position="21"> However, a predicate P has a variety of expression form in Japanese. For example, a verb is frequently connected with some auxiliary verbs(e.g., NAl(negative), TA(past)) or verbal suffixes(e.g., RARERU(passive), SASERU(causative)). Therefore, we decompose the predicate P into objective expression Po and subjective expression Ps.</Paragraph>
    <Paragraph position="22"> Then, we define the basic predicate function as the function which donsists of the following four kinds Of predicate PoPs .</Paragraph>
    <Paragraph position="23">  (i) Po(Final form of verb) Ps(Zero element of speaker's judgement), (2) Po(Final form of adjective I) Ps(Zero element of speaker's judgement), (3) Po(Adjective ~) Ps(Judgement expression &amp;quot;DA(be)&amp;quot;), (4) Po(Noun) Ps(Judgement expression &amp;quot;DA(be) &amp;quot;).</Paragraph>
    <Paragraph position="24">  The application of operators presented in next section, inflects the form of Po or Ps. Other predicate functions are defined by the application of operators to basic predicate functions. Thus, the predicate functions are classified as follows. f Constant function Predicate (ideomatic expression) function Basic predicate function Derivative function The predicate generally represents some attribute concept. Unlike substances an attribute does not occur alone. It arises accompanying substances. When we cognize an attribute as the concept, there exist some substances which accompany this attribute. The variables corresponding to these substances are called obligatory variables of the predicate, and the case postpositions, obligatory ones aei. On the other hand, one substance usually accompanies various kinds of attribute, and is related to other substances as a mediation of this attribute. In the predicate function, the variables corresponding to such attributes and substances are called facultative variables, and the case postpositions, facultative ones aoi. The variables of a predicate function have some domains of their own, that is to say, substitutable word classes. So we specify the domain of variables in terms of the descriptor O. Also, the relationship between the predicate and each variable is given by the descriptor E.</Paragraph>
    <Paragraph position="25"> These are summarized in Table 6.</Paragraph>
    <Paragraph position="26"> Definition of operators The operator produces a new function from one or two functions. They are classified into six groups, that is, modal(Fl) , nominalization (FII), embedding(fill), connecting(FIV), elliptical(F V) and anaphoric operator(Fvl). Modal operator The modal operators consist of the objective expressions Fil(e.g. , abstract verb, verbal suffix, a part of prefix) and the subjective expressions Fi2(e.g. , auxiliary verb, adverbial postposition). FII applies to Po of the predi- null cate, and varies the mode of the attribute which is expressed by the function. On the other hand, FI2 applies to Ps, and varies the mode of the judgement. An example of FII and FI2 are shown in Table 7-8 respectively. null Nominalization operator The nominalization operators apply to one predicate function and nominalize it in the following way.</Paragraph>
    <Paragraph position="27">  (i) f~l : Cognizing one of the objects expressed by the predicate function, as the substance with attribute.</Paragraph>
    <Paragraph position="28"> (DIODE) GA (HOODEN ZIKAN) WO HAYAMERU.</Paragraph>
    <Paragraph position="29"> (A diode advances the time of discharge.) + (HOODEN ZIKAN) WO HAYAMERU DIODE (A diode which advances the time of discharge.) (2) ill2 : Recognizing the concrete event expressed by the predicate function, as substance ideally.</Paragraph>
    <Paragraph position="30"> (HAKEI) GA NAMARU (The wave form is blunted.) / (HAKEI) GA NAMARU KOTO (or NO) (that the wave form is blunted.) (3) fII3 : Transforming the predicate function into clauses which express the time, reason, state, effect and so on.</Paragraph>
    <Paragraph position="32"> The clause or noun phrase which is produced by the application of the nominalization operator, is substituted in the variable of other predicate function by embedding operator fIII.</Paragraph>
    <Paragraph position="33"> Connecting operator A connecting operator joins one predicate function to another coordinately or subordinately. Generally, it corresponds to conjunctions and conjunctive postpositions. Some operators are related to modal operators, attribute adverbs, or variety of predicate. It is classified into following six groups.</Paragraph>
    <Paragraph position="34">  the power source through the resistor.) Generally, more than one connecting operator is applied in the actual sentences. So we define the universal connecting formula as follows. Let fII and fIII be the nominalization and the embedding operator respectively. An arbitrary predicate function A i is expressed by A i = Ail*fivl*Ai2*flVl*...</Paragraph>
    <Paragraph position="35"> *fIVl*Aik*fIVl* .*fIViAi m where Ai k is  (i) Su, (~) \[Ai*flVd*Aj\] (d = 2,3,4,5,6).</Paragraph>
    <Paragraph position="36">  Su is the basic predicate function, or the derivative function which is produced by the application of more than one modal operator, and is called unit predicate function. Moreover, the embedding operator is sometimes applied to Su in the following way.</Paragraph>
    <Paragraph position="37"> Su(flll-A~, A~,...., A~,..., A~) where A~ = fiiAi .</Paragraph>
    <Paragraph position="38"> Other operators When one predicate function is produced by the application of the connecting operator to two functions, the elliptical operator omits the one of the same expression forms in the two functions and anaphoric operator replaces the one of the same expression forms with the pronoun. Definition of phrase function We introduce the phrase function in order to describe the structure of noun phrases or compound words. However, it is not easy to define the phrase function based on the word class, unlike the predicate function. So we classify the phrases according to their content, and define the phrase function based on this classification. An example of phrase function is listed in Table 9.</Paragraph>
    <Paragraph position="39"> G 1 is the phrase connected in terms of such relational concepts as position(rl) , reference (r2) , and part(rs). G 2 is the phrase formed by cognitive behaviors(Y), such as enumeration(@10, @II), cognition of one object from the various view point(@9) , concrete and abstract cognition of one object(~7), and so on. G 3 is the phrase constructed in terms of the relationship(o I) between substance and attribute, and the various  --496connection(o 3) of the same kind of objects. G 4 is other phrases.</Paragraph>
    <Section position="1" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
Decomposition process
</SectionTitle>
      <Paragraph position="0"> The new derivative functions can be produced by the application of the various operators to the basic predicate functions. This means that the sentences with complex syntactic structure correspond to one predicate function. Therefore, the normalization of sentences is the decomposition of the predicate function corresponding to these sentences, into a set of basic predicate functions, phrase functions and operators. In this section, we describe the decomposing procedure 4.</Paragraph>
    </Section>
    <Section position="2" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
Machine dictionary
</SectionTitle>
      <Paragraph position="0"> A machine dictionary consists of three elementary dictionaries, that is, word dictionary(WD), predicate function dictionary(PFD) and related concept dictionary(RCD). WD is utilized to acquire the basic linguistic information of each words in input sentences. PFD is given to the candidate word for predicate, such as verb, adjective, and so on, and is used to extract the predicate function from sentences and phrases. RCD is stored with the relation between concepts, and is used for not only the decision of embedded phrase but also the analysis of phrases. Table i0 shows an example of each dictionary.</Paragraph>
      <Paragraph position="1"> Procedural description General flow of decomposition process.</Paragraph>
      <Paragraph position="2"> The general procedural flow and the data flow of decomposition process are shown in Fig.l and Fig.2 respectively. Input Japanese sentences spelled in Roman letters are segmented word by word with spaces.</Paragraph>
      <Paragraph position="3"> Each word is matched with entry words of WD. The word Iist(WLIST) is constructed based on the information from WD. The candidate for predicate (e.g., verb, adjective) is found by searching WLIST from the head of the list. Then, the modal operator (Fill, FII 2 and FI21) , embedding operator fill and connecting operator FIV are extracted by investigating the variety and the inflectional form of the predicate or the words which follow the predicate. The extracting method of these operators is shown in Fig.3. The extracted information is stored in FLIST 1 and CLIST. The variables of the predicate function are extracted by reference to PFD. At the same time, the modal operators FI2 3 and FI2 4 are extracted, if any. If the obligatory variable of the function is omitted, the word whose concept is coincident with the domain of the variable, is found from the extracted word string in WLIST. This is regarded as the application of the elliptical operator.</Paragraph>
      <Paragraph position="4"> When the embedding operator applies to the predicate, the variety of the nominalization operator and the embedded phrase are</Paragraph>
      <Paragraph position="6"> 0(obligatory variable), l(facultative variable), 2(special variable due to f~ll and f~12), 3(special variable due to f~ 3 ) (c) Related concept dictionary (RCD) NO, Number Variety Direction Level* Related concept</Paragraph>
      <Paragraph position="8"> ** The code is stored in actual dictionary.</Paragraph>
      <Paragraph position="9">  --497-decided. The extracted information is stored FLIST I, and the word strings of the variables are stored in VLIST. These word strings are decomposed into basic predicate functions, nominalization operators and phrase functions, and then stored in FLIST 2 and GLIST. The above procedure are repeated for other predicate candidates. Finally, the connecting formula which indicates the relation among predicate functions are formed by reference to CLIST.</Paragraph>
      <Paragraph position="10"> Processing of phrases. At first, the procedure finds the candidate for predicate, such as dynamic attribute noun, declinable word modifying form of common verb, prefix (e.g., &amp;quot;KOO(high)&amp;quot;, &amp;quot;TEl(low)&amp;quot;, &amp;quot;DAl(large)&amp;quot;, ,etc.) and adjective II, from the word strings stored in VLIST. If the candidate is found, the basic predicate function, nominalization operator and embedded word are extracted. If not, the phrase function are extracted. They are classified into three types according to decision method.</Paragraph>
      <Paragraph position="11"> \[Type I\] Phrase functions extracted by the features of their constant. The example are gl01, g201, g301, and so on, in Table 9.</Paragraph>
      <Paragraph position="12"> Their constants, such as &amp;quot;RYeS(both)&amp;quot;, &amp;quot;KAN (between)&amp;quot;, &amp;quot;TAHOe(another)&amp;quot;, &amp;quot;DAI&amp;quot;, &amp;quot;KS&amp;quot;, etc., are given the priority based on the strength of the connectability to variable, and are stored in constant list. The phrase function of this type is extracted according to priority.</Paragraph>
      <Paragraph position="13"> \[Type II\] Phrase functions extracted by using RCD. The examples are g105, g308, and so on.</Paragraph>
      <Paragraph position="14"> \[Type III\] Phrase functions extracted by using the variety or level of word concept. For example, g20'3 is extracted by investigating whether the upper concepts of both words agree with each other or not, and g204 is done by investigating whether the concept of second word  j E ....... ~e oodal ~od ....... ing ! * operator applied to the predicate J of the predicate function I Extract the variables J Is the embedding operator applied te the predicate</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML