XML Viewer - c80-1003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/c80-1003_metho.xml
Size: 20,488 bytes
Last Modified: 2025-10-06 14:11:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="C80-1003">
  <Title>A SYNTAX PARSER BASED ON THE CASE DEPENDENCY GRAMMAR AND ITS EFFICIENCY</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A SYNTAX PARSER BASED ON THE CASE DEPENDENCY
GRAMMAR AND ITS EFFICIENCY
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
S UMMARY
</SectionTitle>
    <Paragraph position="0"> Augumented transition network grammars (ATNGs) or augumented context-free grammars are generally used in natural language processing systems.</Paragraph>
    <Paragraph position="1"> The advantages of ATNGs may be summarized as i) efficiency of representation, 2) perspicuity, 3) generative power, and the disadvantage of ATNGs is that it is difficult to get an efficient parsing algorithm becuase of the flexibility of their complicated additional functions.</Paragraph>
    <Paragraph position="2"> In this paper, the syntax of Japanese sentences , based on case dependency relations are stated first, and then we give an bottom-up and breadth-first parsing algoritbxnwhich parses input sentence using time O(n 3) and memory space O(n2), where n is the length of input sentence. Moreover, it is shown that this parser requires time O(n2), whenever each B-phrase in input sentence is unambiguous in its grammatical structure. Therefore, the efficiency of this parser is nearly equal to the Earley's parser which is the most efficient parsing method for general context-free grammars.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. FUNDAMENTALS OF JAPANESE SENTENCE
</SectionTitle>
    <Paragraph position="0"> The Japanese sentence is ordinarily written in kana (phonetic) letters and kanji (ideographic) characters without leaving a space between words. From the viewpoint of machine processing, however, it is necessary to express clearly the units composing the sentence in such a way as to leave a space between every word as in English. We have no standard way of spacing the units though the need for this has been demanded for a long time.</Paragraph>
    <Paragraph position="1"> We give some examples in Figure i.</Paragraph>
    <Paragraph position="2"> The first sentence in the figure is of ordinary written form.</Paragraph>
    <Paragraph position="3"> The second indicates a way of spacing (i.e. putting a space between every word).</Paragraph>
    <Paragraph position="4"> The third indicates another way of spacing (i.e. putting a space between every B-phrase).</Paragraph>
    <Paragraph position="5"> Nowadays, many other spacing methods have been tried in several institutes in Japan.</Paragraph>
    <Paragraph position="6"> In this paper, input sentences are given in colloquial style in which a spacing symbol is placed between two successive B-phrases.</Paragraph>
    <Paragraph position="7"> In Japanese sentences, BUNSETSUs(Bphrase) are the minimal morphological units of case dependency, and the syntax of Japanese sentences consists of (i) the syntax of B-phrase as a string of words, and (2) the syntax of a sentence as a string of B-phrases.</Paragraph>
    <Paragraph position="8"> A B-phrase usually pronounced without pausing consists of two parts --main part \[or equally an independent part in the conventional school grammatical term\] and an annex part which is post positioned. We denote the connection of two parts in a B-phrase by a dot if necessary. A main part, which is a conceptual word \[or equally an independent word\] (e.g. noun, verb, adjective or adverb) provides mainly the information of the concept. On the other hand, an annex part, a possibly null string of suffix words (e.g. auxiliary verbs or particles) provides the information concerning the kakariuke relation and/or the supplementary information (e.g. the speaker's attitude towards the contents of the sentence, tense, etc.) A word w has it's spelling W, part of speech H and inflexion K. We call (W,H,K) the word structure of w.</Paragraph>
    <Paragraph position="9"> Suppose that a string b of length n be a B-phrase. Then, there exist an independent word w 0 and suffix words</Paragraph>
    <Paragraph position="11"> where (Wi,Hi,Ki) is the word structure of w i (0~i~PS), Cont(Hk,Kk,Hk+1) means a word whose part of speech and inflexion are Hk, K k respectively can be followed by a word whose part of speech is Hk+lin  -15--B-phrases and Termi(HQ,Kz) means a word whose part of speech ~nd inflexion are HPS, KZ respectively can be a right-most subword of B-phrases.</Paragraph>
    <Paragraph position="12"> (i), (2) are called the rules of B-phrase structure, and</Paragraph>
    <Paragraph position="14"> is called B-phrase structure of b. If (3) satisfies the condition (i), w0wlw * ..w Z is called to be a left partial 2  B-phrase.</Paragraph>
    <Paragraph position="15"> The kakariuke relation is the dependency relation between two B-phrases in a sentence. A B-phrase has the syntactic functions of governor and dependent. The function of governor is mainly represented by the independent word of B-phrase. The function of dependent is mainly represented by the string of particles which is the right-most substring of B-phrase and by the word in front of it (right-most non-particle word).</Paragraph>
    <Paragraph position="16"> Every particle has the syntactic and partially semantic dependent function with its own degree of power. The particle whose power of dependent function is strongest of all particles appearing in the string of particles is called the representative particle.</Paragraph>
    <Paragraph position="17"> Therefore, the syntactic function of dependent of a B-phrase is mainly represented by the representative particle and by the right-most non-particle word.</Paragraph>
    <Paragraph position="18"> Let (W0,H0,K0) , (Wi,Hi,Ki) , (W~,H~,K~) be the word structures of independent J word, right-most non-particle word and representative particle of a B-phrase, respectively. Then, &lt;W^,H^&gt;_, &lt;W..,Hi, u u ~ &amp; Hj&gt; d are called the inrormatlon or governor and the information of dependent of the B-phrase respectively, and the pair (&lt;W0,H0&gt;~,&lt;Wi,Hi,Hj&gt;d) is called dependency~informati6n of the B-phrase.</Paragraph>
    <Paragraph position="19"> There are many types of dependency relation such as agent, patient, instrument, location, time, etc. Let C be the set of all types of dependency relation. The set of all possible dependency relations from a B-phrase b l to a B-phrase b 2 is founded on the information of dependent of b I and the information of governor of b 2. Therefore, there is a function 6 which computes the set of all possible dependency relations ~(a,8) between a B-phrase of dependency information ~ and another B-phrase of dependency information 8.</Paragraph>
    <Paragraph position="20"> The function ~ is realized by the dependency dictionary retrieved with the key of two dependency informations. The order of B-phrase is relatively free in a simple sentence, except for one constraint that the predicative B-phrase governing the whole sentence must be in the sentence's final position. Japanese is a post positional in this sense.</Paragraph>
    <Paragraph position="21"> The pattern of the dependency relations in a sentence has some structural property which is called the rules of dependency structure, and the dependency relations in a sentence are called the dependency structure of a sentence. The dependency structure of a sentence is shown in figure 2, where arrows indicate dependency relations of various types. The rules of dependency structure consist of following three conditions.</Paragraph>
    <Paragraph position="22"> i Each B-phrase except one at the sentence final is a dependent of exactly one B-phrase appearing after it.</Paragraph>
    <Paragraph position="23"> ii A dependency relation between any two B-phrases does not cross with another dependency relations in a sentence.</Paragraph>
    <Paragraph position="24"> iii No two dependency relations depending on the same governor are the same.</Paragraph>
    <Paragraph position="25"> Let N be the number of B-phrases in a input sentence, and all B-phrases are numbered descendingly from right to left (see figure 2). We shall fix an input sentence, throughout this chapter. Let DI(i) be the set of all dependency informations of i-th B-phrase.</Paragraph>
    <Paragraph position="26"> Definition: A dependency file DF of a sentence is a finite set of 5-tuples. (i,j,ai,ej,c) 6 DF .... _~ { N=i&gt;j=l, a i E DI (i), cde . ~j 6 DI(j) and c E~(ai,aj).</Paragraph>
    <Paragraph position="27"> Definition: If a subset of DF satisfies following conditions i) to 5), it is called a dependency structure from the Z-th B-phrase to the m-th B-phrase (N~Z&gt;m~i) and denoted by DS(PS,m) or DS' (i,m).</Paragraph>
    <Paragraph position="28">  The set of all dependency structur~ from i-th B-phrase to m-th B-phrase is denoted by ~(~,m). Any DS(N,i)~ ~(N,i) is called a dependency structure of the input sentence. The dependency information of j-th B-phrase is unique in DS(i,m), since 2) and 3) hold. Let JDiDS(Z,m) and jGDS(Z,m) be the dependency information of the j-th B-phrase inD~PS,m) and~set of all the dependency relations that the j-th B-phrase governs in DS(PS,m), respectively.</Paragraph>
    <Paragraph position="29"> def ~i,~,C) JGDS(i,m) u__. {c I (i,J~Ds(~m) } Definition: If the k-th B-phrase (i~k~_m) in DS(PS,m) has the following property, k(the k-th B-phrase) is called a joint of DS(PS,m): For any (i,j,ai,~j,c)~ DS(~,m) , k~i or J~k.</Paragraph>
    <Paragraph position="30"> Let j~(=PS) &gt; j, &gt; Jl &gt; &amp;quot;'&amp;quot; &gt; j (=m) be u , the descendlng sequence of ale the joints of DS(i,m) (see figure &amp;).</Paragraph>
    <Paragraph position="31"> Then, the Jk-th B-phrase is called the k-th joint of DS(PS,m). There is a dependency relation from k-th joint (dependent) to k+i-th joint(governor) in DS(PS,m). Let J.DS(PS,m) be a set of all the joints of DS(PS,m). DS(PS,m/i,j) a subset of DS(Z,m), is defined as follows:  DS(PS,m/i,j)-~{ (p,q,av,~o,c) I (p,q,ap,aq,C) ~ DS(~,my, i~p&gt;q~j}.</Paragraph>
    <Paragraph position="32"> Lemma i. For any positive integer PS, i, j, m (N~PS~i&gt;j~m), the following propositions hold.</Paragraph>
    <Paragraph position="33"> (i) DS(i,m/i,j)6 ~(i,j), if j is a joint of DS(i,m).</Paragraph>
    <Paragraph position="35"> (k=0,1,2,-..), then Jk is the k-th joint of DS(J0,m).</Paragraph>
    <Paragraph position="36"> Syntax analysis of a Japanese sentence is defined as giving B-phrase structures and dependency structure of the sentence.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. THE PARSING ALGORITHM
AND ITS EFFICIENCY
</SectionTitle>
    <Paragraph position="0"> In this chapter, we shall give a parsing method which will parse an input sentence using time O(n ~) and space O(n~), where n is the length of input sentence. Moreover, if the dependency information of each B-phrase is unambiguous, the time variation is quadratic.</Paragraph>
    <Paragraph position="1"> The essence of the parsing algorithm is theconstruction of B-phrase parse list BL and dependency parse list DL which are constructed essentially by a &amp;quot;dynamic programming&amp;quot; method. The parsing algorithm consists of four minor algorithms that are the construction of BL, the obtaining of B-phrase structure, the construction of DL and the obtaining of dependency structure.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
13-PHRASE PARSE LIST
</SectionTitle>
    <Paragraph position="0"> Let b be a string of n length and b(i) denote the i-th character from the left end of it.</Paragraph>
    <Paragraph position="1"> b=b(1) (2) ... b(n).</Paragraph>
    <Paragraph position="2"> The B-phrase parse list of b consists of n minor lists BL(1), BL(2),</Paragraph>
    <Paragraph position="4"> where, IL_i &lt; j~n, WS is a word structure and DI is a dependency information.</Paragraph>
    <Paragraph position="6"> only if there exists a sequence of words w o, w l, ... , wPS satisfying following two conditions:  i) b(1)b(2) ... b(i)=w0w I .. * w~_ I, b(i+l)b(i+2) ... b(j)=wPS, and WS is the word structure of wPS. 2) The string of word w_w I ... w Z is * D a left parclal B-phrase of dependency information DI.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ALGORITHM FOR THE CONSTRUCTION OF BL
</SectionTitle>
    <Paragraph position="0"> Input. An input string b=b(1)(2)</Paragraph>
    <Paragraph position="2"> Output. The B-phrase parse list BL(1), BL(2), ... , BL(n).</Paragraph>
    <Paragraph position="3"> Method. Step i: Find all the independent word which are the left-most subwords of b, using independent word dictionary and for each independent word w=b(1)b(2) ''. b(j), add (0, (W,H,K),a) to BL(j) where, (W,H,K) is the word structure of w and ~=  (&lt;W,H&gt;K, &lt;W,H,-&gt; d) . Then, set the controI word i to 1 and repeat Step 2 until ~ = n *  C(H,K,K'). (W',H')0a is a dependency information defined as follows.</Paragraph>
    <Paragraph position="4"> i If H' is a auxiliary verb* then (W',H')o~ def (&lt;~&gt;g,&lt;W,,H,,_&gt;d) where* &lt;a&gt;g is the information of governor or a.</Paragraph>
    <Paragraph position="5"> ii Let &lt;W&amp;quot;,H&amp;quot;,H&amp;quot;' &gt; be the information of dependent of ~. When H' is a particle, (W,,H,)o a def .... (&lt;a&gt;g,&lt;W&amp;quot;,H&amp;quot;,H'&gt;d) if the power of dependency function of H' is stronger than that of H&amp;quot;' , and else (W,,H,)o ~ def ~.</Paragraph>
    <Paragraph position="6"> There exists upper limit in the length of words and there exists upper limit in the number of dependency informations of all left partial B-phrase of a(1)a(2) ... a(i). Therefore, there exists upper limit for the necessary size of memory space of BL(i) and the theorem 1 follows.</Paragraph>
    <Paragraph position="7"> Theorem i.</Paragraph>
    <Paragraph position="8"> Algorithm for the construction of BL requires O(n) memory space and O(n) elementary operations.</Paragraph>
    <Paragraph position="9"> We shall now describe how to find a B-phrase structure of specified dependency information from BL. The method is given as follows.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ALGORITHM FOR OBTAINING A B-PHRASE
STRUCTURE OF AN INPUT STRING
</SectionTitle>
    <Paragraph position="0"> Input. The specified dependency information ~ and BL.</Paragraph>
    <Paragraph position="1"> Output. A B-phrase structure of dependency information a or the error signal &amp;quot;error&amp;quot;.</Paragraph>
    <Paragraph position="2"> Method. STEP i: Search any item (i,(W,H,K),a) in BL(n) such as Termi (H,H). If there is no such item, then emit &amp;quot;error&amp;quot; and halt. Otherwise, output the word structure (W,H,K), set the register R to (i,(W,H,K),a) and repeat the step 2 until i = 0.</Paragraph>
    <Paragraph position="3"> STEp 2: Let R be (i,(W,H,K),e).</Paragraph>
    <Paragraph position="4"> Search any item (i',(W',H',K'),a') in BL(i) such as C(H',K',H) and (W,H) o~=a. There exist at least one element which satisfies above conditions. &amp;quot;Output the word structure (W',H',K') and R/ (i',(W',H',K'),a').</Paragraph>
    <Paragraph position="5"> It is easy to know theorem 2 holds.</Paragraph>
    <Paragraph position="6"> Theorem 2.</Paragraph>
    <Paragraph position="7"> A B-phrase structure of specified dependency information is output by the above algorithm, if and only if the input string has at least one B-phrase structure of specified dependency information and it takes constant memory space and O(n) elementary operations to operate the above algorithm.</Paragraph>
    <Paragraph position="8"> The set of all the dependency informations DI of input string b is obtained from BL(n), since DI={a I (i, (W,H,K) ,a)PSSL(n) , C(H,K) }.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DEPENDENCY PARSE LIST DL
</SectionTitle>
    <Paragraph position="0"> Let s be a input sentence of N Bphrases. The set of all the dependency informations DI(i) of the i-th B-phrase is obtained by operating the algorithm of construction of BL on the string of the i-th B-phrase.</Paragraph>
    <Paragraph position="1"> The dependency parse list DL of s consists of N-i minor lists DL(2), DL(3) , ''- ,DL(N) .</Paragraph>
    <Paragraph position="2"> \[\] of items in Form DL(i).</Paragraph>
    <Paragraph position="3"> (ai,J,aj,~,P) I (ai*J,aj, ,P) where, N~i &gt; j~l, aie DI(i), ajE DI(j), ce ~, P~ and $ is a specially introduced symbol.</Paragraph>
    <Paragraph position="5"> only if there is a dependency structure DS(i,i) of s, where</Paragraph>
    <Paragraph position="7"> only if there is a dependency structure DS(i,i) of s, where ai=iDiDS(i i) a. :JDiDS(i,i), * r J j is a joint of DS(i,i) except O-th or 1st joint, jGDS(i,i) =P.</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="18" type="metho">
    <SectionTitle>
ALGORITHM FOR THE CONSTRUCTION OF DL
</SectionTitle>
    <Paragraph position="0"> Input. The sequence of the sets of all dependency informations DI(1) ,  DI(2) , ''&amp;quot; *DI(N) .</Paragraph>
    <Paragraph position="1"> Output. Dependency list DL(2), DL(3) , ''&amp;quot; ,DL(N).</Paragraph>
    <Paragraph position="2"> Method. STEP 1 (Construction of DL(2))~ For each a e DI(2)* a16 DI(1) and cE ~ such that ~ e6(c~2,c~i) , add (a2,l,al,c,{c}) to DL(2)* set i to 2 and repeat the STEP 2 and the STEP 3 until i = N.</Paragraph>
    <Paragraph position="3"> STEP 2 (Registration of items of the form (ai+l,j,aA,c,P)) : For any (ai,J,aj*c,P) ~ DL(i) and ~i+16 DI(i+l) , compute 6 (ai+ I,~i) and add every (ai+l,i,ai,c',{c'}) to DL(i+i) such that c'6 6(ei+1,~i). And, for any (c~i,J,aj,A,P) 6 DL(i) where A~ ~ ~'{$} and ai+1PS DI(i+l), compute ~(c~i+1,aA) and add every (ai+l,j,c~j,c',PU {c'}~ to DL(i+i) such that c'6 ~(ai+1,a j) and c'} P. Go to Step 3.</Paragraph>
    <Paragraph position="4">  STEP 3 (Registration of items of the form (ai+1,j,ej,$,P)): For any (ai+1,j,al,c,P) ~ DL(i+i) and (al,k,ek, A,P') # DL~j), add (ei+1,k,ak,$,~') to DL(i+i). Then, set i to i+; and go to STEP 2.</Paragraph>
    <Paragraph position="5"> Theorem 3.</Paragraph>
    <Paragraph position="6"> If there exist no ambiguity in the dependency information of B-phrases of input sentence, then the step 3 in the above algorithm can be replaced to the following step 3'.</Paragraph>
    <Paragraph position="7"> STEP 3': For each (~Ki+!,j,~A,A,P)</Paragraph>
    <Paragraph position="9"> Then, set i to i+l and go to STEP 2.</Paragraph>
    <Paragraph position="10"> The efficiency of each step of above algorithm is as follows.</Paragraph>
    <Paragraph position="11"> The memory size of DL(i) is O(N).</Paragraph>
    <Paragraph position="12"> The step i, the step 2 and the step</Paragraph>
  </Section>
  <Section position="10" start_page="18" end_page="18" type="metho">
    <SectionTitle>
3 take constant, O(N) and O(N ~)
</SectionTitle>
    <Paragraph position="0"> elementary operations, respectively.</Paragraph>
    <Paragraph position="1"> The step 3' takes O(N) elementary operations since it takes O(N) elementary operations to compute Ki+ ~ . Therefore, the theorem 4 holds.</Paragraph>
    <Paragraph position="2"> Theorem 4.</Paragraph>
    <Paragraph position="3"> The algorithm for the construction of DL requires O(N ~) memory space and O(N ~) elementary operations. Moreover, if there exist no ambiguity in the dependency information of each Bphrases, the algorithm requires O(N ~) elementary operations by replacing the step 3 with the step 3' We shall now describe how to find a dependency structure of input sentence from DL. To begin with, we shall explain items of partial dependency structure list PDSL.</Paragraph>
    <Paragraph position="4"> Form of items in PDSL (i,j,a~,a~,P#) where, Nhi ~j ~i a# ~ DI(i) ~ {#}, ~ % DI(j) U {~, P~ i~ a subset of C or #Oand# is specially introduced symbol. ~ Semantics of (i,j,~#,e#.p#) .~ i j-The item (i,j,a~,e~,P#) % PDSL means to be a dependenceS- structure DS(i,j)~ ~(i,j) such that following conditions i),2) and 3) hold.</Paragraph>
    <Paragraph position="5">  i) If a~=~i(%#), then iDiDS (i,j) =e i * 2) If e#=aj(~#!,~ then JDiDS(i,j)=aj. 3) If P~=P(~#). then JGDS(i,j)=P.</Paragraph>
    <Paragraph position="6">  Therefore, (N,i,#,#,#) means to be a dependency structure of the input sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML