File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1002_metho.xml
Size: 18,191 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1002"> <Title>Linear-Time Dependency Analysis for Japanese</Title> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 3 Previous Work </SectionTitle> <Paragraph position="0"> We review here previous work, mainly focusing on time complexity. In English as well as in Japanese, dependency analysis has been studied (e.g., (Lafferty et al., 1992; Collins, 1996; Eisner, 1996)). The parsing algorithms in their papers require C7B4D2</Paragraph> </Section> <Section position="5" start_page="1" end_page="2" type="metho"> <SectionTitle> BF </SectionTitle> <Paragraph position="0"> time where D2 is the number of words.</Paragraph> <Paragraph position="1"> In dependency analysis of Japanese it is very common to use probabilities of dependencies between each two bunsetsus in a sentence. Haruno et al. (1998) used decision trees to estimate the dependency probabilities. Fujio and Matsumoto (1998) applied a modified version of Collins' model (Collins, 1996) to Japanese dependency analysis. Both Haruno et al., and Fujio and Matsumoto used the CYK algorithm, which requires C7B4D2</Paragraph> <Paragraph position="3"> where D2 is a sentence length, i.e., the number of bunsetsus. Sekine et al. (2000) used Maximum Entropy (ME) Modeling for dependency probabilities and proposed a backward beam search to find the best parse. This beam search algorithm requires C7B4D2</Paragraph> </Section> <Section position="6" start_page="2" end_page="2" type="metho"> <SectionTitle> BE </SectionTitle> <Paragraph position="0"> B5 time. Kudo and Matsumoto (2000) also used the same backward beam search together with SVMs rather than ME.</Paragraph> <Paragraph position="1"> There are few statistical methods that do not use dependency probabilities of each two bunsetsus. Nivre (2003) proposes a deterministic algorithm for projective dependency parsing, the running time of which is linear. The algorithm has been evaluated on Swedish text. Sekine (2000) observed that 98.7% of the head locations are covered by five candidates in a sentence. Maruyama and Ogino (Maruyama and Ogino, 1992) also observed similar phenomena. Based on this observation, Sekine (2000) proposed an efficient analysis algorithm using deterministic finite state transducers. This algorithm, in which the limited number of bunsetsus are considered in order to avoid exhaustive search, takes C7B4D2B5 time. However, his parser achieved an accuracy of 77.97% on the Kyoto University Corpus, which is considerably lower than the state-of-the-art accuracy around 89%.</Paragraph> <Paragraph position="2"> Another interesting method that does not use dependency probabilities between each two bunsetsus is the cascaded chunking model by Kudo and Matsumoto (2002) based on the idea in (Abney, 1991; Ratnaparkhi, 1997). They used the model with SVMs and achieved an accuracy of 89.29%, which is the best result on the Kyoto University Corpus. Although the number of dependencies that are estimated in parsing are significantly fewer than that either in CYK or the backward beam search, the upper bound of time complexity is still C7B4D2</Paragraph> </Section> <Section position="7" start_page="2" end_page="2" type="metho"> <SectionTitle> BE </SectionTitle> <Paragraph position="0"> B5.</Paragraph> <Paragraph position="1"> Thus, it is still an open question as to how we analyze dependencies for Japanese in linear time with a state-of-the-art accuracy. The algorithm described below will be an answer to this question.</Paragraph> </Section> <Section position="8" start_page="2" end_page="3" type="metho"> <SectionTitle> 4 Algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.1 Algorithm to Parse a Sentence </SectionTitle> <Paragraph position="0"> The pseudo code for our algorithm of dependency analysis is shown in Figure 1. This algorithm is used with any estimator that decides whether a bunsetsu modifies another bunsetsu. A trainable classifier, such as an SVM, a decision tree, etc., is a typical choice for the estimator. We assume that we have some classifier to estimate the dependency between two bunsetsus in a sentence and the time complexity of the classifier is not affected by the sentence length.</Paragraph> <Paragraph position="1"> Apart from the estimator, variables used for parsing are only two data structures. One is for input and the other is for output. The former is a stack for keeping IDs of modifier bunsetsus to be checked.</Paragraph> <Paragraph position="2"> The latter is an array of integers that stores head IDs that have already been analyzed.</Paragraph> <Paragraph position="3"> Following the presented algorithm, let us parse a // Input: N: the number of bunsetsus in a sentence. // w[]: an array that keeps a sequence of bunsetsus in the sentence. // Output: outdep[]: an integer array that stores an analysis result, i.e., dependencies between // the bunsetsus. For example, the head of w[j] is outdep[j]. // // stack: a stack that holds IDs of modifier bunsetsus in the sentence. If it is empty, the pop // method returns EMPTY (A0BD).</Paragraph> <Paragraph position="4"> // function estimate dependency(j, i, w[]): // a function that returns non-zero when the j-th bunsetsu should // modify the i-th bunsetsu. Otherwise returns zero.</Paragraph> <Paragraph position="5"> function analyze(w[], N, outdep[])</Paragraph> <Paragraph position="7"> for (int i = 1; i BO N; i++) CU // Variable i for a head and j for a modifier.</Paragraph> <Paragraph position="8"> int j = stack.pop(); // Pop a value off the stack.</Paragraph> <Paragraph position="9"> while (j != EMPTY && (i == N A0 1 CYCY estimate dependency(j, i, w))) CU</Paragraph> <Paragraph position="11"> rightmost one in the sentence.</Paragraph> <Paragraph position="12"> // indep[]: an integer array that holds correct dependencies given in a training corpus. // // function estimate dependency(j, i, w[], indep[]): // a function that returns non-zero if indep[j] == i, otherwise returns zero. // It also prints a feature vector (i.e., an encoded example) with a label which is decided to be // 1 (modify) or -1 (not modify) depending on whether the j-th bunsetsu modifies the i-th. function generate examples(w[], N, indep[]) stack.push(0); for (int i = 1; i BO N; i++) CU</Paragraph> <Paragraph position="14"> sample sentence in Figure 3. For explanation, we here assume that we have a perfect classifier as estimate dependency() in Figure 1, which can return a correct decision for the sample sentence.</Paragraph> <Paragraph position="15"> First, we push 0 (Ken-ga) on the stack for the bunsetsu ID at the top of the sentence. After this initialization, let us see how analysis proceeds at each iteration of the for loop. At the first iteration we check the dependency between the zero-th bunsetsu and the 1st (kanojo-ni). We push 0 and 1 because the zero-th bunsetsu does not modify the 1st. Note that the bottom of the stack is 0 rather than 1. Smaller IDs are always stored at lower levels of the stack.</Paragraph> <Paragraph position="16"> Due to this, we do not break the non-crossing constraint (C3. in Section 2.1).</Paragraph> <Paragraph position="17"> At the second iteration we pop 1 off the stack and check the dependency between the 1st bunsetsu and the 2nd (ano). Since the 1st does not modify the 2nd, we again push 1 and 2.</Paragraph> <Paragraph position="18"> At the third iteration we pop 2 off the stack and check the dependency for the 2nd and the 3rd (honwo). Since the 2nd modifies the 3rd, the dependency is stored in outdep[]. The value of outdep[j] represents the head of the CY-th bunsetsu. For example, outdep[2] = 3 means the head of the 2nd bunsetsu is the 3rd. Then we pop 1 off the stack and check the dependency between the 1st and the 3rd. We push again 1 since the 1st does not modify the 3rd. After that, we push 3 on the stack. The stack now has 3, 1 and 0 in top-to-bottom order.</Paragraph> <Paragraph position="19"> At the fourth iteration we pop 3 off the stack. We do not have to check the dependency between the 3rd and the 4th (age-ta) because the 4th bunsetsu is the last bunsetsu in the sentence. Now we set outdep[3] = 4. Next, we pop 1 off the stack. Also in this case, we do not have to check the dependency between the 1st and the 4th. Similarly the zero-th bunsetsu modifies the 4th. As a result we set outdep[1] = 4 and outdep[0] = 4. Now the stack is empty and we finish the analysis function. Finally, we have obtained a dependency structure through the array outdep[].</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.2 Time Complexity </SectionTitle> <Paragraph position="0"> At first glance, the upper bound of the time complexity of this algorithm seems to be C7B4D2</Paragraph> <Paragraph position="2"> it involves a double loop; however, it is not. We will show that the upper bound is C7B4D2B5 by considering how many times the condition part of the while loop in Figure 1 is executed. The condition part of the while loop fails C6 A0 BE times because the outer for loop will be executed from 1 to C6 A0BD. On the other hand, the same condition part successes C6A0BD times because outdep[j] = i is executed C6 A0 BD times. For each bunsetsu ID CY, outdep[j] = i is surely executed once because by executing j = stack.pop() the value of CY is lost and it is never pushed on the stack again.</Paragraph> <Paragraph position="3"> That is the body of the while loop will be executed at most C6 A0 BD times which is equal to the number of the bunsetsus except the last one. Therefore the total number of execution of the condition part of the while loop is BEC6 A0BF, which is obtained by summing up C6A0BE and C6A0BD. This means that the upper bound of time complexity is C7B4D2B5.</Paragraph> </Section> <Section position="3" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.3 Algorithm to Generate Training Examples </SectionTitle> <Paragraph position="0"> When we prepare training examples for the trainable classifier used with this algorithm, we use the algorithm shown in Figure 2. It is almost the same as the algorithm for analyzing in Figure 1. The differences are that we give correct dependencies to estimate dependency() through indep[] and we obviously do not have to store the head IDs to outdep[].</Paragraph> </Section> <Section position="4" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 4.4 Summary and Theoretical Comparison </SectionTitle> <Paragraph position="0"> with Related Work The algorithm presented here has the following features: null F1. It is independent on specific machine learning methodologies. Any trainable classifiers can be used.</Paragraph> <Paragraph position="1"> F2. It scans a sentence just once in a left-to-right manner.</Paragraph> <Paragraph position="2"> F3. The upper bound of time complexity is C7B4D2B5. The number of the classifier call, which is most time consuming, is at most BEC6 A0 BF. F4. The flow and the used data structures are very simple. Therefore, it is easy to implement. One of the most related models is the cascaded chunking model by (Kudo and Matsumoto, 2002). Their model and our algorithm share many features including F1.</Paragraph> <Paragraph position="3"> The big difference between theirs and ours is how many times the input sentence has to be scanned (F2). With their model we have to scan it several times, which leads to some computational inefficiency, i.e., at the worst case</Paragraph> <Paragraph position="5"> B5 computation is required. Our strict left-to-right parsing is more suitable also for practical applications such as real time speech recognition. In addition, the flow and the data strucutres are much simpler (F4) than those of the cascaded chunking model where an array for chunk tags is used and it must be updated while scanning the sentence several times.</Paragraph> <Paragraph position="6"> Our parsing method can be considered to be one of the simplest forms of shift-reduce parsing. The difference from typical use of shift-reduce parsing is that we do not need several types of actions and only the top of the stack is inspected. The reason for these simplicities is that Japanese has the C2 constraint (Sec. 2.1) and the target task is dependency analysis rather than CFG parsing.</Paragraph> </Section> </Section> <Section position="9" start_page="3" end_page="4" type="metho"> <SectionTitle> 5 Models for Estimating Dependency </SectionTitle> <Paragraph position="0"> In order to evaluate the proposed algorithm empirically, we use SVMs (Vapnik, 1995) for estimating dependencies between two bunsetsus because they have excellent properties. One of them is that combinations of features in an example are automatically considered with polynomial kernels. Excellent performances have been reported for many classification tasks. Please see (Vapnik, 1995) for formal descriptions of SVMs.</Paragraph> <Paragraph position="1"> Kudo and Matsumoto (2002) give more comprehensive comparison with the probabilistic models as used in (Uchimoto et al., 1999).</Paragraph> <Paragraph position="2"> At estimate dependency() in Figure 1, we encode an example with features described below. Then we give it to the SVM and receive the estimated decision as to whether a bunsetsu modifies the other.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.1 Standard Features </SectionTitle> <Paragraph position="0"> By the &quot;standard features&quot; here we mean the feature set commonly used in (Uchimoto et al., 1999; Sekine et al., 2000; Kudo and Matsumoto, 2000; Kudo and Matsumoto, 2002). We employ the features below for each bunsetsu: 1. Rightmost Content Word - major POS, minor POS, conjugation type, conjugation form, surface form (lexicalized form) 2. Rightmost Function Word - major POS, minor POS, conjugation type, conjugation form, surface form (lexicalized form) 3. Punctuation (periods, and commas) 4. Open parentheses and close parentheses 5. Location - at the beginning of the sentence or at the end of the sentence.</Paragraph> <Paragraph position="1"> In addition, features as to the gap between two bunsetsus are also used. They include: distance, particles, parentheses, and punctuation.</Paragraph> </Section> <Section position="2" start_page="3" end_page="4" type="sub_section"> <SectionTitle> 5.2 Local Contexts of the Current Bunsetsus </SectionTitle> <Paragraph position="0"> Local contexts of a modifier and its possible head would be useful because they may represent fixed expressions, case frames, or other collocational relations. Assume that the CY-th bunsetsu is a modifier and the CX-th one is a possible head. We consider three bunsetsus in the local contexts of the CY-th and the CX-th: the (CY A0 BD)-th bunsetsu if it modifies the CY-th, the (CXA0BD)-th one, and the (CXB7BD)-th one. Note that in our algorithm the (CX A0 BD)-th always modifies the CX-th when checking the dependency between the CY-th bunsetsu and the CX-th where CYBOCXA0BD. In order to keep the data structure simple in the proposed algorithm, we did not consider more distant bunsetsus from both the CY-th and the CX-th. It is easy to check whether the (CY A0 BD)-th bunsetsus modifies the CY-th one through outdep[]. Note that this use of local contexts is similar to the dynamic features in (Kudo and Matsumoto, 2002) .</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.3 Richer Features Inside a Bunsetsu </SectionTitle> <Paragraph position="0"> With the standard features we will miss some case particles if the bunsetsu has two or more function words. Suppose that a bunsetsu has a topic marker Their model extracts three types of dynamic features from modifiers of the CY-th bunsetsu (Type B), modifiers of the CX-th bunsetsu (Type A), and heads of the CX-th bunsetsu (Type C). Since in our proposed algorithm analysis proceeds in a left-to-right manner, we have to use stacking (Wolpert, 1992) or other techniques to employ the type C features.</Paragraph> <Paragraph position="1"> as well as a case particle. In this case the case particle is followed by the topic marker. Thus we miss the case particle since in the standard features only the rightmost function word is employed. In order to capture this information, we use as features also all the particles in each bunsetsu.</Paragraph> <Paragraph position="2"> Another important features missed in the standard are ones of the leftmost word of a possible head bunsetsu, which often has a strong association, e.g., an idiomatic fixed expression, with the right-most word of its modifier. Furthermore, we use as a feature the surface form of the leftmost word of the bunsetsu that follows a possible head. This feature is used with ones in Section 5.2.</Paragraph> </Section> <Section position="4" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.4 Features for Conjunctive Structures </SectionTitle> <Paragraph position="0"> Detecting conjunctive structures is one of hard tasks in parsing long sentences correctly. Kurohashi and Nagao (1994) proposed a method to detect conjunctive structures by calculating similarity scores between two sequences of bunsetsus.</Paragraph> <Paragraph position="1"> So far few attempts have been made to explore features for detecting conjunctive structures. As a first step we tried two preliminary features for conjunctive structures. If the current modifier bunsetsu is a distinctive key bunsetsu (Kurohashi and Nagao, 1994, page 510), these features are triggered. One is a feature which is activated when a modifier bunsetsu is a distinctive key bunsetsu. The other is a feature which is activated when a modifier is a distinctive key bunsetsu and the content words of both the modifier and its possible head are equal to each other. For simplicity, we limit the POS of these content words to nouns.</Paragraph> </Section> </Section> class="xml-element"></Paper>