File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1040_intro.xml
Size: 10,677 bytes
Last Modified: 2025-10-06 14:02:04
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1040"> <Title>A Deterministic Word Dependency Analyzer Enhanced With Preference Learning</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Methodology </SectionTitle> <Paragraph position="0"> Instead of building a word dependency corpus from scratch, we use the standard data set for comparison.</Paragraph> <Paragraph position="1"> That is, we use Penn Treebank's Wall Street Journal data (Marcus et al., 1993). Sections 02 through 21 are used as training data (about 40,000 sentences) and section 23 is used as test data (2,416 sentences). We converted them to word dependency data by using Collins' head rules (Collins, 1999).</Paragraph> <Paragraph position="2"> The proposed method uses the following procedures. null A base NP chunker: We implemented an SVM-based base NP chunker, which is a simplified version of Kudo's method (Kudo and Matsumoto, 2001). We use the 'one vs. all others' backward parsing method based on an 'IOB2' chunking scheme. By the chunking, each word is tagged as - B: Beginning of a base NP, - I: Other elements of a base NP.</Paragraph> <Paragraph position="3"> - O: Otherwise.</Paragraph> <Paragraph position="4"> Please see Kudo's paper for more details. A Root-Node Finder (RNF): We will describe this later.</Paragraph> <Paragraph position="5"> A Dependency Analyzer: It works just like Yamada's Dependency Analyzer.</Paragraph> <Paragraph position="6"> A PP-Attatchment Resolver (PPAR): This resolver improves the dependency accuracy of prepositions whose part-of-speech tags are IN or TO.</Paragraph> <Paragraph position="7"> The above procedures require a part-of-speech tagger. Here, we extract part-of-speech tags from the Collins parser's output (Collins, 1997) for section 23 instead of reinventing a tagger. According to the document, it is the output of Ratnaparkhi's tagger (Ratnaparkhi, 1996). Figure 2 shows the architecture of the system. PPAR's output is used to rewrite the output of the Dependency Analyzer.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Finding root nodes </SectionTitle> <Paragraph position="0"> When we use SVM, we regard root-node finding as a classification task: Root nodes are positive examples and other words are negative examples.</Paragraph> <Paragraph position="1"> For this classification, each word wi in a tagged sentence T = (w1=p1;::: ;wi=pi;::: ;wN=pN) is characterized by a set of features. Since the given POS tags are sometimes too specific, we introduce a rough part-of-speech qi defined as follows.</Paragraph> <Paragraph position="3"> Then, each word is characterized by the following features, and is encoded by a set of boolean variables. null The word itself wi, its POS tags pi and qi, and its base NP tag bi = B;I;O.</Paragraph> <Paragraph position="4"> We introduce boolean variables such as current word is John and current rough POS is J for each of these features.</Paragraph> <Paragraph position="5"> Previous word wi 1 and its tags, pi 1, qi 1, and bi 1.</Paragraph> <Paragraph position="6"> Next word wi+1 and its tags, pi+1, qi+1, and bi+1.</Paragraph> <Paragraph position="7"> The set of left words fw0;::: ;wi 1g, and their tags, fp0;::: ;pi 1g, fq0;::: ;qi 1g, and fb0;::: ;bi 1g. We use boolean variables such as one of the left words is Mary.</Paragraph> <Paragraph position="8"> The set of right words fwi+1;::: ;wNg, and their POS tags, fpi+1;::: ;pNg and fqi+1;::: ;qNg.</Paragraph> <Paragraph position="9"> Whether the word is the first word or not. We also add the following boolean features to get more contextual information.</Paragraph> <Paragraph position="10"> Existence of verbs or auxiliary verbs (MD) in the sentence.</Paragraph> <Paragraph position="11"> The number of words between wi and the nearest left comma. We use boolean variables such as nearest left comma is two words away. The number of words between wi and the nearest right comma.</Paragraph> <Paragraph position="12"> Now, we can encode training data by using these boolean features. Each sentence is converted to the set of pairs f(yi; xi)g where yi is +1 when xi corresponds to the root node and yi is 1 otherwise. For Preference Learning, we make the set of triplets f(yi; xi:1; xi:2)g, where yi is always +1, xi:1 corresponds to the root node, and xi:2 corresponds to a non-root word in the same sentence. Such a triplet means that xi:1 is preferable to xi:2 as a root node.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Dependency analysis </SectionTitle> <Paragraph position="0"> Our Dependency Analyzer is similar to Yamada's analyzer (Yamada and Matsumoto, 2003). While scanning a tagged sentence</Paragraph> <Paragraph position="2"> end of the sentence, each word wi is classified into three categories: Left, Right, and Shift.1 Right: Right means that wi directly modifies the right word wi+1 and that no word in T modifies wi. If wi is classified as Right, the analyzer removes wi from T and wi is registered as a left child of wi+1.</Paragraph> <Paragraph position="3"> Left: Left means that wi directly modifies the left word wi 1 and that no word in T modifies wi. If wi is classified as Left, the analyzer removes wi from T and wi is registered as a right child of wi 1.</Paragraph> <Paragraph position="4"> Shift: Shift means that wi is not next to its modificand or is modified by another word in T. If wi is classified as Shift, the analyzer does nothing for wi and moves to the left word wi 1.</Paragraph> <Paragraph position="5"> This process is repeated until T is reduced to a single word (= root node). Since this is a three-class problem, we use 'one vs. rest' method. First, we train an SVM classifier for each class. Then, for each word in T, we compare their values: fLeft(x), fRight(x), and fShift(x). If fLeft(x) is the largest, the word is classified as Left.</Paragraph> <Paragraph position="6"> However, Yamada's algorithm stops when all words in T are classified as Shift, even when T has two or more words. In such cases, the analyzer cannot generate complete dependency trees.</Paragraph> <Paragraph position="7"> Here, we resolve this problem by reclassifying a word in T as Left or Right. This word is selected in terms of the differences between SVM outputs:</Paragraph> <Paragraph position="9"> These values are non-negative because fShift(x) was selected. For instance, Left(x) ' 0 means that fLeft(x) is almost equal to fShift(x). If Left(xk) gives the smallest value of these differences, the word corresponding to xk is reclassified as Left. If window for simplicity.</Paragraph> <Paragraph position="10"> Right(xk) gives the smallest value, the word corresponding to xk is reclassified as Right. Then, we can resume the analysis.</Paragraph> <Paragraph position="11"> We use the following basic features for each word in a sentence.</Paragraph> <Paragraph position="12"> The word itself wi and its tags pi, qi, and bi, Whether wi is on the left of the root node or on the right (or at the root node). The root node is determined by the Root-Node Finder.</Paragraph> <Paragraph position="13"> Whether wi is inside a quotation.</Paragraph> <Paragraph position="14"> Whether wi is inside a pair of parentheses.</Paragraph> <Paragraph position="15"> wi's left children fwi1;::: ;wikg, which were removed by the Dependency Analyzer beforehand because they were classified as 'Right.' We use boolean variables such as one of the left child is Mary.</Paragraph> <Paragraph position="16"> Symmetrically, wi's right children fwi1;::: ;wikg are also used.</Paragraph> <Paragraph position="17"> However, the above features cover only nearsighted information. If wi is next to a very long base NP or a sequence of base NPs, wi cannot get information beyond the NPs. Therefore, we add the following features.</Paragraph> <Paragraph position="18"> Li;Ri: Li is available when wi immediately follows a base NP sequence. Li is the word before the sequence. That is, the sentence looks like: . . . Li h a base NP i wi . . .</Paragraph> <Paragraph position="19"> Ri is defined symmetrically.</Paragraph> <Paragraph position="20"> The following features of neigbors are also used as wi's features.</Paragraph> <Paragraph position="21"> Left words wi 3;::: ;wi 1 and their basic features. null Right words wi+1;::: ;wi+3 and their basic features.</Paragraph> <Paragraph position="22"> The analyzer's outputs (Left/Right/Shift) for wi+1;::: ;wi+3. (This analyzer runs backward from the end of T.) If we train SVM by using the whole data at once, training will take too long. Therefore, we split the data into six groups: nouns, verbs, adjectives, prepositions, punctuations, and others.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 PP attachment </SectionTitle> <Paragraph position="0"> Since we do not have phrase labels, we use all prepositions (except root nodes) as training data.</Paragraph> <Paragraph position="1"> We use the following features for resolving PP attachment. null The preposition itself: wi.</Paragraph> <Paragraph position="2"> Candidate modificand wj and its POS tag.</Paragraph> <Paragraph position="3"> Left words (wi 2;wi 1) and their POS tags.</Paragraph> <Paragraph position="4"> Right words (wi+1;wi+2) and their POS tags.</Paragraph> <Paragraph position="5"> Previous preposition.</Paragraph> <Paragraph position="6"> Ending word of the following base NP and its POS tag (if any).</Paragraph> <Paragraph position="7"> i j, i.e., Number of the words between wi and wj.</Paragraph> <Paragraph position="8"> Number of commas between wi and wj.</Paragraph> <Paragraph position="9"> Number of verbs between wi and wj.</Paragraph> <Paragraph position="10"> Number of prepositions between wi and wj. Number of base NPs between wi and wj.</Paragraph> <Paragraph position="11"> Number of conjunctions (CCs) between wi and wj.</Paragraph> <Paragraph position="12"> Difference of quotation depths between wi and wj. If wi is not inside of a quotation, its quotation depth is zero. If wj is in a quotation, its quotation depth is one. Hence, their difference is one.</Paragraph> <Paragraph position="13"> Difference of parenthesis depths between wi and wj.</Paragraph> <Paragraph position="14"> For each preposition, we make the set of triplets f(yi; xi;1; xi;2)g, where yi is always +1, xi;1 corresponds to the correct word that is modified by the preposition, and xi;2 corresponds to other words in the sentence.</Paragraph> </Section> </Section> class="xml-element"></Paper>