File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2044_metho.xml
Size: 13,796 bytes
Last Modified: 2025-10-06 14:09:35
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2044"> <Title>Two-Phase Shift-Reduce Deterministic Dependency Parser of Chinese</Title> <Section position="3" start_page="0" end_page="256" type="metho"> <SectionTitle> 2 Overview of Related Works </SectionTitle> <Paragraph position="0"> Most nature language grammars tend to assign many possible syntactic structures to the same input utterance. A parser should output a single analysis for each sentence. The task of selecting one single analysis for a given sentence is known as disambiguation.</Paragraph> <Paragraph position="1"> Some of the parsing strategies first produce all possible trees for a sentence. The disambiguation work is done in the end by searching the most probable one through parsing tree forest. Statistical parsers employ probability as a disambiguation measure and output the tree with the highest probability[4,5]. However, in the work of Collins [6], 42% of the correct parse trees were not in the candidate pool of ~30-best parses. Disambiguation work by searching throughout the parsing tree forest has limitations. The alternative way is to disambiguate at each parsing step and output the parsing result deterministically. Nivre[2] and Yamada[3] suggest a shift-reduce like dependency parsing strategy. In section 3.1 we give a detailed analysis of their approach.</Paragraph> <Paragraph position="2"> There are several approaches for dependency parsing on Chinese text. Ma[5] and Cheng[18] are examples of these approaches. The training and test set Ma[5] used, are not sufficient to prove the reliability of Ma's[5] approach. On the frame of parsing Chinese with CFG, there are several approaches to apply the original English parsing strategies to Chinese [7,8,9]. The potential purposes of these works are to take advantage of state-of-art English parsing strategy and to find a way to apply it to Chinese text. Due to the differences between Chinese and English, the performance of the system on Chinese is about 10% lower comparing the performance of the original system.</Paragraph> </Section> <Section position="4" start_page="256" end_page="258" type="metho"> <SectionTitle> 3 Two-Phase Dependency Parsing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="256" end_page="257" type="sub_section"> <SectionTitle> 3.1 Review of Previous Shift-Reduce Dependency Parsers </SectionTitle> <Paragraph position="0"> Nivre[3] presented a shift-reduce dependency parsing algorithm which can parse in linear time.</Paragraph> <Paragraph position="1"> The Nivre's parser was represented by a triples <S, I, A>, where S is a stack, I is a list of (remaining) input tokens, and A is the set of determined dependency relations. Nivre defined four transitions: Left-Arc, Right-Arc, Reduce, and Shift. If there is a dependency relation between the top word of the stack and the input word, according to the direction of the dependency arc, it can be either Left-Arc or Right-Arc. Otherwise, the transition can be either shift or reduce. If the head of the top word of the stack is already determined, then the transition is reduce, otherwise shift. The action of each transition is shown in Fig.1. For details, please refer to Nivre[3,10].</Paragraph> <Paragraph position="2"> Fig.2 gives an example of parsing a Chinese sentence using Nivre's algorithm.</Paragraph> <Paragraph position="3"> Nivre's[3,10] approach has several advantages. First, the dependency structure produced by the algorithm is projective and acyclic[3].</Paragraph> <Paragraph position="4"> Second, the algorithm performs very well for deciding short-distance dependences. Third, at each parsing step, all of the dependency relations on the left side of the input word are determined. Also as the author emphasizes, the time complexity is linear.</Paragraph> <Paragraph position="5"> However, wrong decision of reduce transition, like early reduce, cause the word at the top of the stack loses the chance to be the head of others. Some words lose the chance to be the head of other following words. As a result, the dependents of this word will have a wrong head or may have no head.</Paragraph> <Paragraph position="6"> The parsing steps of a Chinese sentence using Nivre's[3] algorithm are given in Fig.2. At step5 of Fig.2, after reduce, the top of the stack was popped. The algorithm doesn't give a chance for the word Kuo Da to be the head of other words.</Paragraph> <Paragraph position="7"> Therefore, word 'Yin Zi ' cannot have word 'Kuo Da ' as its head. In the final dependency tree of example-1 in Fig.2, the arc from Ji Hua to Yin Zi is wrong. Fig.3 gives the correct dependency tree.</Paragraph> <Paragraph position="8"> Here, Kuo Da is the head of word Yin Zi .</Paragraph> <Paragraph position="9"> All the example sentences are from CTB.</Paragraph> <Paragraph position="10"> If there is a dependency relation between top.stack and input If the dependency relation is Left_arc Insert (input, top.stack) pair into set A Zhe Ge Sheng Ji Hua Kuo Da Zhao Shang Yin Zi , This province plan extend attract merchants attract investments. The province plans to expand attracting merchants and investments. stack , input relation set A As the final dependency tree in Fig.4 shows, there is no head for word Xiao Xi . After Step-5, the top of the stack is word Gei and input word is [?] . There is no dependency relation between these two words. Since the head of the word Gei is already determined in step-2,the next transition is R(educe). As a result, word Gei loses the chance to be the head of word Xiao Xi . So, there is no head assigned to word Xiao Xi in Fig.4. Therefore, Nivre's algorithm causes some errors for determining the right-side dependents.</Paragraph> <Paragraph position="11"> Yamada's algorithm define three actions: left, right and shift, which were similar to those of Nivre's. Yamada parsed a sentence by scanning the sentence word by word from left to right, during the meantime, left or right or shift actions were decided. For short dependents, Yamada's algorithm can cope with it easily. For long dependents, Yamada tried to solve by increasing the iteration of scanning the sentences. As Yamada pointed out, 'shift' transition was executed for two kinds of structure. This may cause wrong decision while deciding the action of transition. Yamada tried to resolve it by looking ahead for more information on the right side of the target word.</Paragraph> <Paragraph position="12"> Zhuan Da Gei Jiao Shi Men [?] Jian Ling Ren Xin Xi De Xiao Xi , declare to teachers a piece exciting of news.</Paragraph> <Paragraph position="13"> Declare a piece of exciting news to teachers.</Paragraph> <Paragraph position="14"> ... ... ...</Paragraph> <Paragraph position="15"> Bao Gao Liao Er Bai Ge Yin Jin Wai Guo Tou Zi De Ji Hua . report _ 200 attract foreign country investment of plan. Report 200 plans in attracting foreign investment. ... ... ...</Paragraph> <Paragraph position="16"> step-i : RA < Bao Gao , Yin Jin Wai Guo Tou Zi De Ji Hua ,{( Bao Gao ,Liao )} > When applying to Chinese parsing, the determination of dependency relation between two verbs is not effective. In the example-3 of Fig.5, at step-i, the parser decides whether the dependency relation between Bao Gao and Yin Jin is either Left-arc or Right-arc. The actual head of the verb Yin Jin is De , which is distant. By looking only two or three right side words ahead, to decide the dependency relation between these verbs at this moment is not reliable. Yamada's algorithm is not a clear solution to determine the right side dependents either.</Paragraph> </Section> <Section position="2" start_page="257" end_page="258" type="sub_section"> <SectionTitle> 3.2 Two-Phase Dependency Parsing </SectionTitle> <Paragraph position="0"> For the head-final languages like Korean or Japanese, Nivre[3] and Yamada's[4] approaches are efficient. However, being applied to Chinese text, the existing methods cannot correctly detect various kinds of right-side dependents involved in verbs. All wrong decisions of reduce transition mainly occur if the right dependent of a verb is also a verb, which may have right-side dependents.</Paragraph> <Paragraph position="1"> For the correct detection of the right-side dependents, we divide the parsing procedure into two-phase. Phase I is to detect the left-side dependents and right-side nominal dependents.</Paragraph> <Paragraph position="2"> Although some nominal dependents are rightside, they don't have dependents on the right side, and will not cause any ambiguities related to right-side dependents. In Phase II, the detection of right-side verbal dependents, are performed. null In Phase I, we determine the left-side dependents and right-side nominal dependents. We define three transitions for Phase I: Shift, Left-Arc, Right-Arc. The actions of transition shift and Left-Arc are the same as Nivre[3] defines. However, in our method, the transition of Right-Arc does not push the input token to the stack. The original purpose for pushing input to stack after right-arc, is to give a chance for the input to be a potential head of the following words. In Chinese, only verbs and prepositions have right-side dependents. For other POS categories, the action of pushing into stack is nonsense. In case that the input word is a preposition, there is no ambiguities we describe. Only the words belong to various verbal categories may cause problems.</Paragraph> <Paragraph position="3"> The method that we use is as follows. When the top word of the stack and the next input word are verbs, like VV, VE, VC or VA [11], the detection of the dependency relation between these two verbs is delayed by transition of shift. To differentiate this shift from original shift, we call this verbal-shift. The determination of the dependency relation between these two verbs will be postponed until phase II. The transitions are summarized as Fig.6.</Paragraph> <Paragraph position="4"> If there is no more input word, phase I terminates. The output of the phase I is a stack, which VV, VE, VC and VA are Penn Chinese Treebank POS categories related to verbs. For details, please refer to [11]. contains verbs in reverse order of the original appearance of the verbs in the sentence. Each verb in the stack may have their partial dependents, which are determined in Phase I.</Paragraph> <Paragraph position="5"> If the action is Verbal-shift : push the input to the stack else if the action is Shift push the input to the stack else if the action is Left-arc set the dependency relation for two words; pop the top of the stack else if the action is Right-arc set the dependency relation for two words Fig. 6. Types of transitions in the phase I The type of transition is determined by the top word of the stack, input word and their context. Most of the previous parsing models[4,12,13] use lexical words as features. Compared to Penn English Treebank, the size of Penn Chinese Treebank (version 4.0, abbreviated as CTB) is rather small. Considering the data sparseness problem, we use POS tags instead of lexical words itself. As Fig.7. shows, the window for feature extraction is the top word of the stack, input word, previous word of the top of the stack, next word of the input. The left-side nearest dependent of these is also taken into consideration. Besides, we use two more features, if_adjoin, and Punc. The feature vector for Phase I is shown in Fig.7.</Paragraph> <Paragraph position="6"> After Phase I, only verbs remain in the stack. In Phase II, we determine the right-side verbal dependents. We take the output stack of Phase I as input. Some words in the stack will have right-side dependents as shown in Fig.8. For Phase II, we also define three transitions: shift, left-arc, right-arc. The operations of these three transitions are the same as Phase I, but there are no verbal-shifts. Fig.9 shows the output of Phase I and parsing at Phase II of example given in Fig.8.</Paragraph> <Paragraph position="7"> The window for feature extraction is the same as that of Phase I. The right-side nearest dependent is newly taken as features for Phase II. The feature vector for Phase II is shown in Fig.10.</Paragraph> <Paragraph position="8"> The two-phase parsing will output a projective, acyclic and connective dependency structure. Nivre[10] said that the time complexity of his parser is 2 times the size of the sentence. Our algorithm is 4 times the size of the sentence, so the time complexity of our parser is still linear to the size of the sentence.</Paragraph> <Paragraph position="9"> Windows for feature extraction : t.stack : top word of the stack p.stack: previous word of top of the stack input : input word n.input: next word of the input word x.pos : POS tag of word x x.left.child : the left-side nearest dependent of word x punc : the surface form of punctuation between top word of the stack and input word, if there is any if_adjoin : a binary indicator to show if the top word of the stack and input word are adjoined The feature vector for Phase I is : continuously improve the investment environments and attract more capitals from overseas, advanced techniques and experiences of administration.) null The contents of stack after Phase I: <Yin Jin ,Gai Shan ,Feng Xing ,Shuo >. (attract, improve, pursue, said ) The dependents of each verb in the stack The feature vector for Phase II is : <p.stack.pos t.stack.pos input.pos n.input.pos p.stack.left.child.pos t.stack.left.child.pos input.left.child.pos p.stack.right.child.pos t.stack.right.child.pos input.right.child.pos n.input.right.child.pos punc if_adjoin> Fig. 10. Feature vector for Phase II.</Paragraph> </Section> </Section> class="xml-element"></Paper>