File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2044_evalu.xml
Size: 5,154 bytes
Last Modified: 2025-10-06 13:59:26
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2044"> <Title>Two-Phase Shift-Reduce Deterministic Dependency Parser of Chinese</Title> <Section position="5" start_page="258" end_page="260" type="evalu"> <SectionTitle> 4 Experiments and Evaluation </SectionTitle> <Paragraph position="0"> Our parsing procedure is sequentially performed from left to right. The feature vectors for Phase I and Phase II are used as the input for the parsing model. The model outputs a parsing action, left-arc, right-arc or shift. We use SVM as the model to obtain a parsing action, and use CTB for training and test the model.</Paragraph> <Section position="1" start_page="259" end_page="259" type="sub_section"> <SectionTitle> 4.1 Conversion of Penn Chinese Treebank to Dependency Trees </SectionTitle> <Paragraph position="0"> Annotating a Treebank is a tedious task. To take the advantage of CTB, we made some heuristic rules to convert CTB into dependency Treebank. This kind of conversion task has been done on English Treebank[14,10,4]. We use the dependency formalism as Zhou[15] defined.</Paragraph> <Paragraph position="1"> CTB contains 15,162 newswire sentences (including titles, fragments and headlines). The contents of CTB are from Xinhua of mainland, information services department of HKSAR and Sinorama magazine of Taiwan. For experiments, 12,142 sentences are extracted, excluding all the titles, headlines and fragments.</Paragraph> <Paragraph position="2"> For the conversion task, we made some heuristic rules. CTB defines total 23 syntactic phrases and verb compounds[11]. A phrase is composed of several words accompanied to a head word. The head word of each phrase is used as an important resource for PCFG parsing[12,13]. According to the position of the head word with respect to other words, a phrase can be categorized into head-final, head-initial or head-middle set. Table.1 shows the head-initial, head-final and head-middle groups.</Paragraph> <Paragraph position="3"> For VP, IP and CP, these phrases have a verb as its head word. So we find a main verb and regard the verb the head word of the phrase. If the head word for each phrase is determined, other words composing the phrase simply take the head word of the phrase as its head. In the case of BA/LB , we take a different view from what is done in CTB. Zhou[15] regards BA/LB as the dependent of the following verb. We follow Zhou's[15] thought. For sentences containing BA/LB, we converted them into dependency trees manually. With above heuristics, we converted the original CTB into dependency Treebank. null We use the label of phrases as CTB has defined. We exclude FRAG, LST, PRN. For each definition of the phrase please refer to [11].</Paragraph> <Paragraph position="4"> BA, LB are two POS categories of CTB. For details, see [11].</Paragraph> </Section> <Section position="2" start_page="259" end_page="260" type="sub_section"> <SectionTitle> 4.2 Experiments </SectionTitle> <Paragraph position="0"> SVM is one of the binary classifiers based on maximum margin strategy introduced by Vapnik[16]. SVM has been used for various NLP tasks, and gives reasonable outputs. For the experiments reported in this paper, we used the software package SVM light [17].</Paragraph> <Paragraph position="1"> For evaluation matrix, we use Dependency Accuracy and Root Accuracy defined by Yamada[4]. An additional evaluation measure, None Head is defined as following.</Paragraph> <Paragraph position="2"> None Head: the proportion of words whose head is not determined.</Paragraph> <Paragraph position="3"> We construct two SVM binary classifiers, Dep vs. N_Dep and LA vs. RA, to output the transition action of Left-arc, Right-arc or Shift. Dep vs. N_Dep classifier determines if two words have a dependency relation. If two words have no dependency relation, the transition action is simply Shift. If there is a dependency relation, the second classifier will decide the direction of it, and the transition action is either Left-arc or Right-arc.</Paragraph> <Paragraph position="4"> We first train a model along the algorithm of Nivre[10]. The training and test sentences are randomly selected. Table.2 shows that 1.53% of the words cannot find their head after parsing. This result means that the original Nivre's algorithm cannot guarantee a connective dependency structure.</Paragraph> <Paragraph position="5"> With our two-phase parsing algorithm, there is no none head. Then, the dependency accuracy and root accuracy are increased by 10.08% and</Paragraph> </Section> <Section position="3" start_page="260" end_page="260" type="sub_section"> <SectionTitle> 4.3 Comparison with Related Works </SectionTitle> <Paragraph position="0"> Compared to the original works of Nivre[10] and Yamada[4], the performance of our system is lower. We think that is because the target language is different.</Paragraph> <Paragraph position="1"> The average length of sentence in our test set is 34, which is much longer than that in Ma[5] and Cheng[18]. The performance of our system is still better than Ma[5] and less than Cheng[8].</Paragraph> </Section> </Section> class="xml-element"></Paper>