File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2931_evalu.xml

Size: 5,859 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2931">
  <Title>Dependency Parsing Based on Dynamic Local Optimization</Title>
  <Section position="6" start_page="213" end_page="214" type="evalu">
    <SectionTitle>
4 Experiments and analysis
</SectionTitle>
    <Paragraph position="0"> Our parsing results and average results are listed in the Table 1. It can be seen that the attachment scores vary greatly with different languages. A general analysis and a specific analysis are made respectively in this section.</Paragraph>
    <Section position="1" start_page="213" end_page="213" type="sub_section">
      <SectionTitle>
4.1 General analysis
</SectionTitle>
      <Paragraph position="0"> We try to find the properties that make the difference to parsing results in multi-lingual parsing. The properties of all the training data can be found in (Buchholz et al., 2006). Intuitively the size of training data and average length of per sentence would be influential on dependency parsing. The relation of these properties and scores are showed in the Figure 4 and 5.</Paragraph>
      <Paragraph position="1"> From the charts we cannot assuredly find the properties that are proportional to score. Whether Czech language with the largest size of training data or Chinese with the shortest sentence length, don't achieve the best results. It seems that no any factor is determining to parsing results but all the properties exert influence on the dependency parsing together.</Paragraph>
      <Paragraph position="2"> Another factor that maybe explain the difference of scores in multi-lingual parsing is the characteristics of language. For example, the number of tokens with HEAD=0 in a sentence is not one for some languages. Table 1 shows the range of governing degree of head. This statistics is somewhat different with that from organizers because we don't distinguish the scoring tokens and non-scoring tokens.</Paragraph>
      <Paragraph position="3"> Another characteristic is the directionality of dependency relations. As Table 1 showed, many relations in treebanks are bi-directional, which increases the number of the relation actually. Furthermore, the flexibility of some grammatical structures poses difficulties to language model. For instance, subject can appear in both sides of the predicates in some treebanks which tends to cause the confusion with the object (Kromann, 2003; Afonso et al., 2002; Civit Torruella and Mart'i Anton'in, 2002; Oflazer et al., 2003; Atalay et al., 2003).</Paragraph>
      <Paragraph position="4"> As to our parsing results, which are lower than all the average results except for Danish. That can be explained from the following aspects: (1) Our parser uses a projective parsing algorithm and cannot deal with the non-projective tokens, which exist in all the languages except for Chinese.</Paragraph>
      <Paragraph position="5"> (2) The information provided by training data is not fully employed. Only POS and lemma are used. The morphological and syntactic features may be helpful to parsing.</Paragraph>
      <Paragraph position="6"> (3) We haven't explored syntactic structures in depth for multi-lingual parsing and more structural features need to be used in the Check procedure.</Paragraph>
    </Section>
    <Section position="2" start_page="213" end_page="214" type="sub_section">
      <SectionTitle>
4.2 Specific analysis
</SectionTitle>
      <Paragraph position="0"> Specifically we make error analysis to Chinese and Turkish. In Chinese result we found many errors occurred near the auxiliary word &amp;quot;&amp;quot;(DE). We call the noun phrases with &amp;quot;&amp;quot; DE Structure. The word &amp;quot;&amp;quot; appears 355 times in the all 4970 dependencies of the test data. In Table 2 the second row shows the frequencies of &amp;quot;DE&amp;quot; as the parent of dependencies. The third row shows the frequencies while it is as child. Its error rate is 33.1% and 43.4% in our results respectively. Furthermore, each head error will result in more than one errors, so the errors from DE Structures are nearly 9% in our results.</Paragraph>
      <Paragraph position="1">  number of tokens with HEAD=0 in a sentence. The last row lists the number of relations/the number of bi-directional relations of them (Our statistics are slightly different from that of organizers). gold system error headerr  The high error rate is due to the flexibility of DE Structure. The children of DE can be nouns and verbs, thus the ambiguities will occur. For example, the sequence &amp;quot;V N1 DE N2&amp;quot; is a common ambiguious structure in Chinese. It needs to be solved with semantic knowledge to some extent. The errors of DE being child are mostly from noun compounds.</Paragraph>
      <Paragraph position="2"> For example, the string &amp;quot;MF!T!J&amp;quot; results in the error: &amp;quot;DE&amp;quot; as the child of &amp;quot;T&amp;quot;. It will be better that noun compounds are processed specially.</Paragraph>
      <Paragraph position="3"> Our results and average results achieve the lowest score on Turkish. We try to find some reasons through the following analysis. Turkish is a typical head-final language and 81.1% of dependencies are right-headed. The monotone of directionality increases the difficulties of identification. Another difficulty is the diversity of the same pair. Taking noun and pronoun as example, which only achieve the accuracy of 25% and 28% in our results, there are 14 relations in the noun-verb pairs and 11 relations in the pronoun-verb pairs. Table 3 illustrates the distribution of some common relations in the test data.</Paragraph>
      <Paragraph position="4"> The similarity of these dependencies makes our parser only recognize 23.3% noun-verb structures and 21.8% pronoun-verb structures. The syntactic or semantic knowledge maybe helpful to distinguish these similar structures.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML