File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2934_evalu.xml

Size: 4,765 bytes

Last Modified: 2025-10-06 13:59:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2934">
  <Title>Multi-lingual Dependency Parsing with Incremental Integer Linear Programming</Title>
  <Section position="6" start_page="228" end_page="229" type="evalu">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> Our results on the test set are shown in Table 1.</Paragraph>
    <Paragraph position="1"> Our results are well above the average for all languages but Czech. For Chinese we perform significantly better than all other participants (p = 0.00) and we are in the top three entries for Dutch, German, Danish. Although Dutch and Chinese are languages were we included additional constraints, our scores are not a result of these. Table 2 compares the result for the languages with additional constraints.</Paragraph>
    <Paragraph position="2"> Adding constraints only marginally helps to improve the system (in the case of Slovene a bug in our implementation even degraded accuracy). A more detailed explanation to this observation is given in the following section. A possible explanation for our high accuracy in Chinese could be the fact that we were not able to optimise the feature set on the development set (see the previous section). Maybe this prevented us from overfitting. It should be noted that we did use non-projective parsing for Chinese, although the corpus was fully projective. Our worst results in comparison with other participants can be seen for Czech. We attribute this to the reduced training set we had to use in order to produce a model in time, even when using the original MST algorithm.</Paragraph>
    <Section position="1" start_page="228" end_page="228" type="sub_section">
      <SectionTitle>
4.1 Chinese
</SectionTitle>
      <Paragraph position="0"> For Chinese the parser was augmented with a set of constraints that disallowed more than one argument of the types head, goal, nominal, range, theme, reason, DUMMY, DUMMY1 and DUMMY2.</Paragraph>
      <Paragraph position="1"> By enforcing arity constraints we could either turn wrong labels/heads into right ones and improve accuracy or turn right labels/heads into wrong ones and degrade accuracy. For the test set the number of improvements (36) was higher than the number of errors (22). However, this margin was outweighed by a few sentences we could not properly process because our inference method timed out. Our overall improvement was thus unimpressive 7 tokens.</Paragraph>
      <Paragraph position="2"> In the context of duplicate &amp;quot;head&amp;quot; dependencies (that is, dependencies labelled &amp;quot;head&amp;quot;) the number of sentences where accuracy dropped far outweighed the number of sentences where improvements could be gained. Removing the arity constraints on &amp;quot;head&amp;quot; labels therefore should improve our results.</Paragraph>
      <Paragraph position="3"> This shows the importance of good second best dependencies. If the dependency with the second highest score is the actual gold dependency and its score is close to the highest score, we are likely to pick this dependency in the presence of additional constraints. On the other hand, if the dependency with the second highest score is not the gold one and its score is too high, we will probably include this dependency in order to fulfil the constraints.</Paragraph>
      <Paragraph position="4"> There may be some further improvement to be gained if we train our model using k-best MIRA with k &gt; 1 since it optimises weights with respect to the k best parses.</Paragraph>
    </Section>
    <Section position="2" start_page="228" end_page="229" type="sub_section">
      <SectionTitle>
4.2 Turkish
</SectionTitle>
      <Paragraph position="0"> There is a considerable gap between the unlabelled and labelled results for Turkish. And in terms of labels the POS type Noun gives the worst performance because many times a subject was classified as object or vice a versa.</Paragraph>
      <Paragraph position="1"> Case information in Turkish assigns argument roles for nouns by marking different semantic roles. Many errors in the Turkish data might have been caused by the fact that this information was not adequately used. Instead of fine-tuning our feature set to Turkish we used the feature cross product as de- null and without constraints.</Paragraph>
      <Paragraph position="2"> scribed in Section 3. Some of the rather meaningless combinations might have neutralised the effect of sensible ones. We believe that using morphological case information in a sound way would improve both the unlabelled and the labelled dependencies. However, we have not performed a separate experiment to test if using the case information alone would improve the system any better. This could be the focus of future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML