File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2925_evalu.xml

Size: 3,260 bytes

Last Modified: 2025-10-06 13:59:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2925">
  <Title>Projective Dependency Parsing with Perceptron</Title>
  <Section position="6" start_page="182" end_page="183" type="evalu">
    <SectionTitle>
4 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We experimented on the 13 languages proposed in the CoNLL-X Shared Task (HajiVc et al., 2004; Simov et al., 2005; Simov and Osenova, 2003; Chen et al., 2003; B&amp;quot;ohmov'a et al., 2003; Kromann, 2003; van der Beek et al., 2002; Brants et al., 2002; Kawata and Bartels, 2000; Afonso et al., 2002; DVzeroski et al., 2006; Civit and Mart'i, 2002; Nilsson et al., 2005; Oflazer et al., 2003; Atalay et al., 2003). Our approach to deal with many different languages was totally blind: we did not inspect the data to motivate language-specific features or processes.</Paragraph>
    <Paragraph position="1">  We did feature filtering based on frequency counts. Our feature extraction patterns, that exploit both lexicalization and combination, generate millions of feature dimensions, even with small datasets. Our criterion was to use at most 500,000 different dimensions in each label weight vector. For each language, we generated all possible features, and then filtered out most of them according to the counts. Depending on the number of training sentences, our counts cut-offs vary from 3 to 15.</Paragraph>
    <Paragraph position="2"> For each language, we held out from training data a portion of sentences (300, 500 or 1000 depending on the total number of sentences) and trained a model for up to 20 epochs in the rest of the data. We evaluated each model on the held out data for different number of training epochs, and selected the optimum point. Then, we retrained each model on the whole training set for the selected number of epochs.</Paragraph>
    <Paragraph position="3"> Table 5 shows the attachment scores obtained by our system, both unlabeled (UAS) and labeled (LAS). The first column (GOLD) presents the LAS obtained with a perfect scoring function: the loss in accuracy is related to the projectivity assumption of our parsing algorithm. Dutch turns out to be the most non-projective language, with a loss in accuracy of 5.44%. In our opinion, the loss in other languages is relatively small, and is not a major limitation to achieve a high performance in the task. Our system achieves an overall LAS of 74.72%, with substantial variation from one language to another.</Paragraph>
    <Paragraph position="4"> Turkish, Arabic, Dutch, Slovene and Czech turn out to be the most difficult languages for our system, with accuracies below 70%. The easiest language is clearly Japanese, with a LAS of 88.13%, followed by Chinese, Portuguese, Bulgarian and German, all with LAS above 80%.</Paragraph>
    <Paragraph position="5"> Table 6 shows the contribution of base feature extraction functions. For four languages, we trained models that increasingly incorporate base functions.</Paragraph>
    <Paragraph position="6"> It can be shown that all functions contribute to a better score. Contextual features (ph3) bring the system to the final order of performance, while distance (ph4) and runtime (ph) features still yield substantial improvements. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML