File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2014_evalu.xml

Size: 5,975 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2014">
  <Title>Soft Syntactic Constraints for Word Alignment through Discriminative Training</Title>
  <Section position="7" start_page="109" end_page="111" type="evalu">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We conduct two experiments. The first tests the dependency-augmented ITG described in Section 3.2 as an aligner with hard cohesion constraints. The second tests our discriminative ITG with soft cohesion constraints against two strong baselines.</Paragraph>
    <Section position="1" start_page="109" end_page="110" type="sub_section">
      <SectionTitle>
5.1 Experimental setup
</SectionTitle>
      <Paragraph position="0"> We conduct our experiments using French-English Hansard data. Our ph2 scores, link probabilities and word frequency counts are determined using a sentence-aligned bitext consisting of 50K sentence pairs. Our training set for the discriminative aligners is the first 100 sentence pairs from the French-English gold standard provided for the 2003 WPT workshop (Mihalcea and Pedersen, 2003). For evaluation we compare to the remaining 347 gold standard pairs using the alignment evaluation metrics: precision, recall and alignment error rate or AER (Och and Ney, 2003). SVM learning parameters are tuned using the 37-pair development set provided with this data. English dependency trees are provided by Minipar (Lin, 1994).</Paragraph>
    </Section>
    <Section position="2" start_page="110" end_page="110" type="sub_section">
      <SectionTitle>
5.2 Hard Constraint Performance
</SectionTitle>
      <Paragraph position="0"> The goal of this experiment is to empirically confirm that the English spans marked invalid by Section 3.2's dependency-augmented ITG provide useful guidance to an aligner. To do so, we compare an ITG with hard cohesion constraints, an unconstrained ITG, and a weighted maximum matching aligner. All aligners use the same simple objective function. They maximize summed link values v(l), where v(l) is defined as follows for an l = (Ej,Fk):  scores, breaking ties in favor of closer pairs. This allows us to evaluate the hard constraints outside the context of supervised learning.</Paragraph>
      <Paragraph position="1"> Table 1 shows the results of this experiment.</Paragraph>
      <Paragraph position="2"> We can see that switching the search method from weighted maximum matching to a cohesion-constrained ITG (D-ITG) has produced a 34% relative reduction in alignment error rate. The bulk of this improvement results from a substantial increase in precision, though recall has also gone up. This indicates that these cohesion constraints are a strong alignment feature. The ITG row shows that the weaker ITG constraints are also valuable, but the cohesion constraint still improves on them.</Paragraph>
    </Section>
    <Section position="3" start_page="110" end_page="111" type="sub_section">
      <SectionTitle>
5.3 Soft Constraint Performance
</SectionTitle>
      <Paragraph position="0"> We now test the performance of our SVM ITG with soft cohesion constraint, or SD-ITG, which is described in Section 4.2.2. We will test against two strong baselines. The first baseline, matching is the matching SVM described in Section 4.2.1, which is a re-implementation of the state-of-the-art work in (Taskar et al., 2005)3. The second baseline, D-ITG is an ITG aligner with hard cohesion constraints, but which uses the weights 3Though it is arguably lacking one of its strongest features: the output of GIZA++ (Och and Ney, 2003)  trained by the matching SVM to assign link values. This is the most straight-forward way to combine discriminative training with the hard syntactic constraints.</Paragraph>
      <Paragraph position="1"> The results are shown in Table 2. The first thing to note is that our Matching baseline is achieving scores in line with (Taskar et al., 2005), which reports an AER of 0.107 using similar features and the same training and test sets.</Paragraph>
      <Paragraph position="2"> The effect of the hard cohesion constraint has been greatly diminished after discriminative training. Matching and D-ITG correspond to the the entries of the same name in Table 1, only with a much stronger, learned value function v(l). However, in place of a 34% relative error reduction, the hard constraints in the D-ITG produce only a 9% reduction from 0.110 to 0.100. Also note that this time the hard constraints result in a reduction in recall. This indicates that the hard cohesion constraint is providing little guidance not provided by other features, and that it is actually eliminating more sure links than it is helping to find.</Paragraph>
      <Paragraph position="3"> The soft-constrained SD-ITG, which has access to the D-ITG's invalid spans as a feature during SVM training, is fairing substantially better. Its AER of 0.086 represents a 22% relative error reduction compared to the matching system. The improved error rate is caused by gains in both precision and recall. This indicates that the invalid span feature is doing more than just ruling out links; perhaps it is de-emphasizing another, less accurate feature's role. The SD-ITG overrides the cohesion constraint in only 41 of the 347 test sentences, so we can see that it is indeed a soft constraint: it is obeyed nearly all the time, but it can be broken when necessary. The SD-ITG achieves by far the strongest ITG alignment result reported on this French-English set; surpassing the 0.16 AER reported in (Zhang and Gildea, 2004).</Paragraph>
      <Paragraph position="4"> Training times for this system are quite low; unsupervised statistics can be collected quickly over a large set, while only the 100-sentence training  set needs to be iteratively aligned. Our matching SVM trains in minutes on a single-processor machine, while the SD-ITG trains in roughly one hour. The ITG is the bottleneck, so training time could be improved by optimizing the parser.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML