File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0813_metho.xml

Size: 2,339 bytes

Last Modified: 2025-10-06 14:09:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0813">
  <Title>Symmetric Probabilistic Alignment</Title>
  <Section position="3" start_page="88" end_page="88" type="metho">
    <SectionTitle>
2 Experimental Design
</SectionTitle>
    <Paragraph position="0"> In previous work (Kim et al., 2005), we tested our alignment method on a set of French-English sentence pairs taken from the Canadian Hansard corpus and on a set of English-Chinese sentence pairs, and compared the results to human alignments. For the present workshop, we chose to use the Romanian-English data which had been made available.</Paragraph>
    <Paragraph position="1"> Due to a lack of time prior to the period of the shared task, we merely re-used the parameters which had been tuned for French-English, rather than tuning the alignment parameters specifically for the development data.</Paragraph>
    <Paragraph position="2"> SPA was run under three experimental conditions.</Paragraph>
    <Paragraph position="3"> In the first, labeled &amp;quot;SPA (c)&amp;quot; in Tables 1 and 2, SPA was instructed to examine only contiguous target phrases as potential alignments for a given source phrase. In the second, labeled &amp;quot;SPA (n)&amp;quot;, a noncontiguous target alignment consisting of two contiguous segments with a gap between them was permitted in addition to contiguous target alignments. The third condition (&amp;quot;SPA (h)&amp;quot;) examined the impact of a small amount of manual alignment information on the selection of contiguous alignments. Unlike the first two conditions, the presence of additional data beyond the training corpus forces SPA(h) into the Unlimited Resources track.</Paragraph>
    <Paragraph position="4"> We had a native Romanian speaker hand-align 204 sentence pairs from the training corpus, and extracted 732 distinct translation pairs from those alignments, of which 450 were already present in the automatically-generated dictionaries. The new translation pairs were added to the dictionaries for the SPA(h) condition and the translation probabilities for the existing pairs were increased to reflect the increased confidence in their correctness. Had more time been available, we would have investigated more sophisticated means of integrating the human knowledge into the translation dictionaries.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML