File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2405_concl.xml

Size: 2,780 bytes

Last Modified: 2025-10-06 13:55:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2405">
  <Title>Identifying idiomatic expressions using automatic word-alignment</Title>
  <Section position="7" start_page="38" end_page="39" type="concl">
    <SectionTitle>
6 Conclusion and future work
</SectionTitle>
    <Paragraph position="0"> In this paper we have shown that assessing automatic word alignment can help to identify idiomatic multi-word expressions. We ranked candidates according to their link variability using translational entropy and their link consistency with regards to default alignments. For our experiments we used a set of 200 Dutch MWE candidates and word-aligned parallel corpora from Dutch to English, Spanish and German. The MWE candidates have been extracted using standard association measures and a head dependence heuristic.</Paragraph>
    <Paragraph position="1"> The word alignment has been done using standard models derived from statistical machine translation. Two measures were tested to re-rank the candidates. Translational entropy measures the predictability of the translation of an expression by looking at the links of its components to a target language. Ranking our 200 MWE candidates using entropy on Dutch to German word alignments improved the baseline of 75.5% to 93.2% uninterpolated average precision (uap). The proportion of default alignments among the links found for MWE components is another score we explored for ranking our MWE candidates. Here, the accuracy is rather similar giving us 91.7% while using the results of a directional alignment model from Dutch to Spanish. In general, we obtain slightly better results when using word alignment from Dutch to German and Spanish, compared to alignment from Dutch to English.</Paragraph>
    <Paragraph position="2"> There emerge several extensions of this work that we wish to address in the future. Alignment types and scoring metrics need to be tested in larger lists of randomly selected MWE candidates to see if the results remain unaltered. We also want to apply some weighting scheme by using the num- null ber of NO LINKS per expression. Our assumption is that an expression with many NO LINKS is harder to translate compositionally, and probably an idiomatic or ambiguous expression. Alternatively, an expression with no NO LINKS is very predictable, thus a literal expression. Finally, another possible improvement is combining several language pairs. There might be cases where idiomatic expressions are conceptualized in a similar way in two languages. For example, a Dutch idiomatic expression with a cognate expression in German might be conceptualized in a different way in Spanish. By combining the entropy or pda scores for NL-EN, NL-DE and NL-ES the accuracy might improve.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML