File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2405_intro.xml

Size: 1,347 bytes

Last Modified: 2025-10-06 14:04:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2405">
  <Title>Identifying idiomatic expressions using automatic word-alignment</Title>
  <Section position="3" start_page="33" end_page="34" type="intro">
    <SectionTitle>
1.3 Related work
</SectionTitle>
    <Paragraph position="0"> Melamed (1997b) measures the semantic entropy of words using bitexts. Melamed computes the translational distribution T of a word s in a source language and uses it to measure the translational entropy of the wordH(T|s); this entropy approximates the semantic entropy of the word that can be interpreted either as (a) the semantic ambiguity or (b) the inverse of reliability. Thus, a word with high semantic entropy is potentially very ambiguous and therefore, its translations are less reliable (or highly context-dependent). We also use entropy to approximate meaning predictability. Melamed (1997a) investigates various techniques to identify non-compositional compounds in parallel data. Non-compositional compounds  are those sequences of 2 or more words (adjacent or separate) that show a conventionalized meaning. From English-French parallel corpora, Melamed's method induces and compares pairs of translation models. Models that take into account non-compositional compounds are highly accurate in the identification task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML