File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2405_intro.xml
Size: 1,347 bytes
Last Modified: 2025-10-06 14:04:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2405"> <Title>Identifying idiomatic expressions using automatic word-alignment</Title> <Section position="3" start_page="33" end_page="34" type="intro"> <SectionTitle> 1.3 Related work </SectionTitle> <Paragraph position="0"> Melamed (1997b) measures the semantic entropy of words using bitexts. Melamed computes the translational distribution T of a word s in a source language and uses it to measure the translational entropy of the wordH(T|s); this entropy approximates the semantic entropy of the word that can be interpreted either as (a) the semantic ambiguity or (b) the inverse of reliability. Thus, a word with high semantic entropy is potentially very ambiguous and therefore, its translations are less reliable (or highly context-dependent). We also use entropy to approximate meaning predictability. Melamed (1997a) investigates various techniques to identify non-compositional compounds in parallel data. Non-compositional compounds are those sequences of 2 or more words (adjacent or separate) that show a conventionalized meaning. From English-French parallel corpora, Melamed's method induces and compares pairs of translation models. Models that take into account non-compositional compounds are highly accurate in the identification task.</Paragraph> </Section> class="xml-element"></Paper>