File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/99/p99-1041_relat.xml

Size: 2,910 bytes

Last Modified: 2025-10-06 14:16:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1041">
  <Title>Automatic Identification of Non-compositional Phrases</Title>
  <Section position="9" start_page="320" end_page="320" type="relat">
    <SectionTitle>
7 Related Work
</SectionTitle>
    <Paragraph position="0"> There have been numerous previous research on extracting collocations from corpus, e.g., (Choueka, 1988) and (Smadja, 1993). They do not, however, make a distinction between compositional and non-compositional collocations. Mutual information has often been used to separate systematic associations from accidental ones. It was also used to compute the distributional similarity between words CHin dle, 1990; Lin, 1998). A method to determine the compositionality of verb-object pairs is proposed in (Tapanainen et al., 1998). The basic idea in there is that &amp;quot;if an object appears only with one verb (of few verbs) in a large corpus we expect that it has an idiomatic nature&amp;quot; (Tapanainen et al., 1998, p.1290). For each object noun o, (Tapanainen et al., 1998) computes the distributed frequency DF(o) and rank the non-compositionality of o according to this value.</Paragraph>
    <Paragraph position="1"> Using the notation introduced in Section 3, DF(o) is computed as follows:</Paragraph>
    <Paragraph position="3"> where {vl,v2,... ,vn} are verbs in the corpus that took o as the object and where a and b are constants.</Paragraph>
    <Paragraph position="4"> The first column in Table 5 lists the top 40 verb-object pairs in (Tapanainen et ai., 1998). The &amp;quot;mi&amp;quot; column show the result of our mutual information filter. The '+' sign means that the verb-object pair is also consider to be non-compositional according to mutual information filter (3). The '-' sign means that the verb-object pair is present in our dependency database, but it does not satisfy condition (3). For each '-' marked pairs, the &amp;quot;similar collocation&amp;quot; column provides a similar collocation with a similar mutual information value (i.e., the reason why the pair is not consider to be non-compositional). The '&lt;&gt;' marked pairs are not found in our collocation database for various reasons. For example, &amp;quot;finish seventh&amp;quot; is not found because &amp;quot;seventh&amp;quot; is normalized as &amp;quot;_NUM&amp;quot;, &amp;quot;have a go&amp;quot; is not found because &amp;quot;a go&amp;quot; is not an entry in our lexicon, and &amp;quot;take advantage&amp;quot; is not found because &amp;quot;take advantage of&amp;quot; is treated as a single lexical item by our parser. The ~/marks in the &amp;quot;ntc&amp;quot; column in Table 5 indicate that the corresponding verb-object pairs is an idiom in (Spears and Kirkpatrick, 1993). It can be seen that none of the verb-object pairs in Table 5 that are filtered out by condition (3) is listed as an idiom in NTC-EID.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML