File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-1099_relat.xml

Size: 2,084 bytes

Last Modified: 2025-10-06 14:15:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1099">
  <Title>You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction</Title>
  <Section position="4" start_page="0" end_page="785" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Although there has been a fair amount of work employing linguistically sophisticated analysis of candidate items (e.g., on CE by Lin (1998) and Lin (1999) as well as on ATR by Daille (1996), Jacquemin (1999), and Jacquemin (2001)), these approaches are limited by the difficulty to port grammatical specifications to other domains (in the case of ATR) or by the error-proneness of full general-language parsers (in the case of CE).</Paragraph>
    <Paragraph position="1"> Therefore, most recent approaches in both areas have backed off to more shallow linguistic filtering techniques, such as POS tagging and phrase chunking (e.g., Frantzi et al. (2000), Krenn and Evert (2001), Nenadi'c et al. (2004), Wermter and Hahn (2005)).</Paragraph>
    <Paragraph position="2">  After linguistic filtering, various measures are employed in the literature for grading the termhood / collocativity of collected candidates. Among the most widespread ones, both for ATR and CE, are statistical and information-theoretic measures, such as t-test, log-likelihood, entropy, and mutual information. Their prominence is also reflected by the fact that a whole chapter of a widely used textbook on statistical NLP (viz.</Paragraph>
    <Paragraph position="3"> Chapter 5 (Collocations) in Manning and Sch&amp;quot;utze (1999)) is devoted to them. In addition, the C-value (Frantzi et al., 2000) - basically a frequency-based approach - has been another widely used measure for multi-word ATR. Recently, more linguistically informed algorithms have been introduced both for CE (Wermter and Hahn, 2004) and for ATR (Wermter and Hahn, 2005), which have been shown to outperform several of the statistics-only metrics.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML