File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2406_intro.xml

Size: 2,999 bytes

Last Modified: 2025-10-06 14:04:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2406">
  <Title>Collocation Extraction: Needs, Feeds and Results of an Extraction System for German</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Since Firth rst described collocations as habitual word combinations in the 1950ies (cf. Firth, 1968), a number of papers focusing on collocation extraction have been published (see the overviews in (Evert, 2004; Bartsch, 2004)). Most studies concentrate on the extraction from English. However, the procedures proposed in these studies cannot necessarily be applied to other languages as English stands out, e.g. with respect to con gurationality. They rely on the fact that the syntax of English (and of all con gurational languages) provides positional clues to the grammatical function of noun phrases, and they exploit this concept by means of window-based, adjacency-based or pattern-based extraction, combined with association measures to identify co-occurrences that are more frequent than statistically expectable. What these procedures do not cover is semantic-oriented de nitions like (a) and (b).</Paragraph>
    <Paragraph position="1"> a. A collocation is a combination of a free ('autosematic') element (the base) and a lexically determined ('synsemantic') element (the collocate, which may lose (some of) its meaning in a collocation) (adapted from (Hausmann, 1979; Hausmann, 1989; Hausmann, 2003)).</Paragraph>
    <Paragraph position="2"> b. A collocation is a word combination whose semantic and/or syntactic properties cannot be fully predicted from those of its components, and which therefore has to be listed in a lexicon (Evert, 2004).</Paragraph>
    <Paragraph position="3"> We argue that linguistic knowledge could not only improve results (Krenn, 2000b; Smadja, 1993) but is essential when extracting collocations from certain languages: this knowledge provides other applications (or a lexicon user, respectively) with a ne-grained description of how the extracted collocations are to be used in context.</Paragraph>
    <Paragraph position="4"> Additional requirements resulting from the needs of dictionary users are described in (Hausmann, 2003; Heid and Gouws, 2005) and are of interest not only in lexicography but can also be transferred to the eld of natural language generation. These requirements in uence the development of collocation extraction systems, which motivates this paper.</Paragraph>
    <Paragraph position="5"> The structure of the paper is as follows: In chapter 2, the requirements, depending on factors like the targeted language, are presented. We then discuss and suggest methods to meet the given needs.</Paragraph>
    <Paragraph position="6"> A documentation of ongoing work on the extraction of noun + verb collocations from German texts is given in chapter 3. Chapter 4 gives a conclusion and an outlook on work still to be done.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML