File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2406_concl.xml
Size: 1,739 bytes
Last Modified: 2025-10-06 13:55:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2406"> <Title>Collocation Extraction: Needs, Feeds and Results of an Extraction System for German</Title> <Section position="5" start_page="45" end_page="47" type="concl"> <SectionTitle> 4 Conclusion and Outlook </SectionTitle> <Paragraph position="0"> We presented a system for collocation extraction that takes into account the behaviour or use of collocations in context. Pro ting from linguistic information (PoS tagging, chunking), the tool reaches a precision of 66% on the top 323 candidates by frequency. On the same data, a window-based approach relying only on PoS information reached a precision of 41%.</Paragraph> <Paragraph position="1"> As the extracted word combinations as well as their context parameters (including the original evidence from the corpus) are stored in a database, the tool also supports explorative research in lexicography. null However, there are some enhancements worth doing: Especially when dealing with low frequencies, relative frequencies lack reliability. Therefore, we suggest computing a con dence interval as proposed in (Evert, 2004b; Heid and Ritz, 2005; Ritz, 2005).</Paragraph> <Paragraph position="2"> As indicated in gure 1, several postprocessing steps can be added to the system, e.g. enabling a sorting of collocation candidates with compound nouns by the morphological heads of their base.</Paragraph> <Paragraph position="3"> In order to get more data, the extraction from verb rst and verb second constructions is also possible. To complete the tool, extraction patterns for collocations of different syntactic relations (cf. table 1) could be designed.</Paragraph> <Paragraph position="5"/> </Section> class="xml-element"></Paper>