File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2105_intro.xml
Size: 2,887 bytes
Last Modified: 2025-10-06 14:00:48
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2105"> <Title>Robust German Noun Chunking With a Probabilistic Context-Free Grammar</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A fro'lilt oh'unicef inarks the noun chunks in a sentence as in the tbllowing example: (Wirtschaftsbosse) mit (zweitblhaftem Ruf) economy (:hef~ with doubtable reputation sind an (der in (Engt)Sssen) mlgewandten Fiihrung) are in the in bottlenecks apl)lied guidance (des Landes) beteiligt.</Paragraph> <Paragraph position="1"> of the country involved.</Paragraph> <Paragraph position="2"> 'Leading economists with doubtable reI)utations are involved in guiding the country in times of bottlenecks.' A tool which identifies noun chunks is useflfl for term extraction (most technical terms are nouns or comI)lex noun groups), for lexicograt)hic lmrposes (see (\]~panainen and JSrvinen, 1.998) on syntactically organised concordancing), and as index terms for information retrieval. Chunkers may also mark other types of chunks like verb groups, adverbial t)hrases or adjectival I)hrases.</Paragraph> <Paragraph position="3"> Several methods have been develoI)ed tbr noun chunking. Church's noun phrase tagger (Church, 1988), one of the first; noun ehunkers, was based on a Hidden Markov Model (HMM) similar to those used * Thanks to Mats Rooth and Uli IIeid for many helpflfl comirlonts. null for part-of-speech tagging. Another HMM-bascd approach has been developed by Mats Rooth (Rooth, 1992). It integrates two HMMs; one of them models noun chunks internally, the other models the context of noun chunks. Abney's cascaded finite-state parser (Almey, 1996) also contains a processing step which recognises noun chunks and other types of chunks. Ramshaw and Marcus (Ramshaw and Marcus, 1995) successflflly applied Eric Brill's transformation-based learning method to the chunking problem. Voutilainen's NPtool (Voutilainen, 1993) is based on his constraint-grammar system.</Paragraph> <Paragraph position="4"> Finally, Brants (Brmlts, 1999) described a German clumker which was implemented with cascaded Markov Models.</Paragraph> <Paragraph position="5"> In this 1)aper, a prol)abilistic context-free parser is aI)I)lied to the noui, chunking task. Tile German grammar used in the experiments was semiautolnati(:ally extended with robustness rules in of der to be able to process arbitrary int)ut. The grammar parameters were trained on unlabelled data. A novel algorithm is used for noun chunk extraction. It maximises the t)robability of the chunk set.</Paragraph> <Paragraph position="6"> The tbllowing section introduces the grammar fi'alnework, followed by a description of the chunking algorithm in section 3, and the experiments and their evaluation in section 4.</Paragraph> </Section> class="xml-element"></Paper>