File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1004_intro.xml
Size: 2,433 bytes
Last Modified: 2025-10-06 14:01:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1004"> <Title>Combining unsupervised and supervised methods for PP attachment disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recently, numerous statistical methods for prepositional phrase (PP) attachment disambiguation have been proposed. They can broadly be divided into unsupervised and supervised methods. In the unsupervised methods the attachment decision is based on information derived from large corpora of raw text. The text may be automatically processed (e.g. by shallow parsing) but not manually disambiguated. The most prominent unsupervised methods are the Lexical Association score by Hindle and Rooth (1993) and the cooccurrence values by Ratnaparkhi (1998). They resulted in up to 82% correct attachments for a set of around 3000 test cases from the Penn treebank. Pantel and Lin (2000) increased the training corpus, added a collocation database and a thesaurus which improved the accuracy to 84%.</Paragraph> <Paragraph position="1"> In contrast, the supervised methods are based on information that the program learns from manually disambiguated cases. These cases</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Science Foundation under grant 12-54106.98. </SectionTitle> <Paragraph position="0"> are usually extracted from a treebank. Supervised methods are as varied as the Back-off approach by Collins and Brooks (1995) and the Transformation-based approach by Brill and Resnik (1994). Back-off scored 84% correct attachments and outperformed the Transformation-based approach (80%). Even better results were reported by Stetina and Nagao (1997) who used the WordNet thesaurus with a supervised learner and achieved 88% accuracy. null All these accuracy figures were reported for English. We have evaluated both unsupervised and supervised methods for PP attachment disambiguation in German. This work was constrained by the availability of only a small German treebank (10,000 sentences). Under this constraint we found that an intertwined combination of using information from unsupervised and supervised learning leads to the best results. We believe that our results are relevant to many languages for which only small treebanks are available.</Paragraph> </Section> </Section> class="xml-element"></Paper>