XML Viewer - w04-2410

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2410_intro.xml
Size: 6,292 bytes
Last Modified: 2025-10-06 14:02:44
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2410">
  <Title>Thesauruses for Prepositional Phrase Attachment</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Prepositional phrases are an interesting example of syntactic ambiguity and a challenge for automatic parsers.</Paragraph>
    <Paragraph position="1"> The ambiguity arises whenever a prepositional phrase can modify a preceding verb or noun, as in the canonical example I saw the man with the telescope. In syntactic terms, the prepositional phrase attaches either to the noun phrase or the verb phrase. Many kinds of syntactic ambiguity can be resolved using structural information alone (Briscoe and Carroll, 1995; Lin, 1998a; Klein and Manning, 2003), but in this case both candidate structures are perfectly grammatical and roughly equally likely. Therefore ambiguous prepositional phrases require some kind of additional context to disambiguate correctly. In some cases a small amount of lexical knowledge is sufficient: for example of almost always modifies the noun. Other cases, such as the telescope example, are potentially much harder since discourse or world knowledge might be required.</Paragraph>
    <Paragraph position="2"> Fortunately it is possible to do well at this task just by considering the lexical preferences of the words making up the PP. Lexical preferences describe the tendency for certain words to occur together or only in specific constructions. For example, saw and telescope are more likely to occur together than man and telescope, so we can infer that the correct attachment is likely to be verbal. The most useful lexical preferences are captured by the quadruple (v,n1,p,n2) where v is the verb, n1 is the head of the direct object, p is the preposition and n2 is the head of the prepositional phrase. A benchmark dataset of 27,937 such quadruples was extracted from the Wall Street Journal corpus by Ratnaparkhi et al. (1994) and has been the basis of many subsequent studies comparing machine learning algorithms and lexical resources. This paper examines the effect of particular smoothing algorithms on the performance of an existing statistical PP model.</Paragraph>
    <Paragraph position="3"> A major problem faced by any statistical attachment algorithm is sparse data, which occurs when plausible PPs are not well-represented in the training data. For example, if the observed frequency of a PP in the training is zero then the maximum likelihood estimate is also zero.</Paragraph>
    <Paragraph position="4"> Since the training corpus only represents a fraction of all possible PPs, this is probably an underestimate of the true probability. An appealing course of action when faced with an unknown PP is to consider similar known examples instead. For example, we may not have any data for eat pizza with fork, but if we have seen eat pasta with fork or even drink beer with straw then it seems reasonable to base our decision on these instead.</Paragraph>
    <Paragraph position="5"> Similarity is a rather nebulous concept but for our purposes we can define it to be distributional similarity, where two words are considered similar if they occur in similar contexts. For example, pizza and pasta are similar since they both often occur as the direct object of eat. A thesaurus collects together lists of such similar words. The first step in constructing a thesaurus is to collect co-occurrence statistics from some large corpus of text. Each word is assigned a probability distribution describing the probability of it occurring with all other words, and by comparing distributions we can arrive at a similarity score. The corpus, co-occurrence relationships and distributional similarity metric all affect the nature of the final thesaurus.</Paragraph>
    <Paragraph position="6"> There has been a considerable amount of research comparing corpora, co-occurrence relations and similarity measures for general-purpose thesauruses, and these thesauruses are often compared against wide-coverage and general purpose semantic resources such as Word-Net. In this paper we examine whether it is useful to tailor the thesaurus to the task. General purpose thesauruses list words that tend to occur together in free text; we want to find words that behave in similar ways specifically within prepositional phrases. To this end we create a PP thesaurus using existing similarity metrics but using a corpus consisting of automatically extracted prepositional phrases.</Paragraph>
    <Paragraph position="7"> A thesaurus alone is not sufficient to solve the PP attachment problem; we also need a model of the lexical preferences of prepositional phrases. Here we use the back-off model described in (Collins and Brooks, 1995) but with maximum likelihood estimates smoothed using similar PPs discovered using a thesaurus. Such similarity-based smoothing methods have been successfully used in other NLP applications but our use of them here is novel. A key difference is that smoothing is not done over individual words but over entire prepositional phrases. Similar PPs are generated by replacing each component word with a distributionally similar word, and we define a similarity functions for comparing PPs. We find that using a score based on the rank of a word in the similarity list is more accurate than the actual similarity scores provided by the thesaurus, which tend to weight less similar words too highly.</Paragraph>
    <Paragraph position="8"> In Section 2 we cover related work in PP attachment and smoothing techniques, with a brief comparison between similarity-based smoothing and the more common (for PP attachment) class-based smoothing. Section 3 describes Collins' PP attachment model and our thesaurus-based smoothing extensions. Section 4 discusses the thesauruses used in our experiment and describes how the specialist thesaurus is constructed. Experimental results are given in Section 5 and we show statistically significant improvements over the baseline model using generic thesauruses. Contrary to our hypothesis the specialist thesaurus does not lead to significant improvements and we discuss possible reasons why it underperforms on this task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML