File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2201_metho.xml

Size: 4,504 bytes

Last Modified: 2025-10-06 14:15:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2201">
  <Title>A Connectionist Approach to Prepositional Phrase Attachment for Real World Texts</Title>
  <Section position="4" start_page="1233" end_page="1233" type="metho">
    <SectionTitle>
2 Using Words
</SectionTitle>
    <Paragraph position="0"> The attachment probability p(verb attach\]verb nl prep n2) should be computed. Due to the use of word cooccurrence, this approach comes up against the serious problem of data sparseness: the same 4-tuple (v nl prep n2) is hardly ever repeated across the corpus even when the corpus is very large. Collins and Brooks (1995) showed how serious this problem can be: almost 95% of the 3097 4-tuples of their test set do not appear in their 20801 training set 4tuples. In order to reduce data sparseness, Hindle and Rooth (1993) simplified the context, by considering only verb-preposition (p(prep\]verb)), and nlpreposition (p(prep\]nl)) co- occurrences, n2 was ignored in spite of the fact that it may play an important role. In the test, attachment to verb was decided if p(preplverb ) &gt; p(prep\]noun); otherwise attachment to nl is decided. Despite these limitations, 80% of PP were correctly assigned.</Paragraph>
    <Paragraph position="1"> Another method for reducing data sparseness has been introduced recently by Collins and Brooks (1995). These authors showed that the problem of PP attachment ambiguity is analogous to n-gram language models used in speech recognition, and that one of the most common methods for language modelling, the backed-off estimate, is also applicable here. Using this method they obtained 84.5% accuracy on WSJ data.</Paragraph>
  </Section>
  <Section position="5" start_page="1233" end_page="1234" type="metho">
    <SectionTitle>
3 Using Classes
</SectionTitle>
    <Paragraph position="0"> Working with words implies generating huge parameter spaces for which a vast amount of memory space is required. NNs (probably like people) cannot deal with such spaces. NNs are able to approximate very complex functions, but they cannot memorize huge probability look-up tables. The use of semantic classes has been suggested as an alternative to word co-occurrence. If we accept the idea that all the words included in a given class mu'st have similar (attachment) behaviour, and that there are fewer semantic classes than there are words, the problem of data sparseness and memory space can be considerably reduced.</Paragraph>
    <Paragraph position="1"> Some of the class-based methods have used Word-Net (Miller et al., 1993) to extract word classes. WordNet is a semantic net in which each node stands for a set of synonyms (synset), and domination stands for set inclusion (IS-A links). Each synset represents an underlying concept. Table 1 shows three of the senses for the noun bank. Table 2 shows the accuracy of the results reported in previous work. The worst results were obtained when only classes were used. It is reasonable to assume a major source of knowledge humans use to make attachment decisions is the semantic class for the words involved and consequently there must be a class-based method that provides better results. One possible reason for low performance using classes is that WordNet is not an adequate hierarchy since it is hand-crafted. Ratnaparkhi et al. (1994), instead of using hand-crafted semantic classes, uses word classes obtained via Mutual Information Clustering (MIC) in a training corpus. Table 2 shows that, again, worse results are obtained with classes.</Paragraph>
    <Paragraph position="2"> A complementary explanation for the poor results using classes would be that current methods do not use class information very effectively for several reasons: 1.-In WordNet, a particular sense belongs to several classes (a word belongs to a class if it falls within the IS-A tree below that class), and so determining an adequate level of abstraction is difficult. 2.- Most words have more than one sense. As a result, before deciding attachment, it is first necessary to determine the correct sense for each word.</Paragraph>
    <Paragraph position="3"> 3.- None of the preceding methods used classes for verbs. 4.- For reasons of complexity, the complete 4-tuple has not been considered simultaneously except in Ratnaparkhi et a1.(1994). 5.- Classes of a  given sense and classes of different senses of different words can have complex interactions and the preceding methods cannot take such interactions into account.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML