File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/p00-1014_concl.xml

Size: 3,643 bytes

Last Modified: 2025-10-06 13:52:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1014">
  <Title>An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words</Title>
  <Section position="11" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7.3 Results
</SectionTitle>
    <Paragraph position="0"> We used the 3097-example testing corpus described in Section 7.1. Table 8 presents the precision and recall of our algorithm and Table 9 presents a performance comparison between our system and previous supervised and unsupervised approaches using the same test data. We describe the different classifiers below: cl base : the baseline described in Section 7.2  Our classifier outperforms all previous unsupervised techniques and approaches the performance of supervised algorithm.</Paragraph>
    <Paragraph position="1"> We reconstructed the two earlier unsupervised classifiers cl HR and cl R2 . Table 10 presents the accuracy of our reconstructed classifiers. The originally reported accuracy for cl R 2 is within the 95% confidence interval of our reconstruction. Our reconstruction of cl HR achieved slightly higher accuracy than the original report.</Paragraph>
    <Paragraph position="2"> 5 The accuracy is reported in (Collins and Brooks, 1995). 6 The accuracy was obtained on a smaller test set but, from the same source as our test data.</Paragraph>
    <Paragraph position="3"> Our classifier used a mixture of the two training data sets described in Section 3. In Table 11 , we compare the performance of our system on the following training data sets: UNAMB : the data set of unambiguous examples described in Section 3.2 EM0 : the data set of Section 3.1 afte r frequency table initialization</Paragraph>
    <Paragraph position="5"> 1/8-EM1 : one eighth of the data in EM1 MIX : The concatenation of UNAMB and EM1 Table 11 illustrates a slight but consistent increase in performance when using contextually similar words. However, since the confidence intervals overlap, we cannot claim with certainty  that the contextually similar words improve performance.</Paragraph>
    <Paragraph position="6"> In Section 7.1, we mentioned some testing examples contained N 1 = the or N 2 = the . For supervised algorithms, the is represented in the training set as any other noun. Consequently, these algorithms collect training data for the and performance is not affected. However, unsupervised methods break down on such examples. In Table 12 , we illustrate the performance increase of our system when removing these erroneous examples.</Paragraph>
    <Paragraph position="7"> Conclusion and Future Work The algorithms presented in this paper advance the state of the art for unsupervised approaches to prepositional phrase attachment and draws near the performance of supervised methods.</Paragraph>
    <Paragraph position="8"> Currently, we are exploring different functions for combining contextually similar word approximations with the attachment scores. A promising approach considers the mutual information between the prepositional relationship of candidate attachments and N 2 . As the mutual information decreases, our confidence in the attachment score decreases and the contextually similar word approximation is weighted higher. Also, improving the construction algorithm for contextually similar words would possibly improve the accuracy of the system. One approach first clusters the similar words. Then, dependency relationships are used to select the most representative clusters as the contextually similar words. The assumption is that more representative similar words produce better approximations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML