File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-0109_evalu.xml
Size: 11,504 bytes
Last Modified: 2025-10-06 14:00:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0109"> <Title>I ! Corpus Based PP Attachment Ambiguity Resolution 1 with a Semantic Dictionary II</Title> <Section position="9" start_page="74" end_page="78" type="evalu"> <SectionTitle> 5. EVALUATION AND EXPERIMENTAL RESULTS 5.1 WORD SENSE DISAMBIGUATION </SectionTitle> <Paragraph position="0"> Because the induction of the decision tree for the PP attachment is based on a supervised learning from sense-tagged examples, it was necessary to sense-disambiguate the entire training set. This was done by the iterative algorithm described in Chapter 2.</Paragraph> <Paragraph position="1"> 10In the WordNet hierarchy.</Paragraph> <Paragraph position="2"> I lWe would like to thank Michael Collins for supplying the data.</Paragraph> <Paragraph position="3"> To form an approximate evaluation of the quality of this disambiguation, we have randomly selected 500 words, manually 12 assigned sets of possible senses to them (sets, because without a full sentential context a full disambiguation is not always possible), and compared these with the automatic disarnbiguafion. If the automatically chosen sense was present in the manually assigned set, the disarnbiguation was considered correct. Out of these 500 words 362 could be considered correctly disarnbiguated, which represents slightly over 72%.</Paragraph> <Paragraph position="4"> We can argue that the insufficient disambiguation context, sparse data problem and empirically set iteration step in the disambiguating algorithm lead to an unreliable disarnbiguation. However, it is necessary to maintain the understanding that it is the PP attachment rather than the sense disambiguation that is our primary goal. Additionally, because the words of the input sentences for the PP attachment are to be assigned senses in the same manner, the sense disambiguation error is concealed. Alhouhg the disambiguation of the training set is eomputationally the most expensive part of the system, it is done only once. The disambiguation of unseen (testing) examples is done by the same algorithm which is modified to exclude the SDT iteration cycles. It is therefore reasonably fast even for real-life applications.</Paragraph> <Section position="1" start_page="75" end_page="78" type="sub_section"> <SectionTitle> 5.2 PP-ATTACHMENT </SectionTitle> <Paragraph position="0"> The PP attachment using the decision tree is extremely efficient and reliable. We have induced the decision tree separately for each preposition in the training corpus, covering the 51 most common prepositions. The induced decision trees are relatively shallow and the classification of unseen sentences is rapid. As shown in the following table, our algorithm appears to have The fact that many words in both the training and the testing sets were not found in WordNet caused a reduction in the accuracy. This is because training examples with an error or with a word not found in WordNet could not fully participate on the decision tree induction. This reduced the original training set of 20801 quadruples to 17577. In the case of the testing set, many of the 3097 testing quadruples were also handicapped by having no entry in WordNet. Attachment of these had to be based on a partial quadruple and was usually assigned at a higher level of the decision tree, which reduced the overall accuracy. In order to conduct a fair comparison, however, we used the same testing set as the methods shown in the above table. If just the examples with full WordNet entries were used, the accuracy rose to 90.8%.</Paragraph> <Paragraph position="1"> Although the algorithm does not provide high enough accuracy from the point of view of word sense disambiguation, it is more important to bear in mind that our main goal is the PP attachment ambiguity resolution. The relatively low accuracy of the word sense disambiguation is compensated by the fact that the same sense disambiguation error is present in both the training set and the classified quadruple. The use of the same training set for both the PP attachment and the sense disambiguation provides a positive bias in favour of correct attachment.</Paragraph> <Paragraph position="2"> Until we have a sufficiently big enough word sense tagged corpus, we can only hypothesise on the importance of the correct sense disambiguation for the PP attachment. Experiments, however, show that if the positive bias between thc word senses of the training set and the testing quadruples is removed, the accuracy of the PP attachment falls substantially. We have conducted an experiment, in which the disambiguated senses of the testing set were replaced by the most frequent senses, i.e. the fast senses as defined in WordNet. This caused a substantial reduction of a~uracy to 76.5%. The fact that our approximate disambiguation (algorithm in Chapter 2) leads to 88.1% correct PP attachment is partly to be attributed to the positive bias of disambiguation of the testing examples against the same training set which is also used for the decision tree induction. The disambiguation errors are thus hidden by their replication in both the training and the testing sets.</Paragraph> <Paragraph position="3"> As we have already mentioned, Collins and Brooks \[C,gB95\] based their method on matching the testing quadruples against the set of training examples. The decision on the attachment was made according to which attachment type had a higher count in the training corpus. If no match for the given quadruple was found, the algorithm backed-off to a combined frequency count of the occurences of matches on three words only, i.e. on the verb-noun-preposition, verb-prepositiondescription and noun-preposition-description. If no match was found on any of the three words combination, the algorithm backed-off to a combined match on two words, i.e. one of the content words with a preposition. If there was further no match found on two words, the attachment type was assigned according to the prepositional statistics, or, if the preposition was not present in the training corpus, the quadruple was assigned the adjectival default. There was a substantial decrease of accuracy between the triples and doubles stage. Our algorithm, on the other hand, has substantially reduced the number of classifications based on fewer words. This is because at the top of the decision tree all of the semantic tops of all of the content words of the given quadruple are compared with the semantic generalisations of the training examples represented through the nodes of the decision tree. Only if the homogeneity termination condition is satisfied before all three content words are compared, the decision is based on less than a full quadruple. The decision tree therefore represents a very useful mechanism for determining the semantic level at which the decision on the PP attachment is made.</Paragraph> <Paragraph position="4"> Collins and Brooks' have also demonstrated the importance of low count events in training data by an experiment where all counts less than 5 were put to zero. This effectively made their algorithm ignore low count events which resulted in the decrease of accuracy from 84.1 to 81.6%. This important feature is maintained in our approach by small homogenous leaves at higher levels of the decision tree, which usually accommodate the low count training examples.</Paragraph> <Paragraph position="5"> Figure 3 shows an interesting aspect of learning the prepositional phrase attachment from a huge corpus. We have selected five most common prepositions and compared their learning curves. It turned out that for the size of a training set smaller than 1000 examples, learning is rather unreliable and dependent on the quality of the chosen quadruples. For a bigger training set, the accuracy grows with its size until a certain maximum accuracy level is reached. This level is different for different prepositions and we hypothesise that it can be broken only when a wider sentential or discourse context is used.</Paragraph> <Paragraph position="6"> Our algorithm also provides a qualification certainty based on the heterogeneity of the decision tree leaves. The tree leaves are heterogeneous for two reasons: 1) the tree expansion is terminated when a node contains more than 77% of examples belonging to the same class, or, 2) when there are examples in the node that cannot be further divided because the tree has reached the bottom of the WordNet hierarchy. The Table 2 shows that the incorrect attachments usually occur with a lower certainty than the correct ones, i.e. most of the incorrect attachments are marked as less certain.</Paragraph> <Paragraph position="7"> The prepositional statistics indicates that there were no matches found for the given quadruple and the attachment was decided based on the statistical frequency of the given preposition. Adjectival default was used in three cases when the preposition was not found in the training set. The certainty between 0.5 and 0.8 accounts mostly for the examples whose attachment was made through the decision tree, but there was either a small number of examples that participated on the creation of the tree branch or the examples were not sufficiently representative (e.g. contradictory examples). Most of the examples in this category possibly require a wider sentential context for further improvement of accuracy. The certainty bigger than 0.8 and smaller than 1.0 accounts for the situations when the decision was based on a leaf whose further expansion was terminated by the homogeneity termination condition or simply some noisy or incorrectly disambiguated examples were involved in its creation 15. Examples, which did not reach the bottom of the decision tree and were assigned the majority class of the node from which there was no appropriate branch to follow, were all classified with certainty between 0.5 and 1.0. The decision with certainty 1.0 is always based on a homogenous leaf. It does not exhibit the highest accuracy because many of the homogenous leaves are formed from only very few examples and many of these are erroneous.</Paragraph> <Paragraph position="8"> As Figure 3 shows, each preposition has a different saturation accuracy which cannot be suttpassed unless a wider sentential context is used. We believe, however, that a bigger corpus would provide better word-sense disarnbiguation which in turn would allow to increase the homogeneity limit for the termination of the tree expansion. Heterogeneous nodes, which force the expansion of the decision tree to unnecessary extent, are caused by 1) examples with an error in the word sense disambiguafion, or by 2) examples, that can be both adjectival and adverbial if taken out of context. The second ease cannot be eliminated by a bigger training corpus, however, the reduction of noisy examples would contribute to an increase in accuracy mainly in the case of small nodes which can now contain more noisy examples than correct ones and thus force a wrong attachment. We feel that a bigger corpus, would provide us with an increase of accuracy of &quot;certainty 1&quot; attachments, which partly includes attachments based on the small leaves. Also, we believe that a bigger training corpus would increase performance in the case of less frequent prepositions which do not have enough training examples to allow for induction of a reliable decision tree.</Paragraph> </Section> </Section> class="xml-element"></Paper>