File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/w01-0511_concl.xml
Size: 3,767 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0511"> <Title>ClassifyingtheSemanticRelationsinNounCompoundsviaa Domain-SpecificLexicalHierarchy</Title> <Section position="7" start_page="4" end_page="4" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have presented a simple approach to corpus-based assignment of semantic relations for noun compounds. The main idea is to define a set of relations that can hold between the terms and use standard machine learning techniques and a lexical hierarchy to generalize from training instances to new examples. The initial results are quite promising.</Paragraph> <Paragraph position="1"> In this task of multi-class classification (with 18 classes) we achieved an accuracy of about 60%.</Paragraph> <Paragraph position="2"> These results can be compared with Vanderwende Note that for unseen words, the baseline lexical-based logistic regression approach, which essentially builds a tabular representation of the log-odds for each class, also reduces to random guessing.</Paragraph> <Paragraph position="3"> els accuracies (for the entire test set and for case 4) and the dashed lines represent the corresponding lexical accuracies. The accuracies are smaller than the previous case of Table 4 because the training set is much smaller, but the point of interest is the difference in the performance of MeSH vs. lexical in this more difficult setting. Notethatlexicalfor case4reduces torandom guessing.</Paragraph> <Paragraph position="4"> the four cases. All these curves refer to the case of getting exactly the right answer. Note the difference in performance between case 1 (first noun not present in the training set) and case 2 (second noun not present in training set).</Paragraph> <Paragraph position="5"> (1994) who reports an accuracy of 52% with 13 classes and Lapata (2000) whose algorithm achieves about 80% accuracy for a much simpler binary classification. null We have shown that a class-based representation performes as well as a lexical-based model despite the reduction of raw information content and despite a somewhat errorful mapping from terms to concepts. We have also shown that representing the nouns of the compound by a very general representation (Model 2) achieves a reasonable performance of aout 52% accuracy on average. This is particularly important in the case of larger collections with a much bigger number of unique words for which the lexical-based model is not a viable option. Our results seem to indicate that we do not lose much in terms of accuracy using the more compact MeSH representation.</Paragraph> <Paragraph position="6"> We have also shown how MeSH-besed models out perform a lexical-based approach when the number of training points is small and when the test set consists of words unseen in the training data.</Paragraph> <Paragraph position="7"> This indicates that the MeSH models can generalize successfully over unseen words. Our approach handles&quot;mixed-class&quot;relationsnaturally. Forthemixed class Defect in Location, the algorithm achieved an accuracy around 95% for both &quot;Defect&quot; and &quot;Location&quot; simultaneously. Our results also indicate that the second noun (the head) is more important in determining the relationships than the first one.</Paragraph> <Paragraph position="8"> In future we plan to train the algorithm to allow different levels for each noun in the compound. We also plan to compare the results to the tree cut algorithm reported in (Li and Abe, 1998), which allows different levels to be identified for different subtrees. We also plan to tackle the problem of noun compounds containing more than two terms.</Paragraph> </Section> class="xml-element"></Paper>