File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2021_concl.xml
Size: 8,958 bytes
Last Modified: 2025-10-06 13:55:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2021"> <Title>Using WordNet to Automatically Deduce Relations between Words in Noun-Noun Compounds</Title> <Section position="9" start_page="165" end_page="166" type="concl"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> Various approaches to noun-noun compound disambiguation in the literature have used the semantic category membership of the constituent words in a compound to determine the relation between those words. Most of these use hand-crafted lexical hierarchies designed for particular semantic domains. We compare our algorithm for compound disambiguation with one recently presented alternative, Rosario, Hearst, and Fillmore's (2002) rule-based system for the disambiguation of noun-noun compounds in the biomedical domain.</Paragraph> <Section position="1" start_page="165" end_page="165" type="sub_section"> <SectionTitle> 6.1 Rule-based disambiguation algorithm </SectionTitle> <Paragraph position="0"> Rosario et al.'s (2002) general approach to noun-noun compound disambiguation is based, as ours is, on the semantic categories of the nouns making up a compound. Rosario et al. make use of the MeSH (Medical Subject Headings) hierarchy, which provides detailed coverage of the biomedical domain they focus on. Their analysis involves automatically extracting a corpus of noun-noun compounds from a large set of titles and abstracts from the MedLine collection of biomedical journal articles, and identifying the MeSH semantic categories under which the modifier and head words of those compounds fall. This analysis generates a set of category pairs for each compound (similar to our sense pairs), with each pair consisting of a MeSH category for the modifier word and a MeSH category for the head.</Paragraph> <Paragraph position="1"> The aim of Rosario et al.'s analysis was to produce a set of rules which would link the MeSH category pair for a given compound to the correct semantic relation for that compound. Given such a set of rules, their algorithm for disabmiguating noun-noun compounds involves obtaining the MeSH category membership for the constituent words of the compounds to be disambiguated, forming category pairs, and looking up those category pairs in the list of category-pair-relation rules. If a rule was found linking the category pair for a given compound to a particular semantic relation, that relation was returned as the correct relation for the compound in question.</Paragraph> <Paragraph position="2"> To produce a list of category-pair-relation rules, Rosario et al. first selected a set of categorypairsoccurringintheircorpusofcompounds. null For each category pair, they manually examined 20% of the compounds falling under that category pair, paraphrasing the relation between the nouns in that compound by hand, and seeing if that relation was the same across all compounds under that category pair. If that relation was the same across all selected compounds, that category pair was recorded as a rule linked to the relation produced. If, on the other hand, several different relations were produced for a given category pair, analysis decended one level in the MeSH hierarchy, splitting that category pair into several subcategories. This repeated until a rule was produced assigning a relation to every compound examined. The rules produced by this process were then tested using a randomly chosen test set of 20% of compounds falling under each category pair, entirely distinct from the compound set used in rule construction, and applying the rules to those new compounds. An evaluator checked each compound to see whether the relation returned for that compound was an acceptable reflection of that compound's meaning. The results varied between 78.6% correct to 100% correct across the different category pairs.</Paragraph> </Section> <Section position="2" start_page="165" end_page="166" type="sub_section"> <SectionTitle> 6.2 Comparing the algorithms </SectionTitle> <Paragraph position="0"> In this section we first compare Rosario et al.'s algorithm for compound disambiguation with our own, and then compare the procedures used to assess those algorithms. While both algorithms are based on the association between category pairs (sense pairs) and semantic relations, they differ in that Rosario et al.'s algorithm uses a static list of manually-defined rules linking category pairs and semantic relations, while our PRO algorithm automatically and dynamically computes links between sense pairs and relations on the basis of proportionalco-occurrenceinacorpusofcompounds. null This gives our algorithm an advantage in terms of coverage: where Rosario et al.'s algorithm can only disambiguate compounds whose constituent words match one of the category-pair-relation rules on their list, our algorithm should be able to apply to any compound whose constituent words are defined in WordNet. This also gives our algorithm an advantage in terms of extendability, in that while adding a new compound to the corpus of compounds used by Rosario et al. could potentially require the manual removal or re-definition of a number of category-pair-relation rules, adding a new compound to the annotated corpus used by our PRO algorithm requires no such intervention. Of course, the fact that Rosario et al.'s algorithm is based on a static list of rules linking categories and relations, while our algorithm dynamically computes such links, gives Rosario et al.'s algorithm a clear efficiency advantage. Improving the efficiency of the PRO algorithm, perhaps by automatically compiling a tree of associations between word senses and semantic relations and using that tree in compound disambiguation, is an important aim for future research.</Paragraph> <Paragraph position="1"> Our second point of comparison concerns the procedures used to assess the two algorithms. In Rosario et al.'s assessment of their rule-based algorithm, an evaluator checked the relations returned by the algorithm for a set of compounds, and found that those relations were acceptable in a large proportion of cases (up to 100%). A problem with this procedure is that many compounds can fall equally under a number of different acceptable semantic relations. The compound storm damage, for example, is best defined by the relation causes ('damage caused by a storm'), but also falls under the relations makes ('damage made by a storm') andderivedfrom('damagederivedfromastorm'): most people would agree that these paraphrases all acceptably describe the meaning of the compound (Devereux & Costello, 2005). This means that, while the relations returned for compounds byRosarioetal.'salgorithmmayhavebeenjudged acceptable for those compounds by the evaluator, they were not necessarily the most appropriate relations for those compounds: the algorithm could have returned other relations that would have been equally acceptable. In other words, Rosario et al.'s assessment procedure is somewhat weaker than the assessment procedure we used to test the PRO algorithm, in which there was one correct relation identified for each compound and the algorithm was taken to have performed correctly only if it returned that relation. One aim for future work is to apply the assessment procedure used by Rosario et al. to the PRO algorithm's output, asking an evaluator to assess the acceptability of the relations returnedratherthansimplycountingthecaseswhere null the best relation was returned. This would provide a clearer basis for comparison between the algorithms. null</Paragraph> </Section> <Section position="3" start_page="166" end_page="166" type="sub_section"> <SectionTitle> 6.3 Conclusions </SectionTitle> <Paragraph position="0"> In this paper we've described an algorithm for noun-noun compound disambiguation which automatically identifies the semantic relations and relation senses used in such compounds. We've given evidence showing that, coupled with a corpus of noun-noun compounds annotated with WordNet senses and semantic relations, this algorithm can identify the correct semantic relations for compounds with high precision. Unlike other approaches to automatic compound disambiguation which typically apply to particular specificdomains, ouralgorithmisnotdomainspecific and can identify relations for a random sample of noun-noun compounds drawn from the Word-Net dictionary. Further, our algorithm is fully automatic: unlike other approaches, our algorithm does not require the manual construction of relation rules to produce successful compound disambiguation. In future work we hope to extend this algorithm to provide a more efficient algorithmic implementation, and also to apply the algorithm in areas such as the machine translation of noun-noun compounds, where the identification of semantic relations in compounds is a crucial step in the translation process.</Paragraph> </Section> </Section> class="xml-element"></Paper>