File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1401_concl.xml
Size: 3,072 bytes
Last Modified: 2025-10-06 13:53:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1401"> <Title>Disambiguating Noun Compounds with Latent Semantic Indexing</Title> <Section position="6" start_page="0" end_page="55" type="concl"> <SectionTitle> 5 Conclusions and Future Research </SectionTitle> <Paragraph position="0"> In this study, we extended LSI beyond its usual remit by adopting it as a measure of conceptual association for noun compound disambiguation.</Paragraph> <Paragraph position="1"> The results reported here are encouraging, the highest accuracy of 84% on the AmiPro collection indicating the potential of our approach.</Paragraph> <Paragraph position="2"> However, poorer performance was obtained for the other collections indicating that there is much room for improvement. We therefore intend to pursue our investigation of the utility of applying vector-based measures of semantic similarity to the problem of syntactic disambiguation. An attractive feature of this approach for the processing of terminology is that it requires no manually constructed knowledge sources, meaning that it does not suffer the same coverage limitations as the methods of Lauer (1995) and Resnik (1993). In principle, our approach can be applied to any domain.</Paragraph> <Paragraph position="3"> Another attractive feature is that it does not rely on counts of unambiguous subconstituents in training. This means that it can be applied to novel compounds for which no subcompounds exist in training, something which would not be figure shows percentage disambiguation accuracy of the adjacency and dependency models for a range of SVD factors. The percentage of left-branching compounds in each test set, which served as the baseline in our study, is also shown for comparison. possible for the statistical techniques outlined in Section 2. Our next step will thus be to investigate the efficacy of our approach on novel compounds.</Paragraph> <Paragraph position="4"> We are currently examining the use of other techniques for deriving vector-based measures ofconceptualassociation; preliminaryinvestigations using a &quot;sliding window&quot; method (Burgess and Lund, 1999; Levy and Bullinaria, 2001) to disambiguate compounds from the AmiPro corpusshowresultsevenbetterthanthosereported null here. Present work involves setting various parameters (e.g., window size, similarity metric, weighting method) to study their effect on performance. We are continuing to test both the adjacency and dependency algorithms on this corpus, and have consistently found better performance using the former.</Paragraph> <Paragraph position="5"> Future work will involve continuing to test the technique in other domains; we also intend training on larger and more diverse corpora.</Paragraph> <Paragraph position="6"> Furthermore, we plan to investigate other examples of syntactic ambiguity, such as prepositional phrase attachment. Such structures pose many problems for traditional NLP systems, but may prove amenable to the techniques discussed in this paper.</Paragraph> </Section> class="xml-element"></Paper>