File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1806_concl.xml
Size: 3,042 bytes
Last Modified: 2025-10-06 13:54:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1806"> <Title>Automatically Inducing Ontologies from Corpora</Title> <Section position="7" start_page="74" end_page="74" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> The evidence combination described above is based on transitivity and union. Since the above evaluations, we have been experimenting with an ad hoc weighted evidence combination scheme, based on each knowledge source expressing a strength for a posited relation. In future, we will also investigate using an initial seed ontology to provide a better 'backbone' for induction, and then using a spreading activation method to activate nodes related by existing knowledge sources to seed nodes. Corpus statistics can be used to weight the links. For example, based on (Caraballo 1999), each parent of a leaf node could be viewed as a cluster label for its children, with the weight of a parent-child link being determined based on how strongly the child is associated with the cluster.</Paragraph> <Paragraph position="1"> The mean distance in H between terms that are distance 1 apart in M is 5.17, with a standard deviation of 2.12. The mean distance in M between terms which are distance 1 apart in H is 3.85, with a standard deviation of 1.69.</Paragraph> <Paragraph position="2"> The ontology induction methods described here can allow for considerable savings in time in CompuTerm 2004 - 3rd International Workshop on Computational Terminology 53 constructing ontologies. The evaluations we have carried out are suggestive, but many issues remain open. There are many unanswered questions about human-created reference ontologies, including lack of inter-annotator agreement studies. Indeed, experience shows that without guidelines for ontology construction, humans are prone to come up with very different ontologies for a domain.</Paragraph> <Paragraph position="3"> Comparing a machine-induced ontology against an ideal human reference ontology, were one to be available, is also fraught with problems. Our experience with using an implementation of the (Daude et al. 2001) constraint relaxation algorithm for ontology comparison suggests that much work is needed on distance metrics which are not oversensitive to small differences in structure.</Paragraph> <Paragraph position="4"> Our interest, therefore, is focused more towards an extrinsic evaluation. PRONTO, which is due to be released in 2004, offers the opportunity to measure costs of ontology induction and post-editing on a large-scale problem of value to the biology community. We also plan to measure the effectiveness of PRONTO in query expansion for information access to MEDLINE and protein databases. Finally, we will investigate more sophisticated evidence combination methods, and compare against other automatic methods for ontology induction.</Paragraph> <Paragraph position="5"> The ontology induction tools are available for free distribution for research purposes.</Paragraph> </Section> class="xml-element"></Paper>