File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/j04-2002_abstr.xml
Size: 5,892 bytes
Last Modified: 2025-10-06 13:43:23
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-2002"> <Title>c(c) 2004 Association for Computational Linguistics Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites</Title> <Section position="2" start_page="0" end_page="153" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The importance of domain ontologies is widely recognized, particularly in relation to the expected advent of the Semantic Web (Berners-Lee 1999). The goal of a domain ontology is to reduce (or eliminate) the conceptual and terminological confusion among the members of a virtual community of users (for example, tourist operators, commercial enterprises, medical practitioners) who need to share electronic documents and information of various kinds. This is achieved by identifying and properly defining a set of relevant concepts that characterize a given application domain. An ontology is therefore a shared understanding of some domain of interest (Uschold and Gruninger Creating ontologies is, however, a difficult and time-consuming process that involves specialists from several fields. Philosophical ontologists and artificial intelligence logicians are usually involved in the task of defining the basic kinds and structures of concepts (objects, properties, relations, and axioms) that are applicable in every [?] Dipartimento di Informatica, Universit`a di Roma &quot;La Sapienza,&quot; Via Salaria, 113 - 00198 Roma, Italia. E-mail: {navigli, velardi}@di.uniroma1.it.</Paragraph> <Paragraph position="1"> The three levels of generality of a domain ontology.</Paragraph> <Paragraph position="2"> possible domain. The issue of identifying these very few &quot;basic&quot; principles, now often referred to as foundational ontologies (FOs) (or top, or upper ontologies; see Figure 1) (Gangemi et al. 2002), meets the practical need of a model that has as much generality as possible, to ensure reusability across different domains (Smith and Welty 2001).</Paragraph> <Paragraph position="3"> Domain modelers and knowledge engineers are involved in the task of identifying the key domain conceptualizations and describing them according to the organizational backbones established by the foundational ontology. The result of this effort is referred to as the core ontology (CO), which usually includes a few hundred application domain concepts. While many ontology projects eventually succeed in the task of defining a core ontology, populating the third level, which we call the specific domain ontology (SDO), is the actual barrier that very few projects have been able to overcome (e.g., WordNet [Fellbaum 1995], Cyc [Lenat 1993], and EDR [Yokoi 1993]), but they pay a price for this inability in terms of inconsistencies and limitations. It turns out that, although domain ontologies are recognized as crucial resources for the Semantic Web, in practice they are not available and when available, they are rarely used outside specific research environments.</Paragraph> <Paragraph position="4"> So which features are most needed to build usable ontologies? * Coverage: The domain concepts must be there; the SDO must be sufficiently (for the application purposes) populated. Tools are needed to extensively support the task of identifying the relevant concepts and the relations among them.</Paragraph> <Paragraph position="5"> * Consensus: Decision making is a difficult activity for one person, and it gets even harder when a group of people must reach consensus on a given issue and, in addition, the group is geographically dispersed.</Paragraph> <Paragraph position="6"> When a group of enterprises decide to cooperate in a given domain, they have first to agree on many basic issues; that is, they must reach a consensus of the business domain. Such a common view must be reflected by the domain ontology.</Paragraph> <Paragraph position="7"> * Accessibility: The ontology must be easily accessible: tools are needed to easily integrate the ontology within an application that may clearly show Navigli and Velardi Learning Domain Ontologies its decisive contribution, e.g., improving the ability to share and exchange information through the web.</Paragraph> <Paragraph position="8"> In cooperation with another research institution, we defined a general architecture and a battery of systems to foster the creation of such &quot;usable&quot; ontologies. Consensus is achieved in both an implicit and an explicit way: implicit, since candidate concepts are selected from among the terms that are frequently and consistently employed in the documents produced by the virtual community of users; explicit, through the use of Web-based groupware aimed at consensual construction and maintenance of an ontology. Within this framework, the proposed tools are OntoLearn, for the automatic extraction of domain concepts from thematic Web sites; ConSys, for the validation of the extracted concepts; and SymOntoX, the ontology management system. This ontology-learning architecture has been implemented and is being tested in the context of several European projects, aimed at improving interoperability for networked enterprises.</Paragraph> <Paragraph position="9"> In Section 2, we provide an overview of the complete ontology-engineering architecture. In the remaining sections, we describe in more detail OntoLearn, a system that uses text mining techniques and existing linguistic resources, such as WordNet and SemCor, to learn, from available document warehouses and dedicated Web sites, domain concepts and taxonomic relations among them. OntoLearn automatically builds a specific domain ontology that can be used to create a specialized view of an existing general-purpose ontology, like WordNet, or to populate the lower levels of a core ontology, if available.</Paragraph> </Section> class="xml-element"></Paper>