File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1006_intro.xml
Size: 3,521 bytes
Last Modified: 2025-10-06 14:01:18
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1006"> <Title>Semi-Automatic Practical Ontology Construction by Using a Thesaurus, Computational Dictionaries, and Large Corpora</Title> <Section position="2" start_page="0" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> An ontology is a knowledge base with information about concepts existing in the world or domain, their properties, and how they relate to each other. The principal reasons to use an ontology in machine translation (MT) are to enable source language analyzers and target language generators to share knowledge, to store semantic constraints, and to resolve semantic ambiguities by making inferences using the concept network of the ontology (Mahesh, 1996; Nirenburg et al., 1992). An ontology is different from a thesaurus in that it contains only language independent information and many other semantic relations, as well as taxonomic relations.</Paragraph> <Paragraph position="1"> In general, to build a high-quality semantic knowledge base, manual processing is indispensable. Previous attempts were mostly performed manually, or were developed without considering the context of a practical situation (Mahesh, 1996; Lenat et al., 1990). Therefore, it is difficult to construct a practical ontology with limited time and manpower resources. To solve this problem, we propose a semi-automatic ontology construction method, which takes full advantage of already existing knowledge resources and practical usages in large corpora. First, we define our ontology representation language (ORL) by modifying the most suitable among previously developed ORLs, and then design a language-independent and practical (LIP) ontology structure based on the defined ORL. Afterwards, we construct a practical ontology by the semi-automatic construction method given below.</Paragraph> <Paragraph position="2"> We extend the existing Kadokawa thesaurus (Ohno & Hamanishi, 1981) by inserting additional semantic relations into the hierarchy of the thesaurus. Uramoto (1996) and Tokunaga (1997) propose thesaurus extension methods for positioning unknown words in an existing thesaurus. Our approach differs in that the objects inserted are not words but semantic relations.</Paragraph> <Paragraph position="3"> Additional semantic relations can be classified as case relations and other semantic relations. The former can be obtained by converting the established valency information in bilingual dictionaries of COBALT-J/K (Collocation-Based Language Translator from Japanese to Korean) and COBALT-K/J (Collocation-Based Language Translator from Korean to Japanese) (Moon & Lee, 2000) MT systems, as well as from the case frame in the Sejong electronic dictionary . The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus (Li et al., 2000).</Paragraph> <Paragraph position="4"> The remainder of this paper is organized as follows. We describe the principles of ontology design and an ORL used to represent our LIP ontology in the next section. In Section 3, we describe the semi-automatic ontology construction methodology in detail. An ontology-based word sense disambiguation (WSD) algorithm is given in Section 4.</Paragraph> <Paragraph position="5"> Experimental results are presented and analyzed in Section 5. Finally, we make a conclusion and indicate the direction of our future work in Section 6.</Paragraph> </Section> class="xml-element"></Paper>