File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-2013_intro.xml
Size: 3,019 bytes
Last Modified: 2025-10-06 14:01:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-2013"> <Title>SOAT: A Semi-Automatic Domain Ontology Acquisition Tool from Chinese Corpus</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Domain ontology is important for large-scale natural language application systems such as speech recognition (Flett & Brown 2001), question answering (QA), knowledge management and organization memory (KM/OM), information retrieval, machine translation (Guarino 1998), and grammar checking systems (Bredenkamp 2000). With the help of domain ontology, software systems can perform better in understanding natural language. However, building domain ontology is laborious and time consuming.</Paragraph> <Paragraph position="1"> Previous works suggest that ontology acquisition is an iterative process which includes keyword collection as well as structure reorganization. The ontology will be revised, refined, and filled in detail during iteration. (Noy and McGuinness 2001) For example (Hearst 1992), in order to find a hyponym of a keyword, the human editor must observe sentences containing this keyword and its related hyponyms. The editor then deduces rules for finding more hyponyms of this keyword. As such cycle iterates, the editor refines the rules to obtain better quality pairs of keyword-hyponyms. In this work we try to speed up the above labor-intensive approach by designing acquisition rules that can be applied recursively. A human editor only has to verify the results of the acquisition.</Paragraph> <Paragraph position="2"> The extraction rules we specified are templates of part-of-speech (POS) tagged phrase structure. Parsing a phrase by POS tags (Abney 1991) is a well-known shallow parsing technique, which provides the natural language processing function for different natural language applications including ontology acquisition (Maedche and Staab 2000).</Paragraph> <Paragraph position="3"> In previous works (Hsu et al. 2001), we have constructed a knowledge representation framework, InfoMap, to integrate various linguistic knowledge, commonsense knowledge and domain knowledge. InfoMap is designed to perform natural language understanding. It has been applied to many application domains, such as QA system and KM/OM (Wu et al. 2002) and has obtained encouraging results. An important characteristic of InfoMap is to extract events from a sentence by capturing the topic words, usually noun-verb (NV) pairs or noun-noun (NN) pairs, which is defined in domain ontology. We design the SOAT as a semi-automatic domain ontology acquisition tool following the ontology framework, InfoMap.</Paragraph> <Paragraph position="4"> We shall review the InfoMap ontology framework in section 2. The domain ontology acquisition process and extraction rules will be discussed in Section 3. Experimental results are reported in section 4. We conclude our work in Section 5.</Paragraph> </Section> class="xml-element"></Paper>