File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1066_intro.xml
Size: 2,647 bytes
Last Modified: 2025-10-06 14:06:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1066"> <Title>Knowledge Acquisition from Texts : Using an Automatic Clustering Method Based on Noun-Modifier Relationship</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Knowledge Acquisition (KA) from technical texts is a growing research area among the Knowledge-Based Systems (KBS) research community since documents containing a large amount of technical knowledge are available on electronic media.</Paragraph> <Paragraph position="1"> We focus on the methodological aspects of KA from texts. In order to build up the model of the subject field, we need to perform a corpus-based semantic analysis. Prior to the semantic analysis, morpho-syntactic analysis is performed by LEXTER, a terminology extraction software (Bourigault et al., 1996) : LEXTER gives a network of noun phrases which are likely to be terminological units and which are connected by syntactical links. When dealing with medium-sized corpora (a few hundred thousand words), the terminological network is too voluminous for analysis by hand and it becomes necessary to use data analysis tools to process it. The main idea to make KA from medium-sized corpora a feasible and efficient task is to perform a robust syntactic analysis (using LEXTER, see section 2) followed by a semi-automatic semantic analysis where automatic clustering techniques are used interactively by the knowledge engineer (see sections 3 and 4).</Paragraph> <Paragraph position="2"> We agree with the differential definition of semantics : the meaning of the morpho-lexical units is not defined by reference to a concept, but rather by contrast with other units (Rastier et al., 1994).</Paragraph> <Paragraph position="3"> In fact, we are considering &quot;word usage rather than word meanin\]' (Zernik, 1990) following in this the distributional point of view, see (Harris, 1968), (Hindle, 1990).</Paragraph> <Paragraph position="4"> Statistical or probabilistic methods are often used to extract semantic clusters from corpora in order to build lexical resources for ANLP tools (Hindle, 1990), (Zernik, 1990), (Resnik, 1993), or for automatic thesaurus generation (Grefenstette, 1994).</Paragraph> <Paragraph position="5"> We use similar techniques, enriched by a preliminaxy morpho-synta~ztic analysis, in order to perform knowledge acquisition and modeling for a specific task (e.g. : electrical network planning). Moreover, we are dealing with language for specific purpose texts and not with general texts.</Paragraph> </Section> class="xml-element"></Paper>