File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2136_intro.xml
Size: 2,703 bytes
Last Modified: 2025-10-06 14:00:54
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2136"> <Title>Automatic Acquisition of Domain Knowledge for Information Extraction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 0 Introduction </SectionTitle> <Paragraph position="0"> Intbrmation Extraction is the selective extraction of specified types of intbrmation from natural language text. The intbrmation to be extracted may consist of particular semantic classes of objects (entities), relationships among these entities, and events in which these entities participate. The extraction system places this intbrmation into a data base tbr retrieval and subsequent processing.</Paragraph> <Paragraph position="1"> In this paper we shall be concerned primarily with the extraction of intbrmation about events. In the terminology which has evolved ti'om the Message Understanding Conferences (muc, 1995; muc, 1993), we shall use the term subject domain to refer to a broad class of texts, such as business news, and tile term scenario to refer to tile specification of tile particular events to be extracted. For example, the &quot;Management Succession&quot; scenario for MUC-6, which we shall refer to throughout this paper, involves information about corporate executives starting and leaving positions.</Paragraph> <Paragraph position="2"> The fundamental problem we face in porting an extraction system to a new scenario is to identify the many ways in which intbrmation about a type of event may be expressed in the text;. Typically, there will be a few common tbrms of expression which will quickly come to nfind when a system is being developed. However, the beauty of natural language (and the challenge tbr computational linguists) is that there are many variants which an imaginative writer cast use, and which the system needs to capture. Finding these variants may involve studying very large amounts of text; in the sub-ject domain. This has been a major impediment to the portability and performance of event extraction systems.</Paragraph> <Paragraph position="3"> We present; in this paper a new approach to finding these variants automatically fl'om a large corpus, without the need to read or amLotate the corpus. This approach has been evaluated on actual event extraction scenarios.</Paragraph> <Paragraph position="4"> In the next section we outline the strncture of our extraction system, and describe the discovery task in the context of this system. Sections 2 and 3 describe our algorithm for pattern discovery; section 4 describes our experimental results. This is tbllowed by comparison with prior work and discussion in section 5.</Paragraph> </Section> class="xml-element"></Paper>