File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1025_intro.xml
Size: 3,331 bytes
Last Modified: 2025-10-06 14:01:18
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1025"> <Title>Named Entity Recognition: A Maximum Entropy Approach Using Global Information</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Considerable amount of work has been done in recent years on the named entity recognition task, partly due to the Message Understanding Conferences (MUC). A named entity recognizer (NER) is useful in many NLP applications such as information extraction, question answering, etc. On its own, a NER can also provide users who are looking for person or organization names with quick information. In MUC-6 and MUC-7, the named entity task is defined as finding the following classes of names: person, organization, location, date, time, money, and percent (Chinchor, 1998; Sundheim, 1995) Machine learning systems in MUC-6 and MUC-7 achieved accuracy comparable to rule-based systems on the named entity task.</Paragraph> <Paragraph position="1"> Statistical NERs usually find the sequence of tags that maximizes the probability a0a2a1a4a3a6a5a8a7a10a9 , where a7 is the sequence of words in a sentence, and a3 is the sequence of named-entity tags assigned to the words in a7 . Attempts have been made to use global information (e.g., the same named entity occurring in different sentences of the same document), but they usually consist of incorporating an additional classifier, which tries to correct the errors in the output of a first NER (Mikheev et al., 1998; Borthwick, 1999). We propose maximizing a0a2a1a4a3a6a5a8a7a12a11a14a13a16a15a18a17a19a9 , where a3 is the sequence of named-entity tags assigned to the words in the sentence a7 , and a13a20a15a18a17 is the information that can be extracted from the whole document containing a7 . Our system is built on a maximum entropy classifier. By making use of global context, it has achieved excellent results on both MUC-6 and MUC-7 official test data. We will refer to our system as MENERGI (Maximum Entropy Named Entity Recognizer using Global Information).</Paragraph> <Paragraph position="2"> As far as we know, no other NERs have used information from the whole document (global) as well as information within the same sentence (local) in one framework. The use of global features has improved the performance on MUC-6 test data from 90.75% to 93.27% (27% reduction in errors), and the performance on MUC-7 test data from 85.22% to 87.24% (14% reduction in errors). These results are achieved by training on the official MUC-6 and MUC-7 training data, which is much less training data than is used by other machine learning systems that worked on the MUC-6 or MUC-7 named entity task (Bikel et al., 1997; Bikel et al., 1999; Borthwick, 1999).</Paragraph> <Paragraph position="3"> We believe it is natural for authors to use abbreviations in subsequent mentions of a named entity (i.e., first &quot;President George Bush&quot; then &quot;Bush&quot;). As such, global information from the whole context of a document is important to more accurately recognize named entities. Although we have not done any experiments on other languages, this way of using global features from a whole document should be applicable to other languages.</Paragraph> </Section> class="xml-element"></Paper>