File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0420_intro.xml
Size: 2,110 bytes
Last Modified: 2025-10-06 14:01:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0420"> <Title>Maximum Entropy Models for Named Entity Recognition</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper, we present an approach for extracting the named entities (NE) of natural language inputs which uses the maximum entropy (ME) framework (Berger et al., 1996). The objective can be described as follows.</Paragraph> <Paragraph position="1"> Given a natural input sequence a4a6a5a7a9a8a10a4 a7a12a11a13a11a14a11a4a16a15 a11a13a11a14a11a4 we choose the NE tag sequence a17 a5a7 a8a18a17 a7a19a11a14a11a14a11a17a20a15 a11a13a11a14a11a17 with the highest probability among all possible tag sequences:</Paragraph> <Paragraph position="3"> The argmax operation denotes the search problem, i.e.</Paragraph> <Paragraph position="4"> the generation of the sequence of named entities. According to the CoNLL-2003 competition, we concentrate on four types of named entities: persons (PER), locations (LOC), organizations (ORG), and names of miscellaneous entities (MISC) that do not belong to the previous three groups, e.g.</Paragraph> <Paragraph position="5"> [PER Clinton] 's [ORG Ballybunion] fans invited to [LOC Chicago] .</Paragraph> <Paragraph position="6"> Additionally, the task requires the processing of two different languages from which only English was specified before the submission deadline. Therefore, the system described avoids relying on language-dependent knowledge but instead uses a set of features which are easily obtainable for almost any language.</Paragraph> <Paragraph position="7"> The remainder of the paper is organized as follows: in section 2, we outline the ME framework and specify the features that were used for the experiments. We describe the training and search procedure of our approach. Section 3 presents experimental details and shows results obtained on the English and German test sets. Finally, section 4 closes with a summary and an outlook for future work.</Paragraph> </Section> class="xml-element"></Paper>