File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1030_intro.xml

Size: 2,310 bytes

Last Modified: 2025-10-06 14:06:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1030">
  <Title>Categorizing and standardizing proper nouns for efficient information retrieval, In B. Boguraev and</Title>
  <Section position="3" start_page="0" end_page="202" type="intro">
    <SectionTitle>
1 Proper Name Identification in
Natural Language Processing
</SectionTitle>
    <Paragraph position="0"> Text processing applications, such as machine translation systems, information retrieval systems or natural-language understanding systems, need to identify multi-word expressions that refer to proper names of people, organizations, places, laws and other entities. When encountering Mrs. Candy Hill in input text, for example, a machine translation system should not attempt to look up the translation of candy and hill, but should translate Mrs. to the appropriate personal title in the target language and preserve the rest of the name intact. Similarly, an information retrieval system should not attempt to expand Candy to all of its morphological variants or suggest synonyms (Wacholder et al. 1994).</Paragraph>
    <Paragraph position="1"> The need to identify proper names has two aspects: the recognition of known names and the discovery of new names. Since obtaining and maintaining a name database requires significant effort, many applications need to operate in the absence of such a resource. Without a database, names need to be discovered in the text and linked to entities they refer to. Even where name databases exist, text needs to be scanned for new names that are formed when entities, such as countries or commercial companies, are created, or for unknown names which become important when the entities they refer to become  topical. This situation is the norm for dynamic applications such as news providing services or Internet information indexing.</Paragraph>
    <Paragraph position="2"> The next Section describes the different types of proper name ambiguities we have observed. Section 3 discusses the role of context and world knowledge in their disambiguation; Section 4 describes the process of name discovery as implemented in Nominator, a module for proper name recognition developed at the IBM T.J. Watson Research Center. Sections 5-7 elaborate on Nominator's disambiguation heuristics.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML