File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1002_intro.xml

Size: 2,598 bytes

Last Modified: 2025-10-06 14:03:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1002">
  <Title>The Role of Lexical Resources in CJK Natural Language Processing</Title>
  <Section position="4" start_page="0" end_page="9" type="intro">
    <SectionTitle>
2 Named Entity Extraction
</SectionTitle>
    <Paragraph position="0"> Named Entity Recognition (NER) is useful in NLP applications such as question answering, machine translation and information extraction.</Paragraph>
    <Paragraph position="1"> A major difficulty in NER, and a strong motivation for using tools based on probabilistic methods, is that the compilation and maintenance of large entity databases is time consuming and expensive. The number of personal names and their variants (e.g. over a hundred ways to spell Mohammed) is probably in the billions. The number of place names is also large, though they are relatively stable compared with the names of organizations and products, which change frequently.</Paragraph>
    <Paragraph position="2"> A small number of organizations, including The CJK Dictionary Institute (CJKI), maintain databases of millions of proper nouns, but even such comprehensive databases cannot be kept fully up-to-date as countless new names are created daily. Various techniques have been used to automatically detect entities, one being the use of keywords or syntactic structures that co-occur with proper nouns, which we refer to as named entity contextual clues (NECC).</Paragraph>
    <Section position="1" start_page="9" end_page="9" type="sub_section">
      <SectionTitle>
Headword Reading Example
</SectionTitle>
      <Paragraph position="0"> senta senta Guo Min Sheng Huo senta hoteru hoteru hoterusiono Yi eki Zhao Xia Yi Xie Hui kiyoukai Ri Ben yunisehu Xie Hui Table 1 shows NECCs for Japanese proper nouns, which when used in conjunction with entity lexicons like the one shown in Table 2 below achieve high precision in entity recognition. Of course for NER there is no need for such lexicons to be multilingual, though it is obviously essential for MT.</Paragraph>
      <Paragraph position="1">  Yemen iemen Ye Men L Xie Men yemen Note how the lexemic pairs (&amp;quot;L&amp;quot; in the LO column) in Table 2 above are not merely simplified and traditional orthographic (&amp;quot;O&amp;quot;) versions of each other, but independent lexemes equivalent to American truck and British lorry. NER, especially of personal names and place names, is an area in which lexicon-driven methods have a clear advantage over probabilistic methods and in which the role of lexical resources should be a central one.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML