File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0712_intro.xml

Size: 3,531 bytes

Last Modified: 2025-10-06 14:03:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0712">
  <Title>An Integrated Approach for Arabic-English Named Entity Translation</Title>
  <Section position="2" start_page="0" end_page="87" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Named Entities (NEs) translation is crucial for effective cross-language information retrieval (CLIR) and for Machine Translation. There are many types of NE phrases, such as: person names, organization names, location names, temporal expressions, and names of events. In this paper we only focus on three categories of NEs: person names, location names and organization names, though the approach is, in principle, general enough to accommodate any entity type.</Paragraph>
    <Paragraph position="1"> NE identification has been an area of significant research interest for the last few years. NE translation, however, remains a largely unstudied problem. NEs might be phonetically transliterated (e.g. persons names) and might also be mixed between phonetic transliteration and semantic translation as the case with locations and organizations names.</Paragraph>
    <Paragraph position="2"> There are three distinct approaches that can be applied for NE translation, namely: a transliteration approach, a word based translation approach and a phrase based translation approach.</Paragraph>
    <Paragraph position="3"> The transliteration approach depends on phonetic transliteration and is only appropriate for out of vocabulary and completely unknown words. For more frequently used words, transliteration does not provide sophisticated results. A word based approach depends upon traditional statistical machine translation techniques such as IBM Model1 (Brown et al., 1993) and may not always yield satisfactory results due to its inability to handle difficult many-to-many phrase translations.</Paragraph>
    <Paragraph position="4"> A phrase based approach could provide a good translation for frequently used NE phrases though it is inefficient for less frequent words. Each of the approaches has its advantages and disadvantages.</Paragraph>
    <Paragraph position="5"> In this paper we introduce an integrated approach for combining phrase based NE translation, word based NE translation, and NE transliteration in a single framework. Our approach attempts to harness the advantages of the three approaches while avoiding their pitfalls. We also introduce and evaluate a new approach for aligning NEs across parallel corpora, a process for automatically extracting new NEs translation phrases, and a new transliteration approach. As is typical for statistical MT, the system requires the availability of general parallel corpus and Named Entity identifiers for the NEs of interest.</Paragraph>
    <Paragraph position="6"> Our primary focus in this paper is on translating NEs out of context (i.e. NEs are extracted and translated without any contextual clues). Although  this is a more difficult problem than translating NEs in context, we adopt this approach because it is more generally useful for CLIR applications.</Paragraph>
    <Paragraph position="7"> The paper is organized as follows, section 2 presents related work, section 3 describes our integrated NE translation approach, section 4 presents the word based translation module, the phrase based module, the transliteration module, and system integration and decoding, section 5 provides the experimental setup and results and finally section 6 concludes the paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML