File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1412_intro.xml

Size: 1,979 bytes

Last Modified: 2025-10-06 14:01:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1412">
  <Title>A Comparative Study on Translation Units for Bilingual Lexicon Extraction</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Developments in statistical or example-based MT largely rely on the use of bilingual corpora.</Paragraph>
    <Paragraph position="1"> Although bilingual corpora are becoming more available, they are still an expensive resource compared with monolingual corpora. So if one is fortune to have such bilingual corpora at hand, one must seek the maximal exploitation of linguistic knowledge from the corpora.</Paragraph>
    <Paragraph position="2"> This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. Our approach owes greatly to recent advances in various NLP tools such as part-of-speech taggers, chunkers, and dependency parsers. All such tools are trained from corpora using statistical methods or machine learning techniques. The linguistic &amp;quot;clues&amp;quot; obtained from these tools may be prone to some error, but there is much partially reliable information which is usable in the generation of translation units from unannotated bilingual corpora. null Three N-gram models of generating translation units, namely Bound-length N-gram, Chunk-bound N-gram, and Dependency-linked N-gram are compared. We aim to determine characteristics of translation units that achieve both high accuracy and wide coverage and to identify the limitation of these models.</Paragraph>
    <Paragraph position="3"> In the next section, we describe three models used to generate translation units. Section 3 explains the extraction algorithm of translation pairs. In Sections 4 and 5, we present our experimental results and analyze the characteristics of each model. Finally, Section 6 concludes the paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML