File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2112_intro.xml
Size: 3,260 bytes
Last Modified: 2025-10-06 14:03:48
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2112"> <Title>Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs</Title> <Section position="3" start_page="0" end_page="874" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Word alignment was first proposed as an intermediate result of statistical machine translation (Brown et al., 1993). Many researchers build alignment links with bilingual corpora (Wu, 1997; Och and Ney, 2003; Cherry and Lin, 2003; Zhang and Gildea, 2005). In order to achieve satisfactory results, all of these methods require a large-scale bilingual corpus for training. When the large-scale bilingual corpus is unavailable, some researchers acquired class-based alignment rules with existing dictionaries to improve word alignment (Ker and Chang, 1997). Wu et al.</Paragraph> <Paragraph position="1"> (2005) used a large-scale bilingual corpus in general domain to improve domain-specific word alignment when only a small-scale domain-specific bilingual corpus is available.</Paragraph> <Paragraph position="2"> This paper proposes an approach to improve word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2, we introduce a third language L3 as the pivot language. Although only small amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilingual corpora in L1-L3 and L2-L3 are available.</Paragraph> <Paragraph position="3"> Using these two additional bilingual corpora, we train two word alignment models for language pairs L1-L3 and L2-L3, respectively. And then, with L3 as a pivot language, we can build a word alignment model for L1 and L2 based on the above two models. Here, we call this model an induced model. With this induced model, we perform word alignment between languages L1 and L2 even if no parallel corpus is available for this language pair. In addition, using the small bilingual corpus in L1 and L2, we train another word alignment model for this language pair. Here, we call this model an original model. An interpolated model can be built by interpolating the induced model and the original model.</Paragraph> <Paragraph position="4"> As a case study, this paper uses English as the pivot language to improve word alignment between Chinese and Japanese. Experimental results show that the induced model performs better than the original model trained on the small Chinese-Japanese corpus. And the interpolated model further improves the word alignment results, achieving a relative error rate reduction of 21.30% as compared with results produced by the original model.</Paragraph> <Paragraph position="5"> The remainder of this paper is organized as follows. Section 2 discusses the related work.</Paragraph> <Paragraph position="6"> Section 3 introduces the statistical word alignment models. Section 4 describes the parameter estimation method using bilingual corpora of other language pairs. Section 5 presents the interpolation model. Section 6 reports the experimental results. Finally, we conclude and present the future work in section 7.</Paragraph> </Section> class="xml-element"></Paper>