File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2112_relat.xml
Size: 2,659 bytes
Last Modified: 2025-10-06 14:15:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2112"> <Title>Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs</Title> <Section position="4" start_page="874" end_page="874" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> A shared task on word alignment was organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts (Martin et al., 2005).</Paragraph> <Paragraph position="1"> The focus of the task was on languages with scarce resources. Two different subtasks were defined: Limited resources and Unlimited resources. The former subtask only allows participating systems to use the resources provided. The latter subtask allows participating systems to use any resources in addition to those provided.</Paragraph> <Paragraph position="2"> For the subtask of unlimited resources, Aswani and Gaizauskas (2005) used a multi-feature approach for many-to-many word alignment on English-Hindi parallel corpora. This approach performed local word grouping on Hindi sentences and used other methods such as dictionary lookup, transliteration similarity, expected English words, and nearest aligned neighbors. Martin et al. (2005) reported that this method resulted in absolute improvements of up to 20% as compared with the case of only using limited resources. Tufis et al. (2005) combined two word aligners: one is based on the limited resources and the other is based on the unlimited resources.</Paragraph> <Paragraph position="3"> The unlimited resource consists of a translation dictionary extracted from the alignment of Romanian and English WordNet. Lopez and Resnik (2005) extended the HMM model by integrating a tree distortion model based on a dependency parser built on the English side of the parallel corpus. The latter two methods produced comparable results with those methods using limited resources. All the above three methods use some language dependent resources such as dictionary, thesaurus, and dependency parser. And some methods, such as transliteration similarity, can only be used for very similar language pairs.</Paragraph> <Paragraph position="4"> In this paper, besides the limited resources for the given language pair, we make use of large amounts of resources available for other language pairs to address the alignment problem for languages with scarce resources. Our method does not need language-dependent resources or deep linguistic processing. Thus, it is easy to adapt to any language pair where a pivot language and corresponding large-scale bilingual corpora are available.</Paragraph> </Section> class="xml-element"></Paper>