File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/p04-1068_relat.xml
Size: 3,880 bytes
Last Modified: 2025-10-06 14:15:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1068"> <Title>Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora</Title> <Section position="4" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> In this section, we review some research in generating translation equivalents for automatic construction of translational lexicons.</Paragraph> <Paragraph position="1"> Transitive translation: Several transitive translation techniques have been developed to deal with the unreliable direct translation problem. Borin (2000) used various sources to improve the alignment of word translation and proposed the pivot alignment, which combined direct translation and indirect translation via a third language. Gollins et al. (2001) proposed a feasible method that translated terms in parallel across multiple intermediate languages to eliminate errors. In addition, Simard (2000) exploited the transitive properties of translations to improve the quality of multilingual text alignment.</Paragraph> <Paragraph position="2"> Corpus-based translation: To automatically construct translation lexicons, conventional research in MT has generally used statistical techniques to extract translations from domain-specific sentence-aligned parallel bilingual corpora. Kupiec (1993) attempted to find noun phrase correspondences in parallel corpora using part-of-speech tagging and noun phrase recognition methods. Smadja et al.</Paragraph> <Paragraph position="3"> (1996) proposed a statistical association measure of the Dice coefficient to deal with the problem of collocation translation. Melamed (2000) proposed statistical translation models to improve the techniques of word alignment by taking advantage of pre-existing knowledge, which was more effective than a knowledge-free model. Although high accuracy of translation extraction can be easily achieved by these techniques, sufficiently large parallel corpora for query term &quot;George Bush&quot; from Google.</Paragraph> <Paragraph position="4"> various subject domains and language pairs are not always available.</Paragraph> <Paragraph position="5"> Some attention has been devoted to automatic extraction of term translations from comparable or even unrelated texts. Such methods encounter more difficulties due to the lack of parallel correlations aligned between documents or sentence pairs. Rapp (1999) utilized non-parallel corpora based on the assumption that the contexts of a term should be similar to the contexts of its translation in any language pairs. Fung et al. (1998) also proposed a similar approach that used a vector-space model and took a bilingual lexicon (called seed words) as a feature set to estimate the similarity between a word and its translation candidates.</Paragraph> <Paragraph position="6"> Web-based translation: Collecting parallel texts of different language versions from the Web has recently received much attention (Kilgarriff et al., 2003). Nie et al. (1999) tried to automatically discover parallel Web documents. They assumed a Web page's parents might contain the links to different versions of it and Web pages with the same content might have similar structures and lengths. Resnik (1999) addressed the issue of language identification for finding Web pages in the languages of interest.</Paragraph> <Paragraph position="7"> Yang et al. (2003) presented an alignment method to identify one-to-one Chinese and English title pairs based on dynamic programming. These methods often require powerful crawlers to gather sufficient Web data, as well as more network bandwidth and storage. On the other hand, Cao et al. (2002) used the Web to examine if the arbitrary combination of translations of a noun phrase was statistically important. null</Paragraph> </Section> class="xml-element"></Paper>