File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/h05-1110_relat.xml
Size: 2,765 bytes
Last Modified: 2025-10-06 14:15:44
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1110"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 875-882, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Inducing a multilingual dictionary from a parallel multitext in related languages</Title> <Section position="4" start_page="875" end_page="875" type="relat"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> There has been a lot of work done on building dictionaries, by using a variety of techniques. One good overview is Melamed (2000). There is work on lexicon induction using string distance or other phonetic/orthographic comparison techniques, such as Mann and Yarowsky (2001) or semantic comparison using resources such as WordNet (Kondrak, 2001). Such work, however, primarily focuses on finding cognates, whereas we are interested in translations of all words. Moreover, while some techniques (e.g., Mann and Yarowsky (2001)) use multiple languages, the languages used have resources such as dictionaries between some language pairs.</Paragraph> <Paragraph position="1"> We do not require any dictionaries for any language pair.</Paragraph> <Paragraph position="2"> An important element of our work is focusing on more than a pair of languages. There is an active research area focusing on multi-source translation (e.g., Och and Ney (2001)). Our setting is the reverse: we do not use multiple dictionaries in order to translate, but translate (in a very crude way) in order to build multiple dictionaries.</Paragraph> <Paragraph position="3"> Many machine translation techniques require dictionary building as a step of the process, and therefore have also attacked this problem. They use a variety of approaches (a good overview is Koehn and Knight (2001)), many of which require advanced tools for both languages which we are not able to use. They also use bilingual (and to some extent monolingual) corpora, which we do have available.</Paragraph> <Paragraph position="4"> They do not, however, focus on related languages, and tend to ignore lexical similarity 4, nor are they able to work on more than a pair of languages at a time.</Paragraph> <Paragraph position="5"> It is also worth noting that there has been some MT work on related languages which explores language similarity in an opposite way: by using dictionaries and tools for both languages, and assuming that a near word-for-word approach is reasonable (Hajic et al., 2000).</Paragraph> <Paragraph position="6"> 4Much of recent MT research focuses on pairs of languages which are not related, such as English-Chinese, English-Arabic, etc.</Paragraph> </Section> class="xml-element"></Paper>