File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/n06-1003_relat.xml
Size: 1,379 bytes
Last Modified: 2025-10-06 14:15:54
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1003"> <Title>Improved Statistical Machine Translation Using Paraphrases</Title> <Section position="7" start_page="22" end_page="23" type="relat"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> Previous research on trying to overcome data sparsity issues in statistical machine translation has largely focused on introducing morphological analysis as a way of reducing the number of types observed in a training text. For example, Nissen and Ney (2004) apply morphological analyzers to English and German and are able to reduce the amount of training data needed to reach a certain level of translation quality. Goldwater and McClosky (2005) find that stemming Czech and using lemmas improves the word-to-word correspondences when training Czech-English alignment models. Koehn and Knight (2003) show how monolingual texts and parallel corpora can be used to figure out appropriate places to split German compounds.</Paragraph> <Paragraph position="1"> Still other approaches focus on ways of acquiring data. Resnik and Smith (2003) develop a method for gathering parallel corpora from the web. Oard et al. (2003) describe various methods employed for quickly gathering resources to create a machine translation system for a language with no initial resources. null</Paragraph> </Section> class="xml-element"></Paper>