File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/n06-1003_relat.xml

Size: 1,379 bytes

Last Modified: 2025-10-06 14:15:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1003">
  <Title>Improved Statistical Machine Translation Using Paraphrases</Title>
  <Section position="7" start_page="22" end_page="23" type="relat">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Previous research on trying to overcome data sparsity issues in statistical machine translation has largely focused on introducing morphological analysis as a way of reducing the number of types observed in a training text. For example, Nissen and Ney (2004) apply morphological analyzers to English and German and are able to reduce the amount of training data needed to reach a certain level of translation quality. Goldwater and McClosky (2005) find that stemming Czech and using lemmas improves the word-to-word correspondences when training Czech-English alignment models. Koehn and Knight (2003) show how monolingual texts and parallel corpora can be used to figure out appropriate places to split German compounds.</Paragraph>
    <Paragraph position="1"> Still other approaches focus on ways of acquiring data. Resnik and Smith (2003) develop a method for gathering parallel corpora from the web. Oard et al. (2003) describe various methods employed for quickly gathering resources to create a machine translation system for a language with no initial resources. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML