File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1006_intro.xml

Size: 3,283 bytes

Last Modified: 2025-10-06 14:03:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1006">
  <Title>Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages</Title>
  <Section position="3" start_page="41" end_page="41" type="intro">
    <SectionTitle>
2 Morphology in SMT Systems
</SectionTitle>
    <Paragraph position="0"> Previous approaches have used morpho-syntactic knowledge mainly at the low-level stages of a machine translation system, i.e. for preprocessing.</Paragraph>
    <Paragraph position="1"> (Niessen and Ney, 2001a) use morpho-syntactic knowledge for reordering certain syntactic constructions that differ in word order in the source vs. target language (German and English). Re-ordering is applied before training and after generating the output inthe target language. Normalization of English/German inflectional morphology to base forms for the purpose of word alignment is performed in (Corston-Oliver and Gamon, 2004) and (Koehn, 2005), demonstrating that the vocabulary size can be reduced significantly without affecting performance.</Paragraph>
    <Paragraph position="2"> Similar morphological simplifications have been applied to other languages such as Romanian (Fraser and Marcu, 2005) in order to decrease word alignment error rate. In (Niessen and Ney, 2001b), a hierarchical lexicon model is used that represents words as combinations of full forms, base forms, and part-of-speech tags, and that allows the word alignment training procedure to interpolate counts based on the different levels of representation. (Goldwater and McCloskey, 2005) investigate various morphological modifications for Czech-English translations: a subset of the vocabulary was converted to stems, pseudowords consisting ofmorphological tags wereintroduced, and combinations of stems and morphological tags were used as new word forms. Small improvements were found in combination with a word-to-word translation model. Most of these techniques have focused on improving word alignment or reducing vocabulary size; however, it is often the case that better word alignment does not improve the overall translation performance of a standard phrase-based SMT system.</Paragraph>
    <Paragraph position="3"> Phrase-based models themselves have not benefited much from additional morpho-syntactic knowledge; e.g. (Lioma and Ounis, 2005) do not report any improvement from integrating part-of-speech information at the phrase level. One successful application of morphological knowledge is (de Gispert et al., 2005), where knowledge-based morphological techniques are used to identify unseen verb forms in the test text and to generate inflected forms in the target language based on annotated POS tags and lemmas. Phrase prediction in the target language is conditioned on the phrase in the source language as well the corresponding tuple of lemmatized phrases. This technique worked well for translating from a morphologically poor language (English) to a more highly inflected language (Spanish) when applied to unseen verb forms. Treating both known and unknown verbs in this way, however, did not result in additional improvements. Here we extend the notion of treating known and unknown words differently and propose a backoff model for phrase-based translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML