XML Viewer - w02-1012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1012_intro.xml
Size: 4,063 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1012">
  <Title>Extensions to HMM-based Statistical Word Alignment Models</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The main task in statistical machine translation is to model the string translation probability a0a2a1a4a3a6a5a7a9a8a10a12a11a7a14a13 where the string a10 a11a7 in one language is translated into another language as string a3a15a5a7 . We refer to a3a16a5a7 as the source language string and a10 a11a7 as the target language string in accordance with the noisy channel terminology used in the IBM models of (Brown et al., 1993). Word-level translation models assume a pairwise mapping between the words of the source and target strings. This mapping is generated by alignment models. In this paper we present extensions to the HMM alignment model of (Vogel et al., 1996; Och and Ney, 2000b). Some of our extensions are applicable to other alignment models as well and are of general utility.1 For most language pairs huge amounts of parallel corpora are not readily available whereas monolingual resources such as taggers are more often available. Little research has gone into exploring the po- null authors would also like to thank the various reviewers for their helpful comments on earlier versions.</Paragraph>
    <Paragraph position="1"> tential of part of speech information to better model translation probabilities and permutation probabilities. Melamed (2000) uses a very broad classification of words (content, function and several punctuation classes) to estimate class-specific parameters for translation models. Fung and Wu (1995) adapt English tags for Chinese language modeling using Coerced Markov Models. They use English POS classes as states of the Markov Model to generate Chinese language words. In this paper we use POS tag information to incorporate prior knowledge of word translation and to model local word order variation. We show that using this information can help in the translation modeling task.</Paragraph>
    <Paragraph position="2"> Many alignment models assume a one to many mapping from source language words to target language words, such as the IBM models 1-5 of Brown et al. (1993) and the HMM alignment model of (Vogel et al., 1996). In addition, the IBM Models 3, 4 and 5 include a fertility model a17 a1a19a18a20a8a3a21a13 where a18 is the number of words aligned to a source word a3 . In HMM-based alignment word fertilities are not modeled. The alignment positions of target words are the states in an HMM. The alignment probabilities for word a10a23a22 depend only on the alignment of the previous word a10a23a22a23a24 a7 if using a first order HMM. Therefore, source words are not awarded/penalized for being aligned to more than one target word. We present an extension to HMM alignment that approximately models word fertility.</Paragraph>
    <Paragraph position="3"> Another assumption of existing alignment models is that there is a special Null word in the source sentence from which all target words that do not have other correspondences in the source language are generated. Use of such a Null word has proven problematic in many models. We also assume the Association for Computational Linguistics.</Paragraph>
    <Paragraph position="4"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 87-94. Proceedings of the Conference on Empirical Methods in Natural existence of a special Null word in the source language that generates words in the target language.</Paragraph>
    <Paragraph position="5"> However, we define a different model that better constrains and conditions generation from Null. We assume that the generation probability of words by Null depends on other words in the target sentence.</Paragraph>
    <Paragraph position="6"> Next we present the general equations for decomposition of the translation probability using part of speech tags and later we will go into more detail of our extensions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML