File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0309_intro.xml
Size: 3,037 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0309"> <Title>The Duluth Word Alignment System</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Word alignment is a crucial part of any Machine Translation system, since it is the process of determining which words in a given source and target language sentence pair are translations of each other. This is a token level task, meaning that each word (token) in the source text is aligned with its corresponding translation in the target text.</Paragraph> <Paragraph position="1"> The Duluth Word Alignment System is a Perl implementation of IBM Model 2 (Brown et al., 1993). It learns a probabilistic model from sentence aligned parallel text that can then be used to align the words in another such text (that was not a part of the training process).</Paragraph> <Paragraph position="2"> A parallel text consists of a source language text and its translation into some target language. If we have determined which sentences are translations of each other then the text is said to be sentence aligned, where we call a source and target language sentence that are translations of each other a sentence pair.</Paragraph> <Paragraph position="3"> (Brown et al., 1993) introduced five statistical translation models (IBM Models 1 - 5). In general a statistical machine translation system is composed of three components: a language model, a translation model, and a decoder (Brown et al., 1988).</Paragraph> <Paragraph position="4"> The language model tells how probable a given sentence is in the source language, the translation model indicates how likely it is that a particular target sentence is a translation of a given source sentence, and the decoder is what actually takes a source sentence as input and produces its translation as output. Our focus is on translation models, since that is where word alignment is carried out.</Paragraph> <Paragraph position="5"> The IBM Models start very simply and grow steadily more complex. IBM Model 1 is based solely on the probability that a given word in the source language translates as a particular word in the target language. Thus, a word in the first position of the source sentence is just as likely to translate to a word in the target sentence that is in the first position versus one at the last position. IBM Model 2 augments these translation probabilities by taking into account how likely it is for words at particular positions in a sentence pair to be alignments of each other.</Paragraph> <Paragraph position="6"> This paper continues with a more detailed description of IBM Model 2. It goes on to present the implementation details of the Duluth Word Alignment System. Then we describe the data and the parameters that were used during the training and testing stages of the shared task on word alignment. Finally, we discuss our experimental results and briefly outline our future plans.</Paragraph> </Section> class="xml-element"></Paper>