File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0834_intro.xml

Size: 2,357 bytes

Last Modified: 2025-10-06 14:03:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0834">
  <Title>Word Graphs for Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="191" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A statistical machine translation system usually produces the single-best translation hypotheses for a source sentence. For some applications, we are also interested in alternative translations. The simplest way to represent these alternatives is a list with the N-best translation candidates. These N-best lists have one major disadvantage: the high redundancy.</Paragraph>
    <Paragraph position="1"> The translation alternatives may differ only by a single word, but still both are listed completely. Usually, the size of the N-best list is in the range of a few hundred up to a few thousand candidate translations per source sentence. If we want to use larger N-best lists the processing time gets very soon infeasible.</Paragraph>
    <Paragraph position="2"> Word graphs are a much more compact representation that avoid these redundancies as much as possible. The number of alternatives in a word graph is usually an order of magnitude larger than in an N-best list. The graph representation avoids the combinatorial explosion that make large N-best lists infeasible. null Word graphs are an important data structure with various applications: + Word Filter.</Paragraph>
    <Paragraph position="3"> The word graph is used as a compact representation of a large number of sentences. The score information is not contained.</Paragraph>
    <Paragraph position="4"> + Rescoring.</Paragraph>
    <Paragraph position="5"> We can use word graphs for rescoring with more sophisticated models, e.g. higher-order language models.</Paragraph>
    <Paragraph position="6"> + Discriminative Training.</Paragraph>
    <Paragraph position="7"> The training of the model scaling factors as described in (Och and Ney, 2002) was done on N-best lists. Using word graphs instead could further improve the results. Also, the phrase translation probabilities could be trained discrimatively, rather than only the scaling factors. + Con dence Measures.</Paragraph>
    <Paragraph position="8"> Word graphs can be used to derive con dence measures, such as the posterior probability (Uef ng and Ney, 2004).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML