File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0834_intro.xml
Size: 2,357 bytes
Last Modified: 2025-10-06 14:03:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0834"> <Title>Word Graphs for Statistical Machine Translation</Title> <Section position="2" start_page="0" end_page="191" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A statistical machine translation system usually produces the single-best translation hypotheses for a source sentence. For some applications, we are also interested in alternative translations. The simplest way to represent these alternatives is a list with the N-best translation candidates. These N-best lists have one major disadvantage: the high redundancy.</Paragraph> <Paragraph position="1"> The translation alternatives may differ only by a single word, but still both are listed completely. Usually, the size of the N-best list is in the range of a few hundred up to a few thousand candidate translations per source sentence. If we want to use larger N-best lists the processing time gets very soon infeasible.</Paragraph> <Paragraph position="2"> Word graphs are a much more compact representation that avoid these redundancies as much as possible. The number of alternatives in a word graph is usually an order of magnitude larger than in an N-best list. The graph representation avoids the combinatorial explosion that make large N-best lists infeasible. null Word graphs are an important data structure with various applications: + Word Filter.</Paragraph> <Paragraph position="3"> The word graph is used as a compact representation of a large number of sentences. The score information is not contained.</Paragraph> <Paragraph position="4"> + Rescoring.</Paragraph> <Paragraph position="5"> We can use word graphs for rescoring with more sophisticated models, e.g. higher-order language models.</Paragraph> <Paragraph position="6"> + Discriminative Training.</Paragraph> <Paragraph position="7"> The training of the model scaling factors as described in (Och and Ney, 2002) was done on N-best lists. Using word graphs instead could further improve the results. Also, the phrase translation probabilities could be trained discrimatively, rather than only the scaling factors. + Con dence Measures.</Paragraph> <Paragraph position="8"> Word graphs can be used to derive con dence measures, such as the posterior probability (Uef ng and Ney, 2004).</Paragraph> </Section> class="xml-element"></Paper>