File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3113_intro.xml

Size: 2,936 bytes

Last Modified: 2025-10-06 14:04:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3113">
  <Title>How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?</Title>
  <Section position="2" start_page="0" end_page="94" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In several natural language processing tasks, such as automatic speech recognition and machine translation, state-of-the-art systems rely on the statistical approach.</Paragraph>
    <Paragraph position="1"> Statistical machine translation (SMT) is based on parametric models incorporating a large number of observations and probabilities estimated from monolingual and parallel texts. The current state of the art is represented by the so-called phrase-based translation approach (Och and Ney, 2004; Koehn et al., 2003). Its core components are a translation model that contains probabilities of phrase-pairs, and a language model that incorporates probabilities of word n-grams.</Paragraph>
    <Paragraph position="2"> Due to the intrinsic data-sparseness of language corpora, the set of observations increases almost linearly with the size of the training data. Hence, to ef ciently store observations and probabilities in a computer memory the following approaches can be tackled: designing compact data-structures, pruning rare or unreliable observations, and applying data compression.</Paragraph>
    <Paragraph position="3"> In this paper we only focus on the last approach.</Paragraph>
    <Paragraph position="4"> We investigate two different quantization methods to encode probabilities and analyze their impact on translation performance. In particular, we address the following questions: * How does probability quantization impact on the components of the translation system, namely the language model and the translation model? * Which is the optimal trade-off between data compression and translation performance? * How do quantized models perform under different data-sparseness conditions? * Is the impact of quantization consistent across different translation tasks? Experiments were performed with our phrase-based SMT system (Federico and Bertoldi, 2005) on two large-vocabulary tasks: the translation of European Parliament Plenary Sessions from Spanish to  English, and the translation of news agencies from Chinese to English, according to the set up de ned by the 2005 NIST MT Evaluation Workshop.</Paragraph>
    <Paragraph position="5"> The paper is organized as follows. Section 2 reviews previous work addressing ef ciency in speech recognition and information retrieval. Section 3 introduces the two quantization methods considered in this paper, namely the Lloyd's algorithm and the Binning method. Section 4 brie y describes our phrase-based SMT system. Sections 5 reports and discusses experimental results addressing the questions in the introduction. Finally, Section 6 draws some conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML