XML Viewer - w04-3227

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3227_intro.xml
Size: 4,540 bytes
Last Modified: 2025-10-06 14:02:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3227">
  <Title>Phrase Pair Rescoring with Term Weightings for Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Words can be classified as content and functional words. Content words like verbs and proper nouns are more informative than function words like &amp;quot;to'' and &amp;quot;the''. In machine translation, intuitively, the informative content words should be emphasized more for better adequacy of the translation quality. However, the standard statistical translation approach does not take account how informative and thereby, how important a word is, in its translation model. One reason is the difficulty to measure how informative a word is. Another problem is to integrate it naturally into the existing statistical machine translation framework, which typically is built on word alignment models, like the well-known IBM alignment models (Brown et al 1993).</Paragraph>
    <Paragraph position="1"> In recent years there has been a strong tendency to incorporate phrasal translation into statistical machine translation. It directly translates an n-gram from the source language into an m-gram in the target language. The advantages are obvious: It has built-in local context modeling, and provides reliable local word reordering. It has multi-word translations, and models a word's conditional fertility given a local context. It captures idiomatic phrase translations and can be easily enriched with bilingual dictionaries. In addition, it can compensate for the segmentation errors made during preprocessing, i.e. word segmentation errors of Chinese. The advantage of using phrase-based translation in a statistical framework has been shown in many studies such as (Koehn et al. 2003; Vogel et al. 2003; Zens et al. 2002; Marcu and Wong, 2002). However, the phrase translation pairs are typically extracted from a parallel corpus based on the Viterbi alignment of some word alignment models. The leads to the question what probability should be assigned to those phrase translations. Different approaches have been suggested as using relative frequencies (Zens et al. 2002), calculate probabilities based on a statistical word-to-word dictionary (Vogel et al. 2003) or use a linear interpolation of these scores (Koehn et al. 2003).</Paragraph>
    <Paragraph position="2"> In this paper we investigate a different approach with takes the information content of words better into account. Term weighting based vector models are proposed to encode the translation quality. The advantage is that term weights, such as tf.idf, are useful to model the informativeness of words. Highly informative content words usually have high tf.idf scores. In information retrieval this has been successfully applied to capture the relevance of a document to a query, by representing both query and documents as term weight vectors and use for example the cosine distance to calculate the similarity between query vector and document vector. The idea now is to consider the source phrase as a &amp;quot;query&amp;quot;, and the different target phrases extracted from the bilingual corpus as translation candidates as a relevant &amp;quot;documents&amp;quot;. The cosine distance is then a natural choice to model the translation probability.</Paragraph>
    <Paragraph position="3"> Our approach is to apply term weighting schemes to transform source and target phrases into term vectors. Usually content words in both source and target languages will be emphasized by large term weights. Thus, good phrase translation pairs will share similar contours, or, to express it in a different way, will be close to each other in the term weight vector space. A similarity function is then defined to approximate translation probability in the vector space.</Paragraph>
    <Paragraph position="4"> The paper is structured as follows: in Section 2, our phrase-based statistical machine translation system is introduced; in Section 3, a phrase translation score function based on word translation probabilities is explained, as this will be used as a baseline system; in Section 4, a vector model based on tf.idf is proposed together with two similarity functions; in Section 5, length regularization and smoothing schemes are explained briefly; in Section 6, the translation experiments are presented; and Section 7 concludes with a discussion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML