File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1122_intro.xml

Size: 3,682 bytes

Last Modified: 2025-10-06 14:03:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1122">
  <Title>Modelling lexical redundancy for machine translation</Title>
  <Section position="3" start_page="0" end_page="969" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Data-driven machine translation (MT) relies on models that can be efficiently estimated from parallel text. Token-level independence assumptions based on word-alignments can be used to decomposeparallelcorporaintomanageableunitsforpa- null rameter estimation. However, if training data is scarce or language pairs encode significantly different information in the lexicon, such as Czech andEnglish, additionalindependenceassumptions may assist the model estimation process.</Paragraph>
    <Paragraph position="1"> Standard statistical translation models use separate parameters for each pair of source and target types. In these models, distinctions in either lexicon that are redundant to the translation process will result in unwarranted model complexity and make parameter estimation from limited parallel data more difficult. A natural way to eliminate such lexical redundancy is to group types into homogeneous clusters that do not differ significantly in their distributions over types in the other language. Cluster-based translation models capture the corresponding independence assumptions.</Paragraph>
    <Paragraph position="2"> Previous work on bilingual clustering has focused on coarse partitions of the lexicon that resemble automatically induced part-of-speech classes. These were used to model generic word-alignment patterns such as noun-adjective re-ordering between English and French (Och, 1998). In contrast, we induce fine-grained partitions of the lexicon, conceptually closer to automatic lemmatisation, optimised specifically to assign translation probabilities. Unlike lemmatisation or stemming, our method specifically quantifies lexical redundancy in a bilingual setting and does not make language-specific assumptions.</Paragraph>
    <Paragraph position="3"> We tackle the problem of redundancy in the translation lexicon via Bayesian model selection over a set of cluster-based translation models. We search for the model, defined by a clustering of the source lexicon, that maximises the marginal likelihood of target tokens in parallel data. In this optimisation, source types are combined into clusters if their distributions over target types are too similar to warrant distinct parameters.</Paragraph>
    <Paragraph position="4"> Redundant distinctions between types may exhibit regularities within a language, for instance, inflexion patterns. These can be used to guide model selection. Here we show that the inclusion of a model 'prior' over the lexicon structure leads to more robust translation models. Although a priori we do not know which monolingual features characterise redundancy for a given language pair, by defining a model over the prior monolingual  space of source types and cluster assignments, we can introduce an inductive bias that allows clustering decisions in different parts of the lexicon to influence one another via monolingual features. We use an EM-type algorithm to learn weights for a Markovrandomfieldparameterisationofthisprior over lexicon structure.</Paragraph>
    <Paragraph position="5"> We obtain significant improvements in translation quality as measured by BLEU, incorporating theseoptimisedmodelwithinaphrase-basedSMT system for three different language pairs. The MRF prior improves the results and picks up features that appear to agree with linguistic intuitions of redundancy for the language pairs considered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML