XML Viewer - i05-2021

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2021_intro.xml
Size: 4,100 bytes
Last Modified: 2025-10-06 14:02:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2021">
  <Title>Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="120" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation or WSD, the task of identifying the correct sense of a word in context, is a central problem for all natural language processing applications, and in particular machine translation: different senses of a word translate differently in other languages, and resolving sense ambiguity is needed to identify the right translation of a word.</Paragraph>
    <Paragraph position="1"> Much work has been done in building dedicated WSD models. The recent Senseval series of workshop promoted controlled comparison of very different WSD models with common accuracy metrics and common data sets. These efforts yielded steady improvements in WSD accuracy, but for WSD evaluated as a standalone task. Senseval focuses on the evaluation of standalone, generic WSD models, even though many application-specific systems--machine translation, information retrieval, and so on--all perform WSD either explicitly or implicitly.</Paragraph>
    <Paragraph position="2"> Since the Senseval models have been built and optimized specifically to address the WSD problems, they typically use richer disambiguating information than SMT systems. This, however, raises the question of whether the sophisticated WSD models are in fact needed in practice.</Paragraph>
    <Paragraph position="3"> In many machine translation architectures, in particular most current statistical machine translation (SMT) models, the WSD problem is typically not explicitly addressed. However, recent progress in machine translation and the continuous improvement on evaluation metrics such as BLEU (Papineni et al., 2002) suggest that SMT systems are already very good at choosing correct word translations. BLEU score with low order n-grams can be seen as an evaluation of the translation adequacy, which suggests that as SMT systems achieve higher BLEU score, their ability to disambiguate word translations improves.</Paragraph>
    <Paragraph position="4"> In other work, we have been conducting comparative studies testing whether state-of-the-art WSD mod- null els can improve SMT translation quality (Carpuat and Wu, 2005). Using a state-of-the-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we found that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. The surprising difficulty of this challenge might suggest that SMT models are sufficiently strong at word level disambiguation on their own, and has recently encouraged speculation that SMT performs WSD as well as the dedicated WSD models.</Paragraph>
    <Paragraph position="5"> The studies described in this paper are aimed at directly testing this increasingly common speculation.</Paragraph>
    <Paragraph position="6"> The comparison of SMT and WSD strengths is not obvious; there are strong arguments in support of both the WSD and the SMT models. A controlled empirical comparison is therefore needed to better assess the strengths and weaknesses of each type of model on the WSD task.</Paragraph>
    <Paragraph position="7"> We therefore propose to evaluate statistical machine translation models on a WSD task, in terms of standard WSD accuracy metrics. This addresses the inverse, complementary question to the other study mentioned above (of whether WSD models can help SMT systems in terms of machine translation quality metrics). Senseval provides a good framework for this evaluation, and allows a direct comparison of the performance of the SMT model with state-of-the-art WSD models on a common dataset. We built a Chinese-to-English SMT system using freely available toolkits, and show that it does not perform as well as the WSD models specifically built for this task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML