File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0829_intro.xml

Size: 2,059 bytes

Last Modified: 2025-10-06 14:03:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0829">
  <Title>Competitive Grouping in Integrated Phrase Segmentation and Alignment Model</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years, various phrase translation approaches (Marcu and Wong, 2002; Och et al., 1999; Koehn et al., 2003) have been shown to outperform word-to-word translation models (Brown et al., 1993). Many of these phrase alignment strategies rely on the pre-calculated word alignment and use different heuristics to extract the phrase pairs from the Viterbi word alignment path. The Integrated Segmentation and Alignment (ISA) model (Zhang et al., 2003) does not require such word alignment.</Paragraph>
    <Paragraph position="1"> ISA segments the sentence into phrases and finds their alignment simultaneously. ISA is simple and fast. Translation experiments have shown comparable performance to other phrase alignment strategies which require complicated statistical model training.</Paragraph>
    <Paragraph position="2"> In this paper, we describe the key idea behind this model and connect it with the competitive linking algorithm (Melamed, 1997) which was developed for word-to-word alignment.</Paragraph>
    <Paragraph position="3"> 2 Translation Likelihood as a Statistical Test Given a bilingual corpus of language pair F (Foreign, source language) and E (English, target language), if we know the word alignment for each sentence pair we can calculate the co-occurrence frequency for each source/target word pair type C(f,e) and the marginal frequency C(f) = summationtexte C(f,e) and C(e) = summationtextf C(f,e). We can apply various statistical tests (Manning and Sch&amp;quot;utze, 1999) to measure how likely is the association between f and e, in other words how likely they are mutual translations. In the following sections, we will use kh2 statistics to measure the the mutual translation likelihood (Church and Hanks, 1990).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML