File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/j97-3002_abstr.xml

Size: 3,497 bytes

Last Modified: 2025-10-06 13:48:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-3002">
  <Title>Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> We introduce a general formalism for modeling of bilingual sentence pairs, known as an inversion transduction grammar, with potential application in a variety of corpus analysis areas. Transduction grammar models, especially of the finite-state family, have long been known. However, the imposition of identical ordering constraints upon both streams severely restricts their applicability, and thus transduction grammars have received relatively little attention in language-modeling research. The inversion transduction grammar formalism skips directly to a context-free, rather than finite-state, base and permits one extra degree of ordering flexibility, while retaining properties necessary for efficient computation, thereby sidestepping the limitations of traditional transduction grammars.</Paragraph>
    <Paragraph position="1"> In tandem with the concept of bilingual language-modeling, we propose the concept of bilingual parsing, where the input is a sentence-pair rather than a sentence.</Paragraph>
    <Paragraph position="2"> Though inversion transduction grammars remain inadequate as full-fledged translation models, bilingual parsing with simple inversion transduction grammars turns out to be very useful for parallel corpus analysis when the true grammar is not fully known. Parallel bilingual corpora have been shown to provide a rich source of constraints for statistical analysis (Brown et al. 1990; Gale and Church 1991; Gale, Church, and Yarowsky 1992; Church 1993; Brown et al. 1993; Dagan, Church, and Gale 1993; Department of Computer Science, University of Science and Technology, Clear Water Bay, Hong Kong. E-mail: dekai@cs.ust.hk (c) 1997 Association for Computational Linguistics Computational Linguistics Volume 23, Number 3 Fung and Church 1994; Wu and Xia 1994; Fung and McKeown 1994). The primary purpose of bilingual parsing with inversion transduction grammars is not to flag ungrammatical inputs; rather, the aim is to extract structure from the input data, which is assumed to be grammatical, in keeping with the spirit of robust parsing. The formalism's uniform integration of various types of bracketing and alignment constraints is one of its chief strengths.</Paragraph>
    <Paragraph position="3"> The paper is divided into two main parts. We begin in the first part below by laying out the basic formalism, then show that reduction to a normal form is possible. We then raise several desiderata for the expressiveness of any bilingual language-modeling formalism in terms of its constituent-matching flexibility and discuss how the characteristics of the inversion transduction formalism are particularly suited to address these criteria. Afterwards we introduce a stochastic version and give an algorithm for finding the optimal bilingual parse of a sentence-pair. The formalism is independent of the languages; we give examples and applications using Chinese and English because languages from different families provide a more rigorous testing ground. In the second part, we survey a number of sample applications and extensions of bilingual parsing for segmentation, bracketing, phrasal alignment, and other parsing tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML