File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/j97-3002_concl.xml

Size: 6,205 bytes

Last Modified: 2025-10-06 13:57:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-3002">
  <Title>Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora</Title>
  <Section position="11" start_page="398" end_page="400" type="concl">
    <SectionTitle>
9. Bilingual Constraint Transfer
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="398" end_page="399" type="sub_section">
      <SectionTitle>
9.1 Monolingual Parse Trees
</SectionTitle>
      <Paragraph position="0"> A parse may be available for one of the languages, especially for well-studied languages such as English. Since this eliminates all degrees of freedom in the English sentence structure, the parse of the Chinese sentence must conform with that given for the English. Knowledge of English bracketing is thus used to help parse the Chinese sentence; this method facilitates a kind of transfer of grammatical expertise in one language toward bootstrapping grammar acquisition in another.</Paragraph>
      <Paragraph position="1"> A parsing algorithm for this case can be implemented very efficiently. Note that the English parse tree already determines the split point S for breaking e0. T into two constituent subtrees deriving e0..s and eS..T respectively, as well as the nonterminal labels j and k for each subtree. The same then applies recursively to each subtree.</Paragraph>
      <Paragraph position="2"> We indicate this by turning S, j, and k into deterministic functions on the English constituents, writing Sst, jst, and kst to denote the split point and the subtree labels for any constituent es..t. The following simplifications can then be made to the parsing algorithm: . Recursion For all English constituents es, t and all i, u, v such that ~ Ki&lt;N 0&lt;~&lt;~&lt;V / 6~)uv(i ) -- max ai_r, k., 6s st,, u(jst) 6s~,,cu,v(kst) (19) u&lt;U&lt;v ust stj , . , ,</Paragraph>
      <Paragraph position="4"> Computational Linguistics Volume 23, Number 3</Paragraph>
      <Paragraph position="6"> The time complexity for this constrained version of the algorithm drops from O(NBT3V 3) to O(TV3).</Paragraph>
    </Section>
    <Section position="2" start_page="399" end_page="400" type="sub_section">
      <SectionTitle>
9.2 Partial Parse Trees
</SectionTitle>
      <Paragraph position="0"> A more realistic in-between scenario occurs when partial parse information is available for one or both of the languages. Special cases of particular interest include applications where bracketing or word alignment constraints may be derived from external sources beforehand. For example, a broad-coverage English bracketer may be available. If such constraints are reliable, it would be wasteful to ignore them.</Paragraph>
      <Paragraph position="1"> A straightforward extension to the original algorithm inhibits hypotheses that are inconsistent with given constraints. Any entries in the dynamic programming table corresponding to illegal subhypotheses--i.e., those that would violate the given bracket-nesting or word alignment conditions--are preassigned negative infinity values during initialization indicating impossibility. During the recursion phase, computation of these entries is skipped. Since their probabilities remain impossible throughout, the illegal subhypotheses will never participate in any ML bibracketing. The running time reduction in this case depends heavily on the domain constraints.</Paragraph>
      <Paragraph position="2"> We have found this strategy to be useful for incorporating punctuation constraints.</Paragraph>
      <Paragraph position="3"> Certain punctuation characters give constituency indications with high reliability; &amp;quot;perfect separators&amp;quot; include colons and Chinese full stops, while &amp;quot;perfect delimiters&amp;quot; include parentheses and quotation marks.</Paragraph>
      <Paragraph position="4"> 10. Unrestricted-Form Grammars It is possible to construct a parser that accepts unrestricted-form, rather than normalform, grammars. In this case an Earley-style scheme (Earley 1970), employing an active chart, can be used. The time complexity remains the same as the normal-form case.</Paragraph>
      <Paragraph position="5"> We have found this to be useful in practice. For bracketing grammars of the type considered in this paper, there is no advantage. However, for more complex, linguistically structured grammars, the more flexible parser does not require the unreasonable numbers of productions that can easily arise from normal-form requirements. For most grammars, we have found performance to be comparable or faster than the normal-form parser.</Paragraph>
      <Paragraph position="6"> 11. Conclusion The twin concepts of bilingual language modeling and bilingual parsing have been proposed. We have introduced a new formalism, the inversion transduction grammar, and surveyed a variety of its applications to extracting linguistic information from parallel corpora. Its amenability to stochastic formulation, useful flexibility with leaky and minimal grammars, and tractability for practical applications are desirable properties. Various tasks such as segmentation, word alignment, and bracket annotation are naturally incorporated as subproblems, and a high degree of compatibility with conventional monolingual methods is retained. In conjunction with automatic procedures for learning word translation lexicons, SITGs bring relatively underexploited bilingual</Paragraph>
    </Section>
    <Section position="3" start_page="400" end_page="400" type="sub_section">
      <SectionTitle>
Wu Bilingual Parsing
</SectionTitle>
      <Paragraph position="0"> correlations to bear on the task of extracting linguistic information for languages less studied than English.</Paragraph>
      <Paragraph position="1"> We are currently pursuing several directions. We are developing an iterative training method based on expectation-maximization for estimating the probabilities from parallel training corpora. Also, in contrast to the applications discussed here, which deal with analysis and annotation of parallel corpora, we are working on incorporating the SITG model directly into our run-time translation architecture. The initial results indicate excellent performance gains.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML