XML Viewer - w03-0314

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/w03-0314_relat.xml
Size: 4,644 bytes
Last Modified: 2025-10-06 14:15:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0314">
  <Title>Learning Sequence-to-Sequence Correspondences from Parallel Corpora via Sequential Pattern Mining</Title>
  <Section position="5" start_page="32" end_page="32" type="relat">
    <SectionTitle>
4 Related Work
</SectionTitle>
    <Paragraph position="0"> Moore (2001) presents insightful work which is closest to ours. His method first computes an initial association score, hypothesizes an occurrence of compounds, fuses it to a single token, recomputes association scores as if all translations are one-to-one mapping, and returns the highest association pairs. As for captoids, he also computes association of an inferred compound and its constituent words. He also uses language-specific features (e.g. capital letters, punctuation symbols) to identify likely compound candidates.</Paragraph>
    <Paragraph position="1"> Our method is quite different in dealing with compounds. First, we outsource a step of hypothesizing compounds to language-dependent preprocessors. The reason is that an algorithm will become complicated if language-specific features are directly embedded. Instead, we provide an abstract interface, namely the projectable predicate in sequential pattern mining, to deal with language-specific constraints. Second, we allow items being redundantly counted and translation pair candidates being overlapped. This sharply contrasts with Moore's method of replacing an identified compound to a single token for each sentence pair. In his method, word segmentation ambiguity must be resolved before hypothesizing compounds. Our method reserves a possibility for word segmentation ambiguity and resolves only when frequently co-occured sequence-to-sequence pairs are identified.</Paragraph>
    <Paragraph position="2"> Since we compute association scores independently, it is difficult to impose mutually exclusive constraints between translation candidates derived from a paired parallel sentence. Hence, our method tends to suffer from indirect association when the association score is low, as pointed out by Melamed (2001). Although our method relies on an empirical observation that &amp;quot;direct associations are usually stronger than indirect association&amp;quot;, it seems effective enough for multi-word translation. balanced by a As far as we know, our method is the first attempt to make an exhaustive enumeration of rigid and gapped translation candidates of both languages possible, yet avoiding combinatorial explosion. Previous approaches effectively narrow down its search space by some heuristics. Kupiec (1993) focuses on noun-phrase translations only, Smadja et al. (1996) limits to find French translation of English collocation identified by his Xtract system, and Kitamura and Matsumoto (1996) can exhaustively enumerate only rigid word sequences.</Paragraph>
    <Paragraph position="3"> Many of works mentioned in the last paragraph as well as ours extract non-probabilistic translation lexicons. However, there are research works which go beyond word-level translations in statistical machine translation.</Paragraph>
    <Paragraph position="4"> One notable work is that of Marcu and Wong (2002), which is based on a joint probability model for statistical machine translation where word equivalents and phrase (rigid sequence) equivalents are automatically learned form bilingual corpora.</Paragraph>
    <Paragraph position="5"> Our method does not iterate an extraction process as shown in Figure 1. This could be a cause of poor performance in single-word translation pairs, since there is no mechanism for imposing mutually exclusion constrains.</Paragraph>
    <Paragraph position="6"> An interesting question then is what kind of iteration should be performed to improve performance. Probabilistic translation lexicon acquisition often uses EM training on Viterbi alignments, e.g. (Marcu and Wong, 2002), while non-probabilistic ones employ a greedy algorithm that extracts translation pairs that give higher association scores than a predefined threshold where the threshold is monotonically decreasing as the algorithm proceeds, e.g. (Kitamura and Matsumoto, 1996). The issue is left for future work.</Paragraph>
    <Paragraph position="7"> Last but not least, no previous works give an explicit mention to an efficient calculation of each cell in a contingency table. Our approach completes the process by a single run of sequential pattern mining. Since speed does not affect results of accuracy and coverage, its significance is often ignored. However, it will be important when we handle with corpora of large size.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML