File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/j03-1002_concl.xml

Size: 5,309 bytes

Last Modified: 2025-10-06 13:53:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-1002">
  <Title>c(c) 2003 Association for Computational Linguistics A Systematic Comparison of Various Statistical Alignment Models</Title>
  <Section position="10" start_page="45" end_page="48" type="concl">
    <SectionTitle>
7. Conclusion
</SectionTitle>
    <Paragraph position="0"> In this article, we have discussed in detail various statistical and heuristic word alignment models and described various modifications and extensions to models known in the literature. We have developed a new statistical alignment model (Model 6) that has yielded the best results among all the models we considered in the experiments we have conducted. We have presented two methods for including a conventional bilingual dictionary in training and described heuristic symmetrization algorithms that combine alignments in both translation directions possible between two languages, producing an alignment with a higher precision, a higher recall, or an improved alignment error rate.</Paragraph>
    <Paragraph position="1"> We have suggested measuring the quality of an alignment model using the quality of the Viterbi alignment compared to that achieved in a manually produced reference alignment. This quality measure has the advantage of automatic evaluation. To produce the reference alignment, we have used a refined annotation scheme that reduces the problems and ambiguities associated with the manual construction of a word alignment.</Paragraph>
    <Paragraph position="2"> We have performed various experiments to assess the effect of different alignment models, training schemes, and knowledge sources. The key results of these experiments are as follows:  Computational Linguistics Volume 29, Number 1 models based on word groups rather than single words (Och, Tillmann, and Ney 1999). The use of models that explicitly deal with the hierarchical structures of natural language is very promising (Wu 1996; Yamada and Knight 2001).</Paragraph>
    <Paragraph position="3"> We plan to develop structured models for the lexicon, alignment, and fertility probabilities using maximum-entropy models. This is expected to allow an easy integration of more dependencies, such as in a second-order alignment model, without running into the problem of the number of alignment parameters getting unmanageably large. Furthermore, it will be important to verify the applicability of the statistical alignment models examined in this article to less similar language pairs such as Chinese-English and Japanese-English.</Paragraph>
    <Paragraph position="4"> Appendix: Efficient Training of Fertility-Based Alignment Models In this Appendix, we describe some methods for efficient training of fertility-based alignment models. The core idea is to enumerate only a small subset of good alignments in the E-step of the EM algorithm instead of enumerating all (I + 1) J  alignments. This small subset of alignments is the set of neighboring alignments of the best alignment that can be found by a greedy search algorithm. We use two operators to transform alignments: The move operator m</Paragraph>
    <Paragraph position="6"> . This method results in a constant number of operations that is sufficient to calculate the score of a move or the score of a swap.</Paragraph>
    <Paragraph position="7"> Refined Implementation: Fast Hill Climbing Analyzing the training program reveals that most of the time is spent on the computation of the costs of moves and swaps. To reduce the number of operations required in such computation, these values are cached in two matrices. We use one matrix for the scores of a move a  During the hill climbing, it is sufficient, after making a move or a swap, to update only those rows or columns in the matrix that are affected by the move or swap. For example, when performing a move a j := i, it is necessary to * update in matrix M the columns j  Similar updates have to be performed after a swap. In the count collection (step 3), it is possible to use the same matrices as obtained in the last hill-climbing step. By restricting in this way the number of matrix entries that need to be updated, it is possible to reduce the number of operations in hill climbing by about one order of magnitude.</Paragraph>
    <Paragraph position="8">  The straightforward algorithm given for performing the count collection has the disadvantage of requiring that all alignments in the neighborhood of alignment a be enumerated explicitly. In addition, it is necessary to perform a loop over all targets and a loop over all source positions to update the lexicon/alignment and the fertility counts. To perform the count collection in an efficient way, we use the fact that the alignments in the neighborhood N(a) are very similar. This allows the sharing of many operations in the count collection process.</Paragraph>
    <Paragraph position="9"> To efficiently obtain the alignment and lexicon probability counts, we introduce the following auxiliary quantities that use the move and swap matrices that are available after performing the hill climbing described above:  ) operations. This is one order of magnitude faster than the straightforward algorithm described above. In practice, we observe that the resulting training is 10-20 times faster.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML