File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1066_concl.xml
Size: 3,696 bytes
Last Modified: 2025-10-06 13:55:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1066"> <Title>Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation</Title> <Section position="7" start_page="526" end_page="527" type="concl"> <SectionTitle> 5 Discussion and Future Work </SectionTitle> <Paragraph position="0"> In this paper we presented a MaxEnt-based phrase reordering model for SMT. We used lexical features and collocation features from boundary words of blocks to predicate reorderings of neigh- null dence intervals. Italic numbers refer to results for which the difference to the best result (indicated in bold) is not statistically significant.</Paragraph> <Paragraph position="1"> bor blocks. Experiments on standard Chinese-English translation tasks from two different domains showed that our method achieves a significant improvement over the distortion/flat reordering models.</Paragraph> <Paragraph position="2"> Traditional distortion/flat-based SMT translation systems are good for learning phrase translation pairs, but learn nothing for phrasal reorderings from real-world data. This is our original motivation for designing a new reordering model, which can learn reorderings from training data just like learning phrasal translations. Lexicalized re-ordering model learns reorderings from training data, but it binds reorderings to individual concrete phrases, which restricts the model to reorderings of phrases seen in training data. On the contrary, the MaxEnt-based reordering model is not limited by this constraint since it is based on features of phrase, not phrase itself. It can be easily generalized to reorder unseen phrases provided that some features are fired on these phrases.</Paragraph> <Paragraph position="3"> Another advantage of the MaxEnt-based re-ordering model is that it can take more features into reordering, even though they are nonindependent. Tillmann et. al (2005) also use a MaxEnt model to integrate various features. The difference is that they use the MaxEnt model to predict not only orders but also blocks. To do that, it is necessary for the MaxEnt model to incorporate real-valued features such as the block translation probability and the language model probability. Due to the expensive computation, a local model is built. However, our MaxEnt model is just a module of the whole log-linear model of translation which uses its score as a real-valued feature. The modularity afforded by this design does not incur any computation problems, and make it eas- null ier to update one sub-model with other modules unchanged.</Paragraph> <Paragraph position="4"> Beyond the MaxEnt-based reordering model, another feature deserving attention in our system is the CKY style decoder which observes the ITG.</Paragraph> <Paragraph position="5"> This is different from the work of Zens et. al.</Paragraph> <Paragraph position="6"> (2004). In their approach, translation is generated linearly, word by word and phrase by phrase in a traditional way with respect to the incorporation of the language model. It can be said that their decoder did not violate the ITG constraints but not that it observed the ITG. The ITG not only decreases reorderings greatly but also makes reordering hierarchical. Hierarchical reordering is more meaningful for languages which are organized hierarchically. From this point, our decoder is similar to the work by Chiang (2005).</Paragraph> <Paragraph position="7"> The future work is to investigate other valuable features, e.g. binary features that explain blocks from the syntactical view. We think that there is still room for improvement if more contributing features are used.</Paragraph> </Section> class="xml-element"></Paper>