File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2402_concl.xml
Size: 1,869 bytes
Last Modified: 2025-10-06 13:55:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2402"> <Title>Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation</Title> <Section position="6" start_page="14" end_page="15" type="concl"> <SectionTitle> 5 Conclusions and Further work </SectionTitle> <Paragraph position="0"> We applied a technique for extracting and using BMWEs in Statistical Machine Translation. This technique is based on grouping BMWEs before performing statistical alignment. On a large corpus with real-life data, this technique failed to clearly improve alignment quality or translation accuracy.</Paragraph> <Paragraph position="1"> After performing a detailed error analysis, we believe that when the considered MWEs are fixed expressions, grouping them before training helps for their correct translation in test. However, grouping MWEs which could in fact be translated word to word, doesn't help and introduces unnecessary rigidity and data sparseness in the models. The main strength of the n-gram translation model (its history capability) is reduced when tuples become longer. So we plan to run this experiment with a phrase-based translation model. Since these models use unigrams, they are more flexible and less sensitive to data sparseness.</Paragraph> <Paragraph position="2"> Some errors were also caused by noise in the automatic generation of BMWEs. Thus filter- null ing techniques should be improved, and different methods for extracting and identifying MWEs must be developed and evaluated. Resources build manually, like Wordnet multi-word expressions, should also be considered.</Paragraph> <Paragraph position="3"> The proposed method considers the bilingual multi-words as units ; the use of each side of the BMWEs as independent monolingual multi-words must be considered and evaluated.</Paragraph> </Section> class="xml-element"></Paper>