File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0314_concl.xml
Size: 1,389 bytes
Last Modified: 2025-10-06 13:53:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0314"> <Title>Learning Sequence-to-Sequence Correspondences from Parallel Corpora via Sequential Pattern Mining</Title> <Section position="6" start_page="32" end_page="32" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> We have proposed an effective method to find sequence-to-sequence correspondences from parallel corpora by sequential pattern mining. As far as multi-word translation is concerned, our method seems to work well, giving 56-84% accuracy at 19% token coverage and 11% type coverage. null In this work, we choose English-Japanese pair and empirically evaluate our method. However, we believe the method is applicable to any language pair with appropriate language-specific preprocessing tools. As by-product of our experiment, we obtain Japanese-English parallel corpora of 150,000 sentences where alignment of validated subsequence correspondences are back-annotated.</Paragraph> <Paragraph position="1"> This was accomplished by looking up to a Double Array dictionary of sequential patterns constructed in the extraction method. This shows that our method can be useful not only to development of semi-automatic lexicon for data-driven machine translation, but also to annotation of corresponding subsequences in translation memory system. null</Paragraph> </Section> class="xml-element"></Paper>