File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-3006_concl.xml
Size: 1,602 bytes
Last Modified: 2025-10-06 13:54:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3006"> <Title>An Automatic Filter for Non-Parallel Texts</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> We have shown that SIMR-cl, a modified version of the SIMR bitext mapping algorithm, can reliably discriminate between parallel and comparable texts. We have demonstrated that SIMR-cl is effective on three language pairs, including two where no bilingual dictionary was available. In addition, we have presented tentative evidence that the parameters of SIMR-cl are not very sensitive to particular language pairs or text genres on this task.</Paragraph> <Paragraph position="1"> Our results suggest several new avenues for future research. First, it would be useful to combine our method for filtering out non-parallel texts with methods for detecting omissions in translations (Melamed, 1996). Some of the translations found on the web today might be made more literal by deleting the untranslated parts. Second, we seem to have discovered the existence of training data for a machine learning approach to translation with summarization. Third, our results suggest that the density of a bitext map is highly correlated with its accuracy, and that this correlation is largely invariant across language pairs and text genres. If this is true, then it should be possible to train bitext mapping algorithms without any hand-aligned training data, by using map density as the objective function instead of RMS error.</Paragraph> </Section> class="xml-element"></Paper>