File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/p97-1038_abstr.xml

Size: 1,280 bytes

Last Modified: 2025-10-06 13:48:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1038">
  <Title>An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a new approach to bitext correspondence problem (BCP) of noisy bilingual corpora based on image processing (IP) techniques.</Paragraph>
    <Paragraph position="1"> By using one of several ways of estimating the lexical translation probability (LTP) between pairs of source and target words, we can turn a bitext into a discrete gray-level image. We contend that the BCP, when seen in this light, bears a striking resemblance to the line detection problem in IP.</Paragraph>
    <Paragraph position="2"> Therefore, BCPs, including sentence and word alignment, can benefit from a wealth of effective, well established IP techniques, including convolution-based filters, texture analysis and Hough transform. This paper describes a new program, PlotAlign that produces a word-level bitext map for noisy or non-literal bitext, based on these techniques.</Paragraph>
    <Paragraph position="3"> Keywords: alignment, bilingual corpus, image processing</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML