File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/w05-1204_abstr.xml

Size: 1,531 bytes

Last Modified: 2025-10-06 13:44:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1204">
  <Title>Training Data Modification for SMT Considering Groups of Synonymous Sentences</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Generally speaking, statistical machine translation systems would be able to attain better performance with more training sets.</Paragraph>
    <Paragraph position="1"> Unfortunately, well-organized training sets are rarely available in the real world. Consequently, it is necessary to focus on modifying the training set to obtain high accuracy for an SMT system. If the SMT system trained the translation model, the translation pair would have a low probability when there are many variations for target sentences from a single source sentence.</Paragraph>
    <Paragraph position="2"> If we decreased the number of variations for the translation pair, we could construct a superior translation model. This paper describes the effects of modification on the training corpus when consideration is given to synonymous sentence groups. We attempt three types of modification: compression of the training set, replacement of source and target sentences with a selected sentence from the synonymous sentence group, and replacement of the sentence on only one side with the selected sentence from the synonymous sentence group. As a result, we achieve improved performance with the replacement of source-side sentences. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML