File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1077_evalu.xml

Size: 1,034 bytes

Last Modified: 2025-10-06 13:59:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1077">
  <Title>Tree-to-String Alignment Template for Statistical Machine Translation</Title>
  <Section position="8" start_page="614" end_page="614" type="evalu">
    <SectionTitle>
5.5 Results on large data
</SectionTitle>
    <Paragraph position="0"> We also conducted an experiment on large data to further examine our design philosophy. The training corpus contains 2.6 million sentence pairs. We used all the data to extract bilingual phrases and a portion of 800K pairs to obtain TATs. Two tri-gram language models were used for Lynx. One was trained on the 2.6 million English sentences and another was trained on the first 1/3 of the Xinhua portion of Gigaword corpus. We also included rule-based translations of named entities, dates, and numbers. By making use of these data, Lynx achieves a BLEU score of 0.2830 on the 2005 NIST Chinese-to-English MT evaluation test set, which is a very promising result for linguistically syntax-based models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML