File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1001_concl.xml

Size: 1,971 bytes

Last Modified: 2025-10-06 13:55:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1001">
  <Title>Capitalizing Machine Translation</Title>
  <Section position="11" start_page="6" end_page="7" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we have studied how to exploit bilingual information to improve capitalization performance on machine translation output, and evaluated the improvement over traditional methods that use only monolingual language models.</Paragraph>
    <Paragraph position="1"> We first presented a probabilistic bilingual capitalization model for capitalizing machine translation outputs using conditional random fields. This model exploits bilingual capitalization knowledge as well as monolingual information. We defined a series of feature functions to incorporate capitalization knowledge into the model.</Paragraph>
    <Paragraph position="2"> We then evaluated our CRF-based bilingual capitalization model both on well-formed texts in terms of capitalization precision, and on possibly ungrammatical end-to-end machine translation outputs in terms of BLEU scores. Experiments were performed on both French and English target MT systems with large-scale training data. Our experimental results showed that the CRF-based bilingual cap- null ing corpus. LM-based capitalizer refers to the trigram-based one. Results were on E-F corpus.</Paragraph>
    <Paragraph position="3"> italization model performs significantly better than a strong baseline, monolingual capitalizer that uses a trigram language model.</Paragraph>
    <Paragraph position="4"> In all experiments carried out at Language Weaver with customer (or domain specific) data, MT systems trained on lowercased data coupled with the CRF bilingual capitalizer described in this paper consistently outperformed both MT systems trained on lowercased data coupled with a strong monolingual capitalizer and MT systems trained on mixed-cased data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML