File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1504_intro.xml
Size: 3,559 bytes
Last Modified: 2025-10-06 14:01:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1504"> <Title>Machine Translation as a testbed for multilingual analysis</Title> <Section position="4" start_page="2" end_page="3" type="intro"> <SectionTitle> 3 MSR-MT </SectionTitle> <Paragraph position="0"> In this section we review the basics of the MSR-MT translation system and its evaluation. The reader is referred to Pinkham et al. (2001) and Richardson et al. (2001) for further details on the French and Spanish versions of the system. The overall architecture and basic component structure LF as described here corresponds to the PAS representation of Campbell and Suzuki (2002).</Paragraph> <Paragraph position="1"> are the same for both the FE and SE versions of the system.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.1 Overview </SectionTitle> <Paragraph position="0"> MSR-MT uses the broad coverage analysis system described in Section 2, a large multi-purpose source-language dictionary, a learned bilingual dictionary, an application independent target-language generation component and a transfer component.</Paragraph> <Paragraph position="1"> The transfer component consists of transfer patterns automatically acquired from sentence-aligned bilingual corpora (described below) using an alignment algorithm described in detail in Menezes and Richardson (2001). Training takes place on aligned sentences which have been analyzed by the source- and target-language analysis systems to yield logical forms. The logical form structures, when aligned, allow the extraction of lexical and structural translation correspondences which are stored for use at runtime in the transfer database. See Figure 4 for an overview of the training process.</Paragraph> <Paragraph position="2"> The transfer database is trained on 350,000 pairs of aligned sentences from computer manuals for SE, and 500,000 pairs of aligned Canadian parliamentary data (the Hansard corpus) for FE.</Paragraph> </Section> <Section position="2" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.2 Evaluation of MSR-MT </SectionTitle> <Paragraph position="0"> Seven evaluators are asked to evaluate the same set of sentences. For each sentence, raters are presented with a reference sentence, the original English sentence from which the human French and Spanish translations were derived, and MSR-MT's machine translation.</Paragraph> <Paragraph position="1"> In order to maintain Microsoft manuals are written in English and translated by hand into other languages. We use these translations as input to our system, and translate them back into English.</Paragraph> <Paragraph position="2"> consistency among raters who may have different levels of fluency in the source language, raters are not shown the original French or Spanish sentence (for similar methodologies, see Ringger et al., 2001; White et al., 1993).</Paragraph> <Paragraph position="3"> All the raters enter scores reflecting the absolute quality of the translation as compared to the reference translation given. The overall score of a sentence is the average of the scores given by the seven raters. Scores range from 1 to 4, with 1 meaning unacceptable (not comprehensible), 2 meaning possibly acceptable (some information is transferred accurately), 3 meaning acceptable (not perfect, but accurate transfer of all important information, and 4 meaning ideal (grammatically correct and all the important information is transferred).</Paragraph> </Section> </Section> class="xml-element"></Paper>