File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1606_intro.xml
Size: 1,966 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1606"> <Title>SPMT: Statistical Machine Translation with Syntactified Target Language Phrases</Title> <Section position="4" start_page="0" end_page="44" type="intro"> <SectionTitle> 2 SPMT: statistical Machine Translation with Syntactified Phrases </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="44" type="sub_section"> <SectionTitle> 2.1 An intuitive introduction to SPMT </SectionTitle> <Paragraph position="0"> After being exposed to 100M+ words of parallel Chinese-English texts, current phrase-based statistical machine translation learners induce reasonably reliable phrase-based probabilistic dictionaries. For example, our baseline statistical phrase-based system learns that, with high probabilities, the Chinese phrases &quot;ASTRO- -NAUTS&quot;, &quot;FRANCE AND RUSSIA&quot; and &quot;COMINGFROM&quot; can be translated into English as &quot;astronauts&quot;/&quot;cosmonauts&quot;, &quot;france and russia&quot;/&quot;france and russian&quot; and &quot;coming from&quot;/&quot;from&quot;, respectively. 1 Unfortunately, when given as input Chinese sentence 1, our phrase-based system produces the output shown in 2 and not the translation in 3, which correctly orders the phrasal translations into a grammatical sequence. We believe this happens because the distortion/reordering models that are used by state-of-the-art phrase-based systems, which exploit phrase movement and ngram target 1To increase readability, in this paper, we represent Chinese words using fully capitalized English glosses and English words using lowercased letters.</Paragraph> <Paragraph position="1"> language models (Och and Ney, 2004; Tillman, 2004), are too weak to help a phrase-based decoder reorder the target phrases into grammatical outputs.</Paragraph> </Section> </Section> class="xml-element"></Paper>