File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-3003_metho.xml
Size: 5,710 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-3003"> <Title>Can the Internet help improve Machine Translation?</Title> <Section position="4" start_page="219" end_page="219" type="metho"> <SectionTitle> 2 Online Elicitation of MT Errors </SectionTitle> <Paragraph position="0"> The main challenge of the error elicitation part of this work is how to elicit minimal post-editing information from non-expert bilingual speakers. The Translation Correction Tool (TCTool) is a user-friendly online tool that allows users to add, delete and modify words and alignments, as well as to drag words around to change word order. A set of user studies was conducted to discover the right amount of error information that bilingual speakers can detect reliably when using the TCTool. These studies showed that simple error information can be elicited much more reliably (F1 0.89) than error type information (F1 0.72) (Font Llitjos and Carbonell, 2004). Most importantly, it became apparent that for our Rule Refinement purposes, the list of correction action(s) with information about error and correction words is sufficient.</Paragraph> <Paragraph position="1"> Building on the example introduced above, Figure 2 shows the initial state of the TCTool, once the user has decided that the translation produced by the MT system is not correct.</Paragraph> <Paragraph position="2"> Figure 2. TCTool snapshot with initial translation pair In this case, the bilingual speaker changed 'grande' to 'gran' and dragged 'gran(de)' in front of 'artista', effectively flipping the order of these two words. Figure 3 shows the state of the TCTool guage sentence (SL), the target language sentence (TL) and the initial alignments (AL), as well as all the correction actions done by the user. It also provides the corrected translation (CTL) and final alignments.</Paragraph> <Paragraph position="3"> The Rule Refinement (RR) module processes one action at a time. So in this approach, the order in which users correct a sentence does have an impact on the order in which refinements apply.</Paragraph> </Section> <Section position="5" start_page="219" end_page="219" type="metho"> <SectionTitle> 4 Lexical Refinements </SectionTitle> <Paragraph position="0"> After having stored all the relevant information from the log file, the Rule Refinement module starts processing the Correction Instance. In the example above, it first goes into the lexicon and, after double checking that there is no lexical entry for [greatgran], it proceeds to add one by duplicating the lexical entry for [greatgrande]. Since these two lexical entries are identical at the feature level, the RR module postulates a new binary feature, say feat1 , which serves the purpose of distinguishing between two words that are otherwise identical (according to our lexicon): A more mnemonic name for feat1 would be pre-nominal.</Paragraph> </Section> <Section position="6" start_page="219" end_page="221" type="metho"> <SectionTitle> 5 Rule Refinements </SectionTitle> <Paragraph position="0"> Now the RR module moves on to process the next action in the Correction Instance and the first step is to look at the parse trace output by the MT system, so that the grammar rule responsible for the error can be identified: At this point, the system extracts the relevant rule (NP,8) from the grammar, and has two options, either to make the required changes directly onto the original rule (REFINE) or to make a copy of the original rule and modify the copy (BIFUR-CATE). If the system has correctly applied the rule in the past (perhaps because users have evaluated the translation pair &quot;She saw a dangerous man Ella vio un hombre peligroso&quot; as correct), then the RR module opts for the BIFURCATE operation. In this case, the RR module makes a copy of the original rule (NP,8) and then modifies the copy (NP,8') by flipping the order of the noun and adjective constituents, as indicated by the user. This rule needs to unify with 'gran' but not with 'grande', and so the RR module proceeds to add the constraint that the Spanish adjective (now y2) needs to have the feat1 with value +: These two refinements result in the MT system generating the desired translation, namely &quot;Gaudi era un gran artista&quot; and not the previous incorrect translation. But can the system also eliminate other incorrect translations automatically? In addition to generating the correct translation, we would also like the RR module to produce a refined grammar that is as tight as possible, given the data that is available. Since the system already has the information that &quot;un artista gran&quot; is not a correct se- null quence in Spanish, the grammar can be further refined to also rule out this incorrect translation. This can be done by restricting the application of the general rule (NP,8) to just post-nominal adjectives, like 'grande', which in this example are marked in the lexicon with (feat1 = [?] ).</Paragraph> </Section> <Section position="7" start_page="221" end_page="221" type="metho"> <SectionTitle> 6 Generalization power </SectionTitle> <Paragraph position="0"> The difference between this approach and mere post-editing is that the resulting refinements affect not only to the translation instance corrected by the user, but also to other similar sentences where the same error would manifest. After the refinements have been applied to the grammar in our example sentence, a sentence like &quot;Irina is a great friend&quot; will now correctly be translated as &quot;Irina es una gran amiga&quot;, instead of &quot;Irina es una amiga grande&quot;.</Paragraph> </Section> class="xml-element"></Paper>