File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1078_evalu.xml
Size: 5,897 bytes
Last Modified: 2025-10-06 13:58:33
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1078"> <Title>Chart-Based Transfer Rule Application in Machine Translation</Title> <Section position="8" start_page="540" end_page="541" type="evalu"> <SectionTitle> 7 Results </SectionTitle> <Paragraph position="0"> Figure 5 gives our results for both experiments 1 and 2, both with normalization (Norm) and without (No Norm). &quot;Total Translations&quot; refer to the number of sen|;ences which were translated successfully 1)y the system and &quot;Over Edge Limit&quot; refers to the numl)er of sentences which caused the system to exceed the edge limit, i.e., once the system produces over 10,000 edges, trm~slation failure is assmned. The system cur7Scoring for special cases is not; included in this paper. These cases include rules for conjunctions and rules ibr words that do not match any transfer rules in a given context (we currently leave the word untranslated.) rently will only fail to produce some translation for any input if the edge limit is exceeded. &quot;Actual Edges&quot; reibrs to the total number of edges used tbr attempting to translate every sentence in the corpus. &quot;Minimum Edges&quot; refer to the total minimum number of edges required for successful translations. The &quot;Edge Ratio&quot; is a ratio between: (1) &quot;Total Edges&quot; less the mnnber of edges used in failed translations; and (2) The &quot;Minimum Edges&quot;. This ratio, in coml)ination with, the number of &quot;Over Edge Limit&quot; measures the efficiency of a given system. &quot;Accuracy&quot; is an assessment of translation quality which we will discuss in the next section.</Paragraph> <Paragraph position="1"> Normalization caused significant speed-up for both experiments. If you compare the total number of edges used with and without normalization, speed-up is a factor of 6.2 for Experiment I and 5.3 for Experiment 2. If you compare actual edge ratios, speed-up is a factor of 4:.5 tbr Experiment 1 and 3.9 tbr Experiment 2. In addition, the number of failed parses went down by a fhctor of 10 for both experiments. As should be expected, accuracy was virtually the same with and without normalization, although normalization <lid cause a slight improvement. Normalization should produce the essentially the same result in less time.</Paragraph> <Paragraph position="2"> These results suggest that we can probably count on a speed-up of at least 4 and a signif icant decline in failed parses by using normMization. The ditferences in performance on the two corpora are most likely due to the degree of hand-tuning for Experiment 1.</Paragraph> <Section position="1" start_page="541" end_page="541" type="sub_section"> <SectionTitle> 7.1 Our Accuracy Measure </SectionTitle> <Paragraph position="0"> &quot;Accuracy&quot; in Figure 5 is the average of the tbllowing score for each translated sentence:</Paragraph> <Paragraph position="2"> TNZU is the set of words in NYU's translation and TMS is the set of words in the original Microsoft translation. If TNYU = &quot;A B C D E&quot; and TMS = &quot;A B C F&quot;, then the intersection set &quot;A B C&quot; is length 3 (the numerator) and the average length of TNZU and TMS is 4 1/2 (the denominator). The accuracy score equals 3 + 4 1/2 = 2/3. This is a Dice coefficient comparison of our translation with the original. It is an inexpensive nmthod of measuring the pertbrmance of a new version of our system, hnprovements in the average accuracy score for our san> ple set; of sentences usually reflect an improvement in overall translation quality. While it is significant that the accuracy scores in Figure 5 did not go down when we normalized the scores, the slight improvement in accuracy should not be given nmch weight. Our accuracy score is flawed in that it cannot account for the following facts: (1) good paraphrases are perfectly acceptable; (2) some diflbrences in word selection are more significant than others; and (3) errors in syntax are not directly accounted tbr.</Paragraph> <Paragraph position="3"> NYU's system translates the Spanish sentence &quot;1. Selection la celda en la que desea introducir una rethrencia&quot; as &quot;1. select the cell that you want to enter a reference in&quot;. Microsoft translates this sentence as &quot;1. Select the cell in which you want; to enter the reference&quot;.</Paragraph> <Paragraph position="4"> Our system gives NYU's translation an accuracy score of .75 due to the degree of overlap with Microsoft's translation. A truman reviewer wouhl probably rate NYU's translation as completely acceptable. In contrast, NYU's system produced the following unacceptable translation which also received a score of .75: the Spanish sentence &quot;Elija la funcidn que desea pegar en la f6rmula en el cuadro de di~logo Asistente para flmciones&quot; is translated as &quot; &quot;Choose the flmction that wants to paste Function Wizard in the formula in the dialog box&quot;, in contr,~st with Microsoft's translation &quot;Choose the flmction you want to paste into the tbrmula fl'om the Function Wizard dialog box&quot;. In fact, some good translations will get worse scores than some bad ones, e.g., an acceptable one word translation can even get a score of 0, e.g.,&quot;SUPR&quot; was translated as &quot;DEL&quot; by Microsoft and as &quot;Delete&quot; by NYU. Nevertheless, by averaging this accuracy score over many examples, it has proved a valuable measure for comparing different versions of a particular system: better systems get better results. Similarly, after tweaking the system, a better translation of a particular sentence will usually yield a better score.</Paragraph> </Section> </Section> class="xml-element"></Paper>