File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1075_evalu.xml
Size: 1,589 bytes
Last Modified: 2025-10-06 13:58:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1075"> <Title>Application of Analogical Modelling to Example Based Machine Translation</Title> <Section position="6" start_page="520" end_page="520" type="evalu"> <SectionTitle> 4. Evaluation </SectionTitle> <Paragraph position="0"> The training set consisted of a bilingual (EN-GR) technical corpus (automotive industry) of 5K sentences, -20K wordl'orms on each language. The process resulted in ~550 translation rules, and 350 translation units (~50 multi-word ones). The precision estimated through manual evaluation was ~75%. More than 23% of the erroneous rules were due to idiomatic expressions. The rest of tile errors was caused by imprecise translation patterns found in the corpus. However, these errors being rather exceptional, received a very low weight of effectiveness at lhe end of the process. No straight forward approach to measure the recall of the learning process was devised, since it was not easy to a-priori determine the number off rules that should be extracted fiom tile training corpus.</Paragraph> <Paragraph position="1"> Howcvcr, coverage of the final translation rule set against the corpus was measured and found equal to 38%. More specifically, the set of 500 rules could tlu'ough an inverse process generate 38% of tile corpus sentences, subsequently interpreted in a significant gain in terms of storage space. Another obvious benefit is the subsentential alignment information that is, the source and target translation units learned at tile end of the process.</Paragraph> </Section> class="xml-element"></Paper>