File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/p95-1050_evalu.xml
Size: 1,709 bytes
Last Modified: 2025-10-06 14:00:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1050"> <Title>Identifying Word Translations in Non-Parallel Texts</Title> <Section position="5" start_page="320" end_page="321" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> The simulation was conducted by randomly permuting the word order of the German matrix and then computing the similarity s to the English matrix.</Paragraph> <Paragraph position="1"> For each permutation it was determined how many words c had been shifted to positions different from those in the original German matrix. The simulation was continued until for each value of c a set of 1000 similarity values was available. 8 Figure 1 shows for the three formulas how the average similarity J between the English and the German matrix depends on the number of non-corresponding word positions c. Each of the curves increases monotonically, with formula 1 having the steepest, i. e. best discriminating characteristic. The dotted curves in figure 1 are the minimum and maximum values in each set of 1000 similarity values for formula 1.</Paragraph> <Paragraph position="2"> X The logarithm has been removed from the mutual information measure since it is not defined for zero cooccurrences. null =Normalization was conducted in such a way that the suxn of all matrix entries adds up to the number of fields in the matrix.</Paragraph> <Paragraph position="3"> Sc ---- 1 is not possible and was not taken into account. of the English and the German matrix and the number of non-corresponding word positions c for 3 formulas. The dotted lines are the minimum and maximum values of each sample of 1000 for formula 1.</Paragraph> </Section> class="xml-element"></Paper>