File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/j00-2003_evalu.xml

Size: 12,563 bytes

Last Modified: 2025-10-06 13:58:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="J00-2003">
  <Title>A Multistrategy Approach to Improving Pronunciation by Analogy</Title>
  <Section position="7" start_page="208" end_page="209" type="evalu">
    <SectionTitle>
7. Results
</SectionTitle>
    <Paragraph position="0"> In this section, we first detail some characteristics of the shortest paths through the pronunciation lattices (since these affect the attainable performance) before demonstrating that the combination strategy produces statistically significant improvements.</Paragraph>
    <Paragraph position="1"> It is largely immaterial if we use the sum or the product rule. Finally, the distribution of errors for the best-performing combination of scores is analyzed in order to set priorities for future research in improving PbA.</Paragraph>
    <Section position="1" start_page="208" end_page="209" type="sub_section">
      <SectionTitle>
7.1 Characteristics of the Shortest Paths
</SectionTitle>
      <Paragraph position="0"> Since we are focusing in this work on D&amp;N's second heuristic (disambiguating tied shortest paths), it makes sense to investigate the limits set by tacit acceptance of D&amp;N's first heuristic--which gives primacy to the shortest path. Table 5 shows some statistics related to the shortest paths for the three different conversion problems studied.</Paragraph>
      <Paragraph position="1"> The minimal percentage indicates the lower bound on words-correct performance that obtains when the second heuristic is irrelevant, i.e., when all the shortest paths through the lattice give the identical, correct pronunciation. On the other hand, the maximal percentage indicates the upper bound that obtains when the second heuristic always chooses the correct candidate, i.e., there is at least one correct pronunciation among the shortest paths. Overall, these statistics indicate that there is considerable scope to improve the second heuristic, since the upper bound of 85.1% words correct for letter-to-phoneme conversion, for instance, is vastly superior to our previous best value of 61.9% and to the figure of 64.0% obtained by Yvon (1996) on the same lexicon using multiple unbounded overlapping chunks as the nodes of the pronunciation lattice. They also suggest (in line with our intuitions) that letter-to-phoneme conver- null sion is harder than phoneme-to-letter conversion, and that lexical stress assignment is harder still.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="209" end_page="213" type="evalu">
    <SectionTitle>
7.2 Results for Different Combinations of Strategy
</SectionTitle>
    <Paragraph position="0"> We have obtained results for all possible combinations of the strategies, for each of the three mapping problems. Since there are five strategies, the number of combinations is (25 - 1) = 31. The various combinations are denoted as a five-bit code where 1 at position i indicates that strategy si was included in the combination (i.e., 6si = 1 in Equations (4) and (5)) and a 0 indicates that it was not. Thus, as an example, the code 00100 indicates that Strategy 3 (FSP) was used alone.</Paragraph>
    <Paragraph position="1"> Table 6 gives an example of the use of the combination 11010 and product rule in deriving a pronunciation for the word longevity using the product rule of combination.</Paragraph>
    <Paragraph position="2"> The points that contribute to the final score are shown in bold. Note that the winner (Candidate 4) gives the correct pronunciation in this case. When the sum rule is used, the correct pronunciations (Candidates 4 and 6) tie with a final score of 13.5.</Paragraph>
    <Paragraph position="3"> Table 7 shows the results of letter-to-phoneme conversion for the 31 possible combinations of scoring strategy using both the product and sum rules. The average across the 31 combinations was 62.42% words correct for the product rule compared to 62.30% for the sum rule. This difference is not significant (on the basis of Equation (6) below).</Paragraph>
    <Paragraph position="4"> Since the product rule gave numerically higher values, however, we continue to use it for the remainder of this paper.</Paragraph>
    <Paragraph position="5"> Tables 8, 9, and 10 show the results obtained with all possible combinations of strategies for the three conversion problems using the product rule. Consider first the results for letter-to-phoneme conversion (Table 8). The last two columns show the rank according to the number of strategies included in the final score (Rank(C)) and the rank according to word accuracy (Rank(W)). Let us hypothesize that these two  ranks are not positively correlated. That is, our null hypothesis is that performance (in terms of word accuracy) does not increase as more scoring strategies si for candidate pronunciation Cj are included in the final score FS(Cj). However, the Spearman rank correlation coefficient rs (Siegel 1956, 202-213) is computed here as 0.6657, with degrees of freedom df = (31 - 2) = 29. For df ~ 10, the significance of this result can be tested as: V~ df _4.8041 t = rs 1 - r 2 This value is very highly significant (p ~ 0.0005, one-tailed test). Hence, we reject the null hypothesis and conclude that performance improves as more scores are included in the combination. Note that this test is nonparametric, and makes a minimum of assumptions about the data--only that they are ordinal and so can be meaningfully ranked.</Paragraph>
    <Paragraph position="6">  Computational Linguistics Volume 26, Number 2 Table 8 Results of letter-to-phoneme conversion for the 31 possible combinations of scoring strategy using the product rule. Rank(C) is the rank of the result according to the number of strategies (in the range 1 to 5) included in the final score. Rank(W) is the rank of the result according to word accuracy. The Spearman rank correlation coefficient rs is 0.6657, which is very highly significant.</Paragraph>
    <Paragraph position="7">  Having shown that there is a very highly significant positive correlation between the number of strategies deployed and the obtained word accuracy, we next ask if the obtained improvement is significant. (This is to take account of the possibility that the difference between two combination strategies ranked at positions i and (i + k) is not significant.) To answer this, we note that only two outcomes are possible for the translation of each word: either the pronunciation is correct or it is not. Thus, the sampling distribution of the word accuracies listed in the second column of Table 8 is binomial and, hence, we can use a binomial test (Siegel 1956, 36-42) to determine the significance of differences between them. Since the number of trials (i.e., word translations) is very large (~20,000), we can use the normal approximation to the binomial distribution.</Paragraph>
    <Paragraph position="8"> Let us first ask if the best letter-to-phoneme conversion result here (65.5% word accuracy for combination 11111) is significantly better than the previous best, preliminary</Paragraph>
    <Section position="1" start_page="212" end_page="212" type="sub_section">
      <SectionTitle>
Marchand and Damper Improving Pronunciation by Analogy
</SectionTitle>
      <Paragraph position="0"> value of 61.7%. The appropriate statistic is (Siegel 1956, 41):</Paragraph>
      <Paragraph position="2"> where N = 19, 594 words, P = 0.617, Q = (1 - P) = 0.383, x = 19, 594 x 0.655 and the +0.5 term (correction for the fact that the binomial distribution is discrete while the normal distribution is continuous) can be ignored, giving z = 10.9. The (one-tailed) probability that this value could have been obtained by chance is effectively zero (untabulated in Siegel's Table A \[p. 247\]). In fact, the critical value for the 1% significance level is z = 2.33, which equates to a word accuracy of approximately 64.7%. It follows that even the best single-strategy result (63.0% for combination 00100 using Strategy 3 only) is significantly poorer than the multistrategy result using all five scoring strategies. null Actually, given its simplicity, it is remarkable that Strategy 3 (frequency of the same pronunciation, FSP) used alone performs as well as it does. It was included partly to test the effect of including what were felt to be oversimplistic strategies! Yet it is superior to the previous best result of 61.7% using the weighted TP score, and the superiority is very highly significant (z = 3.7, p = 0.00011). In fact, for all three mapping problems, Strategies 1 (PF) and 3 are always implicated in results of rank less than 3, indicating their importance in obtaining high performance.</Paragraph>
      <Paragraph position="3"> Turning to phoneme-to-letter conversion (Table 9), the Spearman rank correlation coefficient was 0.6375, which again is very highly significant (t = 4.456, p KK 0.0005, df = 29, one-tailed test). Hence, as before, performance improves as more scoring strategies are deployed. The critical z value of 2.33 for the 1% significance level equates to a word accuracy of 74.7% relative to the best obtained word accuracy of 75.4% for combination 10101. Hence, the best result is significantly better than either the previous best value (74.4%) or the best single-strategy result (73.5% for combination 10000).</Paragraph>
      <Paragraph position="4"> For letter-to-stress conversion (Table 10), the Spearman rank correlation coefficient was 0.7411 (t = 5.944, p KK 0.0005, df = 29, one-tailed test) so that, once again, performance improves as more scoring strategies are deployed. The critical z value of 2.33 for the 1% significance level equates to a word accuracy of 58.0% relative to the best obtained word accuracy of 58.8% for combination 11100. Hence, the best result is significantly better than either the previous best value (54.6%) or the best single-strategy result (53.4% for combination 00100).</Paragraph>
      <Paragraph position="5"> Finally, the percentage of words in which both pronunciation and stress are correct increases from 41.8% to 46.3%.</Paragraph>
    </Section>
    <Section position="2" start_page="212" end_page="213" type="sub_section">
      <SectionTitle>
7.3 Analysis of Errors
</SectionTitle>
      <Paragraph position="0"> Table 11 identifies the main sources of error for letter-to-phoneme conversion using the 11111 combination strategy. Table 11(a) indicates, in rank order, the 10 letters in the input that were most often mapped to an incorrect phoneme. The commonest problem is mispronunciation of letter e, which produces 21.2% of the total errors. To some extent, this is a natural consequence of the high frequency of this letter in English: as indicated in the Proportion column, letter e accounts for 11.0% of the total corpus.</Paragraph>
      <Paragraph position="1"> Even so, the ratio of errors to occurrences is almost 2, while it actually exceeds 2 for letters a and o. It is clear, as other workers have found, that the vowel letters are vastly more difficult to translate than are the consonant letters.</Paragraph>
      <Paragraph position="2"> Table 11(b) ranks the 10 commonest incorrect phonemes in the system's output.</Paragraph>
      <Paragraph position="3"> The schwa vowel accounts for 20.8% of errors in this case. Again, this partially reflects the extremely common occurrence of this phoneme.</Paragraph>
      <Paragraph position="4">  Computational Linguistics Volume 26, Number 2 Table 9 Results of phoneme-to-letter conversion for the 31 possible combinations of scoring strategy using the product rule. Rank(C) is the rank of the result according to the number of strategies (in the range 1 to 5) included in the final score. Rank(W) is the rank of the result according to word accuracy. The Spearman rank correlation coefficient rs is 0.6375, which is very highly significant.</Paragraph>
      <Paragraph position="5">  Finally, Table 11(c) shows the 10 phonemes in the correct pronunciation that most often received a wrong translation. These are the same 10 phonemes as for Table 11(b), but in slightly different rank order. Once more, it is clear that vowel errors vastly outnumber consonant errors overall. The null phoneme is also problematic.</Paragraph>
      <Paragraph position="6"> This pattern of errors is very close to that for the preliminary results. We have exactly the same 10 main letters/phonemes responsible for errors in each column with only minor changes in their rank order. These similarities suggest that these particular errors are persistent even structural--and will cause problems for other translation schemes as well as PbA.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML