File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/j96-3003_intro.xml
Size: 11,716 bytes
Last Modified: 2025-10-06 14:06:02
<?xml version="1.0" standalone="yes"?> <Paper uid="J96-3003"> <Title>Efficient Multilingual Phoneme-to-Grapheme Conversion Based on HMM</Title> <Section position="4" start_page="359" end_page="371" type="intro"> <SectionTitle> 4. Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="359" end_page="361" type="sub_section"> <SectionTitle> 4.1 Explanation of tables and charts </SectionTitle> <Paragraph position="0"> In all tables and charts some symbols have been used to designate the different parameters of the experiments. More precisely, Exp n (n = 1, 2, 3) designates the experiment type as follows: Exp 1: uses a first-order HMM with correct phonemic representation of the input Exp 2: is like Exp 1 but uses a second order HMM Exp 3: is like Exp 2 but with corrupted phonemic representation simulating the output of a speech recognizer The letter combinations El, E2, and NE show the domain of the experiment: E1 experiments use the office environment corpora for training and assessment, E2 the law corpora, NE the newspaper corpora. For the name corpus (Table 8) N1 shows experiments using a corpus of surnames and OD experiments using a corpus of street names. It must be noted that in all experiments the testing material was not included in the training of the model although it may belong to the same domain.</Paragraph> <Paragraph position="1"> In Tables 1 through 8 the model parameters for all the models created for the experiments mentioned above are presented. The columns show the density (i.e., the number of nonzero elements) of the respective model parameters (initial hidden-state probability vector ~r, initial hidden-state pair probability vector p, observation-symbol probability matrix B, and hidden-state transition probability matrix A). The values are percentages. The matrix density is a way of measuring the saturation of the model, that is, whether the model is sufficiently objective or is too dependent on the nature of the training material. One can see from these tables the important differences between the languages on which the experiments were performed. In Tables 9 to 17, a summary of the conversion results is presented for the three sets of experiments carried out. The columns have the following meaning: d: l(s): l(w): 1-2, etc: Is/lw x 100% where lw is the size of a word in error (in characters), Is is the number of incorrect characters in the word, and Is/lw x 100% is the mean value estimated over all wrong words. This number is a measure of the similarity of wrong words with the corresponding correct words (percentage). A small percentage indicates a high similarity.</Paragraph> <Paragraph position="2"> symbol conversion success rate for the first position (percentage). word conversion success rate for the first position (percentage).</Paragraph> <Paragraph position="3"> word conversion success rate accounting for all the referenced positions (percentage).</Paragraph> <Paragraph position="4"> Figures 2 through 10 give an analytic overview of the results in each language. The legends of these figures have the form cc/n where cc is a two letter code for the corpus domain (E1/E2/NE/N1/OD, as described in the beginning of this section) and n is either 1 for a first-order model or 2 for a second-order model. For example, the legend E1/1 means that text of the domain E1 (office environment) was used with a first-order HMM for the experiment.</Paragraph> <Paragraph position="5"> In Figures 11 to 13, a summary for each type of experiment is shown in order to compare the performance between the languages. In Figure 14 the average number of times an output position is occupied is given for all the languages. Finally, in Figure 15, the degradation of performance as a function of the corrupted input words is shown. The differences in performance between the languages and the types of domains and models used are discussed in the following section.</Paragraph> </Section> <Section position="2" start_page="361" end_page="371" type="sub_section"> <SectionTitle> 4.2 Comments on the Performance of the Proposed System </SectionTitle> <Paragraph position="0"> One can initially observe the number of times the algorithm produced a word in each position 1 to 4 (Figure 14). This number decreases very fast from the first to the last position for most of the languages, which shows that the system does not produce extreme spellings of the input words (even though these may be allowed by the language). The second very interesting feature revealed in Tables 9 to 17 is that the improvement in the system's performance decreases rapidly from the first to the last position of the output, which means that the majority of correct suggestions is included in the first two positions. Column l(s) shows that the percentage of erroneous symbols is very small indeed, while column d shows that even though a word may be incorrect, only a small percentage of its symbols may be wrong (about 15% on average in Exp 2), which proves that the output of the algorithm is very easily human-readable even when it contains errors.</Paragraph> <Paragraph position="1"> The performance of the algorithm varied widely, depending on the language being tested. This is due to the differences in spelling in each language and, consequently, to the training the model required. As described in Section 3.1, the available material for training were 300k-word corpora for all languages. This amount was sufficient for some languages (Dutch, German, Italian, Greek, and Spanish) but insufficient for</Paragraph> <Paragraph position="3"/> <Paragraph position="5"/> <Paragraph position="7"> Overview of results for Greek.</Paragraph> <Paragraph position="8"> Overview of results for Spanish. Overview of results for names. others (English and French). A more detailed presentation of the algorithm's behavior in the languages tested follows.</Paragraph> <Paragraph position="9"> For Dutch, the model gives relatively good results (97.6% for four output candidates). Spelling in Dutch is rather straightforward for etymologically Dutch words, but words of foreign origin are usually spelled as in the lauguage of their origin. These words are responsible for most of the errors encountered. The model performed worse for English than for the other languages mainly because the relationship between pronunciation and spelling is less regular. This resulted in fewer grapheme transitions in the training corpus and meant that the standard training period was insufficient. Another problem is that compound words usually keep the initial pronunciation of their components (e.g., in words such as &quot;whatsoever&quot;, &quot;therefore&quot;, etc.); this leads to many errors for an algorithm like the one proposed here, which has no information about the origin and etymology of each word. Similar work (Parfitt and Sharman 1991) shows the same problems in a slightly different context. Of course, more training of the model would improve performance. With French, there is a special problem, which does not occur with other languages: there exist many homophones that are distinguished only by the presence or absence of various mute letters at the ends of the words. This feature significantly increases the number of states that have to be defined. Consequently, the available training material was inadequate for the creation of a correct model, and led to poor performance. The model performed well with German. The only drawback was the decision about the type of the first letter (uppercase or lowercase); nouns always start with a capital letter while other words do not. This is the primary cause of the errors introduced in the experiments with German. Experiments ignoring this ambiguity significantly improved the German results as can be seen from a comparison of Figure 5 Degradation of output vs input corruption.</Paragraph> <Paragraph position="10"> and Figure 6.</Paragraph> <Paragraph position="11"> With Greek, the model behaved quite well, reaching more than 99% success for the second-order HMM experiments with up to four output candidates. Figure 7 illustrates the difference between the performance of Exp 1 and Exp 2 (order of model) in the first output position. These results are the consequence of two contradictory features of the Greek language: a.</Paragraph> <Paragraph position="12"> b.</Paragraph> <Paragraph position="13"> every grapheme is usually pronounced in the same way (i.e., corresponds to one phoneme), and every phoneme usually has more than one possible spellings regardless of its neighboring phonemes.</Paragraph> <Paragraph position="14"> As an example, the phoneme/i/can be transcribed as z, 7/, v, C/z, and oz in almost any context (Petrounias 1984; Setatos 1974). Other problems arise from the consonants, which can be either single or double without any change in the pronunciation. Finally, the model gave extremely good results with Italian and Spanish, reaching more than 99% success for the second order model and up to two or three output candidates for known and unknown text experiments, respectively. This is because there is usually a one-to-one correspondence between phonemes and graphemes in these languages.</Paragraph> <Paragraph position="15"> Another dimension of the analysis of the results is the domain of the experiment. The model behaved best in experiments that used the newspaper corpora, which are more casual in style and richer in vocabulary than the other domains. These corpora usually contain more grapheme transitions, which give greater detail about the spelling mechanism of the language, and provide the most efficient training possible. The Computational Linguistics Volume 22, Number 3 experiments on the Name corpora resulted in lower scores than the corresponding experiments on the Hellenic general corpora reaching 92.3% for Exp 1, 98.3% for Exp 2, and 93.4% for Exp 3 for four output candidates. The main difference in the success rate (Table 17) is due to the size of the training corpora (the training, especially with the street names, was inadequate) and to the fact that names are usually spelled or pronounced in a more arbitrary way than other words.</Paragraph> <Paragraph position="16"> Finally, as expected, the model performed worse in experiments using as input a simulation of a speech recognizer output (distorted speech) than in the corresponding experiments using a correct phonemic representation of the words. However, by measuring the ambiguity introduced by the speech recognizer output, it can be seen that the PTGC system in fact improved the performance of the overall system (recognizer simulator and PTGC). This was also expected, since in the training phase the model is trained using the correct graphemic form of the words, which is later reproduced in the conversion experiments. Evidently, the performance of the algorithm depends on the amount of distortion introduced in the input phonemic string. Figure 15 shows the degradation of the success rate of the algorithm as a function of the corruption of the input stream. The dashed lines refer to a first-order HMM experiment, while the solid lines refer to a second-order HMM experiment. The input degradation does not affect the overall system performance very much (in any of the four output positions) even when more than 85% of the input words have at least one incorrect phoneme. It must be noted that, in this case, about 30% of the input symbols (unit phonemes) have been replaced by erroneous ones but still the score of the first four positions remains above 98%.</Paragraph> </Section> </Section> class="xml-element"></Paper>