File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0406_metho.xml
Size: 5,206 bytes
Last Modified: 2025-10-06 14:14:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0406"> <Title>Dealing with Multilinguality in a Spoken Language Query &quot; Translator</Title> <Section position="4" start_page="40" end_page="40" type="metho"> <SectionTitle> 2 Does accent affect speech </SectionTitle> <Paragraph position="0"> recognizer performance? We have performed a set of experiments to compare tile effect of different accents. We train two sets of models: an English model using native American English speakers as reference and a Cantonese model using native Cantonese speakers as references. Word models of 34 (17 English and 17 Cantonese) simple commands were trained using 6 utterances of each command per speaker. The models were evaluated using a separate set of native Cantonese and native American English speakers. The recognition results are shown in Figure 2.</Paragraph> <Paragraph position="1"> Our experimental results support the claim that recognition accuracy degrades in the presence of an unmodelled accent. In order to bring the recognizer performance for the non-native speaker to that of the native speaker, we need to improve the models in the recognizer. An obvious solution seems to train the model on different accents. However, it is quite a daunting task to train every language with every type of accent. One approximation is to train the system with a mixture of separate languages so that the model parameters would capture the spectral characteristics of more than one language. A mechanism for gradual accent adaptation might potentially increase recognition accuracies of the speech recognizers of both source and target languages. null 3 How to deal with mixed language recognition? Consider two possible ways to implement a mixed language recognizer--(1) Use two pure monolingual recognizers to recognize different parts of the mixed language separately; (2) Use a single mixed language model where the word network allows words in both languages. Method (1) requires some sort of language identification to switch between two recognizers whereas method (2) seems to be more flexible and efficient.</Paragraph> <Paragraph position="2"> We compared the recognition accuracies of a pure language recognizer with a mixed language recognizer. In the pure language recognizer, the word candidates are all from a single language dictionary, whereas the mixed language dictionary contains words from two dictionaries. See Figure 3. In the concatenation model, we assume a priori knowledge (possibly from a language identifier) of the language ID of the words. The expected recognition rate of the concatenation model is the product of the accuracies of the pure language model.</Paragraph> <Paragraph position="3"> From this preliminary experiment, we discover that although a mired language model offers greater flexibility to the speaker, it has a considerably lower performance than that of the concatenation of two pure language models. The reason for such a performance degradation of a mixed model is not difficult to deduce--the dictionary of a mixed model has more candidates. Consequently, the search result is less accurate. If the recognizer knows a priori which dictionary (English or Chinese) it should search for a particular word, it would make less error.</Paragraph> <Paragraph position="4"> This is therefore a potentially interesting problem. Should we incorporate a language identifier in parallel to the recognizers or should we accept tile loss in recognition rate but enjoy the flexibility of a mixed language recognizer? We will implement a language identifier and carry out more experiments to compare the output from the recognizers.</Paragraph> <Paragraph position="5"> 4 Can the source and target languages share the same recognition engine? One important issue for multilinguality in a spoken language translator is the complexity of implementing more than one recognizer in the system. An efficient approach is to use recognizers which are identical except for parameter values. Will this enable robust recognizers? The word-based HMM recognizers for English and Cantonese use identical features (Nine MFCCs and nine delta MFCCs.) The same microphone was used to record both languages. The same initialization procedure was used to initialize the recognizer for both languages. For English, the number of HMM states is deduced from spectrograms. For Cantonese, it is deduced from phoneme numbers for each word.</Paragraph> <Paragraph position="6"> The recognizers were evaluated using native English and Cantonese speakers who were not in the training set.</Paragraph> <Paragraph position="7"> In general, the English recognizer is more robust than our Cantonese recognizer even though identical parameter set, training and testing mechanisms are used. Rather than jumping to the conclusion that a different feature set is needed for Cantonese, we would like to find out what other factors could cause a lower performance of the Cantonese recognizer. For example, we would like to perform experiments on a larger number of speakers to determine whether training and test speaker mismatch caused such a performance degradation.</Paragraph> </Section> class="xml-element"></Paper>