File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-2158_abstr.xml
Size: 11,018 bytes
Last Modified: 2025-10-06 13:41:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2158"> <Title>MT and Topic-Based Techniques to Enhance Speech Recognition Systems for Professional Translators</Title> <Section position="1" start_page="0" end_page="1063" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Our principle objective was to reduce the error rate of speech recognition systems used by professional translators. Our work concentrated on Spanish-to-English translation. In a baseline study we estimated the error rate o1' an off-the-shelf recognizer to he 9.98%. in this paper we describe two independent methods ot' improving speech recognizers: a machine translation (MT) method and a topic-based one. An evaluation of the MT method suggests that the vocabulary used for recognition cannot be completely restricted to the set of translations produced by the MT system and a more sophisticated constraint system must be used. An ewduation of the topic-based method showed significanl error rate reduction, to 5.07%.</Paragraph> <Paragraph position="1"> Introduction Our goal is to improve tim throughput of professional translators by using speech recognition. The problem with using current off-the-shelf speech recognition systems is that these systems have high error rates for similar tasks. If the task is simply to recognize the speech of a person reading out loud, the error rate is relatively low; the error rate of large vocabulary research systems (20,000-60,000 word vocahularies) performing such a task is, at best, around 10% (see, for example, Robinson and Christie 1998, Renals and Hochberg 1996, Hochberg et al. 1995 and Siegler and Stern 1995). The popular press has reported slightly lower results for commercial systems. For example, PC Magazine (Poor 1998) compared Dragon's NaturallySpeaking and IBM's ViaVoice (both continuous speech recognition systems with approximately 20,000 word vocabularies). They evaluated these systems by having five speakers read a 350 word text at a slow pace (1.2 words/second) after completing a half hour training session with each system. The average recognition error rate was 11.5% (about 40 errors in the 350 word text). An evahmtion of the same two systems without training resulted in a recognition error rate of 34% (Keizer 1998). If the task is more difficult than recognizing the speech of a person reading, the error rate increases dramatically. For example, Ringger (1995) reports an average error rate of 30% for recognizing careful, spontaneous speech on a specific topic. However, the error rate of paced speech can be as low as 5% if the vocahulary is severely limited or if the text is highly predictable and the system is tuned to that particular genre. Unfortunately, the speech of expert translators producing spoken translations fr &quot; ~' does not fall into any of the &quot;easy to rcc%nlze categories.</Paragraph> <Paragraph position="2"> In many translation tasks the source doctunent is in electronic form and the obvious question to ask is if an analysis of the source document could lead to a reduction of the speech recognition error rate. For example, suppose we have a robust machine translation system and use it to generate all the possible translations of a given source text. We could then use this set of translations to help predict what the translator is swing. We describe this approach in SS1 below. A simpler approach is to identify the topic o1' the source text and use that topic to aid in speech recognition. Such as approach is described in SS2. Both methods were tested in a Spanish-to-English translation task.</Paragraph> <Paragraph position="3"> This research rests on two crucial ideas. The first is that lexical and translation knowledge extracted from source documents by automated natural language processing can be utilized in a large-vocabulary, continuous speech recognizer to achieve low word-error rates. The second idea is that the translator should be able to dictate a translation and correct the resulting transcription in much less time than if they had to type the translation themselves or rely on a transcribedtypist.</Paragraph> <Paragraph position="4"> 1. Using machine translation The difference between a typical speech dictation system and the situation described above, is that the translator is viewing the source text on a computer--that is, the text is available online. This source text can be analyzed using a machiue translation (MT) component. Hopefully, this analysis will cut down on the recognition perplexity by having the recognizer make choices only from the set of possible renderings in the target language of the words in the source language. In this section we describe the MT subsystem in detail.</Paragraph> <Paragraph position="5"> The function of this subsystem is to take Spanish senteuces as input and produce a set of English words that are likely to occur in translations of these sentences. For example, il' the Spanish text is 1. Bulros Ghali propone v\[a diplomdtica para solucionar crisis haitiana we would expect the translation set to include the words (among others): { Boutros, Ghali, proposes, diplomatic, route, to, settle, Haitian, crisis} Hopefully, this translation set will be a good predictor of what the translator actually said.</Paragraph> <Section position="1" start_page="1061" end_page="1062" type="sub_section"> <SectionTitle> 1.1 The MT subsystem </SectionTitle> <Paragraph position="0"> The MT subsystem consists of 4 components: the Spanish morphological analyzer, the dictionary lookup component, the lexical transfer component, and the English morphological generator. These components are briefly described in this section.</Paragraph> <Paragraph position="1"> 1.1.1 Spanish molphological analyzer The morphology analyzer takes Spanish words as input and outputs a set of possible morphological analyses for those words. Each analysis consists of the root word and a set of feature structures representing the inl'ormation obtained t'rom inl'lectional morphology.</Paragraph> <Paragraph position="2"> Examples are given below.</Paragraph> <Paragraph position="3"> The dictionary lookup component takes a feature structure produced by the morphological analyzer, looks up the root-word/part-of-speech pair in the dictionary, and adds information to the existing feature structure. The words in the dictionary were derived l'rom doing a corpus analysis of a set of 20 Spanish test documents. All the unique words in this corpus, including proper uouns, were included in the dictionary (approximately 1,500 words). A few examples are shown below.</Paragraph> <Paragraph position="4"> activMad ((root actividad) (cat n) (trans activity energy) (gender 1)) comenzar ((root comenzar)(cat v)(trans begin start) (verbtype irregular 129)) cuestion ((root cucstion) (cat n) (lrans C/lUeStion dispute problem issue) (gender 1)) 1.1.3 The lexical trcmffer component At the cud of the dictionary lookup phase, \[br each word in the Spanish sentence we have a feature structure containing the information in the dictionary entry along with the parameter values that were gained from morphological analysis. One feature, trans, contains the possible English translations o1' that Spanish word. The lexical transl'er component converts this Spanish feature structure to one or more English feature structures; one feature structure is created lbr each value in the trans field. For example, the feature structure associated with an instance of actividad encountered in some text will be 'transferred' to two English feature structures: one for activity and one for energy. Similarly, encountering a cuestion in some text, will result in the creation of four feature structures; those representing the English words question, diaT)ute, problem, and issue. In addition, the transfer component converts other lbatures in the Spanish feature structure to features recognizable lo the English morl)hological generator.</Paragraph> <Paragraph position="5"> We used an English Morphological genenttor developed at the Colnputing Research Laboratory at New Mexico State University by Steve Beale. The morphological generator takes feature structures as input and produces correctly inflected English words. Examples of the feature structures used as input and their associated output are illustrated below: ((root run) (cal v) (hum arertmning</Paragraph> <Paragraph position="7"/> </Section> <Section position="2" start_page="1062" end_page="1063" type="sub_section"> <SectionTitle> 1.2 Evaluation </SectionTitle> <Paragraph position="0"> Sut)pose we wish to have a user dictate an English translation of a Spanish sentence flint appears on a computer screen. This Spanish sentence is input to the MT system and the output is a set o1' English words. In the ideal case, tlle words in the English sentence the translator dictates are contained in lhis set. If one could offer a sort of guarantee that the words o1' any reasonable translation of the Spanish sentence are contained within this set, then incorporating the MT subsystem into a speech recognition system would be relatively straight forward; the vocabulary at any given moment would be restricted to this word set. I1, on the other hand, such a gtlal'atltee CtUlllOt be made then this approach will not work. The evahmtion of the natural hmguage subsystem is designed to test whether reasonable translations are contained within this set of words.</Paragraph> <Paragraph position="1"> The test lnaterial consisted of 10 Spanish newspaper articles. Tim articles were translated into English by two independent translators. The following table shows that roughly 1/3 of the words in tim translations the professional translators produced are not in lhe set ol: words produced by the natural language subsystem (T1 and T2 are the two different English The next experiment augmented the word set constructed by the approach described above with lhe 800 most frequent words in a 2 million word corpus Hf English. The results are illustrated in the lbllowing table.</Paragraph> <Paragraph position="2"> The reason this combined method was tested was that often English open class lexical items are added to the translation. For example in one document, the phrase sohtcionar crMs haitiana is translated as &quot;resolution of Haitian crisis&quot;, and the English of does not have a direct correlate in the Spanish phrase. Wlfile lifts combined method appears to work moderately well, it still does not have sufficient coverage to function as a method for generating the complete recognition vocabulary. That is, it cannot guarantee that the words o1' any reasonable translation of a Spanish sentence would be contained in the set o1' English words generated from that sentence. Since we cannot use an MT system to constrain the recognition vocabulary we evaluated a different method-one that uses topic recognition.</Paragraph> </Section> </Section> class="xml-element"></Paper>