XML Viewer - w02-0707

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0707_intro.xml
Size: 6,322 bytes
Last Modified: 2025-10-06 14:01:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0707">
  <Title>Koji TOCHINAI Graduate school of Business Administration</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Speech is the most common means of communication for us because the information contained in</Paragraph>
    <Paragraph position="2"> speech is suf cient to play a fundamental role in conversation. Thus, it is much better that the processing deals with speech directly. However, conventional approaches of speech translation need a text result, obtained by speech recognition, for machine translation although several errors or unrecognized portions may be included in the result.</Paragraph>
    <Paragraph position="3"> A text is translated through morphological analysis, syntactic analysis, and parsing of the sentence of the target language. Finally, the speech synthesis stage produces speech output of the target language.</Paragraph>
    <Paragraph position="4"> Figure 1(A) shows the whole procedure of a traditional speech translation approach.</Paragraph>
    <Paragraph position="5"> The procedure has several complicated processes that do not give satisfying results. Therefore, the lack of accuracy in each stage culminates into a poor nal result. For example, character strings obtained by speech recognition may represent different infor-Association for Computational Linguistics.</Paragraph>
    <Paragraph position="6"> Algorithms and Systems, Philadelphia, July 2002, pp. 45-52. Proceedings of the Workshop on Speech-to-Speech Translation:</Paragraph>
    <Paragraph position="8"> mation than the original speech.</Paragraph>
    <Paragraph position="9"> Murakami et al.(1997) attempted to recognize several vowels and consonants using Neural Networks that had different structures with TDNN (ATR Lab., 1995), however, they could not obtain a high accuracy of recognition. They con rmed that distinguishing the boundaries of words, syllables, or phonemes is a task of great dif culty. Then, they only focused on speech waveform itself, not character strings obtained by speech recognition to realize speech translation. Murakami et.al decided on dealing with the correspondence of acoustic characteristics of speech waveform instead of character strings between two utterances.</Paragraph>
    <Paragraph position="10"> Our approach handles the acoustic characteristics of speech without lexical expression through a much simpler structure than the reports of Takizawa et al.(1998) , Mcurrency1uller et al.(1999) or Lavie et al.(1997) because we believe that simpli cation of the system would prevent inaccuracies in the translation. Figure 1(B) shows the processing stages of our approach. If speech translation can be realized by analyzing the correspondence in character strings obtained by speech recognition, we can also build up speech translation by dealing with the correspondence in acoustic characteristics. In our method, we extract acoustic common parts and different parts by comparing two examples of acoustic characteristics of speech between two translation pairs within the same language. Then we generate translation rules and register them in a translation dictionary.</Paragraph>
    <Paragraph position="11"> The rules also have the location information of acquired parts for speech synthesis on time-domain.</Paragraph>
    <Paragraph position="12"> The translation rules are acquired not only by comparing speech utterances but also using the Inductive Learning Method (K. Araki et al., 2001), still keeping acoustic information within the rules. Deciding the correspondence of meaning between two languages is a unique condition to realize our method.</Paragraph>
    <Paragraph position="13"> In a translation phase, when an unknown utterance of a source language is applied to be translated, the system compares this sentence with all acoustic information of all rules within the source language.</Paragraph>
    <Paragraph position="14"> Then several matched rules are utilized and referred to their corresponding parts of the target language.</Paragraph>
    <Paragraph position="15"> Finally, we obtain roughly synthesized target speech by simply concatenating several suitable parts of rules in the target language according to the information of location. Figure 2 shows an overview of the processing structure of our method.</Paragraph>
    <Paragraph position="16"> Our method has several advantages over other approaches. First, the performance of the translation is not affected by the lack of accuracy in speech recognition because we do not need the segmentation of speech into words, syllables, or phonemes. Therefore, our method can be applied for all languages without having to make processing changes in the machine translation stage because there is no processing dependent on any speci c language. With conventional methods, several processes in the machine translation stage must be altered if the target language is to be changed because morphological analysis and syntactic analysis are dependent on each individual character of language completely.</Paragraph>
    <Paragraph position="17"> Any difference in language has no affect on the ability of the proposed method, fundamentally because we focus on the acoustic characteristics of speech, not on the character strings of languages.</Paragraph>
    <Paragraph position="18"> It is very important to approach speech translation with a new methodology that is independent of individual characters of any language.</Paragraph>
    <Paragraph position="19"> We also expect our approach can be utilized in speech recuperation systems for people with a speech impediment because our method is able to deal with various types of speech that is not able to be treated by conventional speech recognition systems for normal voice.</Paragraph>
    <Paragraph position="20"> Murakami et al.(2002) have successfully obtained several samples of translation by applying our method using local recorded speech data and spontaneous conversation speech.</Paragraph>
    <Paragraph position="21"> In this paper, we adopt speech data of travel conversations to the proposed method. We evaluate the performance of the method through experiments and offer discussion on behaviors of the system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML