File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-0404_evalu.xml
Size: 7,149 bytes
Last Modified: 2025-10-06 14:00:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0404"> <Title>Correct parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation</Title> <Section position="6" start_page="27" end_page="29" type="evalu"> <SectionTitle> 4 Discussions </SectionTitle> <Paragraph position="0"> Some deletion errors of function words are solved by TDMT even without CPE. This is because the translation trains a lot of the spontaneous speech in which identical function words had been deleted. On the other hand, CPE is effective for many erroneous sentences. Important misrecognition characteristics effectively handled by CPE are as follows: (a) Some insertion errors between words (b) Errors at the tail parts of sentences (c) Strange expressions including over N words (d) Expressions not similar to examples (e) Input too complicated to parse (but not errors) null In contrast, characteristics not effectively handled by CPE are as follows: (f) Errors of final parts causing ambiguity, e.g, of a person, of a situation, whether a sentence is negative or positive, or whether a sentence is interrogative or affirmative. In these cases, the translation results are incorrect even if CPE is used.</Paragraph> <Paragraph position="1"> Table 3 - Table 7 show examples for each of the characteristics. The top sentence of each table is the input sentence and the second sentence is the recognition result; the final word sequences are only parts extracted from the recognition results. All of the words are Japanese words expressed in R.oman characters and the words or sentences in brackets are the translated English equivalents.</Paragraph> <Section position="1" start_page="27" end_page="28" type="sub_section"> <SectionTitle> 4.1 Insertion errors </SectionTitle> <Paragraph position="0"> Filled-pauses, e.g., &quot;umm&quot; or &quot;well&quot;, are often spoken in spontaneous speech. Many speech recognition systems deal with filled-pauses as recognized words. Many Japanese filled-pauses consist of only one phoneme, e.g., &quot;e&quot;,&quot;q&quot;, or &quot;'n&quot;. and it is easy for mismatches to parts of other words to occur. Furthermore, filled-pauses have no strong relations to any words and it is difficult to constrain them with an N-gram framework. These are the reasons why insertion errors of filled-pauses are often found in misrecognized results.</Paragraph> <Paragraph position="1"> Table 3 is an example of insertion errors by filled-pauses. For this example, a structure analysis for the whole sentence failed. However, the parts before and after the filled-pauses, /deNwa(telephone) baNgou(number) wa/ and /go(five) ni(two) nana(seven)/ could be extracted as correct parts. The two words /kyuu(nine)/ and /desu(is)/ could not be extracted because the part /kyuu desu/ included only two words.</Paragraph> </Section> <Section position="2" start_page="28" end_page="28" type="sub_section"> <SectionTitle> 4.2 Errors at the tail parts of sen- </SectionTitle> <Paragraph position="0"> tences For an indirect expression or an honorific expression, several function words are often spoken successively at the final part of the sentence. Mis-recognition often occurs at this part. When the words necessary for understanding an utterance have been spoken before the final part, it is possible to perform translation to an understandable sentence by extracting only the beginning parts. Table 4 shows an example of an error occurring at a final part/N desu keredomo/. The part /N desu keredomo/ is part of an honorific expression and all of the words in this part are function words. The proposed extraction selects only the beginning part/heya no yoyaku wo onegai sitai(would like to reserve a room)/. The translation result is a little strange but it can be understood and almost has the correct meaning.</Paragraph> <Paragraph position="1"> Actually, only /I/ could not be translated because the misrecognized part/N desu keredomo/ included a keyword to determine the person.</Paragraph> </Section> <Section position="3" start_page="28" end_page="28" type="sub_section"> <SectionTitle> 4.3 Strange expression consisting of </SectionTitle> <Paragraph position="0"> over N words Table 5 shows an example of a strange expression consisting of over N words. In this example, every word pair is not strange because all of them have already been constrained by bi-gram modeling. But the expression consisting of three words i.e.,/oyako(parent and child) no gokibou(preference)/ is strange. The part /oyako no/can be said to be an erroneous part because it can be connected to other parts and consists only of two words.</Paragraph> </Section> <Section position="4" start_page="28" end_page="29" type="sub_section"> <SectionTitle> 4.4 Expressions not similar to exam- </SectionTitle> <Paragraph position="0"> ples The important merit of the example-based approach is that any structural ambiguity or semantic ambiguity can be reduced in consideration of the similarity to examples. The recognition result shown in Table 6 was misrecognized in the part /ii(am)/ to/i(stay)/. But the mis-recognized result/Suzuki Naoko to i masu (I am staying with Suzuki Naoko)/ is very natural in general. It seems therefore that CFG can parse an erroneous sentence without any problem and the sentence can be understood although with a different meaning. ( /I am staying with Suzuki Naoko/which is different from the correct meaning /I am Suzuki Naoko/ ). However, this is rare for a travel arrangement corpus and the semantic distance value of the whole sentence is over the threshold. As a result of CPE, only /Suzuki Naoko/can be extracted and translated to/Naoko Suzuki/.</Paragraph> </Section> <Section position="5" start_page="29" end_page="29" type="sub_section"> <SectionTitle> 4.5 An utterance including several </SectionTitle> <Paragraph position="0"> sentences Even if a recbgnition result is correct, when one utterance includes several sentences, TDMT without CPE sometimes fails because the boundarv of the sentences can not be understood, for example./waka ri masi ta (\[ see). doumo arigatou (Thank you)/. Though the translation fails without CPE, CPE can extract each sentence one by one and the translation result after CPE is correct.</Paragraph> </Section> <Section position="6" start_page="29" end_page="29" type="sub_section"> <SectionTitle> 4.6 Expression of bad effect by CPE </SectionTitle> <Paragraph position="0"> The keywords for determining whether a sentence is negative or positive, or whether a sentence is interrogative or affirmative, are often spoken at the final part of the sentence. When these keywords are misrecognized, the translation result is quite different from the correct translation result. The input sentence in Table 7 is a negative sentence. The keyword determining the sentence to be negative is/naku/, but is misrecognized. As a result of the translation after CPE, a positive sentence is translated and the meaning is opposite to the intended meaning.</Paragraph> </Section> </Section> class="xml-element"></Paper>