File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0404_intro.xml

Size: 3,461 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0404">
  <Title>Correct parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation</Title>
  <Section position="3" start_page="0" end_page="24" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In continuous speech recognition, N-grams have been widely used as effective linguistic constraints for spontaneous speech \[1\]. To reduce the search effort, N of a high-order can be quite powerful; but making the large corpus necessary to calculate a reliable high-order N is unrealistic. For a realistic linguistic constraint, almost all speech recognition systems use a low-order N-gram, like a bi-gram or tri-gram, which can be constrainted only to the local parts. However this is one of the reasons why many mis-recognized sentences using N-grams are strange on long parts spanning over N words. During *Now working at Toyo Information Systems Co., Ltd the recognition process, several candidates have to be pruned if the beam width is too small, and the pruning cannot but use only those local parts already recognized. Even if we could get a large enough corpus to train a high-order N-gram, it would be impossible to determine the best recognition candidate in consideration of the whole sentence. To put a speech dialogue system or a speech translation system into practical use, it is necessary to develop a mechanism that can parse the misrecognized results using global linguistic constraints.</Paragraph>
    <Paragraph position="1"> Several methods have already been proposed to parse ill-formed sentences or phrases using global linguistic constraints based on a contextfree-grammar (CFG) framework, and their effectiveness against some misrecognized speech sentences have been confirmed \[2, 3\]. Also these parsings are used for translation ( see for example the use of the GLR parser in Janus\[4\] ). In these studies, even if the parsing was unsuccessful for erroneous parts, the parsing could be continued by deleting or recovering the erroneous parts. The parsing was done on the assumption that every input sentence is well-formed after all erroneous parts are recovered. In reality, however spontaneous speech contains a lot of ill-formed sentences and it is difficult to analyze every spontaneous sentence by the CFG framework. Concerning the CFG framework, syntactic rules written by subtrees are proposed \[5\]. Even if a whole sentence can not be analyzed by CFG, the sentence can be expressed by combining several subtrees. The subtrees are effective in parsing spontaneous speech parts. Still, because the subtrees can deal only with local parts like in N-gram modeling basically, parsing is not sufficient for parsing misrecognized sentences. Furthermore, the subtrees are not sufficient in extracting suitable meaningful candidate structures, because that these linguistic constraints are based on the grammatical constraint without semantics. null  To parse misrecognized sentences of spontaneous speech, we propose a correct parts extraction (CPE) method that uses global linguistic and semantic c0nstraints by an example-based approach.</Paragraph>
    <Paragraph position="2"> In the next section, we describe the CPE method. In the following section, we show evaluation results of CPE applied to Japanese-to-English speech translation experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML