File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-1040_intro.xml
Size: 2,934 bytes
Last Modified: 2025-10-06 14:04:39
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1040"> <Title>Robust parsing of severely corrupted spoken utterances</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The problem addressed by this paper is how to make a speech understanding system deal wlth sentences for which some types of words are not recognized.</Paragraph> <Paragraph position="1"> The continuous speech understanding system under development at CSELT laboratories \[Fissore 88\] is part of a question-answerlng system allowing to extract information from a data base using voice messages with high syntactic freedom. The system is composed of a recognition stage \[Laface 87\] followed in cascade by an understanding stage. The recognition stage analyzes speech using acoustic-phonetic knowledge. Since utterances are spoken without pauses between words, it is not possible to uniquely locate words without using syntactic and semantic constraints. Thus the actual output of the recognition stage is a set:of word hypotheses, usually called lattice in the literature. A word hypothesis is characterized by its begin and end times, corresponding to the portion of the utterance in which it has been located, by a score representing its belief degree, and by the lexeme itself. The understanding stage has the task of analyzing the word lattice using linguistic knowledge and producing a representation of the meaning of the most likely consistent word sequence.</Paragraph> <Paragraph position="2"> A two-stage approach to speech understanding offers several advantages and is the most widely followed in the current research. A serious difficulty, however, lles in the fact that often some short words that were actually uttered are not detected by the recognltion level and hence they are missing from the lattice. To cope with this problem the understandlng stage must adopt a language representation and a parsing strategy which 1) whenever possible, do not rely on such words to understand a sentence, and 2) keep parsing efficiency comparable with the case in which no word is missing. This paper describes a technique for obtaining such results. The following is divided into four sections. The next one focuses on the various implications of word undetectlon on the linguistic processing. Then the linguistic knowledge bases of the understanding system and the parsing strategy are outlined (assuming that all words are present in the lattice). Next the technique for coping with missing words is introduced. Finally, experimental results are discussed, showing that the proposed technique permits to greatly increase the quota of corrupted sentences correctly understandable without sensibly decreasing parsing efficiency. A discussion is also provided relating our results to other works addressing slmilar problems.</Paragraph> </Section> class="xml-element"></Paper>