File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/89/h89-2004_abstr.xml
Size: 6,635 bytes
Last Modified: 2025-10-06 13:46:46
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-2004"> <Title>Distinguishing Questions by Contour Speech Recognition Tasks</Title> <Section position="2" start_page="22" end_page="25" type="abstr"> <SectionTitle> 2 Can We Use Intonational Information to Aid </SectionTitle> <Paragraph position="0"> Speech Recognition? Interest in using higher-level intonational information such as pitch contour, intonational phrasing, and pitch accent placement to aid speech recognition has been intermittent.\[Lea79, Pie83, Wai88\] Progress in this area has been hindered by a) the difficulty of extracting higher level intonational characteristics automatically with any reliability; b) the lack of representations of the features to be extracted such that information can be incorporated into the recognition process; and c) an imperfect understanding of the particular constraints syntax, semantics and discourse features impose on a speaker's choice of intonational features. Thus, practical problems of feature detection have gone hand in hand with more theoretical issues of representation and interpretation. However, there has been some progress in developing algorithms to extract and identify at least partial information about higher-level intonational features, such as differentiation of stressed and unstressed syllables and distinction of rising from falling contours.</Paragraph> <Paragraph position="1"> At this stage, it does seem likely, that particular recognition tasks and particular domains will find some higher-level intonational cues more useful than others. For testing the utility of predicting the syntactic 'type' of an utterance from its intonational contour, for example, domains in which there are broad classes of utterances which can be reliably partitioned according to both intonational and syntactic category appear promising. Database query tasks, for example, where there is a reasonable balance between inverted yes-no questions 2, which are commonly uttered with final rising intonation, and wh-questions 3, or imperatives 4, which are both commonly uttered with final fall -- and in which there is relatively little likelihood of speech act ambiguity, seem well-suited to such an experiment. The DARPA Resource Management (RM) task thus seemed a good place to look for such distinctions.</Paragraph> <Paragraph position="2"> In domains such as this, we might expect that distinguishing likely yes-no questions from other sentences might be a useful augmentation for traditional recognition methodologies, acting as a filter on matches proposed by the recognizer or even providing an initial state in a regular grammar partitioned by broad syntactic 'type'. 5 The utility of adding such information is supported by certain classes of recognition errors, such as those illustrated in (2). 6 These errors represent instances in which the ability to distinguish yes-no questions intonationally from wh-questions, imperatives, and other sentence 'types' typically uttered with falling intonation might serve as an aid to recognition (In each case, the (a) sentences represents the test sentence and the (b) sentence represents the recognizer's hypothesis.): (2) a. REF: IS kennedy+s arrival hour in pearl harbor AFTER ** fifteen hundred hours HYP: GIVE kennedy+s arrival hour in pearl harbor HAVE TO fifteen hundred hours b. REF: WHAT IS the total fuel aboard THE mars HYP: WAS ** the total fuel aboard *** mars c. REF: IS shasta within six kilometers of thirteen north forty east HYP: THE shasta within six kilometers of thirteen north forty east d. REF: WHEN+LL enterprise next be in home PORT HYP: WILL enterprise next be in home PORTS e. REF: *** FIND speeds available for england and fox HYP: ARE THE speeds available for england and fox That is, the test sentence represents a sentence type likely to be uttered with an hypothesized by the recognizer. Among these errors, distinguishing between 'when'll' and 'will' and between 'what is' and 'was' would appear to be particularly difficult tasks for a recognizer on acoustic grounds. In fact, about 8% of sentence errors made in this test were due at least in part to one of these two confusions. Table 1 shows all sentence errors in the test run in which yes-no questions were confused with wh-questions, imperatives, or declarative sentences. 7 (Column 2 shows the category of the actual utterance; column 3 show the category of the utterance recognized (yes-no question (ynq),wh-question (whq) or imperative (imp)); and column 4 show the lexical items confused.) 104 ynq decl is ::C/, the 241 whq ynq what is ~ was 242 whq ynq when'll ~ will 247 imp ynq find ~ are 267 whq ynq what is =:*. was 272 whq ynq what is =*- was 287 whq ynq what is ~ was 292 whq ynq what is ~ was rotal sentences incorrect: 128 total sentence type errors: 16 Of the 16 errors which type of contour might have been able to prevent - on the assumption that yes-no questions should have been produced with ZNote of course that some of the mistaken hypotheses were not in fact grammatical, such as c) and (2d) above, so the assignment of sentence 'type' was based upon possible completions the longest initial grammatical string. So, 'WAS ** the date and hour of arrival in port r arkansas ' was considered structurally a yes-no question.</Paragraph> <Paragraph position="3"> rising intonation and other utterance types with falling intonation -- 15 of the misrecognized utterances in fact were spoken with the 'likely' contour for their syntactic type. That is, in fifteen cases a yes-no question uttered with rising intonation was misrecognized as a syntactic type (wh-question or imperative) which would have been unlikely to have been uttered with rising intonation or a non-yes-no question uttered with falling intonation was misrecognized as a yes-no question.</Paragraph> <Paragraph position="4"> However, while these errors might thus have been filtered by this simple association between contour and sentence type, it is not at all clear how well this solution might generalize even to other sentences within the same domain. While yes-no questions are typically uttered with rising intonation in natural speech -- and wh-questions and imperatives commonly uttered with utterance-final fall, it is not clear whether such distinctions appear with the same likelihood in sentences read in isolation, the data which most recognizers train and test upon. To investigate the possibility then of predicting structural distinctions from intonational ones, it is useful to examine the prosody of the training and test data itself.</Paragraph> </Section> class="xml-element"></Paper>