File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1041_evalu.xml
Size: 10,504 bytes
Last Modified: 2025-10-06 14:00:13
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1041"> <Title>PREDICTING AND MANAGING SPOKEN DISFLUENCIES DURING HUMAN-COMPUTER INTERACTION*</Title> <Section position="5" start_page="222" end_page="224" type="evalu"> <SectionTitle> 2.2. Results </SectionTitle> <Paragraph position="0"> Figure 1 summarizes the percentage of all spoken and written distiuencies representing different categories during communication of verbal-temporal content (i.e., studies 1 and 2). However, when people communicated digits (i.e., study 3), disfluencies representing the diiferent categories were distributed differently. Filled pauses dropped from 46% to 15.5% of all observed disfluencies. In contrast, content corrections of digits increased from 25% to 34%, repetitions increased from 21% to 31.5%, and false staxts increased from 8% to 19% of all disfluencies. This drop in frilled pauses and increase in other types of disfluency is niost likely related to the much briefer utterance lengths observed during the computational-numeric tasks. CCleaxly, the relative distribution of different types of disfluency fluctuates with the content and structure of the information presented.</Paragraph> <Paragraph position="1"> The overall baseline rate of spontaneous disfluencies and self-corrections was 1.33 per 100 words in the verbal-ten~poral simulations, or a total of 1.51 disfluencies per task set. The rate per condition ranged from an average of 0.78 per 100 words when speaking to a form, 1.17 when writing to a form, 1.61 during unconstrained writing, and a high of 1.74 during unconstrained speech. Figure 2 illustrates this rate of disfluencies as ~ function of mode and format.</Paragraph> <Paragraph position="2"> Wilcoxon Signed Ranks tests revealed no significant modality difference in the rate of disfluent input, which averaged 1.26 per 100 words for speech and 1.39 for writing, T+ = 75 (N = 17), z < 1. However, the rate of disfluencies was 1.68 per 100 words in the unconstrained format, in comparison with a reduced .98 per 100 words during form-based interactions. Followup analyses revealed no significant difference in the disfluency rate between formats when people wrote, T+ = 64.5 (N = 14), p > .20. However, significantly increased disfluencies were evident in the unconstrained format compared to the form-based one when people spoke, T+ = 88 (N = 14), p < .015, one-tailed. This significant elevation was replicated for unconstrained speech that occurred during the free choice condition, 7% = 87 (N -- 14), p < .015, one-tailed, which simulated a multimodal spoken exchange rather than a unimodal one.</Paragraph> <Paragraph position="3"> A very similax pattern of disfluency rates per condition emerged when people communicated digits. In study 3, the baseline rate of spontaneous disfluencies averaged 1.37 per 100 words, with 0.87 when speaking to a form, 1.10 when writing to a. form, 1.42 during unconstrained writing, and a high of 1.87 during unconstrained speech. Likewise, Wilcoxon Signed Ranks tests revealed no significant dit~erence in the disfluency rate between formats when people wrote, T-t- = 36.5 (N = 11), p > .20, although significantly increased disfluencies again were apparent in the unconstrained format compared to the form-based one when people spoke, T+ = words as a function ofsmacmre in presentation format.</Paragraph> <Paragraph position="4"> 77 (N = 13), p < .015, one-tailed.</Paragraph> <Paragraph position="5"> For studies 1 and 2, disfluency rates were examined further for specific utterances that were graduated in length from 1 to 18 words. I First, these analyses indicated that the average rate of disfluencies per 100 words increased as a function of utterance length for spoken disfluencies, although not for written ones. When the rate of spoken disfluencies was compared for short (I-6 words), medium (7-12 words), and long utterances (13-18 words), it increased from 0.66, to 2.14, to 3.80 disfluencies per 100 words, respectively. Statistical comparisons confirmed that these rates represented significant increases from short to medium sentences, t = 3.09 (dr = 10), p < .006, one-tailed, and also from medium to long ones, t = 2.06 (dr = 8), p < .04, one-tailed.</Paragraph> <Paragraph position="6"> A regression analysis indicated that the strength of predictive association between utterance length and disfluency rate was P~C/T = .77 (N = 16). That is, 77% of the variance in the rate of spoken disfluencies was predictable simply by knowing an utterance's specific length. The following simple linear model, illustrated in the scatterplot in Figure 3, summarizes this relation: l'~j = #Y-I- 13Y.x (X# -/.iX) -I- eij, with a Y-axis constant coefficient of-0.32, and a.u X-axis beta coefficient representing utterance length of +0.26. These data indicate that the demands associated with planning and generating longer constructions lead to substantial elevations in the rate of disfluent speech.</Paragraph> <Paragraph position="7"> To assess whether presentation format had an additional influence on spoken disfluency rates beyond that of utterance length, comparisons were made of disfluency rates occur- null matched for length. These analyses revealed that the rate of spoken disfluencies also was significantly higher in the unconstrained format than in form-based speech, even with utterance length controlled, t (paired) -- 2.42 (df = 5), p < .03, one-tailed. That is, independent of utterance length, lack of structure in the presentation format also was associated with elevated disfluency rates.</Paragraph> <Paragraph position="8"> From a pragmatic viewpoint, it also is informative to compare the total number of disfluencies that would require processing during an application. Different design alternatives can be compared with respect to effective reduction of total disfluencies, which then would require neither processing nor repair. In studies 1 and 2, a comparison of the total number of spoken disfiuencies revealed that people averaged 3.33 per task set when using the unconstrained format, which reduced to an average of 1.00 per task set when speaking to a form. That is, 70% of all disfluencies were eliminated by using a more structured form. Likewise, in study 3, the average number of disfluencies per subject per task set dropped from 1.75 in the unconstrained format to 0.72 in the structured one. In this simulation, a more structured presentation format successfully eliminated 59% of people's disfluencies as they spoke digits, in comparison with the same people completing the same tasks via an unconstrained format.</Paragraph> <Paragraph position="9"> During post-experimental interviews, people reported their preference to interact with the two different presentation formats. Results indicated that approximately two-thirds of the subjects preferred using the more structured format. This 2-to-1 preference for the structured format replicated across both the verbal and numeric simulations.</Paragraph> </Section> <Section position="6" start_page="224" end_page="224" type="evalu"> <SectionTitle> 3. EXPERIMENTS ON HUMAN-HUMAN SPEECH </SectionTitle> <Paragraph position="0"> This section reports on data that were analyzed to explore the degree of variability in disfluency rates among different types of human-human and human-computer spoken interaction, and to determine whether these two classes differ systematically. null</Paragraph> <Section position="1" start_page="224" end_page="224" type="sub_section"> <SectionTitle> 3.1. Method </SectionTitle> <Paragraph position="0"> Data originally collected by the author and colleagues during two previous studies were reanalyzed to provide comparative information on human-human disfluency rates for the present research \[1, 6, 7\]. One study focused on telephone speech, providing data on both: (1) two-person telephone conversations, and (2) three-person interpreted telephone conversations, with a professional telephone interpreter intermedinting. Methodological details of this study are provided elsewhere \[7\]. Essentially, within-subject data were collected from 12 native speakers while they participated in task-oriented dialogues about conference registration and travel arrangements. In the second study, also outlined elsewhere \[1, 6\], speech data were collected on task-oriented dialogues conducted in each of five different communication modalities.</Paragraph> <Paragraph position="1"> For the present comparison, data from two of these modalities were reanalyzed: (1) two-party face-to-face dialogues, and (2) single-party monologues into an audiotape machine.</Paragraph> <Paragraph position="2"> A between-subject design was used, in which 10 subjects described how to assemble a water pump. All four types of speech were reanalyzed from tape-recordings for the same categories of disfluency and self-correction as those coded during the simulation studies, and a rate of spoken disfluencies per 100 words was calculated.</Paragraph> </Section> <Section position="2" start_page="224" end_page="224" type="sub_section"> <SectionTitle> 3.2. Comparative Results </SectionTitle> <Paragraph position="0"> Table 1 summarizes the average speech disfluency rates for the four types of human-human and two types of human-computer interaction that were studied. Disfluency rates for each of the two types of human-computer speech are listed in Table 1 for verbal-temporal and computational-numeric content, respectively, and are corrected for number of syllables per word. All samples of human-human speech reflected substantially higher disfluency rates than human-computer speech, with the average rates for these categories confirmed to be significantly different, t = 5.59 (df = 38), p < .0001, one-tailed. Comparison of the average disfluency rate for human-computer speech with human monologues, the least discrepant of the human-human categories, also replicated this difference, t = 2.65 (df = 21), p < .008, one-tailed. The magnitude of this disparity ranged from 2-to-ll-times higher disfluency rates for human-human as opposed ;to human-computer speech, depending on the categories compared.</Paragraph> <Paragraph position="1"> Further analyses indicated that the average disfluency rate was significantly higher during telephone speech than the other categories of human-human speech, t = 2.12 (df = 20), p < .05, two-tailed.</Paragraph> </Section> </Section> class="xml-element"></Paper>