File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2315_evalu.xml
Size: 7,227 bytes
Last Modified: 2025-10-06 13:59:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2315"> <Title>Speech Graffiti habitability: What do users really say?</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> 82% (2987) of the utterances from the cleaned set were fully Speech Graffiti-grammatical. For individual users, grammaticality ranged from 41.1% to 98.6%, with a mean of 80.5% and a median of 87.4%. These averages are quite high, indicating that most users were able to learn and use Speech Graffiti reasonably well.</Paragraph> <Paragraph position="1"> The lowest individual grammaticality scores belonged to four of the six participants who preferred the natural language MovieLine interface to the Speech Graffiti one, which suggests that proficiency with the language is very important for its acceptance. Indeed, we found a moderate, significant correlation between grammaticality and user satisfaction, as shown in Fig. 1 (r = 0.60, p < 0.01). We found no similar correlation for the natural language interface, using a strict definition of grammaticality.</Paragraph> <Paragraph position="2"> Users' grammaticality tended to increase over time.</Paragraph> <Paragraph position="3"> For each subject, we compared the grammaticality of utterances from the first half of their session with that of utterances in the second half. All but four participants increased their grammaticality in the second half of their Speech Graffiti session, with an average relative improvement of 12.4%. A REML analysis showed this difference to be significant, F = 7.54, p < 0.02. Only one of the users who exhibited a decrease in grammaticality over time was from the group that preferred the natural language interface. However, although members of that that group did tend to increase their grammaticality later in their interactions, none of their second-half grammaticality scores were above 80%.</Paragraph> <Paragraph position="4"> Summary by training and system preference. No significant effects on Speech Graffiti-grammaticality were found due to differences in CSE background, programming experience, training supervision or training domain. This last point suggests that it may not be necessary to design in-domain Speech Graffiti tutorials; utterances by type.</Paragraph> <Paragraph position="5"> instead, a single training application could be developed. The six users who preferred the natural language MovieLine generated 45.4% of the ungrammaticalities, further supporting the idea of language proficiency as a major factor for system acceptance.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Deviations from grammar </SectionTitle> <Paragraph position="0"> To help determine how users can be encouraged to speak within the grammar, we analyzed the ways in which they deviated from it in this experiment. We identified 14 general types of deviations from the grammar; Fig. 2 shows the distribution of each type. Four trivial deviation types (lighter bars in Fig. 2) that resulted from unintentional holes in our grammar coverage comprised about 20% of the ungrammaticalities.</Paragraph> <Paragraph position="1"> When these trivial deviations are counted as grammatical, mean grammaticality rises to 85.5% and the median to 91.3%. However, we have not removed the trivial ungrammaticalities from our overall analysis since they are likely to have resulted in errors that may have affected user satisfaction. Each of the ten other deviation types is discussed in further detail in the sections below.</Paragraph> <Paragraph position="2"> General natural language syntax, 20.6%: Speech Graffiti requires input to have a slot is value phrase syntax for specifying and querying information. The most common type of deviation in the Speech Graffiti utterances involved a natural language (NL) deviation from this standard phrase syntax. For example, a correctly constructed Speech Graffiti query to find movie times at a theater might be theater is Galleria, title is Sweet Home Alabama, what are the show times? For errors in this category, users would instead make more NL-style queries, like when is Austin Powers playing at Showcase West? Slot only, 14.6%: In these cases, users stated a slot name without an accompanying value or query words.</Paragraph> <Paragraph position="3"> For example, a user might attempt to ask about a slot without using what, as in title is Abandon, show times. In about a third of slot-only instances, the ungrammatical input appeared to be an artifact of the options function, which lists slots that users can talk about at any given point; users would just repeat back a slot name without adding a value, confirming Brennan's (1996) findings of lexical entrainment.</Paragraph> <Paragraph position="4"> Out-of-vocabulary word, 14.0%: These were often movie titles that were not included in the database or synonyms for Speech Graffiti-grammatical concepts (e.g. category instead of genre).</Paragraph> <Paragraph position="5"> Keyword problem, 8.1%: Participants used a key-word that was not part of the system (e.g. clear) or they used an existing keyword incorrectly.</Paragraph> <Paragraph position="6"> Value only, 6.7%: Users specified a value (e.g.</Paragraph> <Paragraph position="7"> comedy) without an accompanying slot name.</Paragraph> <Paragraph position="8"> Slot-value mismatch, 5.1%: Users paired slots and values that did not belong together. This often occurred when participants were working on tasks involving locating movies in a certain neighborhood. For instance, instead of stating area is Monroeville, users would say theater is Monroeville. Since the input is actually in the correct slot is value format, this type of ungrammaticality could perhaps be considered more of a disfluency than a true habitability problem.</Paragraph> <Paragraph position="9"> Disfluency, 4.3%: This category includes utterances where the parser failed because of disfluent speech, usually repeated words. 81% of the utterances in this category were indeed grammatical when stripped of their disfluencies, but we prefer to leave this category as a component of the non-trivial deviations in order to account for the unpredictable disfluencies that will always occur in interactions.</Paragraph> <Paragraph position="10"> More syntax, 4.0%: This is are a special case of a keyword problem in which participants misused the keyword more by pairing it with a slot name (e.g. theater, more) rather than using it to navigate through a list.</Paragraph> <Paragraph position="11"> Time syntax, 1.3%: In this special case of natural language syntax ungrammaticality, users created time queries that were initially well-formed but which had time modifiers appended to the end, as in what are show times after seven o'clock? Value + options, 1.1%: In grammatical usage, the keyword options can be used either independently (to get a list of all available slots) or paired with a slot (to get a list of all appropriate values for that slot). In a few cases, users instead used options with a value, as in Squirrel Hill options.</Paragraph> </Section> </Section> class="xml-element"></Paper>