File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-1045_concl.xml
Size: 4,395 bytes
Last Modified: 2025-10-06 13:55:06
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1045"> <Title>Data-driven Generation of Emphatic Facial Displays</Title> <Section position="9" start_page="358" end_page="359" type="concl"> <SectionTitle> 8 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> In this paper, we have demonstrated that there are patternsinthefacialdisplaysthatthisspeakerused when giving different types of object descriptions in the COMIC system. The findings from the corpus analysis are compatible with previous findings on emphatic facial displays in general, and also provide a fine-grained analysis of the individual displays used by this speaker. Basing the recording scripts on the output of the presentation planner allowed full contextual information to be included in the annotated corpus; indeed, all of the contextual factors were found to influence the speaker's use of facial displays. We have also shown that a generation system that captures and reproduces the corpus patterns for a synthetic head can produce successful output. The results oftheevaluationalsodemonstratethatfemalesubjects are more receptive than male subjects to variation in facial displays; in combination with other related results, this indicates that expressive conversational agents are more likely to be successful with female users, regardless of the gender of the agent. Finally, we have shown the potential drawbacks of using a corpus to evaluate the output of a generation system.</Paragraph> <Paragraph position="1"> There are three directions in which the work describedherecanbeextended: improvedcorpusannotation, moresophisticatedimplementations, and further evaluations. First, the annotation on the corpus that was used here was done by a single annotator, inthecontextofaspecificgenerationtask.</Paragraph> <Paragraph position="2"> The findings from the corpus analysis generally agree with those of previous studies (e.g., the predicted pitch accent was correlated with nodding and eyebrow raises), and the corpus as it stands has proved useful for the task for which it was created. However, to get a more definitive picture of thepatternsinthecorpus,itshouldbere-annotated by multiple coders, and the inter-annotator agreement should be assessed. Possible extensions to the annotation scheme include timing information for the words and facial displays, and actual--as opposed to predicted--prosodic contours.</Paragraph> <Paragraph position="3"> In the implementation described here, we built simple models based directly on the corpus counts and used them to select facial displays to accom- null pany previously-generated text; both of these aspects of the implementation can be extended in future. If we build more sophisticated n-gram-based models of the facial displays, using a full language-modelling toolkit, we can take into account contextual information from words other than those in a single segment, and back off smoothly through different amounts of context.</Paragraph> <Paragraph position="4"> Such models can also be integrated directly into the OpenCCG surface realiser (White, 2005)-which is already used as part of the COMIC output-generation process, and which uses n-grams to guide its search for a good realisation.</Paragraph> <Paragraph position="5"> This will allow the system to choose the text and facial displays in parallel rather than sequentially.</Paragraph> <Paragraph position="6"> Such an integrated implementation has a better chance at capturing the complex interactions between the two output channels.</Paragraph> <Paragraph position="7"> Future evaluations should address several questions. First, we should gather users' opinions of the behaviours annotated in the corpus: it may be that subjects actually prefer the generated facial displaystothedisplaysinthecorpus, aswasfound by Belz and Reiter (2006). As well, further studiesshouldlookinmoredetailattheexactnatureof null the gender effect on user preferences, for instance by systematically varying the motion on different dimensions individually to see exactly which types of facial displays are liked and disliked by different demographic groups. Finally, if the extended n-gram-based model mentioned above is implemented, its performance should be measured andcomparedtothatofthemodelsdescribedhere, through both cross-validation and user studies.</Paragraph> </Section> class="xml-element"></Paper>