File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/03/w03-0701_ackno.xml
Size: 1,422 bytes
Last Modified: 2025-10-06 13:50:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0701"> <Title>Combining Semantic and Temporal Constraints for Multimodal Integration in Conversation Systems</Title> <Section position="4" start_page="1" end_page="1" type="ackno"> <SectionTitle> 3 Discussion </SectionTitle> <Paragraph position="0"> During the study, we collected 156 inputs. The system assigned time stamps to each recognized word in the utterance, and each gesture. Figure 3 shows an example of an input that consisted of two gesture inputs and a speech utterance &quot;compare this house with this house&quot;. The first two lines represent two gestures. Each line gives information about when the gesture started and ended, as well as the selected objects with their probabilities.</Paragraph> <Paragraph position="1"> These data provided us information on how the speech and gesture were aligned (to the accuracy of milliseconds). These data will help us further validate the temporal compatibility function used in the matching process.</Paragraph> <Paragraph position="2"> We described an approach that uses graph matching algorithm to combine semantic and temporal constraints for reference resolution. The study showed that this approach worked quite well (93% accuracy) when the referring expressions were correctly recognized by the ASR. In the future, we plan to incorporate spatial constraints.</Paragraph> </Section> class="xml-element"></Paper>