File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1085_intro.xml
Size: 2,301 bytes
Last Modified: 2025-10-06 14:02:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1085"> <Title>Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Corpus </SectionTitle> <Paragraph position="0"> The ICSI Meeting corpus (Janin et al., 2003) is a collection of 75 meetings collected at the International Computer Science Institute (ICSI), one among the growing number of corpora of human-to-human multi-party conversations. These are naturally occurring, regular weekly meetings of various ICSI research teams. Meetings in general run just under an hour each; they have an average of 6.5 participants.</Paragraph> <Paragraph position="1"> These meetings have been labeled with adjacency pairs (AP), which provide information about speaker interaction. They reflect the structure of conversations as paired utterances such as question-answer and offer-acceptance, and their labeling is used in our work to determine who are the addressees in agreements and disagreements. The annotation of the corpus with adjacency pairs is described in (Shriberg et al., 2004; Dhillon et al., 2004).</Paragraph> <Paragraph position="2"> Seven of those meetings were segmented into spurts, defined as periods of speech that have no pauses greater than .5 second, and each spurt was labeled with one of the four categories: agreement, disagreement, backchannel, and other.1 We used spurt segmentation as our unit of analysis instead of sentence segmentation, because our ultimate goal is to build a system that can be fully automated, and in that respect, spurt segmentation is easy to obtain. Backchannels (e.g. &quot;uhhuh&quot; and &quot;okay&quot;) were treated as a separate category, since they are generally used by listeners to indicate they are following along, while not necessarily indicating agreement.</Paragraph> <Paragraph position="3"> The proportion of classes is the following: 11.9% are agreements, 6.8% are disagreements, 23.2% are backchannels, and 58.1% are others. Inter-labeler reliability estimated on 500 spurts with 2 labelers was considered quite acceptable, since the kappa coefficient was .63 (Cohen, 1960).</Paragraph> </Section> class="xml-element"></Paper>