File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2012_evalu.xml

Size: 2,511 bytes

Last Modified: 2025-10-06 13:58:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2012">
  <Title>DETECTION OF AGREEMENT vs. DISAGREEMENT IN MEETINGS: TRAINING WITH UNLABELED DATA</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
3 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> Hand-labeled data from one meeting is held out for test data, and the hand-labeled subset of three other meetings are used for training decision trees. Unlabeled spurts taken from six meetings, different from the test meeting, are used for unsupervised training. Performance is measured in terms of overall 3-way classification accuracy, merging the backchannel and agreement classes. The overall accuracy results can be compared to the &amp;quot;chance&amp;quot; rate of 50%, since testing is on 4-way upsampled data.</Paragraph>
    <Paragraph position="1"> In addition, we report the confusion rate between agreements and disagreements and their recovery (recall) rate, since these two classes are most important for our application. null Results are presented in Table 1 for models using only word-based cues. The simple keyword indicators used in a decision tree give the best performance on hand-transcribed speech, but performance degrades dramatically on ASR output (with WER &gt; 45%). For all other training conditions, the degradation in performance for the system based on ASR transcripts is not as large, though still significant. The system using unsupervised training clearly outperforms the system trained only on a small amount of hand-labeled data. Interestingly, when  the keywords are used in combination with the language model, they do provide some benefit in the case where the system uses ASR transcripts.</Paragraph>
    <Paragraph position="2"> The results in Table 2 correspond to models using only prosodic cues. When these models are trained on only a small amount of hand-labeled data, the overall accuracy is similar to the system using keywords when operating on the ASR transcript. Performance is somewhat better than chance, and use of hand vs. ASR transcripts (and associated word alignments) has little impact. There is a small gain in accuracy but a large gain in agree/disagree recovery from using the data that was labeled via the unsupervised language model clustering technique. Unfortunately, when the prosody features are combined with the word-based features, there is no performance gain, even for the case of errorful ASR transcripts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML