File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2007_evalu.xml

Size: 3,652 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2007">
  <Title>Active Learning for Classifying Phone Sequences from Unsupervised Phonotactic Models</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Evaluation metrics
</SectionTitle>
      <Paragraph position="0"> We are interested in comparing the performance for a given amount of labeling effort of classifiers trained on random selection of examples with that of classifiers trained on examples chosen according to the confidence-based method described in section 3.</Paragraph>
      <Paragraph position="1"> The basic measurements are: A(e): the classification accuracy at a given labeling effort level e of the classifier trained on actively selected labeling examples.</Paragraph>
      <Paragraph position="2"> R(e): the classification accuracy at a given labeling effort leveleof the classifier trained on randomly selected labeling examples.</Paragraph>
      <Paragraph position="3"> A 1(R(e)): the effort required to achieve the performance of random selection at effort e, using active learning. null Derived from these is the main comparison we are in-</Paragraph>
      <Paragraph position="5"> effort that would be required to achieve the performance of random selection at effort e, actually required using active learning: that is, low is good.</Paragraph>
      <Paragraph position="6"> We use two metrics for labeling effort: the number of utterances to be labeled and the number of phones in those utterances. The number of phones is indicative of the length of the audio file that must be listened to in order to make the class label assignment, so this is relevant to assessing just how much real effort is saved by any active learning technique.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Results
</SectionTitle>
      <Paragraph position="0"> Table 1 gives the results for selected levels of labeling effort in the HMIHY domain, calculated in terms of number of utterances labeled.</Paragraph>
      <Paragraph position="1"> These results suggest that we can achieve the same accuracy as random labeling with around 60% of the effort by active selection of examples according to the confidence-based method described in section 3.</Paragraph>
      <Paragraph position="2"> However, a closer inspection of the chosen examples reveals that, on average, the actively selected utterances are nearly 1.5 times longer than the random selection in terms of number of phones. (This is not suprising given that the classification method performs much worse on longer utterances, and the confidence levels reflect this.) In order to overcome this we introduce as part of the selection criteria a length limit of 50 phones. This allows us to retain appreciable effort savings as shown in table 2.</Paragraph>
      <Paragraph position="3"> The TTSHD application is considerably less complex than HMIHY, and this may be reflected in the greater savings obtained using active learning. Tables 3 and 4 show the corresponding results for this domain.</Paragraph>
      <Paragraph position="4"> There is also a smaller variation in utterance length between actively and randomly selected training examples (more like 110% than the 150% for HMIHY); table 4 shows that defining effort in terms of number of phones still results in appreciable savings for active learning. (In null corporating a length limit gave little additional benefit here.)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML