File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2904_evalu.xml

Size: 3,119 bytes

Last Modified: 2025-10-06 13:59:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2904">
  <Title>Scoring Algorithms for Wordspotting Systems</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> The experiments for this algorithm were conducted usingthe Nexidiawordspottingsystemtrained onbroadcast quality North American English speech. The effect of using different scoring algorithms was accomplished using a nine hour subset of the HUB-4 1996 North American Englishbroadcastcorpus. Thisdatawaschosensincethis corpus is widely available and is disjoint from the training data used for the wordspotter. From this corpus, 8500 searchtermswererandomlyselectedfromthetranscripts.</Paragraph>
    <Paragraph position="1"> These queries were equally distributed in length from 4 to 20 phonemes, and then split into a testing and training set. For each search term, results ranging from the top score down to the 90th false alarm were collected. The results from the training terms were then used to train the score models using both the EM algorithm and a Gibbs sampler.</Paragraph>
    <Paragraph position="2"> These trained models were then then used to generate both FB and FC for all of the test queries. In addition, the &amp;quot;Standard&amp;quot; scores were generated. These scores are what the Nexidia wordspotting product reveals to the users, and are calculated by scaling the raw scores by the number of phonemes and mapping these from zero to one.</Paragraph>
    <Paragraph position="3"> The resulting scores from these tests are listed in Table 1. As expected, the CFAR based score performed well on the KS metric, while the Bayesian score was more accurate on the B measure. Both of these methods performed much better than the previous ad-hoc &amp;quot;Standard&amp;quot; method. However, performance improvements on one measure resulted in very poor scores on the other.</Paragraph>
    <Paragraph position="4"> This is due to the fact that the objective of each measure is very different. In addition, the estimation scheme had little effect on the overall scores. Since the EM algorithm requires a small fraction of the computation that the Gibbs sampler requires, this method is preferable.</Paragraph>
    <Paragraph position="5">  To illustrate the differences between the three scoring algorithms, the hits and misses were also collected and plotted in Figure 6. In each subplot, there are histograms of the hits and misses. In all three cases, most of the hits tend to have scores close to one. However, the misses in the standard scoring scheme are concentrated from 0.5 to 0.8. When the Bayes scoring method is used, half of the hits are very close to 1.0, while half of the misses are very close to 0.0. The other half of the scores are distributed along the score range. Finally, the misses from the CFAR scoring algorithm are distributed evenly along entire range of scores. Because the normal score assumptiondoesnotstrictlyhold,thisdistributionisnotperfectly null flat at the start and the end, but it is fairly close.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML