File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1057_metho.xml

Size: 6,968 bytes

Last Modified: 2025-10-06 14:12:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1057">
  <Title>A Dynamic Language Model for Speech Recognition</Title>
  <Section position="3" start_page="293" end_page="294" type="metho">
    <SectionTitle>
CACHE LANGUAGE MODEL
</SectionTitle>
    <Paragraph position="0"> Using a window of the n most recent words, we can estimate a unigram frequency distribution f,(w,+l), a bigram frequency distribution, fn(w,/l I w,), and a trigram frequency distribution, f,(w,+l } w,,w,-1).</Paragraph>
    <Paragraph position="1"> The resulting 3 dynamic estimators are linearly smoothed together to obtain a dynamic trigram model denoted by pc,(w,+l I w,., w,-a). The dynamic trigram model assigns a non-zero probability for the words that have occurred in the window of the previous n words. Since the next word may not be in the cache and since the cache contains very few trigrams, we interpolate linearly the dynamic model with the the static trigram language model:</Paragraph>
    <Paragraph position="3"> where p,(...) is the usual static trigram language model.</Paragraph>
    <Paragraph position="4"> We use the forward-backward algorithm to estimate the interpolation parameter ),c \[1\]. This parameter varies between 0.07 and 0.28 depending on the particular static trigram language model (we used tri-gram language models estimated from different size corpora) and the cache size (varying from 200 to 1000 words.) We have evaluated this cache language model by computing the perplexity on three test sets: * Test sets A and B are each about 100k words of text that were excised from a corpus of documents from an insurance company that was used for building the static trigram language model for a 20,000-word vocabulary.</Paragraph>
    <Paragraph position="5"> * Test set C which consists of 7 documents (about 4000 words) that were dictated in a field trial in the same insurance company on TANGORA (the 20,000-word isolated word recognizer developed at IBM.) Table \] shows the perplexity of the static and dynamic language models for the three test sets. The cache size was 1000 words and was updated word synchronously. The static language model was estimated from about 1.2 million words of insurance documents. The dynamic language model yields from 8% to 23% reduction in perplexity, with the larger reduction occurring with the test sets with larger perplexity. The interpolation weight Ac was estimated using set B when testing on sets A and C and set. A when testing on set B. Table 2 shows the effect of cache size on perplexity where it appears that a larger cache is more useful. These results were on test set C. On test set C, the rate that the next word is in the cache ranges from 75% for a cache window of 200 words to 83% for a window of 1000. Table 3 compares a cache with unigrams only with a full trigram cache (for the trigram cache, the weights for the unigram, bigram, and trigram frequencies were 0.25, 0.25, 0.5 respectively and were selected by hand.) A second set of weights (0.25,0.5,0.25) produced a perplexity of 190 for the trigram cache. In all the above experiments, the cache was not flushed between documents. In the next section, we compare the different models in an isolated speech recognition experiment.</Paragraph>
    <Paragraph position="6"> We have tried using a fancier interpolation scheme where the reliance on the cache depends on the cur- null rent word wn with the expectation that some words will tend to be followed by bursty words whereas other words will tend to be followed by non-bursty words.</Paragraph>
    <Paragraph position="7"> We typically used about 50 buckets (or weighting parameters). However, we have found that the perplexity on independent data to be no better than the single parameter interpolation.</Paragraph>
  </Section>
  <Section position="4" start_page="294" end_page="294" type="metho">
    <SectionTitle>
ISOLATED SPEECH RECOGNITION
</SectionTitle>
    <Paragraph position="0"> We incorporated the cache language model into the TANGORA isolated speech recognition system.</Paragraph>
    <Paragraph position="1"> We evaluated two cache update strategies. In the first one, the cache is updated at the end of every utterance, i.e., when the speaker turns off the microphone. An utterance may be a partial sentence or a complete sentence or several sentences depending on how the speaker dictated the document. In the second strategy, the cache is updated as soon as the recognizer makes a decision about what was spoken. This typically corresponds to a delay of about 3 words. The cache is updated with the correct text which requires that the speaker correct any errors that may occur.</Paragraph>
    <Paragraph position="2"> This may be unduly difficult with the second update strategy. But in the context of our experiments, we have found that using the simpler (and more realistic) update strategy, i.e., after an utterance is completed, to be as effective as the more elaborate update strategy. null The TANGORA system uses a 20,000-word office correspondence vocabulary with a trigram language model estimated from a few hundred million words from several sources. The cache language model was tested on a set of 14 documents dictated by 5 speakers with an internal telephone system (private branch exchange.) The speakers were form the speech group typically dictating electronic mail messages or internal memoranda. The size of a document ranged from about 120 words to 800 words. The total test corpus was about. 5000 words. The maximum cache size (4000 words) was \]a.rger than any of the documents.</Paragraph>
    <Paragraph position="3"> In these tests, the cache is flushed at the beginning of each document.</Paragraph>
    <Paragraph position="4"> In these experiments, the weights for interpolating the dynamic unigram, bigram, and trigram hequencies were 0.4, 0.5, and 0.1, respectively. The weight of the cache probability, At, relative to the static trigram probability was 0.2. Small changes in this weight does not seem to affect recognition performance. The potential benefit of a cache depends on the amount of text that has been observed. Table 4 shows the percentage reduction in error rate as a function of the length of the observed text. We divided the documents into 100-word bins and computed the error rate in each bin. For the static language model, the error rate should be constant except for statistical fluctuations, whereas one expects that the error rate of the cache to decrease with longer documents. As can be seen from Table 4, the cache reduces the error rate by about 5% for shorter documents and up to 24% for longer documents. The trigram cache results in an average reduction in error rate of 10% for these documents whose average size is about 360 words. The trigram cache is very slightly better than a unigram cache eventhough the earlier results using perplexity as a measure of performance indicated a bigger difference between the two caches.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML