XML Viewer - h89-1012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/h89-1012_concl.xml
Size: 3,455 bytes
Last Modified: 2025-10-06 13:56:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1012">
  <Title>THE BBN SPOKEN LANGUAGE SYSTEM</Title>
  <Section position="16" start_page="109" end_page="109" type="concl">
    <SectionTitle>
INTEGRATED SYSTEM PERFORMANCE
</SectionTitle>
    <Paragraph position="0"> In this section, we present results for HARC on the standard DARPA 1000-Word Resource Management speech database (Price, et al. (1988)), with 600 sentences (about 30 minutes) of training speech to train the acoustic models for each speaker. For these experiments, speech was sampled at 20 kHz, and 14 MeI-Frequency cepstral coeffients (MFCC), their derivatives (DMFCC), plus power (R0) and derivative of power (DR0) were computed for each 10 ms, using a 20 ms analysis window. Three separate 8-bit codebooks were created for each of the three sets of parameters using K-means vector quantization (VQ).</Paragraph>
    <Paragraph position="1"> The experiments were conducted using the multicodebook paradigm in the HMM models, where the output of vector quantizer, which consists of a vector of 3 VQ codewords per 10 ms frame, is used as the input observation sequence to the HMM.</Paragraph>
    <Paragraph position="2"> For the purpose of making computation tractable, we applied the lattice pruning techniques described above to a full word lattice to reduce the average lattice size from over 2000 word theories to about 604 . At this lattice size, the probability of having the correct word sequence in the lattice is about 98%, which places an upperbound on subsequent system performance using the language models.</Paragraph>
    <Paragraph position="3">  speakers, using a total of 109 utterances, under 4 grammar conditions. As shown, the grammars tested include: 1) no grammmar: all word sequences are possible; 2) the word pair grammar, containing all pairs of words occuring in the set of sentences that was used to define the database; 3) the syntactic grammar alone; and 4) semantic interpretation for a posteriori filtering on the output of lattice parsing. Note that the performance using the syntactic language model is 7.5% error. At a perplexity of 700, its performance should be closer to the no grammar case, which has a perplexity of 1000 and an error rate of about 15%. We hypothesize that perplexity alone is not adequate to predict the quality of a language model. In order to be more precise, one needs to look at acoustic perplexity: a measure of how well a language model can selectively and appropriately limit acoustic confusability. A linguistically motivated language model seems to do just that--at least in this limited experiment. Also, surprisingly, using semantics gave insignificant improvement in the overall performance. One possible explanation for this is that 460 word theories corresponds to about 4000 acoustic scores. ii0 semantics gets to filter only a small number of the sentences accepted by syntax. Out of the sentences which receive semantic interpretations, syntax alone determined the correct sentence better than 60 percent of the time, leaving only about 20 sentences in which the semantics has a chance to correct the error.</Paragraph>
    <Paragraph position="4"> Unfortunately, of these errorful answers, most were semantically meaningful, although there were some exceptions. Pragmatic information may be a higher level knowledge source to constrain the possible word sequences, and therefore improve performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML