File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1080_evalu.xml
Size: 2,460 bytes
Last Modified: 2025-10-06 14:00:17
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1080"> <Title>A One Pass Decoder Design For Large Vocabulary Recognition</Title> <Section position="5" start_page="408" end_page="409" type="evalu"> <SectionTitle> 4. EXPERIMENTAL RESULTS </SectionTitle> <Paragraph position="0"> Experiments have been performed on both 5k and 20k Wall Street Journal tasks. The WSJ systems used training data from the SI-84 and SI-284 test sets, and the pronunciations from the Dragon Wall Street Journal Pronunciation Lexicon Version 2.0 together with the standard bigram and trigram language models supplied by MIT Lincoln Labs. Some locally generated additions and corrections to the dictionary were used and the stress markings were ignored resulting in 44 phones plus silence. Data preparation used the HTK Hidden Markov Model Toolkit \[13\]. All speech models had three emitting states and a left-to-right topology and used continuous density mixture Gaussian output probability distributions tied at the state level using phonetic decision trees \[14\]. The decoder enforced silence at the start and end of sentences and allowed optional silence between words.</Paragraph> <Paragraph position="1"> These systems achieved the lowest error rates reported for the November 1993 WSJ evaluations on the H1-C2, H2-C1 and H2-P0 and the second lowest error rate on H1-P0. Further details about these systems can be found in \[11\].</Paragraph> <Paragraph position="2"> Table 1 gives details of decoder performance for the various tasks. All figures quoted are for the beam widths used in the evaluation tests. The required computation scales with the number of active models per frame (and the number of frames in the test set) and on an HP735 decoding the 5k gender dependent cross-word systems required approximately 10 minutes per sentence whilst the 20k systems took about 15 minutes per sentence (on average). As the table shows, the computation required does not depend on the potential network size since the load for the trigram case is generally less than the corresponding bigram case. This shows that the early application of knowledge can be used to constrain the search in order to offset the computational costs of using the knowledge. In the bigram case, no reliance is made on the back-off nature of the language model and the computational load will not therefore change when the size of the language model is increased.</Paragraph> </Section> class="xml-element"></Paper>