File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/h92-1080_evalu.xml
Size: 1,767 bytes
Last Modified: 2025-10-06 14:00:10
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1080"> <Title>Applying SPHINX-II to the DARPA Wall Street Journal CSR Task</Title> <Section position="4" start_page="396" end_page="396" type="evalu"> <SectionTitle> 4.4 Results </SectionTitle> <Paragraph position="0"> The official NIST results are given in the following table.</Paragraph> <Paragraph position="1"> Each line of the table gives results for a particular test from the si_evl test suite. The test sets are 5 (5000 word closed), 20 (20000 word closed), sp (spontaneous) and rs (read spontaneous). These four test sets are further subdivided to vp and nvp conditions. The final condition for each test is the language model used. For these tests only two models, 5c (5000 word closed) and 50 (5000 word open) were used. For further details on the testing datasets see \[14\].</Paragraph> <Paragraph position="2"> The table is largely self explanatory other than the column labeled 2or. This column is simply two times the standard deviation of the average word error rate computed from word error rates on a sentence by sentence basis. As expected the vp tests out perform the nvp tests and the the open language model out performs the closed language model when the test data set contains words from outside the language models lexicon. It should be noted however that the vp portion of the test is probably the more difficult set since when we remove the highly reliable punctuation words words from the scoring, the error rate for the remaining words is actually higher than the one obtained in the nvp case. We attribute this to the increased number of disfluencies caused by verbalized pronunciation and to the detrimental effect on the bigram language model.</Paragraph> </Section> class="xml-element"></Paper>