File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-2003_concl.xml

Size: 1,016 bytes

Last Modified: 2025-10-06 13:53:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2003">
  <Title>Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> In summary, we have shown that, if filtered, web text can be successfully used for training language models of conversational speech, outperforming some other out-of-domain (BN) and small domain-specific (Meetings) sources of data. We have also found that by combining LMs from different domains with class-dependent interpolation (particularly when each of the top 100 words forms its own class), we achieve lower WER than if we use the standard approach where mixture weights depend only on the data source. Recognition experiments show a significant reduction in WER (1.3-2.3% absolute) due to additional training data and class-based interpolation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML