File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/p00-1073_concl.xml

Size: 1,071 bytes

Last Modified: 2025-10-06 13:52:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1073">
  <Title>Distribution-Based Pruning of Backoff Language Models</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we proposed a novel approach for n-gram backoff models pruning: keep n-grams that are more likely to occur in a new document. We then developed a criterion for pruning parameters from n-gram models, based on the n-gram distribution i.e. the probability that an n-gram occurs in a document. All n-grams with the probability less than a threshold are removed. Experimental results show that the distribution-based pruning method performed 7-9% (word perplexity reduction) better than conventional cutoff methods. Furthermore, when modelling n-gram distribution on document clusters created according to domain, style, or time, the pruning method results in a more general n-gram backoff model, in spite of the domain, style or temporal bias of training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML