File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/p00-1073_concl.xml
Size: 1,071 bytes
Last Modified: 2025-10-06 13:52:52
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1073"> <Title>Distribution-Based Pruning of Backoff Language Models</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> In this paper, we proposed a novel approach for n-gram backoff models pruning: keep n-grams that are more likely to occur in a new document. We then developed a criterion for pruning parameters from n-gram models, based on the n-gram distribution i.e. the probability that an n-gram occurs in a document. All n-grams with the probability less than a threshold are removed. Experimental results show that the distribution-based pruning method performed 7-9% (word perplexity reduction) better than conventional cutoff methods. Furthermore, when modelling n-gram distribution on document clusters created according to domain, style, or time, the pruning method results in a more general n-gram backoff model, in spite of the domain, style or temporal bias of training data.</Paragraph> </Section> class="xml-element"></Paper>