XML Viewer - p06-1126

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-1126_relat.xml
Size: 2,608 bytes
Last Modified: 2025-10-06 14:15:59
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1126">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Discriminative Pruning of Language Models for Chinese Word Segmentation</Title>
  <Section position="4" start_page="1001" end_page="1001" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> A simple way to reduce the size of an n-gram language model is to exclude those n-grams occurring infrequently in training corpus. It is named as count cut-off method (Jelinek, 1990).</Paragraph>
    <Paragraph position="1"> Because counts are always integers, the size of the model can only be reduced to discrete values.</Paragraph>
    <Paragraph position="2"> Gao and Lee (2000) proposed a distribution-based pruning. Instead of pruning n-grams that are infrequent in training data, they prune n-grams that are likely to be infrequent in a new document. Experimental results show that it is better than traditional count cut-off method.</Paragraph>
    <Paragraph position="3"> Seymore and Rosenfeld (1996) proposed a method to measure the difference of the models before and after pruning each n-gram, and the difference is computed as:</Paragraph>
    <Paragraph position="5"> ) denotes the conditional probabilities assigned by the original model, and</Paragraph>
    <Paragraph position="7"> ) denotes the probabilities in the pruned model. N(h</Paragraph>
    <Paragraph position="9"> ) is the discounted frequency of n-gram event h</Paragraph>
    <Paragraph position="11"> . Seymore and Rosenfeld (1996) showed that this method is more effective than the traditional cut-off method.</Paragraph>
    <Paragraph position="12"> Stolcke (1998) presented a more sound criterion for computing the difference of models before and after pruning each n-gram, which is called relative entropy or Kullback-Leibler distance. It is computed as:</Paragraph>
    <Paragraph position="14"> This criterion removes some of the approximations employed in Seymore and Rosenfeld (1996). In addition, Stolcke (1998) presented a method for efficient computation of the Kullback-Leibler distance of each n-gram.</Paragraph>
    <Paragraph position="15"> In Gao and Zhang (2002), three measures are studied for the purpose of language model pruning. They are probability, rank, and entropy.</Paragraph>
    <Paragraph position="16"> Among them, probability is very similar to that proposed by Seymore and Rosenfeld (1996). Gao and Zhang (2002) also presented a method of combining two criteria, and showed the combination of rank and entropy achieved the smallest models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML