File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1126_evalu.xml
Size: 1,715 bytes
Last Modified: 2025-10-06 13:59:46
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1126"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Discriminative Pruning of Language Models for Chinese Word Segmentation</Title> <Section position="8" start_page="1006" end_page="1007" type="evalu"> <SectionTitle> Model and KLD Model 5 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> A discriminative pruning criterion of n-gram language model for Chinese word segmentation was proposed in this paper, and a step-by-step growing algorithm was suggested to generate the model of desired size based on a full-bigram model and a base model. Experimental results showed that the discriminative pruning method achieves significant improvements over the base-line KLD based method. At the same F-measure, the number of bigrams can be reduced by up to 90%. By combining the saturated model and the baseline KLD based method, we achieved better performance for any model size. Analysis shows that, if the models come from the same pruning method, the correlation between perplexity and performance is strong. Otherwise, the correlation is weak.</Paragraph> <Paragraph position="1"> The pruning methods discussed in this paper focus on bigram pruning, keeping unigram probabilities unchanged. The future work will attempt to prune bigrams and unigrams simultaneously, according to a same discriminative pruning criterion. And we will try to improve the efficiency of the step-by-step growing algorithm. In addition, the method described in this paper can be extended to other applications, such as IME and speech recognition, where language models are applied in a similar way.</Paragraph> </Section> class="xml-element"></Paper>