File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/p06-1124_abstr.xml
Size: 1,027 bytes
Last Modified: 2025-10-06 13:45:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1124"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Hierarchical Bayesian Language Model based on Pitman-Yor Processes</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models.</Paragraph> <Paragraph position="1"> Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.</Paragraph> </Section> class="xml-element"></Paper>