File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/p06-1124_abstr.xml

Size: 1,027 bytes

Last Modified: 2025-10-06 13:45:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1124">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Hierarchical Bayesian Language Model based on Pitman-Yor Processes</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models.</Paragraph>
    <Paragraph position="1"> Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML