File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1031_intro.xml
Size: 1,897 bytes
Last Modified: 2025-10-06 14:05:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1031"> <Title>Bayesian Grammar Induction for Language Modeling</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In applications such as speech recognition, handwriting recognition, and spelling correction, performance is limited by the quality of the language model utilized (7; 7; 7; 7). However, static language modeling performance has remained basically unchanged since the advent of n-gram language models forty years ago (7). Yet, n-gram language models can only capture dependencies within an n-word window, where currently the largest practical n for natural language is three, and many dependencies in natural language occur beyond a three-word window. In addition, n-gram models are extremely large, thus making them difficult to implement efficiently in memory-constrained applications.</Paragraph> <Paragraph position="1"> An appealing alternative is grammar-based language models. Language models expressed as a probabilistic grammar tend to be more compact than n-gram language models, and have the ability to model long-distance dependencies (7; 7; 7). However, to date there has been little success in constructing grammar-based language models competitive with n-gram models in problems of any magnitude.</Paragraph> <Paragraph position="2"> In this paper, we describe a corpus-based induetion algorithm for probabilistic context-free grammars that outperforms n-gram models and the Inside-Outside algorithm (7) in medium-sized domains. This result marks the first time a grammar-based language model has surpassed n-gram modeling in a task of at least moderate size. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm.</Paragraph> </Section> class="xml-element"></Paper>