File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/n01-1029_abstr.xml

Size: 6,080 bytes

Last Modified: 2025-10-06 13:42:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1029">
  <Title>References</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Recent contributions to statistical language modeling for speech recognition have shown that probabilistically parsing a partial word sequence aids the prediction of the next word, leading to &amp;quot;structured&amp;quot; language models that have the potential to outperform n-grams. Existing approaches to structured language modeling construct nodes in the partial parse tree after all of the underlying words have been predicted. This paper presents a different approach, based on probabilistic left-corner grammar (PLCG) parsing, that extends a partial parse both from the bottom up and from the top down, leading to a more focused and more accurate, though somewhat less robust, search of the parse space. At the core of our new structured language model is a fast context-sensitive and lexicalized PLCG parsing algorithm that uses dynamic programming. Preliminary perplexity and word-accuracy results appear to be competitive with previous ones, while speed is increased.</Paragraph>
    <Paragraph position="1"> 1 Structured language modeling In its current incarnation, (unconstrained) speech recognition relies on a left-to-right language model L, which estimates the occurrence of a next word wj given a sequence of preceding words c j Dw j 10 (the context):1 L.wjjc j/DOp.wjjc j/: L is called a language model (LM).</Paragraph>
    <Paragraph position="2"> Obviously the context space is huge and even in very large training corpora most contexts never occur, which prohibits a reliable probability estimation. Therefore the context space needs to be mapped to a much smaller space, such that only the essential information is retained. In spite of its 1As a shorthand, wba denotes a sequence wawaC1 :::wb if b a, else it is the empty sequence.</Paragraph>
    <Paragraph position="3"> simplicity the trigram LM, that reduces c j to w j 1j 2, is hard to improve on and still the main language model component in state-of-the-art speech recognition systems. It is therefore commonly used as a baseline in the evaluation of other models, including the one described in this paper.</Paragraph>
    <Paragraph position="4"> Structured language models (SLM) introduce parsing into language modeling by alternating between predicting the next word using features of partial parses of the context and extending the partial parses to cover the next word. Following this approach, Chelba and Jelinek (2000) obtained a SLM that slightly improves on a trigram model both in perplexity and recognition performance.</Paragraph>
    <Paragraph position="5"> The Chelba-Jelinek SLM is, to our knowledge, the first left-to-right LM using parsing techniques that is successfully applied to large vocabulary speech recognition. It is built on top of a lexicalized probabilistic shift-reduce parser that predicts the next word from the headwords (&amp;quot;exposed&amp;quot; heads) and categories of the last two predicted isolated constituents of the context. Then the predicted word becomes the last isolated constituent and the last two constituents are repeatedly recombined until the parser decides to stop.</Paragraph>
    <Paragraph position="6"> A dynamic programming (DP) version of Chelba's parser, inspired on the CYK chart parser, was proposed in (Jelinek and Chelba, 1999). Our implementation is roughly quadratic in the length of the sentence, but not significantly faster than Chelba's non-DP parser. It scored somewhat lower in perplexity before reestimation (presumably by avoiding search errors), but remained roughly at the same level after full inside-outside reestimation (Van Aelten and Hogenhout, 2000).</Paragraph>
    <Paragraph position="7"> An obvious weakness of the Chelba-Jelinek SLM is the bottom-up behavior of the parser: it creates isolated constituents and only afterwards is it able to check whether a constituent fits into a higher structure. Van Uytsel (2000) developed a top-down alternative along similar lines but based on a lexicalized and context-sensitive DP version of an efficient Earley parser (Stolcke, 1995; Jelinek and Lafferty, 1991). The Earley-based SLM performed worse than the Chelba-Jelinek SLM, mostly due to the fact that the rule production probabilities cannot be conditioned on the underlying lexical information, thus producing a lot of wrong parses.</Paragraph>
    <Paragraph position="8"> The weaknesses of our Earley SLM have led us to consider probabilistic left-corner grammar (PLCG) parsing (Manning and Carpenter, 1997), which follows a mixed bottom-up and top-down approach. Its potential to enhance parsing efficiency has been recognized by Roark and Johnson (2000), who simulated a left-corner parser with a top-down best-first parser applying a left-corner-transformed PCFG grammar. For the language model described in this paper, however, we implemented a DP version of a native left-corner parser using a left-corner treebank grammar (containing projection rules instead of production rules). The efficiency of our implementation further allowed to enrich the history annotation of the parser states and to apply a lexicalized grammar.</Paragraph>
    <Paragraph position="9"> The following section contains a brief review of Manning's PLCG parser. Section 3 describes how it was adapted to our SLM framework: we introduce lexicalization and context-sensitivity, present a DP algorithm using a chart of parser states and finally we define a language model based on the adapted PLCG parser. At the end of the same section we explain how the initial language model can be trained on additional plain text through a variant of inside-outside reestimation. In section 4 we evaluate a few PLCG-based SLMs obtained from the Penn Tree-bank and BLLIP WSJ Corpus. We present test set perplexity measurements and word accuracy after n-best list rescoring to assess their viability for speech recognition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML