File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1058_intro.xml

Size: 4,699 bytes

Last Modified: 2025-10-06 14:01:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1058">
  <Title>On Combining Language Models : Oracle Approach</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Statistical language models (LMs) are essential in speech recognition and understanding systems for high word and semantic accuracy, not to mention robustness and portability. Several language models have been proposed and studied during the past two decades [8]. Although it has turned out to be a rather difficult task to beat the (almost) standard class/word n-grams (typically D2 BPBEor BF), there has been a great deal of interest in grammar based language models [1]. A promising approach for limited domain applications is the use of semantically motivated phrase level stochastic context free grammars (SCFGs) to parse a sentence into a sequence of semantic tags which are further modeled using D2-grams [2, 9, 10, 3].</Paragraph>
    <Paragraph position="1"> The main motivation behind the grammar based LMs is the inability of D2-grams to model longer-distance constraints in a language.</Paragraph>
    <Paragraph position="2"> With the advent of fairly fast computers and efficient parsing and search schemes several researchers have focused on incorporating relatively complex language models into speech recognition and understanding systems at different levels. For example, in [3], we  The work is supported by DARPA through SPAWAR under grant #N66001-00-2-8906.</Paragraph>
    <Paragraph position="3"> report a significant perplexity improvement with a moderate increase in word/semantic accuracy, at C6-best list (rescoring) level, using a dialog-context dependent, semantically motivated grammar based language model.</Paragraph>
    <Paragraph position="4"> Statistical language modeling is a &amp;quot;learning from data&amp;quot; problem. The generic steps to be followed for language modeling are AF preparation of training data AF selection of a model type AF specification of the model structure AF estimation of model parameters The training data should consist of large amounts of text, which is hardly satisfied in new applications. In those cases, complex models fit to the training data. On the other hand, simple models can not capture the actual structure. In the Bayes' (sequence) decision framework of speech recognition/understanding we heavily constrain the model structure to come up with a tractable and practical LM. For instance, in a class/word D2-gram LM the dependency of a word is often restricted to the class that it belongs and the dependency of a class is limited to D2-1 previous classes. The estimation of the model parameters, which are commonly the probabilities, is another important issue in language modeling. Besides data sparseness, the estimation algorithms (e.g. EM algorithm) might be responsible for the estimated probabilities to be far from optimal. The aforementioned problems of learning have different effects on different LM types. Therefore, it is wise to design LMs based on different paradigms and combine them in some optimal sense. The simplest combination method is the so called linear interpolation [4]. Recently, the linear interpolation in the logarithmic domain has been investigated in [6]. Perplexity results on a couple of tasks have shown that the log-linear interpolation is better than the linear interpolation. Theoretically, a far more powerful method for LM combination is the maximum entropy approach [7]. However, it has not been widely used in practice, since it is computationally demanding.</Paragraph>
    <Paragraph position="5"> In this research, we consider two LMs: AF class-based 3-gram LM (baseline).</Paragraph>
    <Paragraph position="6"> AF dialog dependent semantic grammar based 3-gram LM [3].</Paragraph>
    <Paragraph position="7"> After N-best list rescoring experiments with linear and log-linear interpolation, we realized that the performance in terms of word and semantic accuracies fall considerably short of the performance of an oracle. We explain the set-up for the oracle experiment and point out that the oracle is a dynamic LM combiner. To fill the performance gap, we suggest a method that can mimic the oracle.</Paragraph>
    <Paragraph position="8">  The paper is organized as follows. Section 2 presents the language models considered in this study. In Section 3, we briefly explain combining of LMs using linear and log-linear interpolation. Section 4 explains the set up for the oracle experiment. Experimental results are reported in Section 5. The future work and conclusions are given in the last section.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML