File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1047_intro.xml

Size: 3,376 bytes

Last Modified: 2025-10-06 14:06:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1047">
  <Title>Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> There is currently an increasing interest in statistical language models, which in one way or another aim at exploiting word-dependencies spanning over a variable number of words. Though all these models commonly relax the assumption of fixed-length dependency of the conventional ngram model, they cover a wide variety of modeling assumptions and of parameter estimation frameworks. In this paper, we focus on a phrase-based approach, as opposed to a gram-based approach: sentences are structured into phrases and probabilities are assigned to phrases instead of words. Regardless of whether they are gram or phrase-based, models can be either deterministic or stochastic. In the phrase-based framework, non determinism is introduced via an ambiguity on the parse of the sentence into phrases. In practice, it means that even if phrase abe is registered as a phrase, the possibility of parsing the string as, for instance, lab\] \[c\] still remains. By contrast, in a deterministic approach, all co-occurences of a, b and c would be systematically interpreted as an occurence of phrase \[abel.</Paragraph>
    <Paragraph position="1"> Various criteria have been proposed to derive phrases in a purely statistical way 1; data likeli-I i.e. without using grammar rules like in Stochastic Context Free Grammars.</Paragraph>
    <Paragraph position="2"> hood, leaving-one-out likelihood (PLies et al., 1996), mutual information (Suhm and Waibel, 1994), and entropy (Masataki and Sagisaka, 1996). The use of the likelihood criterion in a stochastic framework allows EM principled optimization procedures, but it is prone to overlearning. The other criteria tend to reduce the risk of overlearning, but their optimization relies on heuristic procedures (e.g. word grouping via a greedy algorithm (Matsunaga and Sagayama, 1997)) for which convergence and optimality are not theoretically guaranteed. The work reported in this paper is based on the multigram model, which is a stochastic phrase-based model, the parameters of which are estimated according to a likelihood criterion using an EM procedure. The multigram approach was introduced in (Bimbot et al., 1995), and in (Deligne and Bimbot, 1995) it was used to derive variable-length phrases under the assumption of independence of the phrases. Various ways of theoretically releasing this assumption were given in (Deligne et al., 1996). More recently, experiments with 2-word multigrams embedded in a deterministic variable ngram scheme were reported in (Siu, 1998).</Paragraph>
    <Paragraph position="3"> In section 2 of this paper, we further formulate a model with bigram (more generally ~-gram) dependencies between the phrases, by including a paradigmatic aspect which enables the clustering of variable-length phrases. It results in a stochastic class-phrase model, which can be interpolated with the stochastic phrase model, in a similar way to deterministic approaches. In section 3 and 4, the phrase and class-phrase models are evaluated in terms of perplexity values and model size.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML