File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2002_evalu.xml

Size: 3,022 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2002">
  <Title>Factored Language Models and Generalized Parallel Backoff</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> GPB-FLMs were applied to two corpora and their perplexity was compared with standard optimized vanilla biand trigram language models. In the following, we consider as a &amp;quot;bigram&amp;quot; a language model with a temporal history that includes information from no longer than one previous time-step into the past. Therefore, if factors are deterministically derivable from words, a &amp;quot;bigram&amp;quot; might include both the previous words and previous factors as a history. From a decoding state-space perspective, any such bigram would be relatively cheap.</Paragraph>
    <Paragraph position="1"> In CallHome-Arabic, words are accompanied with deterministically derived factors: morphological class (M),  stems (S), roots (R), and patterns (P). Training data consisted of official training portions of the LDC CallHome ECA corpus plus the CallHome ECA supplement (100 conversations). For testing we used the official 1996 evaluation set. Results are given in Table 1 and show perplexity for: 1) the baseline 3-gram; 2) a FLM 3-gram using morphs and stems; 3) a GPB-FLM 3-gram using morphs, stems and backoff function g1; 4) the baseline 2-gram; 5) an FLM 2-gram using morphs; 6) an FLM 2-gram using morphs and stems; and 7) an GPB-FLM 2-gram using morphs and stems. Backoff path(s) are depicted by listing the parent number(s) in backoff order. As can be seen, the FLM alone might increase perplexity, but the GPB-FLM decreases it. Also, it is possible to obtain a 2-gram with lower perplexity than the optimized baseline 3-gram.</Paragraph>
    <Paragraph position="2"> The Wall Street Journal (WSJ) data is from the Penn Treebank 2 tagged ('88-'89) WSJ collection. Word and POS tag information (Tt) was extracted. The sentence order was randomized to produce 5-fold cross-validation results using (4/5)/(1/5) training/testing sizes. Other factors included the use of a simple deterministic tagger obtained by mapping a word to its most frequent tag (Ft), and word classes obtained using SRILM's ngram-class tool with 50 (Ct) and 500 (Dt) classes.</Paragraph>
    <Paragraph position="3"> Results are given in Table 2. The table shows the baseline 3-gram and 2-gram perplexities, and three GPB-FLMs.</Paragraph>
    <Paragraph position="4"> Model A uses the true by-hand tag information from the Treebank. To simulate conditions during first-pass decoding, Model B shows the results using the most frequent tag, and Model C uses only the two data-driven word classes. As can be seen, the bigram perplexities are significantly reduced relative to the baseline, almost matching that of the baseline trigram. Note that none of these reduced perplexity bigrams were possible without using one of the novel backoff functions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML