File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1014_abstr.xml

Size: 1,587 bytes

Last Modified: 2025-10-06 13:43:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1014">
  <Title>Modeling of Long Distance Context Dependency</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Ngram models are simple in language modeling and have been successfully used in speech recognition and other tasks. However, they can only capture the short distance context dependency within an n-words window where currently the largest practical n for a natural language is three while much of the context dependency in a natural language occurs beyond a three words window. In order to incorporate this kind of long distance context dependency in the ngram model of our Mandarin speech recognition system, this paper proposes a novel MI-Ngram modeling approach. This new MI-Ngram model consists of two components: a normal ngram model and a novel MI model. The ngram model captures the short distance context dependency within an n-words window while the MI model captures the context dependency between the word pairs over a long distance by using the concept of mutual information.</Paragraph>
    <Paragraph position="1"> That is, the MI-Ngram model incorporates the word occurrences beyond the scope of the normal ngram model. It is found that MI-Ngram modeling has much better performance than the normal word ngram modeling.</Paragraph>
    <Paragraph position="2"> Experimentation shows that about 20% of errors can be corrected by using a MI-Trigram model compared with the pure word trigram model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML