File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1032_intro.xml

Size: 3,292 bytes

Last Modified: 2025-10-06 14:01:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1032">
  <Title>Exploiting Headword Dependency and Predictive Clustering for Language Modeling</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In spite of its deficiencies, trigram-based language modeling still dominates the statistical language modeling community, and is widely applied to tasks such as speech recognition and Asian language text input (Jelinek, 1990; Gao et al., 2002).</Paragraph>
    <Paragraph position="1"> Word trigram models are deficient because they can only capture local dependency relations, taking no advantage of richer linguistic structure. Many proposals have been made that try to incorporate linguistic structure into language models (LMs), but little improvement has been achieved so far in realistic applications because (1) capturing longer distance word dependency leads to higher-order n-gram models, where the number of parameters is usually too large to estimate; (2) capturing deeper linguistic relations in a LM requires a large amount of annotated training corpus and a decoder that assigns linguistic structure, which are not always available.</Paragraph>
    <Paragraph position="2"> This paper presents several practical ways of incorporating long distance word dependency and linguistic structure into LMs. A headword detector is first applied to detect the headwords in each phrase in a sentence. A permuted headword trigram model (PHTM) is then generated from the annotated corpus. Finally, PHTM is extended to a cluster model (C-PHTM), which clusters similar words in the corpus.</Paragraph>
    <Paragraph position="3"> Our models are motivated by three assumptions about language: (1) Headwords depend on previous headwords, as well as immediately preceding words; (2) The order of headwords in a sentence can freely change in some cases; and (3) Word clusters help us make a more accurate estimate of the probability of word strings. We evaluated the proposed models on the realistic application of Japanese Kana-Kanji conversion, which converts phonetic Kana strings into proper Japanese orthography. Results show that C-PHTM achieves a 15% error rate reduction over the word trigram model. This demonstrates that the use of simple methods can effectively capture long distance word dependency, and substantially outperform the word trigram model. Although the techniques in this paper are described in the context of Japanese Kana-Kanji conversion, we believe that they can be extended to other languages and applications.</Paragraph>
    <Paragraph position="4"> This paper is organized as follows. Sections 2 and 3 describe the techniques of using headword Association for Computational Linguistics.</Paragraph>
    <Paragraph position="5"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 248-256. Proceedings of the Conference on Empirical Methods in Natural dependency and clustering for language modeling.</Paragraph>
    <Paragraph position="6"> Section 4 reviews related work. Section 5 introduces the evaluation methodology, and Section 6 presents the results of our main experiments.</Paragraph>
    <Paragraph position="7"> Section 7 concludes our discussion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML