File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/a94-1010_concl.xml

Size: 3,177 bytes

Last Modified: 2025-10-06 13:57:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1010">
  <Title>Improving Language Models by Clustering Training Sentences</Title>
  <Section position="6" start_page="62" end_page="63" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> I have suggested that training corpus clustering can be used both to extend the effectiveness of a very general class of language models, and to provide evidence of whether a particular language model could benefit from extending it by hand to allow it to take better account of context. Clustering can be useful even when there is no reason to believe the training  corpus naturally divides into any particular number of clusters on any extrinsic grounds.</Paragraph>
    <Paragraph position="1"> The experimental results presented show that clustering increases the (absolute) success rate of unigram and bigram language modeling for a particular ATIS task by up to about 12%, and that performance improves steadily as the number of clusters climbs towards 100 (probably a reasonable upper limit, given that there are only a few thousand training sentences). However, clusters do not improve tri-gram modeling at all. This is consistent with experience (Rayner et al, 1994) that, for the ATIS domain, trigrams model inter-word effects much better than bigrams do, but that extending the N-gram model beyond N = 3 is much less beneficial.</Paragraph>
    <Paragraph position="2"> For N-rule modeling, clustering increases the success rate for both N = 1 and N = 2, although only by about half as much as for N-grams. This suggests that conditioning the occurrence of a grammar rule on the identity of its mother (as in the 2-rule case) accounts for some, but not all, of the contextual influences that operate. From this it is sensible to conclude, consistently with the results of Briscoe and Carroll (1993), that a more complex model of grammar rule interaction might yield better results.</Paragraph>
    <Paragraph position="3"> Either conditioning on other parts of the parse tree than the mother node could be included, or a rather different scheme such as Briscoe and Carroll's could be used.</Paragraph>
    <Paragraph position="4"> Neither the observation that trigrams may represent the limit of usefulness for N-gram modeling in ATIS, nor that non-trivial contextual influences exist between occurrences of grammar rules, is very novel or remarkable in its own right. Rather, what is of interest is that the improvement (or otherwise) in particular language models from the application of clustering is consistent with those observations.</Paragraph>
    <Paragraph position="5"> This is important evidence for the main hypothesis of this paper: that enhancing a language model with clustering, which once the software is in place can be done largely automatically, can give us important clues about whether it is worth expending research, programming, data-collection and machine resources on hand-coded improvements to the way in which the language model in question models context, or whether those resources are best devoted to different, additional kinds of language model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML