File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3242_concl.xml

Size: 3,278 bytes

Last Modified: 2025-10-06 13:54:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3242">
  <Title>Random Forests in Language Modeling</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have developed a new RF approach for language modeling that can significantly improve upon the KN smoothing in both PPL and WER. The RF approach results in a random history clustering which greatly reduces the number of unseen events compared to the KN smoothing, even though the same training data statistics are used. Therefore, this new approach can generalize well on unseen test data.</Paragraph>
    <Paragraph position="1"> Overall, we can achieve more than 10% PPL reduction and 0.6-1.1% absolute WER reduction over the interpolated KN smoothing, without interpolating with it.</Paragraph>
    <Paragraph position="2"> Based on our experimental results, we think that the RF approach for language modeling is very promising. It will be very interesting to see how our approach performs in a longer history than the trigram. Since our current RF models uses KN smoothing exclusively in lower order probabilities, 3For the a42 -test, we used the standard SCLITE's statistical system comparison program from NIST with the option &amp;quot;mapsswe&amp;quot;, which means the test is the matched pairs sentence segment word error test.</Paragraph>
    <Paragraph position="3"> it may not be adequate when we apply it to higher order a2 -gram models. One possible solution is to use RF models for lower order probabilities as well.</Paragraph>
    <Paragraph position="4"> Higher order RFs will be grown based on lower order RFs which can be recursively grown.</Paragraph>
    <Paragraph position="5"> Another interesting application of our new approach is parser based language models where rich syntactic information is available (Chelba and Jelinek, 2000; Charniak, 2001; Roark, 2001; Xu et al., 2002). When we use RFs for those models, there are potentially many different syntactic questions at each node split. For example, there can be questions such as &amp;quot;Is there a Noun Phrase or Noun among the previous a2 exposed heads?&amp;quot;, etc. Such kinds of questions can be encoded and included in the history. Since the length of the history could be very large, a better smoothing method would be very useful. Composite questions in the form of pylons (Bahl et al., 1989) can also be used.</Paragraph>
    <Paragraph position="6"> As we mentioned at the end of Section 3.2, random samples of the training data can also be used for DT growing and has been proven to be useful for classification problems (Amit and Geman, 1997; Breiman, 2001; Ho, 1998). Randomly sampled data can be used to grow DTs in a deterministic way to construct RFs. We can also construct an RF for each random data sample and then aggregate across RFs.</Paragraph>
    <Paragraph position="7"> Our RF approach was developed for language modeling, but the underlying methodology is quite general. Any a2 -gram type of modeling should be able to take advantage of the power of RFs. For example, RFs could also be useful for POS tagging, parsing, named entity recognition and other tasks in natural language processing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML