File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1003_intro.xml

Size: 4,287 bytes

Last Modified: 2025-10-06 14:01:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1003">
  <Title>An Incremental Decision List Learner</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Decision lists (Rivest, 1987) have been used for a variety of natural language tasks, including accent restoration (Yarowsky, 1994), word sense disambiguation (Yarowsky, 2000), finding the past tense of English verbs (Mooney and Califf, 1995), and several other problems. We show a problem with the standard algorithm for learning probabilistic decision lists, and we introduce an incremental algorithm that consistently works better. While the obvious implementation for this algorithm would be very slow, we also show how to efficiently implement it. The new algorithm produces smaller lists, while simultaneously substantially reducing entropy (by about 40%), and error rates (by about 25% relative.) Decision lists are a very simple, easy to understand formalism. Consider a word sense disambiguation task, such as distinguishing the financial sense of the word &amp;quot;bank&amp;quot; from the river sense. We might want the decision list to be probabilistic (Kearns and Schapire, 1994) so that, for instance, the probabilities can be propagated to an understanding algorithm. The decision list for this task might be: IF &amp;quot;water&amp;quot; occurs nearby, output &amp;quot;river: .95&amp;quot;, &amp;quot;financial: .05&amp;quot; ELSE IF &amp;quot;money&amp;quot; occurs nearby, output &amp;quot;river: .1&amp;quot;, &amp;quot;financial: .9&amp;quot; ELSE IF word before is &amp;quot;left&amp;quot;, output &amp;quot;river: .8&amp;quot;, &amp;quot;financial: .2&amp;quot; ELSE IF &amp;quot;Charles&amp;quot; occcurs nearby, output &amp;quot;river: .6&amp;quot;, &amp;quot;financial: .4&amp;quot; ELSE output &amp;quot;river: .5&amp;quot;, &amp;quot;financial: .5&amp;quot; The conditions of the list are checked in order, and as soon as a matching rule is found, the algorithm outputs the appropriate probability and terminates.</Paragraph>
    <Paragraph position="1"> If no other rule is used, the last rule always triggers, ensuring that some probability is always returned.</Paragraph>
    <Paragraph position="2"> The standard algorithm for learning decision lists (Yarowsky, 1994) is very simple. The goal is to minimize the entropy of the decision list, where entropy represents how uncertain we are about a particular decision. For each rule, we find the expected entropy using that rule, then sort all rules by their entropy, and output the rules in order, lowest entropy first.</Paragraph>
    <Paragraph position="3"> Decision lists are fairly widely used for many reasons. Most importantly, the rule outputs they produce are easily understood by humans. This can make decision lists useful as a data analysis tool: the decision list can be examined to determine which factors are most important. It can also make them useful when the rules must be used by humans, such as when producing guidelines to help doctors determine whether a particular drug should be administered. Decision lists also tend to be relatively small and fast and easy Association for Computational Linguistics.</Paragraph>
    <Paragraph position="4"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 17-24. Proceedings of the Conference on Empirical Methods in Natural to apply in practice.</Paragraph>
    <Paragraph position="5"> Unfortunately, as we will describe, the standard algorithm for learning decision lists has an important flaw: it often chooses a rule order that is suboptimal in important ways. In particular, sometimes the algorithm will use a rule that appears good - has lower average entropy - in place of one that is good - lowers the expected entropy given its location in the list. We will describe a simple incremental algorithm that consistently works better than the basic sorting algorithm. Essentially, the algorithm builds the list in reverse order, and, before adding a rule to the list, computes how much the rule will reduce entropy at that position. This computation is potentially very expensive, but we show how to compute it efficiently so that the algorithm can still run quickly.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML