File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-0713_abstr.xml

Size: 6,350 bytes

Last Modified: 2025-10-06 13:41:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0713">
  <Title>Task DIM GPSM NPSM POSSM PP</Title>
  <Section position="2" start_page="0" end_page="73" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> An extension to memory-based learning is described in which automatically induced rules are used as binary features. These features have an &amp;quot;active&amp;quot; value when the left-hand side of the underlying rule applies to the instance.</Paragraph>
    <Paragraph position="1"> The RIPPER rule induction algorithm is adopted for the selection of the underlying rules. The similarity of a memory instance to a new instance is measured by taking the sum of the weights of the matching rules both instances share. We report on experiments that indicate that (i) the method works equally well or better than RIPPER on various language learning and other benchmark datasets; (ii) the method does not necessarily perform better than default memory-based learning, but (iii) when multi-valued features are combined with the rule-based features, some slight to significant improvements are observed.</Paragraph>
    <Paragraph position="2"> 1 Rules as features A common machine-learning solution to classification problems is rule induction (Clark and Niblett, 1989; Quinlan, 1993; Cohen, 1995).</Paragraph>
    <Paragraph position="3"> The goal of rule induction is generally to induce a set of rules from data, that captures all generalisable knowledge within that data, and that is as small as possible at the same time. Classification in rule-induction classifiers is based on the firing of rules on a new instance, triggered by matching feature values to the left-hand side of the rule. Rules can be of various normal forms, and can furthermore be ordered. The appropriate content and ordering of rules can be hard to find, and at the heart of most rule induction systems are strong search algorithms that attempt to minimise search through the space of possible rule sets and orderings.</Paragraph>
    <Paragraph position="4"> Although rules appear quite different from instances as used in memory-based or instance-based learning (Aha et al., 1991; Daelemans and Van den Bosch, 1992; Daelemans et al., 1997b) there is a continuum between them. Rules can be seen as generalised instances; they represent the set of training instances with the same class that match on the conditions on the left-hand side of the rule. Therefore, classification strategies from memory-based learning can naturally be applied to rules. For example, (Domingos, 1996) describes the RISE system, in which rules are (carefully) generalised from instances, and in which the k-NN classification rule searches for nearest neighbours within these rules when classifying new instances.</Paragraph>
    <Paragraph position="5"> Often, the sets of instances covered by rules overlap. In other words, seen from the instance perspective, a single instance can match more than one rule. Consider the schematic example displayed in Figure 1. Three instances with three multi-valued features match individually with one or two of the four rules; for example, the first instance matches with rule 1 (if fl = A then c = Z) and with rule 3 (if f2 = C then c= Z).</Paragraph>
    <Paragraph position="6"> Pursuing this reasoning, it is possible to index instances by the rules that apply to them.</Paragraph>
    <Paragraph position="7"> For example, in Figure 1, the first instance can be indexed by the &amp;quot;active&amp;quot; rule identification numbers 1 and 3. When the left-hand sides of rules are seen as complex features (in which the presence of some combination of feature values is queried) that are strong predictors of a single class, indexing instances by the rules that apply to them is essentially the same as representing instances by a set of complex features.</Paragraph>
    <Paragraph position="8"> Note that when a rule matches an instance, this does not guarantee that the class of the instance is identical to the rule's predicted class  if fl=A then c=Z if fl=B and f2=B then c=Y if f2=C then c=Z if f3=C then c=Z  ing of multi-valued instances via matching rules to rule-indexed instances, characterlsed by the numbers of the rules that match them. fl, f2, and f3 represent the three features, c represents the class label.</Paragraph>
    <Paragraph position="9"> error. In Figure 1, the third memory instance matches rules 3 and 4 which both predict a Z, while the instance itself has class X.</Paragraph>
    <Paragraph position="10"> Now when instances are represented this way, they can be used in k-NN classification. Each complex feature then becomes a binary feature, that can also be assigned some weight (e.g., gain-ratio feature weights, chi-square, or equal weights (Daelemans et al., 2000)); when a memory instance and a new test instance share complex features, their similarity becomes the sum of the weights of the matching features. In Figure 1, a new instance (bottom) matches rules 2 and 4, thereby (partially) matching the second and third memory instances. If, for example, rule 4 would have a higher overall weight than rule 2, the third memory instance would become the nearest neighbor. The k-NN rule then says that the class of the nearest neighbour transfers to the new instance, which would mean that class X would be copied - which is a different class than those predicted either by rule 2 or 4. This is a marked difference with classification in RIPPER, where the class is assigned directly to the new instance by the rule that fires first. It can be expected that many classifications in this approach would be identical to those made by RIPPER, but it is possible that the k-NN approach has some consistent advantage in the cases where classification diverges. In this paper we investigate some effects of recoding instances by complex features induced by an external rule-induction algorithm, and show that the approach is promising for language learning tasks. We find that the method works equally well or better than RIPPER on various language learning and other benchmark datasets. However, the method does not necessarily perform better than default memory-based learning. Only when the rule-indexing features are added to the original multi-valued features, improvements are observed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML