File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0713_intro.xml

Size: 7,062 bytes

Last Modified: 2025-10-06 14:00:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0713">
  <Title>Task DIM GPSM NPSM POSSM PP</Title>
  <Section position="4" start_page="74" end_page="76" type="intro">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> We performed experiments on the following five language data sets - More details on numbers of features, values per features, number of classes and number of instances are displayed in Table 1: Diminutive formation (henceforth DIM): choosing the correct diminutive inflection to Dutch nouns out of five possible: je, tje, pie, kje, and etje, on the basis of phonemic word transcriptions, segmented at the level of syllable onset, nucei and coda of the final three syllables of the word.</Paragraph>
    <Paragraph position="1"> The data stems from a study described in (Daelemans et al., 1997a).</Paragraph>
    <Paragraph position="2"> Grapheme-phoneme conversion (GPSM): the conversion of a window of nine letters to the phonemic transcription of the middle letter. From the original data set described in (Van den Bosch, 1997) a 10% subset was drawn.</Paragraph>
    <Paragraph position="3"> Base-NP chunking (NPSM): the segmentation of sentences into non-recursive NPs. (Veenstra, 1998) used the Base-NP tag set as presented in (Ramshaw and Marcus, 1995): I for inside a Base-NP, O for outside a Base-NP, and B for the first word in a Base-NP following another Base-NP.</Paragraph>
    <Paragraph position="4"> See (Veenstra, 1998) for more details, and (Daelemans et al., 1999) for a series of experiments on the original data set from which we have used a randomly-extracted 10%.</Paragraph>
    <Paragraph position="5"> Part-of-speech tagging (POSSM): the disambiguation of syntactic classes of words in</Paragraph>
    <Paragraph position="7"> particular contexts. We assume a tagger architecture that processes a sentence from a disambiguated left to an ambiguous right context, as described in (Daelemans et al., 1996). The original data set for the part-of-speech tagging task, extracted from the LOB corpus, contains 1,046,151 instances; we have used a randomly-extracted 10% of this data.</Paragraph>
    <Paragraph position="8"> attachment (PP): the attachment ofa PP in the sequence VP hip PP (VP = verb phrase, 51P = noun phrase, PP = prepositional phrase). The data consists of four-tuples of words, extracted from the Wall Street Journal Treebank. From the original data set, used by (Ratnaparkhi et al., 1994), (Collins and Brooks, 1995), and (Zavrel et al., 1997), (Daelemans et al., 1999) took the train and test set together to form the particular data also used here.</Paragraph>
    <Paragraph position="9"> Table 2 lists the average (10-fold crossvalidation) accuracies, measured in percentages of correctly classified test instances, of IBI-IG, RIPPER, and RBM on these five tasks. The clearest overall pattern in this table is the high accuracy of IBi-IG, surpassed only twice by RBM on the DIM and NPSM tasks (significantly, according to one-tailed t-tests, with p &lt; 0.05). On the other three tasks, IBI-IG outperforms RBM.</Paragraph>
    <Paragraph position="10"> RIPPER performs significantly more accurately than IBi-IG only on the DIM task. Once again, evidence is collected for the global finding that forgetting parts of the training material, as obviously happens in rule induction, tends to be harmful to generalisation accuracy in language learning (Daelemans et al., 1999).</Paragraph>
    <Paragraph position="11"> A surprising result apparent in Table 2 is that RBM never performs worse than RIPPER; in fact, it performs significantly more accurately than RIPPER with the GPSM, NPSM, and POSSM tasks.</Paragraph>
    <Paragraph position="12"> There appears to be an advantage in the k-NN approach to rule matching and voting, over the RIPPER strategy of ordered rule firing, with these tasks.</Paragraph>
    <Paragraph position="13"> Another advantage, now of RBM as opposed to IBi-IG, is the reduced memory requirements and resulting speed enhancements. As listed in Table 3, the average number of rules in the rule sets induced by RIPPER range between 29 and 971. Averaged over all tasks, the rules have on  IBi-IG, RIPPER, and RBM on five language learning tasks. '*' denotes significantly better accuracy of RBM or RIPPER over IBi-IG with p 0.05. '+' denotes significance in the reverse direction, x/denotes significantly better accuracy of RBM over RIPPER with p &lt; 0.05.</Paragraph>
    <Paragraph position="14"> average about two to four conditions (featurevalue tests). More importantly, as the third column of Table 3 shows, the average number of active rules in instances is below two for all tasks. This means that in most instances of any of the five tasks, only one complex feature (bit) is active.</Paragraph>
    <Paragraph position="15"> Especially with the smaller rule sets (DIM, NPSM, and PP - which all have few classes, cf.</Paragraph>
    <Paragraph position="16"> Table 1), RBM's classification is very speedy. It reduces, for example, classification of the NPSM test set from 19 seconds to 1 second 1. Large rule sets (GPSM), however, can have adverse effects - from 8 seconds in ml-IG to 17 seconds in RBM.</Paragraph>
    <Paragraph position="17"> In sum, we observe two cases (DIM and NPSM) in which RBM attains a significant general+-sat+-on accuracy improvement over IBi-IG as well as some interesting classification speedup, but for the other tasks, for now unpredictably, geE- null ditions per rule (c/r), and coded features per instance (f/i); and one-partition timings (s) of classification of test material in IBI-IG and RBM, for five language tasks.</Paragraph>
    <Paragraph position="18"> eralisation accuracy losses and even a slowdown are observed. The latter occurs with GPSM, which has been analysed earlier as being extremely disjunct in class space, and therefore highly sensitive to the &amp;quot;forgetting exceptions is harmful&amp;quot; syndrome (Daelemans et al., 1999; Van den Bosch, 1999a).</Paragraph>
    <Paragraph position="19"> The complex features used in RBM are taken as the only information available; the original information (the feature values) are discarded. This need not be the case; it is possible that the recoded instances are merged with their original feature-value vectors. We performed experiments in which we made this fusion; the results are listed in Table 4. Comparing the column labeled &amp;quot;IBi-IG+RBM, denoting the fusion variant, with the IBi-IG column, it can be seen that it reaches some modest error reduction percentages (rightmost column in Table 4). In fact, with NPSM and POSSM, it performs significantly better (again, according to one-tailed t-tests, with p &lt; 0.05) than IBI-IG. On the other hand, adding the (average) 971 complex features to the nine multi-valued features in the  error reduction, on five language learning tasks. '.' denotes significantly better accuracy of IB1-IG--~-RBM over IBi-IG with p &lt; 0.05.</Paragraph>
    <Paragraph position="20"> GPSM causes a slight drop in performance - and a slowdown.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML