File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0102_concl.xml

Size: 3,556 bytes

Last Modified: 2025-10-06 13:57:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0102">
  <Title>MBT: A Memory-Based Part of Speech Tagger-Generator</Title>
  <Section position="9" start_page="61" end_page="61" type="concl">
    <SectionTitle>
7 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have shown that a memory-based approach to large-scale tagging is feasible both in terms of accuracy (comparable to other statistical approaches), and also in terms of computational efficiency (time and space requirements) when using IGTree to compress and index the case base. The approach combines some of the best features of learned rule-based and statistical systems (small training corpora needed, incremental learning, understandable and explainable behavior of the system). More specifically, memory-based tagging with IGTrees has the following advantages.</Paragraph>
    <Paragraph position="1"> * Accurate generalization from small tagged corpora. Already at small corpus size (300-400 K tagged words), performance is good. These corpus sizes can be easily handled by our system.</Paragraph>
    <Paragraph position="2"> * Incremental learning. New 'cases' (e.g. interactively corrected output of the tagger) can be incrementally added to the case bases, continually improving the performance of the overall system.</Paragraph>
    <Paragraph position="3"> * Explanation capabilities. To explain the classification behavior of the system, a path in the IGTree (with associated defaults) can be provided as an explanation, as well as nearest neighbors from which the decision was extrapolated.</Paragraph>
    <Paragraph position="4"> * Flexible integration of information sources. The feature weighting method takes care of the optimal fusing of different sources of information (e.g. word form and context), automatically.</Paragraph>
    <Paragraph position="5"> * Automatic selection of optimal context. The IGTree mechanism (when applied to the known words case base) automatically decides on the optimal context size for disambiguation of focus words.</Paragraph>
    <Paragraph position="6"> * Non-parametric estimation. The IGTree formalism provides automatic, nonparametric estimation of classifications for low-frequency contexts (it is similar in this respect to backed-off training), but avoids non-optimal estimation due to false intuitions or non-convergence of the gradient-descent procedure used in some versions of backed-off training.</Paragraph>
    <Paragraph position="7"> * Reasonably good results on unknown words without morphological analysis. On the WSJ corpus, unknown words can be predicted (using context and word form information) for more than 90%.</Paragraph>
    <Paragraph position="8"> * Fast learning and tagging. Due to the favorable complexity properties of IGTrees (lookup time in IGTrees is independent on number of cases), both tagger generation and tagging are extremely fast. Tagging speed in our current implementation is about 1000 words per second.</Paragraph>
    <Paragraph position="9"> We have barely begun to optimise the approach: a more intelligent similarity metric would also take into account the differences in similarity between different values of the same feature. E.g. the similarity between the tags rb-in-nn and rb-in should be bigger than the similarity between rb-in and vb-nn. Apart from linguistic engineering refinements of the similarity metric, we are currently experimenting with statistical measures to compute such more fine-grained similarities (e.g. Stanfill &amp; Waltz, 1986, Cost &amp; Salzberg, 1994).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML