File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/97/p97-1030_relat.xml
Size: 2,183 bytes
Last Modified: 2025-10-06 14:16:05
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1030"> <Title>Mistake-Driven Mixture of Hierarchical Tag Context Trees</Title> <Section position="9" start_page="234" end_page="235" type="relat"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> Although statistical natural language processing has mainly focused on Maximum Likelihood Estimators, (Pereira et al., 1995) proposed a mixture approach to predict next words by using the Context Tree Weighting (CTW) method .(Willems et al., 1995).</Paragraph> <Paragraph position="1"> The CTW method computes probability by mixing subtrees in a single context tree in Bayesian fashion.</Paragraph> <Paragraph position="2"> Although the method is very efficient, it cannot be used to construct hierarchical tag context trees.</Paragraph> <Paragraph position="3"> Various kinds of re-sampling techniques have been studied in statistics (Efron, 1979; Efron and Tibshirani, 1993) and machine learning (Breiman, 1996; Hull et al., 1996; Freund and Schapire, 1996a).</Paragraph> <Paragraph position="4"> In particular, the mistake-driven mixture algorithm was directly motivated by Adaboost (Freund and Schapire, 1996a). The Adaboost method was designed to construct a high-performance predictor by iteratively calling a weak learning algorithm (that is slightly better than random guess). An empirical work reports that the method greatly improved the performance of decision-tree, k-nearestneighbor, and other learning methods given relatively simple and sparse data (Freund and Schapire, 1996b). We borrowed the idea of re-sampling to detect exceptional connections and first proved that such a re-sampling method is also effective for a practical application using a large amount of data.</Paragraph> <Paragraph position="5"> The next step is to fill the gap between theory and practition. Most theoretical work on re-sampling assumes i.i.d (identically, independently distributed) samples. This is not a realistic assumption in part-of-speech tagging and other NL applications. An interesting future research direction is to construct a theory that handles Markov processes.</Paragraph> </Section> class="xml-element"></Paper>