File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0306_intro.xml
Size: 4,057 bytes
Last Modified: 2025-10-06 14:06:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0306"> <Title>Mistake-Driven Learning in Text Categorization</Title> <Section position="3" start_page="0" end_page="55" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Learning problems in the natural language and text processing domains are often studied by mapping the text to a space whose dimensions are the measured features of the text, e.g., the words appearing in a document. Three characteristic propertie s of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space and, consequently, (c) there is a high variation in the number of active features in an instance.</Paragraph> <Paragraph position="1"> Multiplicative weight-updating algorithms such as Winnow (Littlestone, 1988) have been studied extensively in the theoretical learning literature. Theoretical analysis has shown that they have exceptionally good behavior in domains with these characteristics, and in particular in the presence of irrelevant attributes, noise, and even a target function changing in time (Littlestone, 1988; Littlestone and Warmuth, 1994; Herbster and Warmuth, 1995), but only recently have people started to use them in applications (Golding and Roth, 1996; Lewis et al., 1996; Cohen and Singer, 1996). We address these claims empirically in an important application domain for machine learning - text categorization. In particular, we study mistake-driven learning algorithms that are based on the Winnow family/, and investigate ways to apply them in domains with the above characteristics.</Paragraph> <Paragraph position="2"> The learning algorithms studied here offer a large space of choices to be made and, correspondingly, may vary widely in performance when applied in specific domains. We concentrate here on the text processing domain, with the characteristics mentioned above, and explore this space of choices in it.</Paragraph> <Paragraph position="3"> In particular, we investigate three variations of on-line prediction algorithms and evaluate them experimentally on large text categorization problems.</Paragraph> <Paragraph position="4"> The algorithms we study are all learning algorithms for linear functions. They are used to categorize documents by learning, for each category, a linear separator in the feature space. The algorithms differ by whether they allow the use of negative or only positive weights and by the way they update their weights during the training phase.</Paragraph> <Paragraph position="5"> We find that while a vanilla version of these algorithms performs rather well, a quantum leap in performance is achieved when we modify the algorithms to better address some of the specific characteristics we identify in textual domains. In particular, we address problems such as wide variations in document sizes, word repetitions and the need to rank documents rather than just decide whether they belong to a category or not. In some cases we adopt solutions that are well known in the IR literature to the class of algorithms we use; in others we modify known algorithms to better suit the characteristics of the domain. We motivate the modifications to the basic algorithms and justify them experimentally by exhibiting their contribution to improvement in performance. Overall, the best variation we investigate, performs significantly better than any known algorithm tested on this task, using a similar set of features.</Paragraph> <Paragraph position="6"> The rest of the paper is organized as follows: The next section describes the task of text categorization, how we model it as a classification task, and some related work. The family of algorithms we use is introduced in Section 3 and the extensions to the basic algorithms, along with their experimental evaluations, is presented in Section 4. In Section 5 we present our final experimental results and compare them to previous works in the literature.</Paragraph> </Section> class="xml-element"></Paper>