File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/p96-1042_concl.xml
Size: 5,336 bytes
Last Modified: 2025-10-06 13:57:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1042"> <Title>Minimizing Manual Annotation Cost In Supervised Training From Corpora</Title> <Section position="9" start_page="324" end_page="325" type="concl"> <SectionTitle> 8 Conclusions </SectionTitle> <Paragraph position="0"> Annotating large textual corpora for training natural language models is a costly process. We propose reducing this cost significantly using committeerThe use of a single model is also criticized in (Cohn, Atlas, and Ladner, 1994).</Paragraph> <Paragraph position="1"> based sample selection, which reduces redundant annotation of examples that contribute little new information. The method can be applied in a semiinteractive process, in which the system selects several new examples for annotation at a time and updates its statistics after receiving their labels from the user. The implicit modeling of uncertainty makes the selection system generally applicable and quite simple to implement.</Paragraph> <Paragraph position="2"> Our experimental study of variants of the selection method suggests several practical conclusions. First, it was found that the simplest version of the committee-based method, using a two-member committee, yields reduction in annotation cost comparable to that of the multi-member committee. The two-member version is simpler to implement, has no parameters to tune and is computationally more efficient. Second, we generalized the selection scheme giving several alternatives for optimizing the method for a specific task. For bigram tagging, comparative evaluation of the different variants of the method showed similar large reductions in annotation cost, suggesting the robustness of the committee-based approach. Third, sequential selection, which implicitly models the expected utility of an example relative to the example distribution, worked in general better than batch selection. The latter was found to work well only for small batch sizes, where the method mimics sequential selection. Increasing batch size (approaching 'pure' batch selection) reduces both accuracy and efficiency. Finally, we studied the effect of sample selection on the size of the trained model, showing a significant reduction in model size.</Paragraph> <Section position="1" start_page="324" end_page="325" type="sub_section"> <SectionTitle> 8.1 Further research </SectionTitle> <Paragraph position="0"> Our results suggest applying committee-based sample selection to other statistical NLP tasks which rely on estimating probabilistic parameters from an annotated corpus. Statistical methods for these tasks typically assign a probability estimate, or some other statistical score, to each alternative analysis (a word sense, a category label, a parse tree, etc.), and then select the analysis with the highest score.</Paragraph> <Paragraph position="1"> The score is usually computed as a function of the estimates of several 'atomic' parameters, often binomials or multinomials, such as: * In word sense disambiguation (Hearst, 1991; Gale, Church, and Varowsky, 1993): P(slf ), where s is a specific sense of the ambiguous word in question w, and f is a feature of occurrences of w. Common features are words in the context of w or morphological attributes of it.</Paragraph> <Paragraph position="2"> * In prepositional-phrase (PP) attachment (Hindle and Rooth, 1993): P(alf), where a is a possible attachment, such as an attachment to a head verb or noun, and f is a feature, or a combination of features, of the attachment. Corn- null mon features are the words involved in the attachment, such as the head verb or noun, the preposition, and the head word of the PP.</Paragraph> <Paragraph position="3"> * In statistical parsing (Black et al., 1993): P(rlh), the probability of applying the rule r at a certain stage of the top down derivation of the parse tree given the history h of the derivation process.</Paragraph> <Paragraph position="4"> * In text categorization (Lewis and GMe, 1994; Iwayama and Tokunaga, 1994): P(tlC), where t is a term in the document to be categorized, and C is a candidate category label.</Paragraph> <Paragraph position="5"> Applying committee-based selection to supervised training for such tasks can be done analogously to its application in the current paper s. ~rthermore, committee-based selection may be attempted also for training non-probabilistic classifiers, where explicit modeling of information gain is typically impossible. In such contexts, committee members might be generated by randomly varying some of the decisions made in the learning algorithm.</Paragraph> <Paragraph position="6"> Another important area for future work is in developing sample selection methods which are independent of the eventual learning method to be applied. This would be of considerable advantage in developing selectively annotated corpora for general research use. Recent work on heterogeneous uncertainty sampling (Lewis and Catlett, 1994) supports this idea, using one type of model for example selection and a different type for classification.</Paragraph> <Paragraph position="7"> Acknowledgments. We thank Yoav Freund and Yishay Mansour for helpful discussions. The first author gratefully acknowledges the support of the Fulbright Foundation.</Paragraph> </Section> </Section> class="xml-element"></Paper>