File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0408_concl.xml
Size: 1,814 bytes
Last Modified: 2025-10-06 13:53:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0408"> <Title>Updating an NLP System to Fit New Domains: an empirical study on the sentence segmentation problem</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we studied the problem of updating a statistical system to fit a domain with characteristics different from that of the training data. Without updating, performance will typically deteriorate, perhaps quite drastically. null We used the sentence boundary detection problem to compare a few different updating methods. This provides useful insights into the potential value of various ideas.</Paragraph> <Paragraph position="1"> In particular, we have made the following observations: 1. An NLP system trained on one data set can perform poorly on another because there can be new examples not adequately represented in the old training set; 2. It is possible to estimate the degree of system performance degradation, and to determine whether it is necessary to perform a system update; 3. When updating a classifier to fit a new domain, even a small amount of newly labeled data can significantly improve the performance (also, the right training data characteristics can be more important than the quantity of training data); 4. Combining the old training data with the newly labeled data in an appropriate way (e.g., by balancing or feature augmentation) can be effective.</Paragraph> <Paragraph position="2"> Although the sentence segmentation problem consid- null ered in this paper is relatively simple, we are currently investigating other problems. We anticipate that the observations from this study can be applied to more complicated NLP tasks.</Paragraph> </Section> class="xml-element"></Paper>