File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0904_concl.xml
Size: 1,613 bytes
Last Modified: 2025-10-06 13:57:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0904"> <Title>ESTIMATING THE TRUE PERFORMANCE OF CLASSIFICATION-BASED NLP TECHNOLOGY</Title> <Section position="6" start_page="26" end_page="27" type="concl"> <SectionTitle> CONCLUSIONS AND RECOMMENDATIONS </SectionTitle> <Paragraph position="0"> The success of a specific classification-based NLP application depends on several factors, including the power of the training method and the size of the training sample. Irrespective of the classification method, the performance of a classification-based NLP system should be evaluated by estimating the accuracy of future predictions, technically known as estimating the true error rate on future cases. This is of fundamental importance for comparing classifiers on the same samples and also for selecting key characteristics of many of the newer classifiers, e.g., neural networks.</Paragraph> <Paragraph position="1"> It has been shown that, with limited samples, the best techniques for measuring the performance of classification-based NLP systems are resampling methods that simulate the presentation of new cases by repeatedly hiding some test cases. Additionally, attention must be paid to the context of the particular NLP application as regards the costs and risks associated with the possible errors in classification.</Paragraph> <Paragraph position="2"> Although statistically valid estimates of the true error rate will not guarantee success in the marketplace for NLP systems, they will give one a measure of confidence in the true performance of the system.</Paragraph> </Section> class="xml-element"></Paper>