File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0828_evalu.xml
Size: 2,512 bytes
Last Modified: 2025-10-06 13:59:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0828"> <Title>TALP System for the English Lexical Sample Task</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Table 2 shows the accuracy obtained on the training set and table 3 the results of our system (SE3, III PC with 320 Mb of memory.</Paragraph> <Paragraph position="1"> TALP), together with the most frequent sense base-line (mfs), the recall result of the best system in the task (best), and the recall median between all participant systems (avg). These last three figures were provided provided by the organizers of the task.</Paragraph> <Paragraph position="2"> OVA(base) in table 2 stands for the results of the one-vs-all approach on the starting feature set (5fold-cross validation on the training set). CC(base) refers to the constrain-classification setting on the starting feature set. OVA(best) and CC(best) mean one-vs-all and constraint-classification with their respective feature selection. Finally, SE3 stands for the system officially presented at competition time7 and TALP stands for the complete architecture.</Paragraph> <Paragraph position="3"> training set It can be observed that the feature selection process consistently improves the accuracy by around 3 points, both in OVA and CC binarization settings. Constraint-classification is slightly better than one-vs-all approach when feature selection is performed, though this improvement is not consistent along all individual words (detailed results omitted) neither statistically significant (z-test with 0.95 confidence level). Finally, the combined binarization-feature selection further increases the accuracy in half a point (again this difference is not statistically significant).</Paragraph> <Paragraph position="4"> measure mfs avg best SE3 TALP However, when testing the complete architecture on the official test set, we obtained an accuracy decrease of more than 4 points. It remains to be analyzed if this difference is due to a possible overfitting to the training corpus during model selection, or simply is due to the differences between training and test corpora. Even so, the TALP system achieves a very good performance, since there is a 7Only 14 words were processed with the full architecture. difference of only 1.3 points in fine and coarse recall respect to the best system of the English lexical sample task of Senseval-3.</Paragraph> </Section> class="xml-element"></Paper>