File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0712_evalu.xml
Size: 7,286 bytes
Last Modified: 2025-10-06 13:58:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0712"> <Title>Learning Computational Grammars</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> This sections presents the results of the different systems applied to the three tasks which were central to this this project: chunking, NP chunking and NP bracketing.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Chunking </SectionTitle> <Paragraph position="0"> Chunking was the shared task of CoNLL-2000, the workshop on Computational Natural Language Learning, held in Lisbon, Portugal in 2000 (Tjong Kim Sang and Buchholz, 2000). Six members of the project have performed this task.</Paragraph> <Paragraph position="1"> The results of the six systems (precision, recall and Fa25a27a26a29a28 can be found in table 1. Belz (2001) used Local Structural Context Grammars for finding chunks. D'ejean (2000a) applied the theory refinement system ALLiS to the shared task data. Koeling (2000) evaluated a maximum entropy learner while using different feature combinations (ME). Osborne (2000b) used a maximum entropy-based part-of-speech tagger for assigning chunk tags to words (ME Tag). Thollard (2001) identified chunks with Finite State Transducers generated by a probabilistic grammar algorithm (FST). Tjong Kim Sang (2000b) tested different configurations of combined memory-based learners (MBL). The FST and the LSCG results are lower than those of the other systems because they were obtained without using lexical informa- null associated with the project (shared task CoNLL2000). The baseline results have been obtained by selecting the most frequent chunk tag associated with each part-of-speech tag. The best results at CoNLL-2000 were obtained by Support Vector Machines. A majority vote of the six LCG systems does not perform much worse than this best result. A majority vote of the five best systems outperforms the best result slightly (a30a32a31 error reduction). null tion. The best result at the workshop was obtained with Support Vector Machines (Kudoh and Matsumoto, 2000).</Paragraph> <Paragraph position="2"> Because there was no tuning data available for the systems, the only combination technique we could apply to the six project results was majority voting. We applied majority voting to the output of the six systems while using the same approach as Tjong Kim Sang (2000b): combining start and end positions of chunks separately and restoring the chunks from these results. The combined performance (Fa25a27a26a29a28 =93.33) was close to the best result published at CoNLL-2000 (93.48).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 NP chunking </SectionTitle> <Paragraph position="0"> The NP chunking task is the specialisation of the chunking task in which only base noun phrases need to be detected. Standard data sets for machine learning approaches to this task were put forward by Ramshaw and Marcus (1995). Six project members have applied a total of seven different systems to this task, most of them in the context of the combination paper Tjong Kim Sang et al. (2000). Daelemans applied the decision tree learner C5.0 to the task. D'ejean used the theory refinement system ALLiS for finding tems associated with the project. The baseline results have been obtained by selecting the most frequent chunk tag associated with each part-of-speech tag. The best results for this task have been obtained with a combination of seven learners, five of which were operated by project members. The combination of these five performances is not far off these best results.</Paragraph> <Paragraph position="1"> noun phrases in the data. Hammerton (2001) predicted NP chunks with the connectionist methods based on self-organising maps (SOM). Koeling detected noun phrases with a maximum entropy-based learner (ME). Konstantopoulos (2000) used Inductive Logic Programming (ILP) techniques for finding NP chunks in unseen texts3. Tjong Kim Sang applied combinations of IB1IG systems (MBL) and combinations of IGTREE learners to this task. The results of the six of the seven systems can be found in table 2. The results of C5.0 and SOM are lower than the others because neither of these systems used lexical information. For all of the systems except SOM we had tuning data and an extra development data set available. We tested all ten combination methods on the development set and best-3 majority voting came out as the best (Fa25a33a26a29a28 = 93.30; it used the MBL, ME and ALLiS results). When we applied best-3 majority voting to the standard test set, we obtained Fa25a27a26a29a28 = 93.65 which is close to the best result we know for this data set (Fa25a33a26a29a28 = 93.86) (Tjong Kim Sang et al., 2000). The latter result was obtained by a combination of seven learning systems, five of which were operated by members of this project.</Paragraph> <Paragraph position="2"> ated with the project for the NP bracketing task, the shared task at CoNLL-99. The baseline results have been obtained by finding NP chunks in the text with an algorithm which selects the most frequent chunk tag associated with each part-of-speech tag. The best results at CoNLL-99 was obtained with a bottom-up memory-based learner.</Paragraph> <Paragraph position="3"> An improved version of that system (MBL) delivered the best project result. The MDL results have been obtained on a different data set and therefore combination of the three systems was not feasible.</Paragraph> <Paragraph position="4"> The original Ramshaw and Marcus (1995) publication evaluated their NP chunker on two data sets, the second holding a larger amount of training data (Penn Treebank sections 02-21) while using 00 as test data. Tjong Kim Sang (2000a) has applied a combination of memory-based learners to this data set and obtained Fa25a33a26a29a28 = 94.90, an improvement on Ramshaw and Marcus's 93.3.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 NP bracketing </SectionTitle> <Paragraph position="0"> Finding arbitrary noun phrases was the shared task of CoNLL-99, held in Bergen, Norway in 1999. Three project members have performed this task. Belz (2001) extracted noun phrases with Local Structural Context Grammars, a variant of Data-Oriented Parsing (LSCG). Osborne (1999b) used a Definite Clause Grammar learner based on Minimum Description Length for finding noun phrases in samples of Penn Treebank material (MDL). Tjong Kim Sang (2000a) detected noun phrases with a bottom-up cascade of combinations of memory-based classifiers (MBL). The performance of the three systems can be found in table 3. For this task it was not possible to apply system combination to the output of the system.</Paragraph> <Paragraph position="1"> The MDL results have been obtained on a different data set and this left us with two remaining systems. A majority vote of the two will not improve on the best system and since there was no tuning data or development data available, other combination methods could not be applied.</Paragraph> </Section> </Section> class="xml-element"></Paper>