File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1207_evalu.xml
Size: 7,106 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1207"> <Title>Classifying Particle Semantics in English Verb-Particle Constructions</Title> <Section position="7" start_page="49" end_page="51" type="evalu"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> We present experimental results for both Ver(ification) and unseen Test data, on each set of features, individually and in combination.</Paragraph> <Section position="1" start_page="49" end_page="50" type="sub_section"> <SectionTitle> 5.1 Experiments Using the Linguistic Features </SectionTitle> <Paragraph position="0"> The results for experiments using the features that capture semantic and syntactic properties of verbs and VPCs are summarized in Table 4, and discussed in turn below.</Paragraph> <Paragraph position="1"> Experiments using the slot features alone test whether features that tap into semantic information about a verb are sufficient to determine the appropriate sense class of a particle when that verb combines with it in a VPC. Although accuracy on the test data is well above the baseline in both the 2-way and 3-way tasks, for verification data the increase over the baseline is minimal. The class corresponding to sense Refl-up in the 3-way task is relatively small, which means that a small variation in classification on these verbs may lead to a large variation in accuracy. However, we find that the difference in accuracy across the datasets is not due to performance on VPCs in this sense class. Although these features show promise for our task, the variation across the datasets indicates the limitations of our small sample sizes.</Paragraph> <Paragraph position="2"> We also examine the performance of the particle features on their own, since to the best of our knowledge, no such features have been used before in investigating VPCs. The results are disappointing, with only the verification data on the 2-way task showing substantially higher accuracy than the baseline. An analysis of errors reveals no consistent explanation, suggesting again that the variation may be due to small sample sizes.</Paragraph> <Paragraph position="3"> We hypothesize that the combination of the slot features with the particle features will give an increase in performance over either set of linguistic features used individually, given that they tap into differing properties of verbs and VPCs. We find that the combination does indeed give more consistent performance across verification and test data than either feature set used individually. We analyze the errors made using slot and particle features separately, and find that they tend to classify different sets of verbs incorrectly. Therefore, we conclude that these feature sets are at least somewhat complementary. By combining these complementary feature sets, the classifier is better able to generalise across different datasets.</Paragraph> </Section> <Section position="2" start_page="50" end_page="50" type="sub_section"> <SectionTitle> 5.2 Experiments Using WCFs </SectionTitle> <Paragraph position="0"> Our goal was to compare the more knowledge-rich slot and particle features to an alternative feature set, the WCFs, which does not rely on linguistic analysis of the semantics and syntax of verbs and VPCs. Recall that we experiment with both 200 feature words, WCFa17a19a18a19a18 , and 500 feature words, WCFa20a19a18a19a18 , as shown in Table 5. Most of the experiments using WCFs perform worse than the corresponding experiment using all the linguistic features. It appears that the linguistically motivated features are better suited to our task than simple word context features.</Paragraph> </Section> <Section position="3" start_page="50" end_page="50" type="sub_section"> <SectionTitle> 5.3 Linguistic Features and WCFs Combined </SectionTitle> <Paragraph position="0"> Although the WCFs on their own perform worse than the linguistic features, we find that the linguistic features and WCFs are at least somewhat complementary since they tend to classify different verbs incorrectly. We hypothesize that, as with the slot and particle features, the different types of information provided by the linguistic features and WCFs may improve performance in combination. We therefore combine the linguistic features with each of the WCFa17a19a18a19a18 and WCFa20a19a18a19a18 features; see Table 6. However, contrary to our hypothesis, for the most part, the experiments using the full combination of features give accuracies the same or below that of the corresponding experiment using just the linguistic features. We surmise that these very different types of features--the linguistic features and WCFs--must be providing conflicting rather than complementary information to the classifier, so that no improvement is attained.</Paragraph> </Section> <Section position="4" start_page="50" end_page="51" type="sub_section"> <SectionTitle> 5.4 Discussion of Results </SectionTitle> <Paragraph position="0"> The best performance across the datasets is attained using all the linguistic features. The linguistically uninformed WCFs perform worse on their own, and do not consistently help (and in some cases hurt) the performance of the linguistic features when combined with them. We conclude then that linguistically based features are motivated for this task. Note that the features are still quite simple, and straightforward to extract from a corpus--i.e., linguistically informed does not mean expensive (although the slot features do require access to chunked text).</Paragraph> <Paragraph position="1"> Interestingly, in determining the semantic nearest neighbor of German particle verbs, Schulte im Walde (2005) found that WCFs that are restricted to the arguments of the verb outperform simple window-based co-occurrence features. Although her task is quite different from ours, similarly restricting our WCFs may enable them to encode more linguistically-relevant information.</Paragraph> <Paragraph position="2"> The accuracies we achieve with the linguistic features correspond to a 30-31% reduction in error rate over the chance baseline for the 3-way task, and an 18-26% reduction in error rate for the 2-way task. Although we expected that the 2-way task may be easier, since it requires less fine-grained distinctions, it is clear that combining senses that have some motivation for being treated separately comes at a price.</Paragraph> <Paragraph position="3"> The reductions in error rate that we achieve with our best features are quite respectable for a first attempt at addressing this problem, but more work clearly remains. There is a relatively high variability in performance across the verification and test sets, indicating that we need a larger number of experimental expressions to be able to draw firmer conclusions. Even if our current results extend to larger datasets, we intend to explore other feature approaches, such as word co-occurrence features for specific syntactic slots as suggested above, in order to improve the performance.</Paragraph> </Section> </Section> class="xml-element"></Paper>