File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3220_evalu.xml

Size: 7,253 bytes

Last Modified: 2025-10-06 13:59:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3220">
  <Title>Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Tasks</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> In Figures 2, 3 and 4 we compare the performance of the independent and joint models on the verb sense disambiguation and verb SCF determination problems, evaluated using both 10-fold cross-validation accuracy and test set accuracy. In Figure 2, we report the performance of a system resulting from doing optimization of free parameters (such as feature and term weights) on a per-verb basis. We also provide a baseline computed by guessing the most likely class.</Paragraph>
    <Paragraph position="1"> Although the parameter optimization of Figure 2 was performed with respect to 10-fold cross-validation on the training sets, its lower performance on the test sets suggests that it suffers from overfitting. To test this hypothesis we also trained and tested on the test sets a version of the system with corpus-wide free parameter optimization, and the results of this test are shown in Figure 3. The lower gap between the training set cross-validation and test set performance on the WSD task confirms our overfitting hypothesis. However, note that the gap between training set cross-validation and test set performance on the SCF determination task persists (although it is diminished slightly). We believe that this results from the fact that there is significant data drift between the training sections of the WSJ in the Penn Treebank (sections 2 through 21) and all other sections.</Paragraph>
    <Paragraph position="2"> Using corpus-wide optimization, the joint model improves sense disambiguation accuracy by 1.9% over the independent model, bringing our system to 55.9% accuracy on the test set, performance that is comparable with that of the state of the art systems on verbs given in Table 1. The joint model reduces sense disambiguation error by 4.1%. On the verb SCF determination task, the joint model yields a 2.1% improvement in accuracy over the independent model, reducing total error by 5.1%.</Paragraph>
    <Paragraph position="3"> We also report results of the independent and joint systems on each verb individually in Table 4 Not surprisingly, making use of the joint distribution was much more helpful for some verbs than others.</Paragraph>
    <Paragraph position="4">  joint systems on the verb sense and SCF tasks, evaluated with 10-fold cross-validation on the training sets and on the test sets. The baseline shown is guessing most likely class. These systems used per-verb optimization of free parameters.</Paragraph>
    <Paragraph position="5">  joint systems on the verb sense and SCF tasks. This system has no relative position word feature weighting and no term weighting.</Paragraph>
    <Paragraph position="6">  dent and joint inference models on the verb sense and SCF tasks,evaluated on the Senseval-2 test set, for each of the 29 verbs in the study. These results were obtained with no per-verb parameter optimization. Note the great variation in problem difficulty and joint model performance across verbs.</Paragraph>
    <Paragraph position="7"> For example, on the verbs begin, drive, find, keep, leave, and work, the joint model gives a greater than 5% accuracy boost on the WSD task. In contrast, for some other verbs, the joint model showed a slight decrease in accuracy on the test set relative to the independent model.</Paragraph>
    <Paragraph position="8"> We present a few representative examples where the joint model makes better decisions than the individual model. In the sentence . . . prices began weakening last month after Campeau hit a cash crunch.</Paragraph>
    <Paragraph position="9"> the sense model (based on bag-of-words evidence) believes that the sense 2:42:04 is most likely (see Table 2 for senses and joint distribution). However, the SCF model gives high weight to the frames VPto and VPing, which when combined with the joint distribution, give much more probability to the sense 2:30:00. The joint model thus correctly chooses sense 2:30:00. In the sentence . . . before beginning a depressing eight-year slide that continued through last year.</Paragraph>
    <Paragraph position="10"> the sense model again believes that the sense 2:42:04 is most likely. However, the SCF model correctly gives high weight to the NP frame, which when combined with the joint distribution, gives much more probability to the sense 2:30:01. The joint model thus correctly chooses sense 2:30:01.</Paragraph>
    <Paragraph position="11"> Given the amount of information contained in the joint distribution it is surprising that the joint model doesn't yield a greater advantage over the independent models. It seems to be the case that the word sense model is able to capture much of the SCF information by itself, without using an explicit syntactic model. This results from the relative position weighting, since many of our SCFs correlate highly with the presence of small sets of words in particular positions (for instance, the infinitival &amp;quot;to&amp;quot;, prepositions, and pronouns). We tested this hypothesis by examining how the addition of SCF information affected performance of a weaker sense model, obtained by removing feature and term weighting.</Paragraph>
    <Paragraph position="12"> The results are shown in Figure 4. Indeed, when using this weaker word sense model, the joint model yields a much larger 4.5% improvement in WSD accuracy. null</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Future Work
</SectionTitle>
    <Paragraph position="0"> We can imagine several modifications to the basic system that might improve performance. Most importantly, more specific use could be made of SCF information besides modeling its joint distribution with sense, for example conditioning on head-words of (perceived) arguments, especially particles and prepositions. Second, although we made some attempt at extracting the &amp;quot;underlying&amp;quot; SCF of verbs by analyzing passive constructions separately, similar analysis of other types of movement such as relative clauses may also be useful. Third, we could hope to get some improvement from changing our model structure to address the issue of doublegeneration of words discussed in section 3. One way this could be done would be to use a parser only to estimate the probability of the sequence of word tags (i.e., parts of speech) in the sentence, then to use a sense-specific lexicon to estimate the probability of finding the words under the tags.</Paragraph>
    <Paragraph position="1"> Although we chose WSD and SCF determination as a test case, the approach of this paper is applicable to other pairs of tasks. It may also be possible to improve parsing accuracy on verb phrases or other phrases, by simultaneously resolving word sense ambiguities, as attempted unsuccessfully by Bikel (2000). This work is intended to introduce a general methodology for combining disjoint NLP tasks that is of use outside of these specific tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML