XML Viewer - n06-2010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2010_metho.xml
Size: 4,038 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2010">
  <Title>Gesture Improves Coreference Resolution</Title>
  <Section position="4" start_page="37" end_page="38" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> The results of our experiments are computed using mention-based CEAF scoring (Luo, 2005), and are reported in Table 2. Leave-one-out evaluation was used to form 16 cross-validation folds, one for each document in the corpus. Using a planned, one-tailed pairwise t-test, the gesture features improved performance significantly  for the boosted decision trees (t(15) = 2.48,p &lt; .02), though not for the voted perceptron (t(15) = 1.07,p = .15).</Paragraph>
    <Paragraph position="1"> In the &amp;quot;all corefer&amp;quot; baseline, all NPs are grouped into a single cluster; in the &amp;quot;none corefer&amp;quot;, each NP gets its own cluster. In the &amp;quot;EXACT MATCH&amp;quot; baseline, two NPs corefer when their surface forms are identical. All experimental systems outperform all baselines by a statistically significant amount. There are few other reported results for coreference resolution on spontaneous, unconstrained speech; (Strube and M&amp;quot;uller, 2003) similarly finds low overall scores for pronoun resolution on the Switchboard Corpus, albeit by a different scoring metric. Unfortunately, they do not compare performance to equivalent baselines.</Paragraph>
    <Paragraph position="2"> For the AdaBoost method, 50 iterations of boosting are performed on shallow decision trees, with a maximum tree depth of three. For the voted perceptron, 50 training iterations were performed. The performance of the voted perceptron on this task was somewhat unstable, varying depending on the order in which the documents were presented. This may be because a small change in the weights can lead to a very different partitioning, which in turn affects the setting of the weights in the next perceptron iteration. For these results, the order of presentation of the documents was randomized, and the scores for the voted perceptron are the average of 10 different runs (s = 0.32% with gestures, 0.40% without).</Paragraph>
    <Paragraph position="3"> Although the AdaBoost method minimizes pairwise error rather than the overall error of the partitioning, its performance was superior to the voted perceptron. One possible explanation is that by boosting small decision trees, AdaBoost was able to take advantage of non-linear combinations of features. We tested the voted perceptron using all pairwise combinations of features, but this did not improve performance.</Paragraph>
  </Section>
  <Section position="5" start_page="38" end_page="39" type="metho">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> If gesture features play a role in coreference resolution, then one might expect the probability of coreference to vary significantly when conditioned on features describing the gesture. As shown in Table 3, the prediction holds: the binned FOCUS DIST gesture feature has the fifth highest kh2 value, and the relationship between coreference and all gesture features was significant</Paragraph>
    <Paragraph position="2"> FOCUS DIST ranks fifth, three of the features above it are variants of a string-match feature, and so are highly redundant.</Paragraph>
    <Paragraph position="3"> The WHICH HAND feature is less strongly correlated with coreference, but the conditional probabilities do correspond with intuition. If the NPs corefer, then the probability of using the same hand to gesture during both NPs is 59.9%; if not, then the likelihood is 52.8%. The probability of not observing a focus hand is 20.3% when the NPs corefer, 25.1% when they do not; in other words, gesture is more likely for both NPs of a coreferent pair than for the NPs of a non-coreferent pair. The relation between the WHICH HAND feature and coreference is also significantly different from the null hypothesis (kh2 = 57.2,dof = 2,p &lt; .01).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML