File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2025_evalu.xml

Size: 6,101 bytes

Last Modified: 2025-10-06 13:59:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2025">
  <Title>Investigating the features that affect cue usage of non-native speakers of English</Title>
  <Section position="7" start_page="146" end_page="148" type="evalu">
    <SectionTitle>
7 Experiments
</SectionTitle>
    <Paragraph position="0"> We divided the experiments into four sets. Experiment Set 1 were run for examining the best individual feature whose predictive power was better than the baseline. Experiment Set 2, 3 and 4 were run for classifying the placement of because. In Experiment Set 2, we only used sentence features.</Paragraph>
    <Paragraph position="1"> In Experiment Set 3, we used both sentence features and embedding structure features. Experiment Set 4 were run using only embedding structure features.</Paragraph>
    <Section position="1" start_page="146" end_page="147" type="sub_section">
      <SectionTitle>
7.1 Experiment Set 1
</SectionTitle>
      <Paragraph position="0"> First we introduce a concept - baseline, which can be obtained by choosing the majority class.</Paragraph>
      <Paragraph position="1"> E.g., 71.0% (88/124) because occurs in the second span. That is, if because is placed directly in the second span, one would be wrong 29% of the times. So 29% is the error rate of the baseline model that is used in the experiment.</Paragraph>
      <Paragraph position="2"> We ran the experiment 14 times using each feature mentioned above. By analyzing the results, we found that only feature R has predictive power. Because the 95% confidence interval of its error rate was 16.2 0.7, whose upper bound for error rate (16.9%) was much lower than the base-line (29%). Table 2 shows the results by using feature R. When discourse relation of the embedding structure is &amp;quot;cause&amp;quot;, &amp;quot;contrast&amp;quot;, &amp;quot;example&amp;quot;, or &amp;quot;explanation&amp;quot;, because occurs in the first span. ......</Paragraph>
      <Paragraph position="4"> ble 3). In the first experiment, all eight sentence features were used. However, the upper bound of the 95% confidence interval for error rate (34.1%) was higher than the baseline (29%).</Paragraph>
      <Paragraph position="5"> So the learned model was not a good one. Then we ran three other experiments using a combination of different sentence features. In subset 2, the features representing span structure (Ns and Ss) were deleted. In subset 3, compared with the first one, span length (Ng and Sg) were deleted. In subset 4, only the features relating to span length (Ng and Sg) and span structure (Ns and Ss) were used. However, no good classification model was obtained.</Paragraph>
    </Section>
    <Section position="2" start_page="147" end_page="147" type="sub_section">
      <SectionTitle>
7.3 Experiment Set 3
</SectionTitle>
      <Paragraph position="0"> Experiment Set 3 had four subsets as well. In the first subset, experiment was run using all sentence features and embedding structure features. Experimental results show that the upper bound of the 95% confidence interval for error rate (26%) was lower than the baseline (29%). It means that embedding structure feature(s) could improve the accuracy of the learned classification models. In the next three experiments, we tried three other feature combinations. One feature set concerned with the placement of because (P) and span structure (Ns and Ss, Bs and Os). Experimental results show that the average error rate is higher than the baseline. In subset 3, two sentence features (Ng and Sg) and two embedding structure features (C and N-S) were added. However, the average error rate of the learned model was still higher than the baseline. It means that these four features can not help to improve the accuracy of classification models. In subset 4, feature R was added. Though the average error rate was lower than subset 2 and 3, its upper bound of the 95% confidence interval for error rate was higher than the baseline. The fourth learned model can not be regarded as a good one.</Paragraph>
    </Section>
    <Section position="3" start_page="147" end_page="148" type="sub_section">
      <SectionTitle>
7.4 Experiment Set 4
</SectionTitle>
      <Paragraph position="0"> Experiment Set 4 had five subsets. In subset 1, the experiment was run using all the six embedding structure features. The upper bound of the 95% confidence interval for error rate of the learned model was lower than the baseline. In subset 2, we ran the experiment by deleting one feature R from subset 1. Its average error rate was higher than that of subset 1, and its upper bound of the 95% confidence interval for error rate was higher than the baseline. It again proves that R is the feature that affects the accuracy of learned models.</Paragraph>
      <Paragraph position="1"> In the subset 3 and 4, experiments were run by deleting feature C and P respectively. The average error rates of the results were nearly the same as that of subset 1. It demonstrates that features C  and P do not affect the accuracy of learned models. In the subset 5, features Bs and Os were deleted from the subset 1. The experimental result did not change so much as well. So we can infer that span structure do not affect the accuracy of the learned model.</Paragraph>
    </Section>
    <Section position="4" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
7.5 Discussion
</SectionTitle>
      <Paragraph position="0"> The experimental results show that machine learning program C4.5 is useful to induce a classification model of placement of because for non-native speakers. The results of Experiment Set 1 demonstrate that feature R is the best individual feature whose predictive power is better than the baseline. Experiment Set 2 and 3 show that good learned model can not be obtained using sentence features, or the combination of sentence features and embedding structure features. The results of Experiment Set 4 demonstrate that high performing classification models can be obtained by combining feature R with several other embedding structure features. However, the best learned model can't be obtained.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML