File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-2031_evalu.xml

Size: 5,132 bytes

Last Modified: 2025-10-06 13:58:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-2031">
  <Title>Learning with Multiple Stacking for Named Entity Recognition</Title>
  <Section position="3" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
3 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> In this section, the experimental conditions and the results of the proposed method are shown.</Paragraph>
    <Paragraph position="1"> In order to improve the performance of the base system, the tag sequence to be predicted is formatted according to IOB1, even though the sequence Let La18 denote the a19 th level learner and let Ta20a18a22a21a23 denote a19 th level output tags for Wa23 .</Paragraph>
    <Paragraph position="2"> Learning:  1. Train the base learner La24 using the features described in  in the original corpus was formatted according to IOB2 (Tjong Kim Sang and Veenstra, 1999).</Paragraph>
    <Paragraph position="3"> To reduce the computational cost, features appearing fewer than three times are eliminated in the training phase.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Base System
</SectionTitle>
      <Paragraph position="0"> To evaluate the effect of multiple stacking in the next section, the performance of the base system is shown in Figure 2. A performance peak is observed after 10,000 rounds of boosting. Note that a decision stump used in the real AdaBoost.MH takes into account only one feature. Hence the number of features used by real AdaBoost.MH is less than the number of the rounds. In our experiment, because the rounds of boosting are always less than the number of the features (about 40,000), a large proportion of features are not used by the learners.</Paragraph>
      <Paragraph position="1"> If the rounds of boosting in the base system are not enough, stacking effect may be similar to increasing the rounds of boosting. In Figure 2, however, we can see that 10,000 rounds is enough.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Multiple Stacking
</SectionTitle>
      <Paragraph position="0"> We examine the effect of multiple stacking compared to the base system.</Paragraph>
      <Paragraph position="1"> The Fa40a42a41a43a5 score of multiple stacking for the Spanish test set (esp.testa) is shown in Table 2. By stacking learners, the score of each named entity is im- null proved. Compared to the overall Fa40a42a41a49a5 score of La7 , the score of La5 , stacking one learner over the base system, is improved by 4.74 point. Further more, compared to the score of La5 , the score of La50 is higher by 1.67 point. Through five iterations of stacking, the score is continuously increased. The overall scores for the six tests are briefly shown in  for the Spanish tests. However, multiple staking effects greater for the Dutch test, especially for the corpus without part of speech. As discussed in Section 3.1, the improvement of the score is not due to the rounds of boosting. Thus, it is due to multiple stacking.</Paragraph>
      <Paragraph position="2"> In Table 2, stacking effects for MISC and ORG appear greater than those for LOC and PER. It is reasonable to suppose that MISC and ORG entities consist of a relatively long sequence of words, and the surrounding tags can be good clues for the prediction of the current tag. Indeed, in the Spanish training set, the ratios of entities which consist of more than three words are 9.7%, 22.4%, 4.4% and 3.5% for ORG, MISC, LOC and PER respectively.</Paragraph>
      <Paragraph position="3"> Table 4 and 5 show examples of the predicted tags through the stacked level. Let us see how multiple stacking works using the examples in Table 5. Let the word &amp;quot;fin&amp;quot; be the current position. The answer tag is &amp;quot;I-MISC&amp;quot;. When we use the base system  La7 , the predicted tag of the word is &amp;quot;O&amp;quot;. In the next level, La5 uses the surrounding tag features &amp;quot;I-MISC, O, (O,) O, O&amp;quot; and also outputs &amp;quot;O&amp;quot;. In the third level, however, La3 correctly predicts the tag using the surrounding tag features &amp;quot;I-MISC, I-MISC, (O,) O, O&amp;quot;. Note that no other feature changes through the levels. The improvement in the example is clearly caused by multiple stacking. As a result, this MISC entity is allocated tags correctly by La52 . The above effect would not be achieved by two level stacking. This result clearly shows that multiple stacking method has an advantage.</Paragraph>
      <Paragraph position="4"> Next we examine the effect of the learning algorithm to multiple stacking. We use the real AdaBoost.MH for 300, 1,000, 3,000, 10,000, 20,000 rounds. Their Fa40a42a41a43a5 scores in each stacking level are plotted in Figure 3. The score improves by stacking for all algorithms. The highest score is achieved by 10,000 iterations at every stacking level. The shapes of the curves in Figure 3 are similar to each other. This result suggests that the stacking effect is scarcely affected by the performance of the algorithm. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML