XML Viewer - w04-3239

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3239_evalu.xml
Size: 10,346 bytes
Last Modified: 2025-10-06 13:59:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3239">
  <Title>A Boosting Algorithm for Classification of Semi-Structured Text</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Experimental Setting
</SectionTitle>
      <Paragraph position="0"> We conducted two experiments in sentence classification. null PHS review classification (PHS) The goal of this task is to classify reviews (in Japanese) for PHS2 as positive reviews or negative reviews. A total of 5,741 sentences were collected from a Web-based discussion BBS on PHS, in which users are directed to submit positive reviews separately from negative reviews. The unit of classification is a sentence. The categories to be identified are &amp;quot;positive&amp;quot; or &amp;quot;negative&amp;quot; with the numbers 2,679 and 3,062 respectively.</Paragraph>
      <Paragraph position="1"> Modality identification (MOD) This task is to classify sentences (in Japanese) by modality. A total of 1,710 sentences from a Japanese newspaper were manually annotated 2PHS (Personal Handyphone System) is a cell phone system developed in Japan in 1989.</Paragraph>
      <Paragraph position="2"> according to Tamura's taxonomy (Tamura and Wada, 1996). The unit of classification is a sentence. The categories to be identified are &amp;quot;opinion&amp;quot;, &amp;quot;assertion&amp;quot; or &amp;quot;description&amp;quot; with the numbers 159, 540, and 1,011 respectively.</Paragraph>
      <Paragraph position="3"> To employ learning and classification, we have to represent a given sentence as a labeled ordered tree. In this paper, we use the following three representation forms.</Paragraph>
      <Paragraph position="4"> bag-of-words (bow), baseline Ignoring structural information embedded in text, we simply represent a text as a set of words. This is exactly the same setting as Boostexter. Word boundaries are identified using a Japanese morphological analyzer, ChaSen3.</Paragraph>
      <Paragraph position="5"> Dependency (dep) We represent a text in a word-based dependency tree. We first use CaboCha4 to obtain a chunk-based dependency tree of the text. The chunk approximately corresponds to the base-phrase in English. By identifying the head word in the chunk, a chunk-based dependency tree is converted into a word-based dependency tree.</Paragraph>
      <Paragraph position="6"> N-gram (ngram) It is the word-based dependency tree that assumes that each word simply modifies the next word. Any subtree of this structure becomes a word n-gram.</Paragraph>
      <Paragraph position="7"> We compared the performance of our Boosting algorithm and support vector machines (SVMs) with bag-of-words kernel and tree kernel according to their F-measure in 5-fold cross validation. Although there exist some extensions for tree kernel (Kashima and Koyanagi, 2002), we use the original tree kernel by Collins (Collins and Duffy, 2002), where all subtrees of a tree are used as distinct features. This setting yields a fair comparison in terms of feature space. To extend a binary classifier to a multi-class classifier, we use the one-vs-rest method. Hyperparameters, such as number of iterations K in Boosting and soft-margin parameter C in SVMs were selected by using cross-validation. We implemented SVMs with tree kernel based on TinySVM5 with custom kernels incorporated therein.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Results and Discussion
</SectionTitle>
      <Paragraph position="0"> Table 1 summarizes the results of PHS and MOD tasks. To examine the statistical significance of the results, we employed a McNemar's paired test, a variant of the sign test, on the labeling disagreements. This table also includes the results of significance tests.</Paragraph>
      <Paragraph position="1">  In all tasks and categories, our subtree-based Boosting algorithm (dep/ngram) performs better than the baseline method (bow). This result supports our first intuition that structural information within texts is important when classifying a text by opinions or modalities, not by topics. We also find that there are no significant differences in accuracy between dependency and n-gram (in all cases, p&gt; 0:2).</Paragraph>
      <Paragraph position="2">  When using the bag-of-words feature, no significant differences in accuracy are observed between Boosting and SVMs. When structural information is used in training and classification, Boosting performs slightly better than SVMs with tree kernel. The differences are significant when we use dependency features in the MOD task. SVMs show worse performance depending on tasks and categories, (e.g., 24.2 F-measure in the smallest category &amp;quot;opinion&amp;quot; in the MOD task).</Paragraph>
      <Paragraph position="3"> When a convolution kernel is applied to sparse data, kernel dot products between almost the same instances become much larger than those between different instances. This is because the number of common features between similar instances exponentially increases with size. This sometimes leads to overfitting in training , where a test instance very close to an instance in training data is correctly classified, and other instances are classified as a default class. This problem can be tackled by several heuristic approaches: i) employing a decay factor to reduce the weights of large sub-structures (Collins and Duffy, 2002; Kashima and Koyanagi, 2002).</Paragraph>
      <Paragraph position="4"> ii) substituting kernel dot products for the Gaussian function to smooth the original kernel dot products (Haussler, 1999). These approaches may achieve better accuracy, however, they yield neither the fast classification nor the interpretable feature space targeted by this paper. Moreover, we cannot give a fair comparison in terms of the same feature space. The selection of optimal hyperparameters, such as decay factors in the first approach and smoothing parameters in the second approach, is also left to as an open  We employed a McNemar's paired test on the labeling disagreements. Underlined results indicate that there is a significant difference (p &lt; 0:01) against the baseline (bow). If there is a statistical difference (p &lt; 0:01) between Boosting and SVMs with the same feature representation (bow / dep / n-gram), better results are asterisked.  In the previous section, we described the merits of our Boosting algorithm. We experimentally verified these merits from the results of the PHS task.</Paragraph>
      <Paragraph position="5"> As illustrated in section 4, our method can automatically select relevant and compact features from a number of feature candidates. In the PHS task, a total 1,793 features (rules) were selected, while the set sizes of distinct uni-gram, bi-gram and tri-gram appearing in the data were 4,211, 24,206, and 43,658 respectively. Even though all subtrees are used as feature candidates, Boosting selects a small and highly relevant subset of features. When we explicitly enumerate the subtrees used in tree kernel, the number of active (non-zero) features might amount to ten thousand or more.</Paragraph>
      <Paragraph position="6"> Table 2 shows examples of extracted support features (pairs of feature (tree) t and weight wt in (Eq. 5)) in the PHS task.</Paragraph>
      <Paragraph position="7"> A. Features including the word &amp;quot;tXM(hard, difficult)&amp;quot; null In general, &amp;quot;tXM(hard, difficult)&amp;quot; is an adjective expressing negative opinions. Most of features including &amp;quot;tXM&amp;quot; are assigned a negative weight (negative opinion). However, only one feature &amp;quot; ~tXM(hard to cut off)&amp;quot; has a positive weight. This result strongly reflects the domain knowledge, PHS (cell phone reviews).</Paragraph>
      <Paragraph position="8"> B. Features including the word &amp;quot;O(use)&amp;quot; &amp;quot;O(use)&amp;quot; is a neutral expression for opinion classifications. However, the weight varies according to the surrounding context: 1) &amp;quot;M hM(want to use)&amp;quot;!positive, 2) &amp;quot;Mb M(be easy to use)&amp;quot;!positive, 3) &amp;quot;Mb Tlh(was easy to use)&amp;quot; (past form)!negative, 4) &amp;quot;wOUMbM(... is easier to use than ..)&amp;quot; (comparative)!negative.</Paragraph>
      <Paragraph position="9"> C. Features including the word &amp;quot;F?(recharge)&amp;quot; Features reflecting the domain knowledge are  extracted: 1) &amp;quot;F?UyM(recharging time is short)&amp;quot; ! positive, 2) &amp;quot;F?U M(recharging time is long)&amp;quot; ! negative.</Paragraph>
      <Paragraph position="10"> These features are interesting, since we cannot determine the correct label (positive/negative) by using just the bag-of-words features, such as &amp;quot;recharge&amp;quot;, &amp;quot;short&amp;quot; or &amp;quot;long&amp;quot; alone. Table 3 illustrates an example of actual classification. For the input sentence &amp;quot;UGVXo,, _bM(The LCD is large, beautiful, and easy to see.)&amp;quot;, the system outputs the features applied to this classification along with their weights wt. This information allows us to analyze how the system classifies the input sentence in a category and what kind of features are used in the classification. We cannot perform these analyses with tree kernel, since it defines their feature space implicitly.</Paragraph>
      <Paragraph position="11"> The testing speed of our Boosting algorithm is much higher than that of SVMs with tree kernel. In the PHS task, the speeds of Boosting and SVMs are 0.531 sec./5,741 instances and 255.42 sec./5,741 instances respectively 6. We can say that Boosting is  about 480 times faster than SVMs with tree kernel.</Paragraph>
      <Paragraph position="12"> Even though the potential size of search space is huge, the pruning criterion proposed in this paper effectively prunes the search space. The pruning conditions in Fig.4 are fulfilled with more than 90% probabitity. The training speed of our method is 1,384 sec./5,741 instances when we set K = 60;000 (# of iterations for Boosting). It takes only 0.023 (=1,384/60,000) sec. to invoke the weak learner, Find Optimal Rule.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML