File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1018_evalu.xml

Size: 4,572 bytes

Last Modified: 2025-10-06 13:59:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1018">
  <Title>Evaluation and Extension of Maximum Entropy Models with Inequality Constraints</Title>
  <Section position="8" start_page="10" end_page="10" type="evalu">
    <SectionTitle>
6.1 Results
</SectionTitle>
    <Paragraph position="0"> We first found the best values for the control parameters of each model, W, s, and the cut-off threshold, by using the development set. We show that the inequality models outperform the other methods in the development set. We then show that these values are valid for the evaluation set. We used the first half of the test set as the development set, and the second half as the evaluation set.</Paragraph>
    <Paragraph position="1"> Figure 1 shows the accuracies of the inequality ME models for various width factors. The accuracies are presented by the &amp;quot;micro averaged&amp;quot; F-score. The horizontal lines show the highest accuracies of cut-off and gaussian models found by exhaustive search. For cut-off, we varied the cut-off threshold and found the best threshold. For gaussian,we varied s with each cut-off threshold, and found the best s and cut-off combination. We can see that the inequality models outperform the cut-off method and the Gaussian MAP estimation with an appropriate value for W in both datasets. Although the OHSUMED dataset seems harder than the Reuters dataset, the improvement in the OHSUMED dataset is greater than that in the Reuters dataset. This may be because the OHSUMED dataset is more sparse than the Reuters dataset. The 2-norm extension boosts the accuracies, especially for bayes, at the moderate Ws (i.e., with the moderate numbers of active features). However, we can not observe the apparent advantage of the 2-norm extension in terms of the highest accuracy here.</Paragraph>
    <Paragraph position="2"> Figure 2 shows the average number of active features of each inequality ME model for various width factors. We can see that active features increase  gaussian, the accuracy with the best s found by exhaustive search is shown for each cut-off threshold. when the widths become small as expected.</Paragraph>
    <Paragraph position="3"> Figure 3 shows the accuracy of each model as a function of the number of active features. We can see that the inequality ME models achieve the highest accuracy with a fairly small number of active features, removing unnecessary features on their own.</Paragraph>
    <Paragraph position="4"> Besides, they consistently achieve much higher accuracies than the cut-off and the Gaussian MAP estimation with a small number of features.</Paragraph>
    <Paragraph position="5"> Table 1 summarizes the above results including the best control parameters for the development set, and shows how well each method performs for the evaluation set with these parameters. We can see that the best parameters are valid for the evaluation sets, and the inequality ME models outperform the other methods in the evaluation set as well. This means that the inequality ME model is generally superior to the cut-off method and the Gaussian MAP estimation. At this point, the 2-norm extension shows the advantage of being robust, especially for the Reuters dataset. That is, the 2-norm models outperform the normal inequality models in the evaluation set. To see the reason for this, we show the average cross entropy of each inequality model as a function of the width factor in Figure 4. The average cross entropy was calculated as [?]  where C is the number of categories. The cross entropy of the 2-norm model is consistently more stable than that of the normal inequality model. Although there is no simple relation between the absolute accuracy and the cross entropy, this consistent difference can be one explanation for the advantage of the 2-norm extension. Besides, it is possible that the effect of 2-norm extension appears more clearly in the Reuters dataset because the robustness is more important in the Reuters dataset since the development set is rather small and easy to overfit.</Paragraph>
    <Paragraph position="6"> Lastly, we could not observe the advantage of bayes method in these experiments. However, since our method is still in development, it is premature to conclude that the idea of using different widths according to its unreliability is not successful. It is possible that the uncertainty of ~p(x), which were not concerned about, is needed to be modeled, or the Bernoulli trial assumption is inappropriate. Further investigation on these points must be done.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML