File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1046_evalu.xml

Size: 2,901 bytes

Last Modified: 2025-10-06 13:59:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1046">
  <Title>Aggregation via Set Partitioning for Natural Language Generation</Title>
  <Section position="7" start_page="364" end_page="364" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Our results are summarized in Table 3. As can be seen, the ILP model outperforms the clustering model by a wide margin (11.9% F-score). The two methods yield comparable recall; however, the clustering model lags considerably behind as far as precision is concerned (the difference is 24.5 %).7 Precision is more important than recall in the context of our aggregation application. Incorrect aggregations may have detrimental effects on the coherence of the generated text. Choosing not to aggregate may result in somewhat repetitive texts; however, the semantic content of the underlying text remains intact. In the case of wrong aggregations, we may group together facts that are not compatible, and even introduce implications that are false.</Paragraph>
    <Paragraph position="1"> We also consider how well our model performs when evaluated on total partition accuracy. Here, we are examining the partitioning as a whole and ask the following question: how many clusters of size 1, 2 . . . n did the algorithm get right? This evaluation measure is stricter than F-score which is com7Unfortunately we cannot apply standard statistical tests such as the t-test on F-scores since they violate assumptions about underlying normal distributions. It is not possible to use an assumptions-free test like kh2 either, since F-score is not a frequency-based measure. We can, however, use kh2 on precision and recall, since these measures are estimated from frequency data. We thus find that the ILP model is significantly better than the clustering model on precision (kh2 = 16.39, p &lt; 0.01); the two models are not significantly different in terms of recall (kh2 = 0.02, p &lt; 0.89).</Paragraph>
    <Paragraph position="2">  ent size puted over pairwise label assignments. The partition accuracy for entry groups of varying size is shown in Figure 2. As can be seen, in all cases the ILP outperforms the clustering baseline. Both models are fairly accurate at identifying singletons, i.e., database entries which are not aggregated. Performance is naturally worse when considering larger clusters. Interestingly, the difference between the two models becomes more pronounced for partition sizes 4 and 5 (see Figure 2). The ILP's accuracy increases by 24% for size 4 and 8% for size 5.</Paragraph>
    <Paragraph position="3"> These results empirically validate the importance of global inference for the partitioning task. Our formulation allows us to incorporate important document-level constraints as well as consistency constraints which cannot be easily represented in a vanilla clustering model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML