File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-1047_evalu.xml
Size: 3,708 bytes
Last Modified: 2025-10-06 13:59:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1047"> <Title>A Semantic Approach to IE Pattern Induction</Title> <Section position="7" start_page="383" end_page="384" type="evalu"> <SectionTitle> 6 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="383" end_page="383" type="sub_section"> <SectionTitle> 6.1 Document Filtering </SectionTitle> <Paragraph position="0"> Results for both the document and sentence filtering experiments are reported in Table 2 which lists precision, recall and F-measure for each approach on both evaluations. Results from the document filtering experiment are shown on the left hand side of the table and continuous F-measure scores for the same experiment are also presented in graphical format in Figure 2. While the document-centric approach achieves the highest F-measure of either system (0.83 on the 33rd iteration compared against 0.81 after 48 iterations of the semantic similarity approach) it only outperforms the proposed approach for a few iterations. In addition the semantic similarity approach learns more quickly and does not exhibit as much of a drop in performance after it has reached its best value. Overall the semantic similarity approach was found to be significantly better than the document-centric approach (p < 0.001, Wilcoxon Signed Ranks Test).</Paragraph> <Paragraph position="1"> Although it is an informative evaluation, the document filtering task is limited for evaluating IE pat- null tern learning. This evaluation indicates whether the set of patterns being learned can identify documents containing descriptions of events but does not provide any information about whether it can find those events within the documents. In addition, the set of seed patterns used for these experiments have a high precision and low recall (Table 2). We have found that the distribution of patterns and documents in the corpus means that learning virtually any pattern will help improve the F-measure. Consequently, we believe the sentence filtering evaluation to be more useful for this problem.</Paragraph> </Section> <Section position="2" start_page="383" end_page="384" type="sub_section"> <SectionTitle> 6.2 Sentence Filtering </SectionTitle> <Paragraph position="0"> Results from the sentence filtering experiment are shown in tabular format in the right hand side of Table 22 and graphically in Figure 3. The semantic similarity algorithm can be seen to outperform the document-centric approach. This difference is also significant (p < 0.001, Wilcoxon Signed Ranks Text).</Paragraph> <Paragraph position="1"> The clear difference between these results shows that the semantic similarity approach can indeed identify relevant sentences while the document-centric method identifies patterns which match relevant documents, although not necessarily relevant sentences.</Paragraph> <Paragraph position="2"> 2The set of seed patterns returns a precision of 0.81 for this task. The precision is not 1 since the patternPERSON+resign matches sentences describing historical events (&quot;Jones resigned last year.&quot;) which were not marked as relevant in this corpus following MUC guidelines.</Paragraph> <Paragraph position="3"> The precision scores for the sentence filtering task in Table 2 show that the semantic similarity algorithm consistently learns more accurate patterns than the existing approach. At the same time it learns patterns with high recall much faster than the document-centric approach, by the 120th iteration the pattern set covers almost 95% of relevant sentences while the document-centric approach covers only 75%.</Paragraph> </Section> </Section> class="xml-element"></Paper>