File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3802_evalu.xml
Size: 3,997 bytes
Last Modified: 2025-10-06 13:59:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3802"> <Title>Graph Based Semi-Supervised Approach for Information Extraction</Title> <Section position="8" start_page="1411" end_page="1411" type="evalu"> <SectionTitle> 6 Results and Discussion </SectionTitle> <Paragraph position="0"> We train several models like the one described in section 5.2 on different training data sets. In all experiments, we use both the LDC ACE training data and the labeled unsupervised data induced with the graph based approach we propose. We use the ACE evaluation procedure and ACE test corpus, provided by LDC, to evaluate all models.</Paragraph> <Paragraph position="1"> We incrementally added labeled unsupervised data to the training data to determine the amount of data after which degradation in the system performance occurs. We sought this degradation point separately for each relation type. Figure 4 shows the effect of adding labeled unsupervised data on the ACE value for each relation separately. We notice from figure 4 and table 1 that relations with a small number of training instances had a higher gain in performance compared to relations with a large number of training instances. This implies that the proposed approach achieves significant improvement when the number of labeled training instances is small but representative.</Paragraph> <Paragraph position="2"> vised data on the ACE value for each relation. The average number of relations per document is 4.</Paragraph> <Paragraph position="3"> From figure 4, we determined the number of training instances resulting in the maximum boost in performance for each relation. We added the training instances corresponding to the maximum boost in performance for all relations to the supervised training data and trained a new model on them. Figure 5 compares the ACE values for each relation in the base line model and the final model The total system ACE value has been improved by 10% over the supervised baseline system. All relation types, except the DSC relation, had significant improvement ranging from 7% to 30% over the baseline supervised system. The DISC relation type had a small degradation; noting that it already has a low ACE value with the baseline system. We think this is due to the fact that the DISC relation has few and inconsistent examples in the supervised data set.</Paragraph> <Paragraph position="4"> To assess the usefulness of the smoothing method employing WordNet distance, we repeated the experiment on EMP-ORG relation without it.</Paragraph> <Paragraph position="5"> We found out that it contributed to almost 30% of the total achieved improvement. We also repeated the experiment but with considering hub scores instead of authority scores. We added the examples associated with highly ranked tuples to the training set. We noticed that using hub scores yielded very little variation in the ACE value (i.e. 0.1 point for and final ACE values for each relation.</Paragraph> <Paragraph position="6"> To evaluate the quality and representativeness of the labeled unsupervised data, acquired using the proposed approach, we study the effect of replacing supervised data with unsupervised data while holding the amount of training data fixed. Several systems have been built using mixture of the supervised and the unsupervised data. In Figure 6, the dotted line shows the degradation in the system performance when using a reduced amount of supervised training data only, while the solid line shows the effect of replacing supervised training data with unsupervised labeled data on the system performance. We notice from Figure 6 that the unsupervised data could replace more than 50% of the supervised data without any degradation in the system performance. This is an indication that the induced unsupervised data is good for training the supervised data on the ACE value. And the effect of replacing portions of the supervised data with labeled training data.</Paragraph> </Section> class="xml-element"></Paper>