File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0507_concl.xml
Size: 2,218 bytes
Last Modified: 2025-10-06 13:55:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0507"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Towards Large-scale Non-taxonomic Relation Extraction: Estimating the Precision of Rote Extractors[?]</Title> <Section position="7" start_page="54" end_page="55" type="concl"> <SectionTitle> 5 Conclusions and future work </SectionTitle> <Paragraph position="0"> We have described here a new procedure for estimating the precision of the patterns learnt by a rote extractor that learns from the web. Compared to other similar approaches, it has the following improvements: * For each pair (hook,target) in the seed list, a target corpora is also collected (apart from the hook corpora), and the evaluation is performed using corpora from several relations.</Paragraph> <Paragraph position="1"> This has been observed to improve the estimate of the rule's precision, given that the evaluation pairs not only refer to the elements in the seed list.</Paragraph> <Paragraph position="2"> * The cardinality of the relations is taken into consideration in the estimation process using the seed list. This is important, for instance, to be able to estimate the precision in n:n relations like author-work, given that we cannot assume that the only books written by someone are those in the seed list.</Paragraph> <Paragraph position="3"> * For those pairs that cannot be evaluated using the seed list, a simple query to the Google search engine is employed.</Paragraph> <Paragraph position="4"> The precisions estimated with this procedure aresignificantlylowerthantheprecisionsobtained with the usual hook corpus approach, specially for ambiguous patterns, and much near the precision estimate when evaluated by hand.</Paragraph> <Paragraph position="5"> Concerningfuturework, weplantoestimatethe precision of the patterns using the whole hook and target corpora, rather than using a random sample. Asecondobjectivewehaveinmindisnottothrow away the ambiguous patterns with low precision (e.g. the possessive construction), but to train a model so that we can disambiguate which is the relation they are conveying in each context (Girju et al., 2003).</Paragraph> </Section> class="xml-element"></Paper>