File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1067_evalu.xml

Size: 7,215 bytes

Last Modified: 2025-10-06 14:00:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1067">
  <Title>Toward General-Purpose Learning for Information Extraction</Title>
  <Section position="5" start_page="405" end_page="407" type="evalu">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> The results presented here represent average performances over several separate experiments.</Paragraph>
    <Paragraph position="1"> In each experiment, the 600 documents in the collection were randomly partitioned into two sets of 300 documents each. One of the two subsets was then used to train each of the learners, the other to measure the performance of the learned extractors.</Paragraph>
    <Paragraph position="2"> \Y=e compared four learners: each of the two simple learners, Bayes and Rote, and SRV with two different feature sets, its default feature set, which contains no &amp;quot;sophisticated&amp;quot; features, and the default set augmented with the features derived from the link grammar parser and Wordnet. \Y=e will refer to the latter as 5RV+ling. Results are reported in terms of two metrics closely related to precision and recall, as seen in information retrievah Accuracy, the percentage of documents for which a learner predicted correctly (extracted the field in question) over all documents for which the learner predicted; and coverage, the percentage of documents having the field in question for which a learner made some prediction.</Paragraph>
    <Section position="1" start_page="405" end_page="406" type="sub_section">
      <SectionTitle>
4.1 Performance
</SectionTitle>
      <Paragraph position="0"> Table 1 shows the results of a ten-fold experiment comparing all four learners on all nine fields. Note that accuracy and coverage must be considered together when comparing learners. For example, Rote often achieves reasonable accuracy at very low coverage.</Paragraph>
      <Paragraph position="1"> Table 2 shows the results of a three-fold experiment, comparing all learners at fixed cover- null learners on the acquisitions fields.</Paragraph>
      <Paragraph position="2"> age levels, 20% and 80%, on four fields which we considered representative of tile wide range of behavior we observed. In addition, in order to assess the contribution of each kind of linguistic information (syntactic and lexical) to 5RV's performance, we ran experiments in which its basic feature set was augmented with only one type or the other.</Paragraph>
    </Section>
    <Section position="2" start_page="406" end_page="407" type="sub_section">
      <SectionTitle>
4.2 Discussion
</SectionTitle>
      <Paragraph position="0"> Perhaps surprisingly, but consistent with results we have obtained in other domains, there is no one algorithm which outperforms the others on all fields. Rather than the absolute difficulty of a field, we speak of the suitability of a learner's inductive bias for a field (Mitchell, 1997). Bayes is clearly better than SRV on the seller and sellerabr fields at all points on the accuracy-coverage curve. We suspect this may be due, in part, to the relative infrequency of these fields in the data.</Paragraph>
      <Paragraph position="1"> The one field for which the linguistic features offer benefit at all points along the accuracy-coverage curve is acqabr. 2 We surmise that two factors contribute to this success: a high frequency of occurrence for this field (2.42 times 2The acqabr differences in Table 2 (a 3-split experiment) are not significant at the 95% confidence level. However, the full 10-split averages, with 95% error margins, are: at 20% coverage, 61.5+4.4 for SRV and  A fragment is a acqabr, if: it contains exactly one token; the token (T) is capitalized; T is followed by a lower-case token; T is preceded by a lower-case token; T has a right AN-link to a token (U) with wn_word value &amp;quot;possession&amp;quot;; U is preceded by a token with wn_word value &amp;quot;stock&amp;quot;; and the token two tokens before T is not a two-character token.</Paragraph>
      <Paragraph position="2"> to purchase 4.5 mln~ common shares at acquire another 2.4 mln~-a6~treasury shares  tic features, along with two fragments of matching text. The AN-link connects a noun modifier to the noun it modifies (to &amp;quot;shares&amp;quot; in both examples). null per document on average), and consistent occurrence in a linguistically rich context.</Paragraph>
      <Paragraph position="3"> Figure 2 shows a 5RV+ling rule that is able to exploit both types of linguistic information. The Wordnet synsets for &amp;quot;possession&amp;quot; and &amp;quot;stock&amp;quot; come from the same branch in a hypernym tree--&amp;quot;possession&amp;quot; is a generalization of &amp;quot;stock&amp;quot;3--and both match the collocations &amp;quot;common shares&amp;quot; and &amp;quot;treasury shares.&amp;quot; That the paths \[right_AN\] and \[right_AN prev_tok\] both connect to the same synset indicates the presence of a two-word Wordnet collocation.</Paragraph>
      <Paragraph position="4"> It is natural to ask why SRV+ling does not 3SRV, with its general-to-specific search bias, often employs Wordnet this way--first more general synsets, followed by specializations of the same concept.  outperform SRV more consistently. After all, the features available to SRV+ling are a superset of those available to SRV. As we see it, there are two basic explanations: * Noise. Heuristic choices made in handling syntactically intractable sentences and in disambiguating Wordnet word senses introduced noise into the linguistic features.</Paragraph>
      <Paragraph position="5"> The combination of noisy features and a very flexible learner may have led to over-fitting that offset any advantages the linguistic features provided.</Paragraph>
      <Paragraph position="6"> * Cheap features equally effective. The simple features may have provided most of the necessary information. For example, generalizing &amp;quot;acquired&amp;quot; and &amp;quot;bought&amp;quot; is only useful in the absence of enough data to form rules for each verb separately.</Paragraph>
    </Section>
    <Section position="3" start_page="407" end_page="407" type="sub_section">
      <SectionTitle>
4.3 Conclusion
</SectionTitle>
      <Paragraph position="0"> More than similar systems, SRV satisfies the criteria of generality and retargetability. The separation of domain-specific information from the central algorithm, in the form of an extensible feature set, allows quick porting to novel domains. null Here, we have sketched this porting process.</Paragraph>
      <Paragraph position="1"> Surprisingly, although there is preliminary evidence that general-purpose linguistic information can provide benefit in some cases, most of the extraction performance can be achieved with only the simplest of information.</Paragraph>
      <Paragraph position="2"> Obviously, the learners described here are not intended to solve the information extraction problem outright, but to serve as a source of information for a post-processing component that will reconcile all of the predictions for a document, hopefully filling whole templates more accurately than is possible with any single learner. How this might be accomplished is one theme of our future work in this area.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML