XML Viewer - c92-2099

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/c92-2099_evalu.xml
Size: 8,020 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2099">
  <Title>ACQUISITION OF SELECTIONAL PATTERNS</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Our first set of experiments were conducted to compare three methods of coping with multiple parses. These methods, as described in section 4, are (1) generating all N parses of a sentence, and weighting each by l/N; (2)selecting the N most likely parses as determined by a stochastic grammar, and weighting those each by 1/N; (3) generating all parses, but assigning weights to alternative parses using a form of the inside-outside procedure. These experiments were conducted using a smaller training set, a set of 727 sentences drawn from 90 articles. We generated a set of triples using each of the three methods and then evaluated them against our hand-classified triples, as described in section 5. We show in Figure 1 the threshold vs. recM1 curves for the three methods; in Figure 2 the recall vs. 1-error rate curves.</Paragraph>
    <Paragraph position="1"> These experiments showed only very small differences between the three methods (the inside-outside method showed slightly better accuracy at some levels of recall). Based on this, we decided to use method 2 (statistical grammar) for subsequent experiments. Other ttfings being equal, method 2 hms the virtue of generating far fewer parses (an average of 1.5 per sentence, vs.</Paragraph>
    <Paragraph position="2"> 37 per sentence when all parses are produced), and hence a far smaller file of regularized parses (about 10 MB for our entire training corpus of 1000 articles, vs. somewhat over 200 MB which would have been required if all parses were generated). Using method 2, therefore, we generated the triples for our 1000-article training corpus.</Paragraph>
    <Paragraph position="3"> Our second series of experiments compared three different ways of accumulating data from the triples:  with multiple parses in pattern collection, using training corpus of 90 articles. Threshokl vs. recall for o = all parses; o = all parses + inside-outside; * = most t)robable parses from stochastic grammar.</Paragraph>
    <Paragraph position="4">  with multiple parses in pattern collection, using training corpus of 90 articles. RecM1 vs. l-er rot rate for o = all parses; o = all parses + inside-outside; * = most probable parses from stochastic grammar.</Paragraph>
    <Paragraph position="5">  generalized heads; o = triples with generalized heads; * = 1)airs.</Paragraph>
    <Paragraph position="6"> 1. generalizing the value in a head-flmctionvalue triple to a word class, but not generalizing tile head 2. generalizing hoth the value and the head 3. ignoring the value field entirely in a  head-function-value triple, ,~n(l accumulating counts of head-fimction pairs (with no generalization applied to the head); a match with the hand-marked triples is therefore recorded if the head and flmction fields match Again, we evaluated the patterns produced by each method against tile hand-marked triples. Figure 3 shows the threshohl vs. recall curves for each method; Figure 4 the recM1 vs. 1-error rate curves. Figure 3 indicates that using pairs yields the highest recall for a given threshold, triples with generalized \]leads an intermediate value, and triples without generalized heads the lowest recall. The error rate vs. recall curves of ligure 4 do not show a great difference between mcdLods, but they do indicate ttlat, over tile range of recalls for which they overlap, using triples without generalized heads l)roduces the lowest error rate.</Paragraph>
    <Paragraph position="7"> Finally, we conducted a series of experiments to compare the effectiveness of the triples in selecting the correct parse, in effect, the selection procedure works as follows, l'br each sentence in the test corpus, the system generates all possible  techniques, using training corpus of 1000 articles. Recall vs. 1-error rate for o = triples without generalized beads; o = triples with generalized heads; * = pairs.</Paragraph>
    <Paragraph position="8"> parses and then generates a set of triples from each parse. Each triple is assigned a score; the score for the parse is the product of the scores of the triples obtained from the parse (the use of products is consistent with the idea that the score for a triple to some degree reflects the probability that this triple is semantically valid). The parse or parses with the highest total score are then selected for evaluation.</Paragraph>
    <Paragraph position="9"> We tested three approaches to assigning a score to a triple: 1. We used the frequency of head-function-value triples relative to the frequency of the head as an estimate of the probability that this head would appear with this functionvalue combination. We used the &amp;quot;expected likelihood estimate&amp;quot; \[8\] in order to assure that triples which do not appear in the training corpus are still assigned non-zero probability; this simple estimator adds 1/2 to each observed frequency: freq. of triple + 0.5 score = freq. of head + 0.5 2. We applied a threshold to our set of collected triples: if a triple appeared with a frequency above the threshold it was assigned one score; if at or below the threshold, a lower score. We selected a threshold of 0.9, so that any triple which appeared unambiguously in at least one sentence of the training corpus was included. For our scores, we used the results of our previous set of experiments. These experiments showed that at a threshold of 0.9, 82% of the triples above the threshold were semantically valid, while 47% of the triples below the threshold were valid3 Thus we used score = 0.82 if freq. of triple &gt; 0.9 0.47 if freq. of triple &lt; 0.9 We expanded on method 2 by using both triples and pairs information. To assign a score to a head-function-value triple, we first ascertain whether this triple appears with frequency &gt; T in the collected patterns; if so, we assign a high score to the triple. If not, we determine whether the head-function pair appears with frequency &gt; T in the collected patterns. If so, we assign an intermediate score to the triple; if not, we assign a low score to the triple.</Paragraph>
    <Paragraph position="10"> Again, we chose a threshold of 0.9 for both triples and pairs. Our earlier experiments indicated that, of those head-function-value triples for which the triple was below the threshold for triples frequency but the head-function pair was above the threshold for pair frequency, 52% were semanticaily valid.</Paragraph>
    <Paragraph position="11"> Of those for which the head-function pair was below the threshold for pair frequency, 40% were semantically valid. Thus we used score = 0.82 if freq. of triple &gt; 0.9, else 0.52 if freq. of pair &gt; 0.9, else 0.40 if freq. of pair &lt; 0.9 Using these three scoring flmctions for selection, we parsed our test set of sentences and then scored the resulting parses against our &amp;quot;standard parses&amp;quot;. As a further comparison, we also parsed the same set using selectional constraints which had been previously manuMly prepared for this domain. The parses were scored against the standard in terms of average recall, precision, and number of crossings; the results are shown in Table 1. 3 A better match to the correct parses  ACTI':S DE COLING-92, NANTES, 23-28 AOUT 1992 6 6 2 PROC. OF COL1NG-92, NAMES, AUG. 23-28, 1992 selection strategy crossings rec',di precision 1. frequency-based 2.00 75.70 71.86 2. triples-threshold 2.17 73.57 70.22 3. triples-and-pairs deg 2-09 74:33 70.943. hand-generated 2.04 &amp;quot;t4.34 i  selection strategies on the quality of parses generated. null is reflected in higher recall and precision and lower number of crossings. These results indicate that the frequency-based scores performed better than either the threshold-ha.qed scores or the manually-prepared selection.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML