XML Viewer - n06-1008

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1008_evalu.xml
Size: 10,144 bytes
Last Modified: 2025-10-06 13:59:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1008">
  <Title>Acquiring Inference Rules with TemporalConstraints by Using Japanese Coordinated Sentences and Noun-Verb Co-occurrences</Title>
  <Section position="5" start_page="60" end_page="63" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
4.1 Settings
</SectionTitle>
      <Paragraph position="0"> We parsed 35 years of newspaper articles (Yomiuri 87-01, Mainichi 91-99, Nikkei 90-00, 3.24GB in total) and 92.6GB of HTML documents downloaded from the WWW using an existing parser (Kanayama et al., 2000) to obtain the word (co-occurrence) frequencies. All the probabilities used in our method were estimated by maximum likelihood estimation from these frequencies. We randomly picked 600 nouns as a development set. We prepared three test sets, namely test sets A, B, and C, which consisted of 100 nouns, 250 nouns and 1,000 nouns respectively.</Paragraph>
      <Paragraph position="1"> Note that all the nouns in the test sets were randomly picked and did not have any common items with the development set. In all the experiments, four human judges checked ifeachproduced rulewasaproper one without knowing how each rule was produced.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="62" type="sub_section">
      <SectionTitle>
4.2 Effects ofUsing Coordinated Sentences
</SectionTitle>
      <Paragraph position="0"> In the rst series of experiments, we compared a simplied version of our scoring function BasicS with some alternative scores. This was mainly to check if coordinated sentences can improve accuracy. The alternative scores we considered  are presented below. Note that we did not test our bias mechanism in this series of experiments.</Paragraph>
      <Paragraph position="2"> S-VV was obtained by approximating the probabilities of coordinated sentences, as in the case of BasicS. However, we assumed the occurrences of two verbs were independent. The difference between the performance of this score and that of BasicS will indicate the effectiveness of using verb-verb co-occurrences in coordinated sentences.</Paragraph>
      <Paragraph position="3"> The second alternative, S-NV, simply ignores the noun-verb co-occurrences in BasicS. MI is a score based onmutual information androughly corresponds to the score used in a previous attempt to acquire temporal relations between events (Chklovski and Pantel, 2004). Cond is an approximation of the proba- null bility that the coordinated sentences consisting of n, vcon andvpre are observed given the precondition part consisting of vpre and n. Rand is a random number and generates rules by combining verbs that co-occur with the given n randomly. This was used as a base-line method of our task Theresulting precisions areshown in Figures 1 and 2. The gure captions specify (4 judges), as in Figure 1, when the acceptable rules included only those regarded as proper by all four judges; the captions specify (3 judges), as in Figure 2, when the acceptable rules include those considered proper by at least three of the fourjudges. We used test set A (100 nouns) and produced the top four rule candidates for each noun according to each score. As the nal results, all the produced rules for all the nouns were sorted according to each score, and a precision was obtained for top N rules in the sorted list. This was thesameastheprecision achieved bysetting thescore valueofN-thruleinthesorted list asthreshold th. Notice that BasicS outperformed all the alternatives2, though the difference between S-VV and BasicS was rather small. Another important point is that the precisions obtained with the scores that ignored noun-verb co-occurrences were quite low. These ndings suggest that 1) coordinated sentences can be useful clues for obtaining temporally constrained rules and 2) noun-verb co-occurrences are also important clues.</Paragraph>
      <Paragraph position="4"> Intheaboveexperiments, weactually allowednoun n to appear as argument types other than the syntactic objects of a verb. When we restricted the argu2Actually, the experiments concerning Rand were conducted considerably after the experiments on the other scores, and only the two of the four judges for Rand were included in the judges for other scores. However, we think that the superiority of our score BasicS over the baseline method was conrmed since the precision of Rand was drastically lower than that of BasicS  In most cases, BasicS outperformed the alternatives.</Paragraph>
      <Paragraph position="5"> Although the number of produced rules was reduced because of this restriction, the precision of all produced rules was improved. Because of this, we decided to restrict the argument type to objects.</Paragraph>
      <Paragraph position="6"> The kappa statistic for assessing the inter-rater agreement was 0.53,which indicates moderate agreement according to Landis and Koch, 1977. The kappa value for only the judgments on rules produced by BasicS rose to 0.59. After we restricted the verb-noun co-occurrences to verb-object co-occurrences, the kappa became 0.49, while that for the rules produced by BasicS was 0.543.</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
4.3 Direction of Implications
</SectionTitle>
      <Paragraph position="0"> Next, we examined the directions of implications and the temporal order between events. We produced 1,000 rules for test set B (250 nouns) using the score BasicS, again without restricting the argument types of given nouns to syntactic objects. When we restricted theargumentpositionstoobjects, weobtained 347rules. Then, fromeach generated rule, we created a new rule having an opposite direction of implications. We swapped the precondition and the consequence oftherule andreversed itstemporal order. For instance, wecreated If someoneenacts alaw, usually someone enforces the law at the same time as or after the enacting of the law from If someone enforces a law, usually someone enacts the law at the same time as or before the enforcing of the law.</Paragraph>
      <Paragraph position="1"> Figure 4 shows the results. Proposed direction  refers to the precision of the rules generated by our method. The precision of the rules with the opposite direction is indicated by Reversed. The precision of Reversed was much lower than that of our method, and this justies our choice of direction. The kappas values forBasicS andReversedwere0.54and0.46 respectively. Both indicate moderate agreement.</Paragraph>
    </Section>
    <Section position="4" start_page="62" end_page="63" type="sub_section">
      <SectionTitle>
4.4 Effects ofthe Bias
</SectionTitle>
      <Paragraph position="0"> Last, we compared Score and BasicS to see the effect of our bias. This time, we used test set C (1,000 nouns). The rules were restricted to those in which the given nouns are syntactic objects of two verbs.</Paragraph>
      <Paragraph position="1"> Theevaluation was doneforonly the top400rules for each score. The results are shown in Figures 5 and 6.</Paragraph>
      <Paragraph position="2"> Score refers to the precision obtained with Score, while BasicS indicates the precision with BasicS.</Paragraph>
      <Paragraph position="3"> For most data points in both graphs, the Score precision was about 10% higher than the BasicS precision. InFigure6,the precision reached 70%when the 400 rules were produced. These results indicate the desirable effect of our bias for, at least, the top rules.  naraba, jikokiroku wo koushinsuru (If someonebetters her best record,usually someonebreaks her best record.) 21/3 moshi katakuriko wo mabusu naraba, katakuriko wo tsukeru (If someonecoats something with potato starch, usually someonecovers somethingwith the starch) 194/4 moshi sasshi wo haifusuru naraba, sasshi wo sakuseisuru (If someonedistributes a booklet, usually someonemakes the booklet.) 303/4 moshi netsuzou wo kokuhakusuru naraba, netsuzou wo mitomeru (If someoneconfesses to a fabrication, usually someoneadmits the fabrication.) 398/3 moshi ifuku wo kikaeru naraba, ifuku wo nugu (If someonechanges clothes, usually someonegets out of the clothes.) Figure7: Examples of acquired inference rules The 400 rules generated by Score included 175 distinct nouns and 272 distinct verb pairs. Examples of the inference rules acquired by Score are shown in  the numbers of judges who judged the rule as being proper. (We omitted the phrase the same time as or before in the examples.) The kappa was 0.57 (moderate agreement).</Paragraph>
      <Paragraph position="4"> In addition, the graphs compare Score with some other alternatives. This comparison was made to check the effectiveness of our bias more carefully. The 400 rules generated by BasicS were re-ranked using Score and the alternative scores, and the precision for each was computed using the human judgments for the rules generated by BasicS. (We did not evaluate the rules directly generated by the alternatives to reduce the workload of the judges.) The rst alternative was Scorecooc, which was presented in Section 3. Here, reranked by ScoreCooc refers to the precision obtained by re-ranking with of Scorecooc. The precision was below that obtained by the re-ranking with Score, (referred to as reranked by Score). As discussed in Section 3, this indicates the bias Parg(vcon) in Score works better than the bias Parg(n,vcon) in Scorecooc.</Paragraph>
      <Paragraph position="5"> Thesecondalternative wasthe scoring function obtained by replacing the bias Parg(vcon) in Score with Pargprime(vpre) , which is roughly the probability that the verb in the precondition will be observed. The score is denoted as PreBias(n,vcon,vpre,arg,argprime) = Pargprime(vpre)BasicS(n,vcon,vpre,arg,argprime). The precision of this score is indicated by reranked by PreBias and is much lower than that of reranked by Score, indicating that only probability of the verbs in the consequences should be used as a bias. This is consistent with our assumption behind the bias.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML