File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1051_metho.xml
Size: 11,073 bytes
Last Modified: 2025-10-06 14:15:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1051"> <Title>Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations</Title> <Section position="4" start_page="399" end_page="399" type="metho"> <SectionTitle> 3 Filtering </SectionTitle> <Paragraph position="0"> Filtering assesses how probable it is for a verb to be associated with a wrong frame. Erroneous frames can be the result of tagging errors, parsing mistakes, or errors introduced by the heuristics and procedures we used to guess syntactic structure.</Paragraph> <Paragraph position="1"> We discarded verbs for which we had very little evidence (frame frequency = 1) and applied a relative frequency cutoff: the verb's acquired frame frequency was compared against its overall frequency in the BNC. Verbs whose relative frame frequency was lower than an empirically established threshold were discarded. The threshold values varied from frame to flame but not from verb to verb and were determined by taking into account for each frame its overall frame frequency which was estimated from the COMLEX subcategorization dictionary (6,000 verbs) (Grishman et al., 1994). This meant that the threshold was higher for less frequent frames (e.g., the double object frame for which only 79 verbs are listed in COMLEX).</Paragraph> <Paragraph position="2"> We also experimented with a method suggested by Brent (1993) which applies the binomial test on frame frequency data. Both methods yielded comparable results. However, the relative frequency threshold worked slightly better and the results reported in the following section are based on this method.</Paragraph> </Section> <Section position="5" start_page="399" end_page="401" type="metho"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> We acquired 162 verbs for the double object frame, 426 verbs for the 'V NP1 to NP2' frame and 962 for the 'V NPl for NP2' frame. Membership in alternations was judged as follows: (a) a verb participates in the dative alternation if it has the double object and 'V NP1 to NP2' frames and (b) a verb</Paragraph> <Section position="1" start_page="399" end_page="399" type="sub_section"> <SectionTitle> Dative Alternation Alternating </SectionTitle> <Paragraph position="0"> V NPI NP2 allot, assign, bring, fax, feed, flick, give, grant, guarantee, leave, lend offer, owe, take pass, pay, render, repay, sell, show, teach, tell, throw, toss, write, serve, send, award allocate, bequeath, carry, catapult, cede, concede, drag, drive, extend, ferry, fly, haul, hoist, issue, lease, peddle, pose, preach, push, relay, ship, tug, yield V NPI to NP2 ask, chuck, promise, quote, read, shoot, slip</Paragraph> </Section> <Section position="2" start_page="399" end_page="401" type="sub_section"> <SectionTitle> Benefactive Alternation </SectionTitle> <Paragraph position="0"> Alternating bake, build, buy, cast, cook, earn, fetch, find, fix, forge, gain, get, keep, knit, leave, make, pour, save procure, secure, set, toss, win, write V NPI NP2 arrange, assemble, carve, choose, compile, design, develop, dig, gather, grind, hire, play, prepare, reserve, run, sew V NP1 for NP2 boil, call, shoot Table 5: Verbs common in corpus and Levin participates in the benefactive alternation if it has the double object and 'V NP1 for NP2' frames. Table 5 shows a comparison of the verbs found in the corpus against Levin's list of verbs; 5 rows 'V NP1 to NP2' and 'V NP1 for NP2' contain verbs listed as alternating in Levin but for which we acquired only one frame. In Levin 115 verbs license the dative and 103 license the benefactive alternation. Of these we acquired 68 for the dative and 43 for the benefactive alternation (in both cases including verbs for which only one frame was acquired).</Paragraph> <Paragraph position="1"> The dative and benefactive alternations were also acquired for 52 verbs not listed in Levin. Of these, 10 correctly alternate (cause, deliver, hand, refuse, report and set for the dative alternation and cause, spoil, afford and prescribe for the benefactive), and 12 can appear in either frame but do not alternate (e.g., appoint, fix, proclaim). For 18 verbs two frames were acquired but only one was correct (e.g., swap and forgive which take only the double object frame), and finally 12 verbs neither alternated nor had the acquired frames. A random sample of the acquired verb frames and their (log-transformed) frequencies is shown in figure 1.</Paragraph> <Paragraph position="2"> for the dative and benefactive alternations class the number of verbs acquired from the corpus against the number of verbs listed in Levin. As can be seen in figure 2, Levin and the corpus approximate each other for verbs of FUTURE HAVING (e.g., guarantee), verbs of MESSAGE TRANSFER (e.g., tell) and BRING-TAKE verbs (e.g., bring).</Paragraph> <Paragraph position="3"> The semantic classes of GIVE (e.g., sell), CARRY (e.g., drag), SEND (e.g., ship), GET (e.g., buy) and PREPARE (e.g., bake) verbs are also fairly well represented in the corpus, in contrast to SLIDE verbs (e.g., bounce) for which no instances were found.</Paragraph> <Paragraph position="4"> Note that the corpus and Levin did not agree with respect to the most popular classes licensing the dative and benefactive alternations: THROWING (e.g., toss) and BUILD verbs (e.g., carve) are the biggest classes in Levin allowing the dative and benefactive alternations respectively, in contrast to FUTURE HAVING and GET verbs in the corpus.</Paragraph> <Paragraph position="5"> This can be explained by looking at the average corpus frequency of the verbs belonging to the semantic classes in question: FUTURE HAVING and GET Levi, I 1 1 verbs outnumber THROWING and BUILD verbs by 30 ~ Corpus dative . II 1 I a factor of two to one.</Paragraph> </Section> </Section> <Section position="6" start_page="401" end_page="402" type="metho"> <SectionTitle> 5 Productivity </SectionTitle> <Paragraph position="0"> The relative productivity of an alternation for a se- 20 mantic class can be estimated by calculating the ratio of acquired to possible verbs undergoing the alternation (Aronoff, 1976; Briscoe and Copestake, Z factive alternations Levin defines 10 semantic classes of verbs for which the dative alternation applies (e.g., GIVE verbs, verbs of FUTURE HAVING, SEND verbs), and 5 classes for which the benefactive alternation applies (e.g., BUILD, CREATE, PREPARE verbs), assuming that verbs participating in the same class share certain meaning components.</Paragraph> <Paragraph position="1"> We partitioned our data according to Levin's pre-defined classes. Figure 2 shows for each semantic a given class as f(acquired, class), the number of verbs which were found in the corpus and are members of the class, over f(class), the total number of verbs which are listed in Levin as members of the class (Total). The productivity values (Prod) for both the dative and the benefactive alternation (Alt) are summarized in table 6.</Paragraph> <Paragraph position="2"> Note that productivity is sensitive to class size. The productivity of BRING-TAKE verbs is estimated to be 1 since it contains only 2 members which were also found in the corpus. This is intuitively correct, as we would expect the alternation to be more productive for specialized classes.</Paragraph> <Paragraph position="3"> The productivity estimates discussed here can be potentially useful for treating lexical rules probabilistically, and for quantifying the degree to which language users are willing to apply' a rule in order for the dative and benefactive alternation to produce a novel form (Briscoe and Copestake, 1996).</Paragraph> </Section> <Section position="7" start_page="402" end_page="402" type="metho"> <SectionTitle> 6 Typicality </SectionTitle> <Paragraph position="0"> Estimating the productivity of an alternation for a given class does not incorporate information about the frequency of the verbs undergoing the alternation. We propose to use frequency data to quantify the typicality of a verb or verb class for a given alternation. The underlying assumption is that a verb is typical for an alternation if it is equally frequent for both frames which are characteristic for the alternation. Thus the typicality of a verb can be defined as the conditional probability of the frame given the verb: f (framei, verb) (6) P(frameilverb) = y~ f fframe n, verb) n We calculate Pfframeilverb) by dividing f(frame i, verb), the number of times the verb was attested in the corpus with frame i, by ~-~.,, f(frame,,, verb), the overall number of times the verb was attested. In our case a verb has two frames, hence P(frameilverb) is close to 0.5 for typical verbs (i.e., verbs with balanced frequencies) and close to either 0 or 1 for peripheral verbs, depending on their preferred frame. Consider the verb owe as an example (cf. figure 1). 648 instances of owe were found, of which 309 were instances of the double object frame. By dividing the latter by the former we can see that owe is highly typical of the dative alternation: its typicality score for the double object frame is 0.48.</Paragraph> <Paragraph position="1"> By taking the average of P(framei, verb) for all verbs which undergo the alternation and belong to the same semantic class, we can estimate how typical this class is for the alternation. Table 6 illustrates the typicality (Typ) of the semantic classes for the two alternations. (The typicality values were computed for the double object frame). For the dative alternation, the most typical class is GIVE, and the most peripheral is DRIVE (e.g., ferry). For the benefactive alternation, PERFORMANCE (e.g., sing), PREPARE (e.g., bake) and GET (e.g., buy) verbs are the most typical, whereas CREATE verbs (e.g., compose) are peripheral, which seems intuitively correct. null</Paragraph> </Section> <Section position="8" start_page="402" end_page="402" type="metho"> <SectionTitle> 7 Future Work </SectionTitle> <Paragraph position="0"> The work reported in this paper relies on frame frequencies acquired from corpora using partial-parsing methods. For instance, frame frequency data was used to estimate whether alternating verbs exhibit different preferences for a given frame (typicality). null However, it has been shown that corpus idiosyncrasies can affect subcategorization frequencies (cf. Roland and Jurafsky (1998) for an extensive discussion). This suggests that different corpora may give different results with respect to verb alternations. For instance, the to-PP frame is poorly' represented in the syntactically annotated version of the Penn Treebank (Marcus et al., 1993). There are only 26 verbs taking the to-PP frame, of which 20 have frame frequency of 1. This indicates that a very small number of verbs undergoing the dative alternation can be potentially acquired from this corpus. In future work we plan to investigate the degree to which corpus differences affect the productivity and typicality estimates for verb alternations.</Paragraph> </Section> class="xml-element"></Paper>