File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1208_metho.xml
Size: 18,598 bytes
Last Modified: 2025-10-06 14:10:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1208"> <Title>Interpretation of Compound Nominalisations using Corpus and Web Statistics</Title> <Section position="5" start_page="54" end_page="55" type="metho"> <SectionTitle> 3 Resources </SectionTitle> <Paragraph position="0"> We make use of a number of lexical resources in our implementation and evaluation. For corpus statistics, we use the written component of the BNC, a balanced 90M token corpus. To nd verb argument frequencies, we parse this using RASP (Briscoe and Carroll, 2002), a tag sequence grammar based statistical parser. We contrast the corpus statistics with ones collected from the Web, using an implementation of a freely available Google scraper from CPAN.1 For a given compound nominalisation, we wish to determine all possible verbal forms of the head. We do so using the combination of the morphological component of CELEX (Burnage, 1990), a lexical database, NOMLEX (Macleod et al., 1998), a nominalisation database, and CATVAR (Habash and Dorr, 2003), an automatically constructed database of clusters of in ected words based on the Porter stemmer (Porter, 1997).</Paragraph> <Paragraph position="1"> Once the verbal forms have been identi ed, we construct canonical forms of the present participle (+ing) and the past participle (+ed), using the morph lemmatiser (Minnen et al., 2001). We construct canonical forms of the plural head and plural modi er (+s) in the same manner.</Paragraph> <Paragraph position="2"> For evaluation, we have the two way classi ed data set used by Lapata (2002), and a three way classi ed data set constructed from open text.</Paragraph> <Paragraph position="3"> Lapata automatically extracts candidates from the British National Corpus, and hand curates a set of 796 compound nominalisations which were interpreted as either a subjective relation SUBJ (e.g. wood appearance wood appears ), or a (direct) objective relation OBJ (e.g. stress avoidance [SO] avoids stress . We automatically validated this data set for consistency, removing: 1. items that did not occur in the same chunk, according to a chunker based on fnTBL 1.0 (Ngai and Florian, 2001), 2. items whose head did not have a verbal form according to our lexical resources, and 3. items which consisted in part of proper nouns, to end up with 695 consistent compounds. We used the method of Nicholson and Baldwin (2005) to derive a small data set of 129 compound nominalisations, also from the BNC, which we instructed three unskilled annotators to identify each as one of subjective (SUB), direct object (DOB), or prepositional object (POB, e.g. side show [SO] show [ST] on the side ). The annotators identi ed nine prepositional relations: {about,against,for,from,in,into,on,to,with}.</Paragraph> <Paragraph position="4"> for examining the actual text of the returned documents.</Paragraph> </Section> <Section position="6" start_page="55" end_page="56" type="metho"> <SectionTitle> 4 Proposed Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 4.1 Paraphrase Tests </SectionTitle> <Paragraph position="0"> To derive preferences for the SUB, DOB, and various POB interpretations for a given compound nominalisation, the most obvious approach is to examine a parsed corpus for instances of the verbal form of the head and the modi er occurring in the corresponding verb argument relation. There are other constructions that can be informative, however. null We examine two novel paraphrase tests: one prepositional and one participial. The prepositional test is based in part on the work by Leonard (1984) and Lauer (1995): for a given compound, we search for instances of the head and modi er nouns separated by a preposition. For example, for the compound nominalisation leg operation, we might search for operation on the leg, corresponding to the POB relation on. Special cases are by, corresponding to a subjective reading akin to a passive construction (e.g. investor hesitancy, hesitancy by the investor [?] the investor hesitates ), and of, corresponding to a direct object reading (e.g. language speaker, speaker of the language [?] [SO] speaks the language ).</Paragraph> <Paragraph position="1"> The participial test is based on the paraphrasing equivalence of using the present participle of the verbal head as an adjective before the modi er, for the SUB relation (e.g. the hesitating investor [?] the investor hesitates ), compared to the past participle for the DOB relation (the spoken language [?] [SO] speaks the language ). The corresponding prepositional object construction is unusual in English, but still possible: compare ?the operatedon leg and the lived-in village.</Paragraph> </Section> <Section position="2" start_page="55" end_page="56" type="sub_section"> <SectionTitle> 4.2 The Algorithm </SectionTitle> <Paragraph position="0"> Given a compound nominalisation, we perform a number of steps to arrive at an interpretation. First, we derive a set of verbal forms for the head from the combination of CELEX, NOMLEX, and CAT-VAR. We nd the participial forms of each of the verbal heads, and plurals for the nominal head and modi er, using the morph lemmatiser.</Paragraph> <Paragraph position="1"> Next, we examine the BNC for instances of the modi er and one of the verbal head forms occurring in a verb argument relation, with the aid of the RASP parse. Using these frequencies, we calculate the pairwise z-scores between SUB and DOB, and between SUB and POB: the score given to the SUB interpretation is the greater of the two. We further examine the RASP parsed data for instances of the prepositional and participial tests for the compound, and calculate the z-scores for these as well.</Paragraph> <Paragraph position="2"> We then collect our Google counts. Because the Web data is unparsed, we cannot look for syntactic structures explictly. Instead, we query a number of collocations which we expect to be representative of the desired structure.</Paragraph> <Paragraph position="3"> For the prepositional test, the head can be singular or plural, the modi er can be singular or plural, and there may or may not be an article between the preposition and the modi er. For example, for the compound nominalisation product replacement and preposition of we search for all of the following: (and similarly for the other prepositions) null replacement of product replacement of the product replacement of products replacement of the products replacements of product replacements of the product replacements of products replacements of the products For the participial test, the modi er can be singular or plural, and if we are examining a prepositional relation, the head can be either a present or past participle. For product replacement, we search for, as well as other prepositions: the replacing product the replacing products the replaced product the replaced products the replacing about product the replacing about products the replaced about product the replaced about products We comment brie y on these tests in Section 6. We choose to use the as our canonical article because it is a reliable marker of the left boundary of an NP and number-neutral; using a/an represents a needless complication.</Paragraph> <Paragraph position="4"> We then calculate the z-scores using the method described in Section 2, where the individual frequency counts are the maximum of the results obtained across the query set.</Paragraph> <Paragraph position="5"> Once the z-scores have been obtained, we choose a classi cation based on the greatestvalued observed test. We contrast the con dence interval based approach with the maximum likelihood method of choosing the largest of the raw frequencies. We also experiment with a machine learning package, to examine the mutual predictiveness of the separate tests.</Paragraph> </Section> </Section> <Section position="7" start_page="56" end_page="57" type="metho"> <SectionTitle> 5 Observed Results </SectionTitle> <Paragraph position="0"> First, we found majority-class baselines for each of the data sets. The two way data set had 258 SUBJ classi ed items, and 437 OBJ classi ed items, so choosing OBJ each time gives a baseline of 62.9%. The three way set had 22 SUB items, 63 of DOB, and 44 of POB, giving a baseline of 48.8%.</Paragraph> <Paragraph position="1"> Contrasting this with human performance on the data set, Lapata recorded a raw inter-annotator agreement of 89.7% on her test set, which corresponds to a Kappa value k = 0.78. On the three way data set, three annotators had a agreement of 98.4% for identi cation and classi cation of observed compound nominalisations in open text, and k = 0.83. For the three-way data set, the annotators were asked to both identify and classify compound nominalisations in free text, and agreement is thus calculated over all words in the test. The high agreement gure is due to the fact that most words could be trivially disregarded (e.g. were not nouns). Kappa corrects this for chance agreement, so we conclude that this task was still better-de ned than the one posed by Lapata. One possible reason for this was the number of poorly behaved compounds that we removed due to chunk inconsistencies, lack of a verbal form, or proper nouns: it would be dif cult for the annotators to agree over compounds where an obvious well de ned interpretation was not available. null</Paragraph> <Section position="1" start_page="56" end_page="57" type="sub_section"> <SectionTitle> 5.1 Comparison Classi cation </SectionTitle> <Paragraph position="0"> Results for classi cation over the Lapata two way data set are given in Table 1, and results over the open data three way set are given in Table 2.</Paragraph> <Paragraph position="1"> For these, we selected the greatest raw frequency count for a given test as the intended relation (Raw), or the greatest con dence interval according to the z-score (Z-Score). If a relation could not be selected due to ties (e.g., the scores were all 0), we selected the majority baseline. To deal with the nature of the two way data set with respect to our three way selection, we mapped compounds that we would prefer to be POB to OBJ, as there are vs. con dence-based z-scores, for BNC data and Google scrapings shown. compounds in the set (e.g. adult provision) that have a prepositional object reading ( provide for adults ) but have been classi ed as a direct object OBJ.</Paragraph> <Paragraph position="2"> The verb argument counts obtained from the parsed BNC are signi cantly better than the base-line for the Lapata data set (kh2 = 4.12,p [?] 0.05), but not signi cantly better for the open data set</Paragraph> <Paragraph position="4"> by Lapata (2002) over her data set using backed off smoothing, the most closely related method.</Paragraph> <Paragraph position="5"> Neither the prepositional nor participial paraphrases were signi cantly better than the baseline for either the two way (kh2 = 0.00,p [?] 1), or the three way data set (kh2 = 3.52,p [?] 0.10), although the prepositional test did slightly improve on the verb argument results.</Paragraph> </Section> <Section position="2" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 5.2 Machine Learning Classi cation </SectionTitle> <Paragraph position="0"> Although the results were not impressive, we still believed that there was complementing information within the data, which could be extracted with the aid of a machine learner. For this, we made use of TiMBL (Daelemans et al., 2003), a nearest-neighbour classi er which stores the entire training set and extrapolates further samples, as a principled method for combination of the data. We use TiMBL's in-built cross-validation method: 90% of the data set is used as training data to test the other 10%, for each strati ed tenth of the set. The results it achieves are assumed to be able to generalise to new samples if they are compared to the current training data set.</Paragraph> <Paragraph position="1"> The results observed using TiMBL are shown paraphrase tests over the two way and three way data sets for corpus and Web frequencies in Table 3. This was from the combination of all of the available paraphrase tests: verb argument, prepositional, and participial for the corpus counts, and just prepositional and participial for the Web counts. The results for the two way data set derived from Lapata's data set were a good improvement over the simple classi cation results, signi cantly so for the Web frequencies</Paragraph> <Paragraph position="3"> tice a corresponding decrease in the results for the three way open data set, which make these improvements immaterial.</Paragraph> <Paragraph position="4"> Examining the other possible combinations for the tests did indeed lead to varying results, but not in a consistent manner. For example, the best combination for the open data set was using the participial raw scores and z-scores (58.1%), which performed particularly badly in simple comparisons, and comparatively poorly (70.2%) for the two way set.</Paragraph> </Section> </Section> <Section position="8" start_page="57" end_page="59" type="metho"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> Although the observed results failed to match, or even approach, various benchmarks set by Lapata (2002) (87.3% accuracy) and Grover et al.</Paragraph> <Paragraph position="1"> (2005) (77%) for the subject object and subject direct object prepositional objects classi cation tasks respectively, the presented approach is not without merit. Indeed, these results relied on machine learning approaches incorporating many features independent of corpus counts: namely, context, suf x information, and semantic similarity resources. Our results were an examination of the possible contribution of lexical information available from high volume unparsed text.</Paragraph> <Paragraph position="2"> One important concept used in the above benchmarks was that of statistical smoothing, both class based and distance based. The reason for this is the inherent data sparseness within the corpus statistics for these paraphrase tests. Lapata (2002) observes that almost half (47%) of the verb noun pairs constructed are not attested within the BNC. Grover et al. (2005) also note the sparseness of observed relations. Using the immense data source of the Web allows one to circumvent this problem: only one compound (anarchist prohibition) has no instances of the paraphrases from the scraping,2 from more than 900 compounds between the two data sets. This extra information, we surmise, would be bene cial for the smoothing procedures, as the comparative accuracy between the two methods is similar.</Paragraph> <Paragraph position="3"> On the other hand, we also observe that simply alleviating the data sparseness is insuf cient to provide a reliable interpretation. These results reinforce the contribution made by the statistical and semantic resources used in arriving at these benchmarks.</Paragraph> <Paragraph position="4"> The approach suggested by Keller and Lapata (2003) for obtaining bigram information from the Web could provide an approach for estimating the syntactic verb argument counts for a given compound (dashes in Tables 1 and 2). In spite of the inherent unreliability of approximating long range dependencies with n-gram information, results look promising. An examination of the effectiveness of this approach is left as further research. Similarly, various methods of combining corpus counts with the Web counts, including smoothing, backing off, and machine learning, could also lead to interesting performance impacts.</Paragraph> <Paragraph position="5"> Another item of interest is the comparative difculty of the task presented by the three way data set extracted from open data, and the two way data set hand curated by Lapata. The baseline 2Interestingly, Google only lists 3 occurrences of this compound anyway, so token relevance is low further inspection shows that those 3 are not well-formed in any case. of this set is much lower, even compared that of the similar task (albeit domain speci c) from Grover et al. (2005) of 58.6%. We posit that the hand ltering of the data set in these works contributes to a biased sample. For example, removing prepositional objects for a two way classi cation, which make up about a third of the open data set, renders the task somewhat arti cial.</Paragraph> <Paragraph position="6"> Comparison of the results between the maximum likelihood estimates used in earlier work, and the more statistically robust con dence intervals were inconclusive as to performance improvement, and were most effective as a feature expansion algorithm. The only obvious result is an aesthetic one, in using robust statistics .</Paragraph> <Paragraph position="7"> Finally, the paraphrase tests which we propose are not without drawbacks. In the prepositional test, a paraphrase with of does not strictly contribute to a direct object reading: consider school aim school aims , for which instances of aim by the school are overwhelmed by aim of the school.</Paragraph> <Paragraph position="8"> We experimented with permutations of the available queries (e.g. requiring the head and modi er to be of different number, to re ect the pluralisability of the head in such compounds, e.g. aims of the school), without observing substantially different results.</Paragraph> <Paragraph position="9"> Another observation is the inherent bias of the prepositional test to the prepositional object relation. Apparent prepositional relations can occur in spite of the available verb frames: consider cash limitation, where the most populous instance is limitation on cash, despite the impossibility of *to limit on cash (for to place a limit on cash). Another example, is bank agreement: nding instances of agreement with bank does not lead to the pragmatically absurd [SO] agrees with the bank.</Paragraph> <Paragraph position="10"> Correspondingly, the participial relation has the opposite bias: constructions of the form the lived-in at [SO] lived in the at are usually lexicalised in English. As such, only 17% of compounds in the two way data set and 34% of the three-way data set display non-zero values in the prepositional object relation for the participial test. We hoped that the inherent biases of the two tests might balance each other, but there is little evidence of that from the results.</Paragraph> </Section> class="xml-element"></Paper>