File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1327_metho.xml
Size: 14,722 bytes
Last Modified: 2025-10-06 14:07:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1327"> <Title>Using Semantically Motivated Estimates to Help Subcategorization Acquisition</Title> <Section position="4" start_page="216" end_page="217" type="metho"> <SectionTitle> 2 Examining SCF Correlation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="216" end_page="217" type="sub_section"> <SectionTitle> between Semantically Similar Verbs </SectionTitle> <Paragraph position="0"> To examine the degree of SCF correlation between semantically similar verbs, we took Levin's verb classification (1993) as a starting point. Levin verb classes are based on the ability of a verb to occur in specific diathesis alternations, i.e. specific pairs of syntactic frames which are assumed to be meaning preserving. The classification provides semantically-motivated sets of syntactic frames associated with individual classes.</Paragraph> <Paragraph position="1"> While Levin's shows that there is correlation between the SCFs related to the verb sense, our aim is to examine whether there is also correlation between the SCFs specific to the verb form. Unlike Levin, we are concerned with polysemic scF distributions involving all senses of verbs. In addition, we are not only interested in the degree of correlation between sets of SCFs, but also in comparing the ranking of SCFs between distributions. Nevertheless, Levin classes provide us a useful starting point.</Paragraph> <Paragraph position="2"> Focusing on five broad Levin classes change of possession, assessment, killing, motion, and destroy verbs - we chose four test verbs from each class and examined the degree with which the SCF distribution for these verbs correlates with the SCF distributions for two other verbs from the same Levin class.</Paragraph> <Paragraph position="3"> The latter verbs were chosen so that one of the verbs is a synonym, and the other a hypernym, of a test verb. We used WordNet (Miller et al., 1990) for defining and recognising these semantic relations. We defined a hypernym as a test verb's hypernym in WordNet, and a synonym as a verb which, in WordNet, shares this same hypernym with a test verb. We also examined how well the SCF distribution for the different test verbs correlates with the SCF distribution of all English verbs in general and with that of a semantically different verb (i.e.</Paragraph> <Paragraph position="4"> a verb belonging to a different Levin class).</Paragraph> <Paragraph position="5"> We used two methods for obtaining the scF distributions. The first was to acquire an unfiltered subcategorization lexicon for 20 million words of the British National Corpus (BNC/) (Leech, 1992) data using Briscoe and Carroll's (1997) system (introduced in section 3.2). This gives us the observed distribution of SCFs for individual verbs and that for all verbs in the BNC data. The second method was to manually analyse around 300 occurrences of each test verb in the BNC data. This gives us an estimate of the correct SCF distributions for the individual verbs. The estimate for the correct distribution of SCFs over all English verbs was obtained by extracting the number of verbs which are members of each SCF class in the ANLT dictionary (Boguraev et al., 1987).</Paragraph> <Paragraph position="6"> The degree of correlation was examined by calculating the Kullback-Leibler distance (KL) (Cover and Thomas, 1991) and the Spearman rank correlation coefficient (Re) (Spearman, 1904) between the different distributions 2.</Paragraph> <Paragraph position="7"> The results given in tables 1 and 2 were obtained by correlating the observed SCF distributions from the BNC data. Table 1 shows an example of correlating the SCF distribution of the motion verb .fly against that of (i) its hypernym move, (ii) synonym sail, (iii) all verbs in general, and (iv) agree, which is not related semantically. The results show that the SCF distribution for .fly clearly correlates better with the SCF distribution for move and sail than that for all verbs and agree. The av2Note that Io., >_ 0, with IO., near to 0 denoting strong association, and -1 _< RC < 1, with RC near to 0 denoting a-low degree of association and ttc near to -1 and 1 denoting strong association.</Paragraph> <Paragraph position="8"> erage results for all test verbs given in table 2 indicate that the degree of SCF correlation is the best with semantically similar verbs. Hypernym and synonym relations are nearly as good, the majority of verbs showing slightly better SCF correlation with hypernyms. The SCF correlation between individual verbs, and verbs in general, is poor, but still better than with semantically unrelated verbs.</Paragraph> <Paragraph position="9"> These findings with the observed SCF distributions hold as well with the correct SCF distributions, as seen in table 3. The results show that in terms of SCF distributions, verbs in all classes examined correlate better with their hypernym verbs than with all verbs in general.</Paragraph> <Paragraph position="10"> As one might expect, the polysemy of the individual verbs affects the degree of SCF correlation between semantically similar verbs.</Paragraph> <Paragraph position="11"> The degree of SCF correlation is higher with those verbs whose predominant 3 sense is involved with the Levin class examined. For example, the SCF distribution for the killing verb murder correlates better with that for the verb kill than that for the verb execute, whose predominant sense is not involved with killing verbs.</Paragraph> <Paragraph position="12"> These results show that the verb sense specific SCF correlation observed by Levin extends to the verb form specific SCF correlation and applies to the ranking of SCFs as well.</Paragraph> <Paragraph position="13"> This suggests that we can obtain more accurate back-off estimates for verbal acquisition by basing them on a semantic verb type. To find out whether such semantically motiwted SPredomlnant sense refers here to the most frequent sense of verbs in WordNet.</Paragraph> <Paragraph position="14"> correlation results estimates can be used to improve SCF acquisition, we performed an experiment which we describe below.</Paragraph> </Section> </Section> <Section position="5" start_page="217" end_page="219" type="metho"> <SectionTitle> 3 Experiment </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="217" end_page="218" type="sub_section"> <SectionTitle> 3.1 Back-off Estimates for the Data </SectionTitle> <Paragraph position="0"> The test data consisted of a total of 60 verbs from 12 broad Levin classes, listed in table 4.</Paragraph> <Paragraph position="1"> Two of the examined Levin classes were collapsed together with another similar Levin class, making the total number of test classes 10. The verbs were chosen at random, sub-ject to the constraint that they occurred frequently enough in corpus data 4 and when applicable, represented different sub-classes of each examined Levin class. To reduce the problem of polysemy, we required that the predominant sense of each verb corresponds to the Levin class in question. This was ensured by manually verifying that the most frequent sense of a verb in WordNet corresponds to the sense involved in the particular Levin class.</Paragraph> <Paragraph position="2"> To obtain the back-off estimates, we chose 4-5 representative verbs from each verb class and obtained correct SCF distributions for these verbs by manually analysing around 300 occurrences of each verb in the ant data. We merged the resulting set of SCF distributions to construct the unconditional SCF distribution for the verb class. This approach was taken to minimise the sparse data problem and cover SCF variations within verb classes and due to polysemy. The bazk-off estimates 4We required at least 300 occurrences for each verb. This was merely to guarantee accurate enough testing, as we evaluated our results against manual analysis of corpus data (see section 4).</Paragraph> <Paragraph position="3"> place, lay, drop, pour, load, fill send, ship, carry, bring, transport pull, push give, lend, contribute, donate, offer provide, supply, acquire, buy analyse fish, explore, investigate agree, communicate, struggle, marry, meet, visit kill, murder, slaughter, strangle demolish, destroy, ruin, devastate arise, emerge, disappear, vanish arrive, depart, march, move, slide, swing travel, walk, fly, sail, dance begin, end, start, terminate, complete for motion verbs, for example, were obtained by merging the SCF distributions of the verbs march, move, fly, slide and sail. Each verb used in obtaining the estimates was excluded when testing the verb itself. For example, when acquiring subcategorization for the verb fly, estimates were obtained only using verbs march, move, slide and sail.</Paragraph> </Section> <Section position="2" start_page="218" end_page="218" type="sub_section"> <SectionTitle> 3.2 Framework for SCF Acquisition </SectionTitle> <Paragraph position="0"> Briscoe and Carroll's (1997) verbal acquisition system distinguishes 163 SCFs and returns relative frequencies for each SCF found for a given predicate. The SCFs are a superset of classes found in the ANLT and COMLEX (Grishman et al., 1994) dictionaries. They incorporate information about control of predicative arguments, as well as alternations such as extraposition and particle movement. The system employs a shallow parser to obtain the subcategorization information. Potential SCF entries are filtered before the final SCF lexicon is produced. While Briscoe and Carroll (1997) used a statistical filter based on binomial hypothesis test, we adopted another method, where the conditional SCF distribution from the system is smoothed before filtering the SCFS, using the different techniques introduced in section 3.3. After smoothing, filtering is performed by applying a threshold to the resulting set of probability estimates.</Paragraph> <Paragraph position="1"> We used training data to find an optimal average threshold for each verb class examined.</Paragraph> <Paragraph position="2"> This filtering method allows us to examine the benefits of smoothing without introducing problems based on the statistical filter.</Paragraph> </Section> <Section position="3" start_page="218" end_page="219" type="sub_section"> <SectionTitle> 3.3 Smoothing 3.3.1 Add One Smoothing </SectionTitle> <Paragraph position="0"> Add one smoothing s has the effect of giving some of the probability space to the SCFs unseen in the conditional distribution. As it assumes a uniform prior on events, it provides a baseline smoothing method against which the 5See (Manning and Schiltze, 1999) for detailed information about the smoothing techniques discussed here.</Paragraph> <Paragraph position="1"> more sophisticated methods can be compared. Let c(x=) be the frequency of a SCF given a verb, N the total number of SCF tokens for this verb in the conditional distribution, and C the total number of SCF types. The estimated probability of the SCF is:</Paragraph> <Paragraph position="3"> In Katz backing-off (Katz, 1987), some of the probability space is given to the SCFs unseen or of low frequency in the conditional distribution. This is done by backing-off to an unconditional distribution. Let p(xn) be a probability of a SCF in the conditional distribution, and p(xnv) its probability in the unconditional distribution, obtained by maximum likelihood estimation. The estimated probability of the scF is calculated as follows:</Paragraph> <Paragraph position="5"> The cut off frequency ci is an empirically defined threshold determining whether to back-off or not. When counts are lower than cl they are held too low to give an accurate estimate, and we back-off to an unconditional distribution. In this case, we discount p(x~) a certain amount to reserve some of the probablity space for unseen and very low frequency scFs. The discount (d) is defined empirically, and a is a normalization constant which ensures that the probabilities of the resulting distribution sum to 1.</Paragraph> <Paragraph position="6"> While Katz backing-off consults different estimates depending on their specificity, linear interpolation makes a linear combination of them. Linear interpolation is used here for the simple task of combining a conditional distribution with an unconditional one. The estimated probability of the SCF is given by</Paragraph> <Paragraph position="8"> where the Ai denotes weights for different context sizes (obtained by optimising the smoothing performance on the training data for all zn) and sum to 1.</Paragraph> </Section> </Section> <Section position="6" start_page="219" end_page="219" type="metho"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="219" end_page="219" type="sub_section"> <SectionTitle> 4.1 Method </SectionTitle> <Paragraph position="0"> To evaluate the approach, we took a sample of 20 million words of the BNC and extracted all sentences containing an occurrence of one of the 60 test verbs on average of 3000 citations of each. The sentences containing these verbs were processed by the SCF acquisition system, and the smoothing methods were applied before filtering. We also obtained results for a baseline without any smoothing.</Paragraph> <Paragraph position="1"> The results were evaluated against a manual analysis of the corpus data. This was obtained by analysing up to a maximum of 300 occurrences for each test verb in BNC/ or LOB (Garside et al., 1987), Susanne and SEC (Taylor and Knowles, 1988) corpora. We calculated type precision (percentage of SCFs acquired which were also exemplified in the manual analysis) and recall (percentage of the SCFs exemplified in the manual analysis which were also acquired automatically), and combined them into a single measure of overall performance using the F measure (Manning and Schiitze, 1999).</Paragraph> <Paragraph position="2"> F = 2. precision, recall (4) precision -4- recall We estimated accuracy with which the system ranks true positive classes against the correct ranking. This was computed by calculating the percentage of pairs of SCFs at positions (n, m) such that n < m in the system ranking that occur in the same order in the ranking from the manual analysis. This gives us an estimate of the accuracy of the relative frequencies of SCFs output by the system. In addition to the system results, we also calculated KL and Rc between the acquired unfiltered SCF distributions and the distributions obtained from the manual analysis.</Paragraph> </Section> </Section> class="xml-element"></Paper>