File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1016_intro.xml
Size: 8,627 bytes
Last Modified: 2025-10-06 14:05:51
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1016"> <Title>On Learning more Appropriate Selectional Restrictions</Title> <Section position="2" start_page="0" end_page="113" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In recent years there has been a common agreement in the NLP research community on the importance of having an extensive coverage of selectional restrictions (SRs) tuned to the domain to work with. SRs can be seen as semantic type constraints that a word sense imposes on the words with which it combines in the process of semantic interpretation. SRs may have different applications in NLP, specifically, they may help a parser with Word Sense Selection (WSS, as in (Hirst, 1987)), with preferring certain structures out of several grammatical ones (Whittemore et al., 1990) and finally with deciding the semantic role played by a syntactic complement (Basili et al., 1992). Lexicography is also interested in the acquisition of SRs (both defining in context approach and lexical semantics work (Levin, 1992)).</Paragraph> <Paragraph position="1"> The aim of our work is to explore the feasibility of using an statistical method for extracting SRs from on-line corpora. Resnik (1992) developed a method for automatically extracting class-based SRs from on-line corpora. Ribas (1994a) *This research has been made in the framework of the Acquilex-II Esprit Project (7315), and has been supported by a grant of Departament d'Ensenyament, Generalitat de Catalunya, 91-DOGC-1491.</Paragraph> <Paragraph position="2"> performed some experiments using this basic technique and drew up some limitations from the corresponding results.</Paragraph> <Paragraph position="3"> In this paper we will describe some substantial modifications to the basic technique and will report the corresponding experimental evaluation.</Paragraph> <Paragraph position="4"> The outline of the paper is as follows: in section 2 we summarize the basic methodology used in (Ribas, 1994a), analyzing its limitations; in section 3 we explore some alternative statistical measures for ranking the hypothesized SRs; in section 4 we propose some evaluation measures on the SRs-learning problem, and use them to test the experimental results obtained by the different techniques; finally, in section 5 we draw up the final conclusions and establish future lines of research. null</Paragraph> <Section position="1" start_page="0" end_page="112" type="sub_section"> <SectionTitle> 2.1 Description </SectionTitle> <Paragraph position="0"> The technique functionality can be summarized as: Input The training set, i.e. a list of complement co-occurrence triples, (verb- lemma, syntactic-relationship, noun-lemma) extracted from the corpus.</Paragraph> <Paragraph position="1"> Previous knowledge used A semantic hierarchy (WordNet 1) where words are clustered in semantic classes, and semantic classes are organized hierarchically. Polysemous words are represented as instances of different classes.</Paragraph> <Paragraph position="2"> Output A set of syntactic SRs, (verb-lemma, syntactic-relationship, semantic-class, weight). The final SRs must be mutually disjoint. SRs are weighted according to the statistical evidence found in the corpus.</Paragraph> <Paragraph position="3"> Learning process 3 stages: suit suit suit administration, agency, bank, ...</Paragraph> <Paragraph position="4"> suit advocate, buyer,carrier, client, ... group proper_name proper_name administration ,government government, leadership administration, leadership, provision concern, leadership, provision, science 2. Evaluation of the appropriateness of the candidates by means of a statistical measure. null 3. Selection of the most appropriate subset in the candidate space to convey the SRs. The appropriateness of a class for expressing SRs (stage 2) is quantified from tile strength of co-occurrence of verbs and classes of nouns in the. corpus (Resnik, 1992). Given the verb v, the syntactic-relationship s and the candidate class c, the Association Score, Assoc, between v and c in s is defined: Assoc(v,s,c) = p(clv, s)I(v;cls ) = p(clv, s)log p( lv, s._____)) p(cls) The two terms of Assoc try to capture different properties: 1. Mutual information ratio, l(v; cls), measures the strength of the statistical association between the given verb v and the candidate class c in the given syntactic position s. It compares the prior distribution, p(cls), with the posterior distribution, p(clv, s). 2. p(elv, s) scales up the strength of the association by the frequency of the relationship. Probabilities are estimated by Maximum Likelihood Estimation (MLE), i.e. counting the relative frequency of the considered events in the corpuQ. However, it is not obvious how to calculate class frequencies when the training corpus is not semantically tagged as is the case. Nevertheless, we take a simplistic approach and calculate them in the following manner:</Paragraph> <Paragraph position="6"> on class-based distributions is dubious, see (Resnik, 1993).</Paragraph> <Paragraph position="7"> Where w is a constant factor used to normalize the probabilities 3 W .~ ~vEV ~sqS ~nqAf freq( v, S, n)lsenses(n)l (2) When creating the space of candidate classes (learning process, stage 1), we use a threshold. ing technique to ignore as much as possible the noise introduced in the training set. Specifically, we consider only those classes that have a higher number of occurrences than the threshold. The selection of the most appropriate classes (stage 3) is based on a global search through the candidates, in such a way that the final classes are mutually disjoint (not related by hyperonymy).</Paragraph> </Section> <Section position="2" start_page="112" end_page="113" type="sub_section"> <SectionTitle> 2.2 Evaluation </SectionTitle> <Paragraph position="0"> Ribas (1994a) reported experimental results obtained from the application of the above technique to learn SRs. He performed an evaluation of the SRs obtained from a training set of 870,000 words of the Wall Street Journal. In this section we summarize the results and conclusions reached in that paper.</Paragraph> <Paragraph position="1"> For instance, table 1 shows the SRs acquired for the subject position of the verb seek. Type indicates a manual diagnosis about the class appropriateness (Ok: correct; ~Abs: over-generalization; Senses: due to erroneous senses). Assoc corresponds to the association score (higher values appear first). Most of the induced classes are due to incorrect senses. Thus, although suit was used in the WSJ articles only in the sense of < legal_action >, the algorithm not only considered the other senses as well (< suit, suing >,< aResnik (1992) and Ribas (1994a) used equation 1 without introducing normalization. Therefore, the estimated function didn't accomplish probability axioms. Nevertheless, their results should be equivalent (for our purposes) to those introducing normalization because it shouldn't affect the relative ordering of Assoc among rival candidate classes for the same (v, s). suit_of_clothes >, < sugt >) , but the Assoc score ranked them higher than the appropriate sense.</Paragraph> <Paragraph position="2"> We can also notice that the l~Abs class, < group >, seems too general for the example nouns, while one of its daughters, < people > seems to fit the data much better.</Paragraph> <Paragraph position="3"> Analyzing the results obtained from different experimental evaluation methods, Ribas (1994a) drew up some conclusions: a. The technique achieves a good coverage.</Paragraph> <Paragraph position="4"> b. Most of the classes acquired result from the accumulation of incorrect senses.</Paragraph> <Paragraph position="5"> c. No clear co-relation between Assoc and the manual diagnosis is found.</Paragraph> <Paragraph position="6"> d. A slight tendency to over-generalization exists due to incorrect senses.</Paragraph> <Paragraph position="7"> Although the performance of the presented technique seems to be quite good, we think that some of the detected flaws could possibly be addressed. Noise due to polysemy of the nouns involved seems to be the main obstacle for the practicality of the technique. It makes the association score prefer incorrect classes and jump on overgeneralizations. In this paper we are interested in exploring various ways to make the technique more robust to noise, namely, (a) to experiment with variations of the association score, (b) to experiment with thresholding.</Paragraph> </Section> </Section> class="xml-element"></Paper>