File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2099_metho.xml
Size: 11,644 bytes
Last Modified: 2025-10-06 14:13:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2099"> <Title>ACQUISITION OF SELECTIONAL PATTERNS</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> ACQUISITION OF SELECTIONAL PATTERNS RALPH GRISHMAN and JOHN STERLING </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 The Problem </SectionTitle> <Paragraph position="0"> For most natural language analysis systems, one of the major hurdles in porting the system to a new domain is the development of an appropriate set of semantic patterns. Such patterns are typically needed to guide syntactic analysis (as selectional constraints) and to control the translation into a predicate-argument representation.</Paragraph> <Paragraph position="1"> As systems are ported to more complex domains, the set of patterns grows and the task of accumulating them manually becomes more formidable.</Paragraph> <Paragraph position="2"> There has therefore been increasing interest in acquiring such patterns automatically froin a sample of text in the domain, through an analysis of word co-occurrence patterns either in raw text (word sequences) or in parsed text. We briefly review some of this work later in the article. We have been specificaily concerned about the practicality of using such techniques in place of manual encoding to develop the selectional patterns for new domains. In the experiments reported here, we have therefore been particularly concerned with the evaluation of our automatically generated patterns, in terms of their completehess and accuracy and in terms of their efficacy in performing selection during parsing.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Patterns and Word Classes </SectionTitle> <Paragraph position="0"> In principle, the semantic patterns could be stated in terms of individual words - this verb can meaningfully occur with this subject, etc. In practice, however, this would produce an unmanageable number of patterns for even a small domain. We therefore need to define semantic word classes for the domain and state our patterns in terms of these classes.</Paragraph> <Paragraph position="1"> Ideally, then, a discovery proeednre for semantic patterns would acquire both the word classes and the patterns from an analysis of the word co-occurrence patterns. In order to simplify the task, however, while we are exploring different strategies, we have divided it into separate tasks, that of acquiring word classes and that of acquiring semantic patterns (given a set of word classes). We have previously described \[1\] some experiments in which the principal word classes for a sublanguge were obtained through the clustering of words based on the contexts in which they occurred, and we expect to renew such experiments using the larger corpora now available.</Paragraph> <Paragraph position="2"> However, the experiments we report below are limited to the acquisition of semantic patterns given a set of manually prepared word classes.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Pattern Acquisition </SectionTitle> <Paragraph position="0"> The basic mechanism of pattern acquisition is straightforward. A sample of text in a new domain is parsed using a broad-coverage grammar (but without any semantic constraints). The resulting parse trees are then transformed into a regularized syntactic structm'e (similar to the f-structure of Lexical-Fnnctional Grammar). This regularization in particular reduces all different clausM forms (active, passive, questions, extraposed forms, relative clauses, reduced relatives, etc.) into a uniform structure with the 'logical' subject and object explicitly marked. For example, the sentence Fred ate fresh cheese from France.</Paragraph> <Paragraph position="1"> would produce the regularized syntactic structure null (s eat (subject (np Fred)) (object (np cheese (a-pos fresh) (from (np France))))) We then extract from this regularized structure a series of triples of the form AcrEs DE COLING-92. NAMES. 23-28 hot;r 1992 6 5 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 head syntactic-function value where - if the value is another NP or S - only the head is recorded. For example, for the above sentence we would get tile triples eat subject Fred eat object cheese cheese a-pos fresh cheese from bYance Finally, we generalize these triples by replacing words by word classes. We had previously prepared, by a purely manual analysis of the corpus, a hierarchy of word classes and a set of semantic patterns for the corpus we were using. From this hierarchy we identified the classes which were most frequently referred to in the nlanually prepared patterns. The generalization process replaces a word by the most specific class to which it belongs (since we have a hierarchy with uested classes, a word will typically belong to several classes). As we cxplain in our experiment section below, we made some runs generalizing just the value and others generalizing both tlle head and the value.</Paragraph> <Paragraph position="2"> As we process the corl)us , we kee t) a count of the frequency of each head-function wdue triple. In addition, we keep separate counts of the number of times each word appears as a head, and the number of times eacll head-fitnction pair al)pears (independent of value).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Coping with Multiple Parses </SectionTitle> <Paragraph position="0"> The procedure described above is sufficient if we are able to obtain ttlc correct parse for eacll sentence, llowever, if we are porting to a new domain and have no semantic constraints, we lnust rely entirely upon syntactic constraints and so will be confronted with a large number of incorrect parses for each sentence, along with (hopefully) the correct one. We have exl)erimented with several approaches to dealing with this problem: 1. If a sentence has N parses, we can generate triples front all the parses and tllen include each triple with a weight of 1/N.</Paragraph> <Paragraph position="1"> 2. We can generate a stochastic grammar through unsupervised training on a portion of the corpus \[2\]. We can then parse the corpus with this stochastic grammar at,l take only the most probable parse for each sentence. \]:or sentences which still generated N > 1 equally-probable parses, we would use a 1/N weight ;us before.</Paragraph> <Paragraph position="2"> 3. In place of a 1/N weighting, we can retine the weights for alternative parse trees nsing an iterative procedure analogous to the inside-outside algorithm \[3\]. We hegin by generating all parses, as in approach 1. Then, based on the counts obtained initially (using 1/N weighting), we can compute the probability for the various triples attd from these tim probabilities of the alternative parse trees. We can then repeat the process, recomputing the counts with weightings based on these probabilities.</Paragraph> <Paragraph position="3"> All of these approaches rely on the expectation that correct patterns arising from correct parses will occur repeatedly, while the distribution of incorrect patterns from incorrect parses will be more scattered, and so over a sufficiently large corpus- we cat, distinguish correct from incorrect patterns on the basis of frequency.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Evaluation Methods </SectionTitle> <Paragraph position="0"> ~\['o gather patterns, we analyzed a series of articles on terrorism which were obtained from tim Foreign llroadcast lnh)rmation Service and used as the development era'pus for the Third Message Understanding Confiwence (held in San Diego, CA, May 1991) \[4\]. l'br pattern collection, we used 1000 such articles with a total of :14,196 sentences and 330,769 wor(ls. Not all sentences parsed, both because of limitations in our grammar and becanse we inlpose a limit on the search which the parser can perform for each sentence.</Paragraph> <Paragraph position="1"> Within these limits, we were able to parse a total of 7,455 sentencesJ The most clearly definable function of tbe triples we collect is to act as a selectional constraint: to differentiate between meaningfld and meaningless triples in new text, and thus identify the correct attalysls.</Paragraph> <Paragraph position="2"> We used two methods to evaluate the effectiveness of tile triples we generated. The first IF or these runs we disabled several heuristics in our systelll which increase tile nulnbef of sentences which can be parsed at some cost m the average quality of parses; hence the relatively low percentage of sentences which obtained parses.</Paragraph> <Paragraph position="3"> ACRES DE COLING-92, NANrES, 23-28 Ao(rr 1992 6 5 9 PROC, OF COLING-92. NANrrEs, AUO. 23-28, 1992 method involved a comparison with manuallyclassified triples. We took 10 articles (not in the training corpus), generated all parses, and produced the triples from each parse. These triples were stated in terms of words, and were not generalized to word classes. We classified each triple as semantically valid or invalid (a triple was counted as valid if we believed that this pair of words could meaningfully occur in this relationship, even if this was not the intended relationship in this particular text). This produced a test set containing a total of 1169 distinct triples, of which 716 were valid and 453 were invalid.</Paragraph> <Paragraph position="4"> We then established a threshold T for the weighted triples counts in our training set, and defined v+ number of triples in test set which were classified as valid and which appeared in training set with count > T v_ number of triples in test set which were classified ms valid and which appeared in training set with count < T i+ number of triples in test set which were classified ms invalid and which appeared in training set with count > T i_ number of triples in test set which were classified as invalid and which appeared in training set with count < T and then defined</Paragraph> <Paragraph position="6"> By varying tim threshold, we can plot graphs of recall vs. precision or recall vs. error-rate.</Paragraph> <Paragraph position="7"> These plots can then be compared among different strategies for collecting triples and for generalizing triples. The precision figures are somewhat misleading because of the relatively small number of invalid triples in the test set: since only 39% of the triples are invalid, a filter which accepted all the triples in the test set would still be accounted as having 61% precision, We have therefore used the error rate in the figures below (plotting recall against l~rror-rate).</Paragraph> <Paragraph position="8"> The second evaluation method involves the use of the triples in selection and a comparison of the parses produced against a set of known correct parses. In this case the known correct parses were prepared manually by the University of Pennsylvania as part of their &quot;'Free Bank&quot; project. For this evaluation, we used a set of 317 sentences, again distinct from the training set, In comparing the parser output against the standard trees, we measured the degree to which the tree structures coincide, stated as recall, precision, and number of crossings. These measures have been defined in earlier papers \[5,6,7\].</Paragraph> </Section> class="xml-element"></Paper>