XML Viewer - p05-1076

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1076_metho.xml
Size: 23,964 bytes
Last Modified: 2025-10-06 14:09:45
<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1076">
  <Title>Automatic Acquisition of Adjectival Subcategorization from Corpora</Title>
  <Section position="3" start_page="614" end_page="615" type="metho">
    <SectionTitle>
2 Adjectival Subcategorization
</SectionTitle>
    <Paragraph position="0"> Although the number of SCF types for adjectives is smaller than the number reported for verbs (e.g. (Briscoe and Carroll, 1997)), adjectives nevertheless exhibit rich syntactic behaviour. Besides the common attributive and predicative positions there are at least six further positions in which adjectives commonly occur (see figure 1). Adjectives in predicative position can be further classified according to the nature of the arguments with which they combine -- finite and non-finite clauses and noun phrases, phrases with and without complementisers, etc. -- and whether they occur as subject or object. Additional distinctions can be made concern-Attributive &amp;quot;The young man&amp;quot; Predicative &amp;quot;He is young&amp;quot; Postpositive &amp;quot;Anyone [who is] young can do it&amp;quot; Predeterminer &amp;quot;such a young man&amp;quot;; &amp;quot;so young a man&amp;quot; Fused modifier-head &amp;quot;the younger of them&amp;quot;; &amp;quot;the young&amp;quot; Predicative adjunct &amp;quot;he died young&amp;quot; Supplementive clause &amp;quot;Young, he was plain in appearance&amp;quot; Contingent clause &amp;quot;When young, he was lonely&amp;quot;  (mandative, interrogative, etc.), preferences for particular prepositions and whether the subject is extraposed. null Even ignoring preposition preference, there are more than 30 distinguishable adjectival SCFs. Some fairly extensive frame sets can be found in large syntax dictionaries, such as COMLEX (31 SCFs) (Wolff et al., 1998) and ANLT (24 SCFs) (Boguraev et al., 1987). While such resources are generally accurate, they are disappointingly incomplete: none of the proposed frame sets in the well-known resources subsumes the others, the coverage of SCF types for individual adjectives is low, and (accurate) information on the relative frequency of SCFs for each adjective is absent.</Paragraph>
    <Paragraph position="1"> The inadequacy of manually-created dictionaries and the difficulty of adequately enhancing and maintaining the information by hand was a central motivation for early research into automatic subcategorization acquisition. The focus heretofore has remained firmly on verb subcategorization, but this is not sufficient, as countless examples show. Knowledge of adjectival subcategorization can yield further improvements in tagging (e.g. distinguishing between &amp;quot;to&amp;quot; as an infinitive marker and as a true preposition), parsing (e.g. distinguishing between PP-arguments and adjuncts), and semantic analysis.</Paragraph>
    <Paragraph position="2"> For example, if John is both easy and eager to please then we know that he is the recipient of pleasure in the first instance and desirous of providing it in the second, but a computational system cannot determine this without knowledge of the subcategorization of the two adjectives. Likewise, a natural language generation system can legitimately apply the extraposition transformation to the first case, but not to the second: It is &amp;quot;easy to please John&amp;quot;, but not  &amp;quot;eager&amp;quot; to do so, at least if &amp;quot;it&amp;quot; be expletive. Similar examples abound.</Paragraph>
    <Paragraph position="3"> Many of the difficulties described in the literature on acquiring verb subcategorization also arise in the adjectival case. The most apparent is data sparsity: among the 100M-word British National Corpus (BNC) (Burnard, 1995), the RASP tools find 124,120 distinct adjectives, of which 70,246 occur only once, 106,464 fewer than ten times and 119,337 fewer than a hundred times. There are fewer than 1,000 adjectives in the corpus which have more than 1,000 occurrences. Both adjective and SCF frequencies have Zipfian distributions; consequently, even the largest corpora may contain only single instances of a particular adjective-SCF combination, which is generally insufficient for classification.</Paragraph>
  </Section>
  <Section position="4" start_page="615" end_page="617" type="metho">
    <SectionTitle>
3 Description of the System
</SectionTitle>
    <Paragraph position="0"> Besides focusing on adjectives, our approach to SCF acquisition differs from earlier work in a number of ways. A common strategy in existing systems (e.g. (Briscoe and Carroll, 1997)) is to extract SCFs from parse trees, introducing an unnecessary dependence on the details of a particular parser. In our approach the patterns are extracted from GRs -- representations of head-complement relations which are designed to be largely parser-independent -- making the techniques more widely applicable and allowing classification to operate at a higher level.</Paragraph>
    <Paragraph position="1"> Further, most existing systems work by classifying corpus occurrences into individual, mutually independent SCFs. We adopt instead a hierarchical approach, viewing frames that share features as descendants of a common parent frame. The benefits are severalfold: specifying each feature only once makes the system both more efficient and easier to understand and maintain, and the multiple inheritance hierarchy reflects the hierarchy of lexical types found in modern grammars where relationships between similar frames are represented explicitly1.</Paragraph>
    <Paragraph position="2"> Our acquisition process consists of two main steps: 1) extracting GRs from corpus data, and 2) feeding the GRs as input to the classifier which incrementally matches parts of the GR sets to decide which branches of a decision-tree to follow. The 1Compare the cogent argument for a inheritance-based lexicon in (Flickinger and Nerbonne, 1992), much of which can be applied unchanged to the taxonomy of SCFs.</Paragraph>
    <Paragraph position="3"> dependent mod arg mod arg aux conj subj or dobjncmod xmod cmod detmod subj comp ncsubj xsubj csubj obj clausal dobj obj2 iobj xcomp ccomp  leaves of the tree correspond to SCFs. The details of these two steps are provided in the subsequent sections, respectively2.</Paragraph>
    <Section position="1" start_page="615" end_page="616" type="sub_section">
      <SectionTitle>
3.1 Obtaining Grammatical Relations
</SectionTitle>
      <Paragraph position="0"> Attempts to acquire verb subcategorization have benefited from increasingly sophisticated parsers.</Paragraph>
      <Paragraph position="1"> We have made use of the RASP toolkit (Briscoe and Carroll, 2002) -- a modular statistical parsing system which includes a tokenizer, tagger, lemmatiser, and a wide-coverage unification-based tag-sequence parser. The parser has several modes of operation; we invoked it in a mode in which GRs with associated probabilities are emitted even when a complete analysis of the sentence could not be found. In this mode there is wide coverage (over 98% of the BNC receives at least a partial analysis (Carroll and Briscoe, 2002)) which is useful in view of the infrequent occurrence of some of the SCFs, although combining the results of competing parses may in some cases result in an inconsistent or misleading combination of GRs.</Paragraph>
      <Paragraph position="2"> The parser uses a scheme of GRs between lemmatised lexical heads (Carroll et al., 1998a; Briscoe et al., 2002). The relations are organized as a multiple-inheritance subsumption hierarchy where each subrelation extends the meaning, and perhaps the argument structure, of its parents (figure 2). For descriptions and examples of each relation, see (Carroll et al., 1998a).</Paragraph>
      <Paragraph position="3"> The dependency relationships which the GRs embody correspond closely to the head-complement 2In contrast to almost all earlier work, there was no filtering stage involved in SCF acquisition. The classifier was designed to operate with high precision, so filtering was less necessary.</Paragraph>
      <Paragraph position="5"> structure which subcategorization acquisition attempts to recover, which makes GRs ideal input to the SCF classifier. Consider the arguments of &amp;quot;easy&amp;quot; in the sentence: These examples of animal senses are relatively easy for us to comprehend as they are not too far removed from our own experience. null According to the COMLEX classification, this is an example of the frame adj-obj-for-to-inf, shown in figure 3, (using AVM notation in place of COMLEX s-expressions). Part of the output of RASP for this sentence (the full output includes 87 weighted GRs) is shown in figure 43.</Paragraph>
      <Paragraph position="6"> Each instantiated GR in figure 4 corresponds to one or more parts of the feature structure in figure 3. xcomp( be[6] easy[8]) establishes be[6] as the head of the VP in which easy[8] occurs as a complement. The first (PP)-complement is &amp;quot;for us&amp;quot;, as indicated by ncmod(for[9] easy[8] we+[10]), with &amp;quot;for&amp;quot; as PFORM and we+ (&amp;quot;us&amp;quot;) as NP. The second complement is represented by xmod(to[11] be+[6] comprehend[12]): a to-infinitive VP. The NP headed by &amp;quot;examples&amp;quot; is marked as the subject of the frame by ncsubj(be[6] examples[2]), and ncsubj(comprehend[12] we+[10])corresponds to the coindexation marked by 3 : the subject of the  in (Carroll et al., 1998a): each argument that corresponds to a word consists of three parts: the lexeme, the part of speech tag, and the position (index) of the word in the sentence.</Paragraph>
      <Paragraph position="8"> VP is the NP of the PP. The only part of the feature structure which is not represented by the GRs is coindexation between the omitted direct object 1 of the VP-complement and the subject of the whole clause.</Paragraph>
    </Section>
    <Section position="2" start_page="616" end_page="617" type="sub_section">
      <SectionTitle>
3.2 SCF Classifier
</SectionTitle>
      <Paragraph position="0"> We used for our classifier a modified version of the fairly extensive COMLEX frameset, including 30 SCFs. The COMLEX frameset includes mutually inconsistent frames, such as sentential complement with obligatory complementiser that and sentential complement with optional that. We modified the frameset so that an adjective can legitimately instantiate any combination of frames, which simplifies classification. We also added simple-predicative and attributive SCFs to the set, since these account for a substantial proportion of frame instances. Finally, frames which could only be distinguished by information not retained in the GRs scheme of the current version of the shallow parser were merged (e.g. the COMLEX frames adj-subj-to-inf-rs (&amp;quot;She was kind to invite me&amp;quot;) and adj-to-inf(&amp;quot;She was able to climb the mountain&amp;quot;)).</Paragraph>
      <Paragraph position="1">  The classifier operates by attempting to match the set of GRs associated with each sentence against various patterns. The patterns were developed by a combination of knowledge of the GRs and examining a set of training sentences to determine which relations were actually emitted by the parser for each SCF. The data used during development consisted of the sentences in the BNC in which one of the 23 adjectives4 given as examples for SCFs in (Macleod 4The adjectives used for training were: able, anxious, apparent, certain, convenient, curious, desirable, disappointed, easy, happy, helpful, imperative, impractical, insistent, kind, obvious, practical, preferable, probable, ridiculous, unaware, uncertain and unclear.</Paragraph>
      <Paragraph position="2">  et al., 1998) occur.</Paragraph>
      <Paragraph position="3"> In our pattern matching language a pattern is a disjunction of sets of partially instantiated GRs with logic variables (slots) in place of indices, augmented by ordering constraints that restrict the possible instantiations of slots. A match is considered successful if the set of GRs can be unified with any of the disjuncts. Unification of a sentence-relation and a pattern-relation occurs when there is a one-to-one correspondence between sentence elements and pattern elements that includes a mapping from slots to indices (a substitution), and where atomic elements in corresponding positions share a common subtype.</Paragraph>
      <Paragraph position="4"> Figure 5 shows a pattern for matching the SCF adj-obj-for-to-inf. For a match to succeed there must be GRs associated with the sentence that match each part of the pattern. Each argument matches either anything at all (*), the &amp;quot;current&amp;quot; adjective (~), an empty GR argument ( ), a [word;id;part-of-speech]3-tuple or a numeric id. In a successful match, equal ids in different parts of the pattern must match the same word position, and distinct ids must match different positions. The various patterns are arranged in a tree, where a parent node contains the elements common to all of its children. This kind of once-only representation of particular features, together with the successive refinements provided by child nodes reflects the organization of inheritance-based lexica. The inheritance structure naturally involves multiple inheritance, since each frame typically includes multiple features (such as the presence of a to-infinitive complement or an expletive subject argument) inherited from abstract parent classes, and each feature is instantiated in several frames.</Paragraph>
      <Paragraph position="5"> The tree structure also improves the efficiency of the pattern matching process, which then occurs in stages: at each matching node the classifier attempts to match a set of relations with each child pattern to yield a substitution that subsumes the substitution resulting from the parent match.</Paragraph>
      <Paragraph position="6"> Both the patterns and the pattern language itself underwent successive refinements after investigation of the performance on training data made it increasingly clear what sort of distinctions were useful to express. The initial pattern language had no slots; it was easy to understand and implement, but insufficiently expressive. The final refinement was the adunspecified 285 improbable 350 unsure 570 doubtful 1147 generous 2052 sure 13591 difficult 18470 clear 19617 important 33303  dition of ordering constraints between instantiated slots, which are indispensable for detecting, e.g., extraposition. null</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="617" end_page="618" type="metho">
    <SectionTitle>
4 Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="617" end_page="617" type="sub_section">
      <SectionTitle>
4.1 Data
</SectionTitle>
      <Paragraph position="0"> In order to evaluate the system we selected a set of 9 adjectives which between them could instantiate all of the frames. The test set was intentionally kept fairly small for these first experiments with adjectival SCF acquisition so that we could carry out a thorough evaluation of all the test instances. We excluded the adjectives used during development and adjectives with fewer than 200 instances in the corpus. The final test set, together with their frequencies in the tagged version of the BNC, is shown in table 1. For each adjective we extracted 200 sentences (evenly spaced throughout the BNC) which we processed using the SCF acquisition system described in the previous section.</Paragraph>
    </Section>
    <Section position="2" start_page="617" end_page="618" type="sub_section">
      <SectionTitle>
4.2 Method
4.2.1 Annotation Tool and Gold Standard
</SectionTitle>
      <Paragraph position="0"> Our gold standard was human-annotated data.</Paragraph>
      <Paragraph position="1"> Two annotators associated a SCF with each sentence/adjective pair in the test data. To alleviate the process we developed a program which first uses reliable heuristics to reduce the number of SCF choices and then allows the annotator to select the preferred choice with a single mouse click in a browser window. The heuristics reduced the average number of SCFs presented alongside each sentence from 30 to 9. Through the same browser interface we provided annotators with information and instructions (with links to COMLEX documentation), the ability to inspect and review previous decisions and decision summaries5 and an option to record that partic- null notation tool ular sentences could not be classified (which is useful for further system development, as discussed in section 5). A screenshot is shown in figure 6. The resulting annotation revealed 19 of the 30 SCFs in the test data.</Paragraph>
      <Paragraph position="2">  We use the standard evaluation metrics: type and token precision, recall and F-measure. Token recall is the proportion of annotated (sentence, frame) pairs that the system recovered correctly. Token precision is the proportion of classified (sentence, frame) pairs that were correct. Type precision and type recall are analogously defined for (adjective, frame) pairs. The F-measure (b = 1) is a weighted combination of precision and recall.</Paragraph>
    </Section>
    <Section position="3" start_page="618" end_page="618" type="sub_section">
      <SectionTitle>
4.3 Results
</SectionTitle>
      <Paragraph position="0"> Running the system on the test data yielded the results summarised in table 2. The greater expressiveness of the final pattern language resulted in a classifier that performed better than the &amp;quot;regression&amp;quot; versions which ignored either ordering constraints, or both ordering constraints and slots. As expected, removing features from the classifier translated directly into degraded accuracy. The performance of the best classifier (67.8% F-measure) is quite similar to that of the best current verbal SCF acquisition systems (e.g. (Korhonen, 2002)).</Paragraph>
      <Paragraph position="1"> Results for individual adjectives are given in table 3. The first column shows the number of SCFs acquired for each adjective, ranging from 2 for unspecments of inter-annotator agreement, but this was judged less important than the enhanced ease of use arising from the reduced set of choices.</Paragraph>
      <Paragraph position="2">  regression systems with restricted pattern-matching ified to 11 for doubtful. Looking at the F-measure, the best performing adjectives are unspecified, difficult and sure (80%) and the worst performing unsure (50%) and and improbable (60%).</Paragraph>
      <Paragraph position="3"> There appears to be no obvious connection between performance figures and the number of acquired SCF types; differences are rather due to the difficulty of detecting individual SCF types -- an issue directly related to data sparsity.</Paragraph>
      <Paragraph position="4"> Despite the size of the BNC, 5 SCFs were not seen at all, either for the test adjectives or for any others. Frames involving to-infinitive complements were particularly rare: 4 such SCFs had no examples in the corpus and a further 3 occurred 5 times or fewer in the test data. It is more difficult to develop patterns for SCFs that occur infrequently, and the few instances of such SCFs are unlikely to include a set of GRs that is adequate for classification. The effect on the results was clear: of the 9 SCFs which the classifier did not correctly recognise at all, 4 occurred 5 times or fewer in the test data and a further 2 occurred 5-10 times.</Paragraph>
      <Paragraph position="5"> The most common error made by the classifier was to mistake a complex frame (e.g.</Paragraph>
      <Paragraph position="6"> adj-obj-for-to-inf, or to-inf-wh-adj) for simple-predicative, which subsumes all such frames. This occurred whenever the GRs emitted by the parser failed to include any information about the complements of the adjective.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="618" end_page="619" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> Data sparsity is perhaps the greatest hindrance both to recovering adjectival subcategorization and to lexical acquisition in general. In the future, we plan to carry out experiments with a larger set of adjec- null each adjective.</Paragraph>
    <Paragraph position="1"> tives using more data (possibly from several corpora and the web) to determine how severe this problem is for adjectives. One possible way to address the problem is to smooth the acquired SCF distributions using SCF &amp;quot;back-off&amp;quot; (probability) estimates based on lexical classes of adjectives in the manner proposed by (Korhonen, 2002). This helps to correct the acquired distributions and to detect low frequency and unseen SCFs.</Paragraph>
    <Paragraph position="2"> However, our experiment also revealed other problems which require attention in the future.</Paragraph>
    <Paragraph position="3"> One such is that GRs output by RASP (the version we used in our experiments) do not retain certain distinctions which are essential for distinguishing particular SCFs. For example, a sentential complement of an adjective with a that-complementiser should be annotated with ccomp(that, adjective, verbal-head), but this relation (with that as the type argument) does not occur in the parsed BNC. As a consequence the classifier is unable to distinguish the frame.</Paragraph>
    <Paragraph position="4"> Another problem arises from the fact that our current classifier operates on a predefined set of SCFs. The COMLEX SCFs, from which ours were derived, are extremely incomplete. Almost a quarter (477 of 1931) of sentences were annotated as &amp;quot;undefined&amp;quot;. For example, while there are SCFs for sentential and infinitival complement in subject position with what6, there is no SCF for the case with a whatprefixed complement in object position, where the subject is an NP. The lack is especially perplexing, because COMLEX does include the corresponding SCFs for verbs. There is a frame for &amp;quot;He wondered 6(adj-subj-what-s: &amp;quot;What he will do is uncertain&amp;quot;; adj-subj-what-to-inf: &amp;quot;What to do was unclear&amp;quot;), together with the extraposed versions (extrap-adj-what-s and extrap-adj-what-to-inf).</Paragraph>
    <Paragraph position="5"> what to do&amp;quot; (what-to-inf), but none for &amp;quot;He was unsure what to do&amp;quot;.</Paragraph>
    <Paragraph position="6"> While we can easily extend the current frameset by looking for further SCF types from dictionaries and from among the corpus occurrences labelled by our annotators as unclassified, we also plan to extend the classifier to automatically induce previously unseen frames from data. A possible approach is to use restricted generalization on sets of GRs to group similar sentences together. Generalization (anti-unification) is an intersection operation on two structures which retains the features common to both; generalization over the sets of GRs associated with the sentences which instantiate a particular frame can produce a pattern such as we used for classification in the experiments described above. This approach also offers the possibility of associating confidence levels with each pattern, corresponding to the degree to which the generalized pattern captures the features common to the members of the associated class. It is possible that frames could be induced by grouping sentences according to the &amp;quot;best&amp;quot; (e.g. most information-preserving) generalizations for various combinations, but it is not clear how this can be implemented with acceptable efficiency. null The hierarchical approach described in this paper may also helpful in the discovery of new frames: missing combinations of parent classes can be explored readily, and it may be possible to combine the various features in an SCF feature structure to generate example sentences which a human could then inspect to judge grammaticality.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML