File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/w99-0706_abstr.xml

Size: 38,968 bytes

Last Modified: 2025-10-06 13:49:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0706">
  <Title>Learning Transformation Rules to Find Grammatical Relations*</Title>
  <Section position="1" start_page="0" end_page="49" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Grammatical relationships are an important level of natural language processing. We present a trainable approach to find these relationships through transformation sequences and-error-driven learning. Our approach finds grammatical relationships between core syntax groups and bypasses much of the parsing phase.</Paragraph>
    <Paragraph position="1"> On our training and test set, our procedure achieves 63.6% recall and 77.3% precision (f-score = 69.8).</Paragraph>
    <Paragraph position="2"> Introduction An important level of natural language processing is the finding of grammatical relationships such as subject, object, modifier, etc. Such relationships are the objects of study in relational grammar \[Perlmutter, 1983\]. Many systems (e.g., the KERNEL system \[Palmer et al., 1993\]) use these relationships as an intermediate, form when determining the semantics of syntactically parsed text. In the SPARKLE project \[Carroll et al., 1997@ grammatical relations form the layer above the phrasal-level in a three layer syntax scheme. Grammatical relationships are often stored in some type of structure like the F-structures of lexical-functional grammar \[Kaplan, 1994\].</Paragraph>
    <Paragraph position="3"> Our own interest in grammatical relations is as a semantic basis for information extraction in the Alembic system. The extraction approach we are currently investigating exploits grammatical relations as an intermediary between surface syntactic phrases and propositional semantic interpretations. By directly associating syntactic heads with their arguments and modifiers, we are hoping that these grammatical relations will provide a high degree of generality and reliability to the process of composing semantic representations. This ability to  &amp;quot;parse&amp;quot; into a semantic representation is according to Charniak \[Charniak, 1997, p. 42\], &amp;quot;the most important task to be tackled now.&amp;quot; In this paper, we describe a system to learn rules for finding grammatical relationships when just given a partial parse with entities like names, core noun m~d verb phrases (noun and verb groups) and semi-accurate estimates of the attachments of prepositions and subordinate conjunctions. In our system, the different entities, attachments and relationships are found using rule sequence processors that are cascaded together. Each processor can be thought of as approximating some aspect of the underlying grammar by finite-state transduction. null We present the problem scope of interest to us, as well as the data annotations required to support our investigation. We also present a decision procedure for finding grammatical relationships. In brief, on our training mid test set, our procedure achieves 63.6% recall and 77.3% precision, for an f-score of 69.8.</Paragraph>
    <Section position="1" start_page="0" end_page="49" type="sub_section">
      <SectionTitle>
Phrase Structure and Grammatical
Relations
</SectionTitle>
      <Paragraph position="0"> In standard derivational approaches to syntax, starting as early as 1965 \[Chomsky, 1965\], the notion of grammatical relationship is typically parasitic on that of phrase structure. That is to say, the primm'y vehicles of syntactic analysis are phrase structure trees: grammatical relationships, if they are to be considered at all. are given as a secondary analysis defined in terms of phrase structure. The surface subject of a sentence, for example, is thus no more than the NP attached by the production S -+ NP VP; i.e., it is the left-most NP daughter of an S node.</Paragraph>
      <Paragraph position="1"> The present paper takes an alternate outlook. In our current work, grammatical relationships play a central role, to tile extent even of replacing phrase structure as the descriptive vehicle for many syntactic phenomena. To be specific, our approach to syntax operates at two levels: (1) that of core phrases, which are an-</Paragraph>
      <Paragraph position="3"> alyzed through standard derivational syntax, and (2) that of argument and modifier attachments, which are analyzed through grammatical relations. These two levels roughly correspond to the top and bottom layers of the three layer syntax annotation scheme in the SPARKLE project \[Carroll et al., 1997a\].</Paragraph>
      <Paragraph position="4"> Core syntactic phrases In recent years, a consensus of sorts has emerged that postulates some core level of phrase analysis. By this we mean the kind of non-recursive simplifications of the NP and VP that in the literature go by names such as noun/verb groups \[Appelt et at., 1993\],. chunks \[Abney, 1996\], or base NPs \[Ramshaw and Marcus, 1995\].</Paragraph>
      <Paragraph position="5"> The common thread between these approaches and ours is to approximate full noun phrases or verb phrases by only parsing their non-recursive core, and thus not attaching modifiers or arguments. For English noun phrases, this amounts to roughly the span between the determiner and the head noun; for English verb phrases, the span runs roughly from the auxiliary to the head verb. We call such simplified syntactic categories groups, and consider in particular, noun, verb, adverb, adjective, and IN groups, i An IN group 2 contains a preposition or subordinate conjunction (including wh-words and &amp;quot;that&amp;quot;).</Paragraph>
      <Paragraph position="6"> For example, for &amp;quot;I saw the cat that ran. &amp;quot;, we have the following core phrase analysis: \[I\],,g \[saw\]vg \[the cat\]ng \[that\], 9 \[ran\]rv where \[...\]-9 indicates a noun group, \[.--\]09 a verb group, and (...\],,j an IN group.</Paragraph>
      <Paragraph position="7"> In English and other languages where core phrases (groups) can be analyzed by head-out (island-like) parsing, the group head-words are basically a by-product of the core phrase analysis.</Paragraph>
      <Paragraph position="8"> Distinguishing core syntax groups from traditional syntactic phrases (such as NPs) is of interest because it singles out what is usually thought of as easy to parse, and allows that piece of the parsing problem to be addressed by such comparatively simple means as finite-state machines or transformation sequences. What is then left of the parsing problem is the difficult stuff: namely the attachment of prepositional phrases, relative clanses, and other constructs that serve in modification, adjunctive, or argument-passing roles.</Paragraph>
      <Paragraph position="9"> ZIn addition, for the noun group, our definition encompasses the named entity task, familiar from information extraction \[Def, 1995\]. Named entities include among others the names of people, places, and organizations, as well as dates, expressions of money, and (in an idiosyncratic extension) titles, job descriptions, and honorifics.</Paragraph>
      <Paragraph position="10"> &amp;quot;The name comes from the Penn Treebank part-of:speech label for prepositions and subordinate conjunctions.</Paragraph>
      <Paragraph position="11"> Grammatical relations In the present work, we encode this hard stuff through a small repertoire of grammatical relations. These relations hold directly between constituents, and as such define a graph, with core constituents as nodes in the graph, and relations as labeled arcs. Our previous example, for instance, generates the following grammatical relations graph (head words underlined): SUBJect \[ll \[saw\] \[the cat\] \[that I \[ran___\] t I MODifier Our grammatical relations effectively replace the recursive X analysis of traditional phrase structure ga'ammar. In this respect, the approach bears resemblance to a dependency grammar, in that it has no notion of a spanning S node, or of intermediate constituents corresponding to argument and modifier attachments.</Paragraph>
      <Paragraph position="12"> One major point of departure from dependency grammar, however, is that these grammatical relation graphs can generally not be reduced to labeled trees. This happens as a result of argument passing, as in \[F ed\] fpromise \] \[to help/\[John\] where \[Fred\] is both the subject of \[promised\] and.-\[to help/. This also happens as a result of argumentmodifier cycles, as in /If \[saw\] \[the cat\]/that\]/ran\] where the relationships between \[the cat\] and \[ran\] form a cycle: \[the cat\] has a subject relationship/dependency to \[ran\], and \[ran\] has a modifier dependency to \[the cat\], since \[ran\] helps indicate (modifies) which cat is seen.</Paragraph>
      <Paragraph position="13"> There has been some work at making additions to extract grammatical relationships from a dependency tree structure \[BrSker, 1998, Lai and Huang, 1998\] so that one first produces a surface structure dependency tree with a syntactic parse and then extracts grammatical relationships from that tree. In contrast, we skip trying to find a surface structure tree and just proceed to more directly finding the grammatical relationships, which are the relationships of interest to us.</Paragraph>
      <Paragraph position="14"> A reason for skipping the tree stage is that extracting grammatical relations from a surface structure tree is often a nontrivial task by itself. For instance, the precise relationship holding between two constituents in a surface structure tree cannot be derived unambiguously from their relative attachments. Contrast. for example &amp;quot;the attack on the military base&amp;quot; with &amp;quot;the attack on March 24&amp;quot;. Both of these have the same underlying surface structure (a PP attached to an NP). but the  former encodes the direct object of a verb nominalization, while the latter encodes a time modifier. Also, in a surface structure tree, long-distance dependencies between heads and arguments are not explicitly indicated by attachments between the appropriate parts of the text. For instance in &amp;quot;Fred promised to help John&amp;quot;, no direct attachment exists between the &amp;quot;Fred&amp;quot; in the text and the &amp;quot;help&amp;quot; in the text, despite the fact that the former is the subject of the latter.</Paragraph>
      <Paragraph position="15"> For our purposes, we have delineated approximately a dozen head-to-argument relationships as well as a commensurate number of modification relationships.</Paragraph>
      <Paragraph position="16"> Among the head-to-argument relationships, we have the deep subject and object (SUBJ and OBJ respectively), and also include the surface subject and object of copulas (COP-SUBJ and the various COP-OBJ forms). In addition, we include a number of relationships (e.g., PP-SUBJ, PP-OBJ) for arguments that are mediated by prepositional phrases. An example is in PP-~ OBJect I \[the attack\] \[on\] \[the military base\] where \[the attack\], a noun group with a verb nominalization, has its object \[the military base\]passed to it via the preposition in \[on\]. Among modifier relationships, we designate both generic modification and some specializations like locational and temporal modification. A complete definition of all the grammatical relations is beyond the scope of thiSS paper, but we give a summary of usage in Table 1. An earlier version of the definitions can be found in our annotation guidelines \[Ferro, 1998\]. The appendix shows some examples of grammatical relationship labeling from our experiments.</Paragraph>
      <Paragraph position="17"> Our set of relationships is similar to the set used in the SPARKLE project \[Carroll et al., 1997a\] \[Carroll et al., 1998a I. One difference is that we make many semantically-based distinctions between what SPARKLE calls a modifier, such as time and location modifier% and the various arguments of event nouns.</Paragraph>
      <Paragraph position="18"> Semantic interpretation A major motivation for this approach is that it supports a direct mapping into semantic interpretations. In our framework, semantic interpretations are given in a neo-Davidsonian 'propositional logic. Grammatical relations are thus interpreted in terms of mappings and relationships between the constants and variables of the propositional language. For instance, the deep subject relation (SUB J) maps to the first position of a predicate's argument list, the deep object {OBJ) to the second such position, and so forth.</Paragraph>
      <Paragraph position="19"> Our example sentence, &amp;quot;I saw the cat that ran&amp;quot; thus translates directly to the following:  We do not have an explicit level for clauses between our core phrase and grammatical relations levels. However, we do have a set of implicit clauses in that each verb (event) and its arguments can be (teemed a base level clause. In our example &amp;quot;I saw the cat that ran&amp;quot;. we have two such base level clauses. &amp;quot;saw&amp;quot; and its arguments form the clause &amp;quot;I saw the cat&amp;quot;. &amp;quot;ran&amp;quot; and its argument form the clause &amp;quot;the cat ran&amp;quot;. Each noun with a possible semantic class of &amp;quot;act&amp;quot; or &amp;quot;process&amp;quot; in Wordnet \[Miller, 1990\] (and that noun's arguments) can likewise be deemed a base level clause.</Paragraph>
      <Paragraph position="20"> The Processing Model Our system uses transformation-based error-driven learning to automatically learn rules from training examples \[Brill and Resnik, 1994\].</Paragraph>
      <Paragraph position="21"> One first runs the system on a training set, which starts with no grammatical relations marked. This training run moves in iterations, with each iteration producing the next rule that yields the best bet gain in the training set (number of matching relationships found minus the number of spurious relationships introduced). On ties, rules with less conditions are favored over rules with more conditions. The training run ends when tile next rule found produces a net gain below a given threshold.</Paragraph>
      <Paragraph position="22"> The rules are then run in the same order on the test set to see how well they do.</Paragraph>
      <Paragraph position="23"> The rules are condition~action pairs that are tried on each syntax group. The actions in our system are limited to attaching (or unattaching) a relationship of a particular type from the group under consideration to that group's neighbor a certain number of groups away in a particular direction (left or right). A sample action would be to attach a SUBJ relation from the group under consideration to the group two groups away to the right.</Paragraph>
      <Paragraph position="24"> A rule only applies to a syntax group when that group and its neighbors meet the rule's conditions. Each condition tests the group in a particular position relative to the group under consideration (e.g., two groups away to the left). All tests can be negated. Table 2 shows the possible tests.</Paragraph>
      <Paragraph position="25"> A sample rule is when a noun group n's * immediate group to tile right has some form of the verb &amp;quot;be&amp;quot; as the head-word.</Paragraph>
      <Paragraph position="27"> subject subject of a verb -- link a copula subject and object -- link a state with the item in that state -- link a place with the item moving to or from that place object -- object of a verb object of an adjective -- surface subject in passives -- object of a preposition, not for partitives or subsets -- object of  an adverbial clause complementizer location object -link a movement verb with a place where entities are moving to or from indobj i indirect object empty use instead of &amp;quot;subj&amp;quot; relation when subject is an expletive (existential) &amp;quot;it&amp;quot; or &amp;quot;there&amp;quot; pp-subj genitive functional &amp;quot;of&amp;quot;'s use instead of &amp;quot;subj&amp;quot; relation when the subject is linked via a preposition, links preposition to its head pp-obj nongenitive functional &amp;quot;of&amp;quot;'s use in place of &amp;quot;obj&amp;quot; relation when the object is linked via a preposition, links preposition to its head i pp-io use in place of &amp;quot;indobj&amp;quot; relation when the indirect object is linked via a preposition. links preposition to its head I cop-subj i surface subject for a copula n-cop-obj , surface nominative object for a copula \[promised\] in &amp;quot;I promised to help&amp;quot; \[I\] ~ \[to help\] in &amp;quot;I promised to help&amp;quot; \[the cat\] ---r \[ran\] in &amp;quot;the cat that ran&amp;quot; \[You\] -+ \[happy\] in &amp;quot;You are happy&amp;quot; \[You\] --r \[a runner\] in &amp;quot;You are a runner&amp;quot; \[you\] ---r \[happy\] in &amp;quot;They made you happy&amp;quot; \[I\] -~ \[home\] in &amp;quot;I went home&amp;quot; \[saw\] ~ \[the caq ill &amp;quot;I saw the cat&amp;quot; \[promised\] &lt;--- \[to help\] ill &amp;quot;I promised to help you&amp;quot; \[happy\] &lt;--- \[to help\] in &amp;quot;I was happy to help&amp;quot; \[I\] ~ \[was seen\] in &amp;quot;I was seen 1)y a cat&amp;quot; \[by\] ~ \[the tree\] in &amp;quot;I was by tile tree&amp;quot; \[After\] e-\[left\] in &amp;quot;After I left, I ate&amp;quot; \[wenq +- \[home\] ill &amp;quot;I went home&amp;quot; \[went\] ~ \[in\] ill &amp;quot;I went in the house \[gave\] &lt;-- \[you\] in &amp;quot;I gave you a cake&amp;quot; \[There\] ~ \[trees\] in &amp;quot;There are trees&amp;quot; \[name\] e-- \[of\] in &amp;quot;name of the building&amp;quot; \[was seen\] +-- \[by\] in &amp;quot;I was seen by a cat&amp;quot; \[age\] e- \[o\]\] in &amp;quot;age of 12&amp;quot; \[the attack\] &lt;--- \[on\] in &amp;quot;the attack on the base&amp;quot; \[.qave\] e- \[to\] in &amp;quot;gave a cake to thenf' \[You\] ~ \[are\] in &amp;quot;You are happy&amp;quot; \[is\] e- \[a rock\] in &amp;quot;It is a rock&amp;quot;  |i p-cop-obj I surface predicate object for a copula i \[are\] e-- \[happy\] in &amp;quot;You are happy&amp;quot; subset subset \[five\] --4 \[the kids\] in &amp;quot;five of the kids&amp;quot; i i \[the cat\] ~-- \[ran\] in &amp;quot;the cat that ran&amp;quot; rood generic modifier (use when modifier does not fit in a case below)  Test Ty.p.e Example, Sample Value(s) group type noun, verb verb group property passive, infinitival, unconjugated present participle end group in a sentence first, last pp-attachment Is a preposition or subordinate conjunction attached to the group under consideration? group contains a particular lexeme or part-of-speech between two groups, there is a particular lexeme or part-of-speech group's head (main) word &amp;quot;cat&amp;quot; head word part-of-speech .... common plural noun head word within a named entity p.erson, organization head word subcategorization and complement categories intransitive verbs (from Comlex \[Wolff et al., 1995\], over 100 categories) head word semantic classes process, communication (from Wordnet \[Miller, 1990\], 25 noun and 15 verb clas.ses) punctuation or coordinating conjunction exist between two groups? head word in a word list? list of relative pronouns, list of partitive quantities (e.g., &amp;quot;some&amp;quot;)  * immediate group to the left is not an IN group (preposition, wh-word, etc.) and * n's head-word is not an existential &amp;quot;there&amp;quot; make n a SUBJ of the group two groups over to n's right.</Paragraph>
      <Paragraph position="28"> When applied to the group \[The eat\] (head words are underlined) in the sentence \[The ~ \[was\] \[very happy.\].</Paragraph>
      <Paragraph position="29"> this rule makes \[The cat\] a SUBject of \[very happy\]. Searching over the space of possible rules is very computationally expensive. Our system has features to make it easier to perform searching in parallel and to minimize the amount of work that needs to be undone once a rule is selected. With these features, rules that (un)attach different types of relationships or relationships at different distances can be searched independently of each other in parallel.</Paragraph>
      <Paragraph position="30"> One feature is that the action of any rule only affects the applicability of rules with either the exact same or opposite action. For example, selecting and running a rule which attaches a MOD relationship to the group that is two groups to the right only can affect the applicability of other rules that either attach or unattach a MOD relationship to the group that is two groups to the right.</Paragraph>
      <Paragraph position="31"> Another feature is the use of net gain as a prox.v nmasure during training. The actual measure by which we judge the system's performance is called an/-score. This/-score is a type of harmonic mean of the precision (p) and recall (r) and is given by 2pr/(p + r). Unfortunately, this measure is nonlinear, mid the application of a new rule can alter the effects of all other possible rules on the/-score. To enable the described parallel search to take place, we need a measure in which how a rule affects that measure only depends on other rules with either the exact same or opposite action. The net gain measure has this trait, so we use it as a proxy for the/-score during training.</Paragraph>
      <Paragraph position="32"> Another way to increase the learniug speed is to restrict the number of possible combinations of conditions/constraints or actions to search over. Each rule is automatically limited to only considering one type of syntactic group. Then when searching over possible conditions to add to that rule, the system only needs to consider the parts-of-speech, semantic classes, etc. applicable to that type of group.</Paragraph>
      <Paragraph position="33"> Many other restrictions are possible. One can estimate which restrictions to try by making some training and test runs with preliminary data sets and seeing what restrictions seem to have no effect on performance, etc. The restrictions used in our experiments are described below.</Paragraph>
      <Paragraph position="35"> Our data consists of bodies of some elementary school reading comprehension tests. For our purposes, these tests have the advantage of having a fairly predictable size (each body has about 100 relationships and syntax groups) and a consistent style of writing. The tests are also on a wide range of topics, so we avoid a narrow specialized vocabulary. Our training set has 1963 relationships (2153 syntax groups, 3299 words) and our test set has 748 relationships .(830 syntax groups, 1151 words).</Paragraph>
      <Paragraph position="36"> We prepared the data by first manually removing the headers and the questions at the end for each test. We then manually annotated the remainder for named entities, syntax groups and relationships. As the system reads in our data, it automatically breaks the data into lexemes and sentences, tags the lexemes for part-of-speech and estimates the attachments of prepositions and subordinate conjunctions. The part-of-speech tagging uses a high-performance tagger based on \[Brill, 1993\]. The attachment estimation uses a procedure described in \[Yeh and Vilain, 1998\] when multiple left attachment possibilities exist and four simple rules when no or only one left attachment possibility exists. Previous testing indicates that the estimation procedure is about 75% accurate.</Paragraph>
      <Paragraph position="37">  As described earlier, a training run uses many parameter settings. Examples include where to look for relationships and to test conditions, the maximum number of constraints allowed in a rule, etc.</Paragraph>
      <Paragraph position="38"> Based on the observation that 95% of the relationships are to at most three groups away in the training set, we decided to limit the search for relationships to at most three groups in length. To keep the number of possible constraints down, we disallowed the negations of most tests for the presence of a particular lexeme or lexeme stem.</Paragraph>
      <Paragraph position="39"> To help determine man)&amp;quot; of the settings, we made some preliminary runs using different subsets of our final training set as the preliminary training and test sets. This kept the final test set unexamined during development. From these preliminary runs, we decided to limit a rule to at most three constraints 3 in order to keep the training time reasonable. We found a number of limitations that help speed up training and seemed to have no effect on the preliminary test runs. A threshold of four was set to end a training run. So training ends when it can no longer find a rule that produces at least a net  gain of four in the score. Only syntax groups spanned by the relationship being attached or unattached and those groups' immediate neighbors were allowed to be mentioned in a rule's conditions. Each condition testing a head-word had to test a head-word of a different group. Except for the lexemes &amp;quot;of&amp;quot;, &amp;quot;?&amp;quot; and a few deternfiners like &amp;quot;the&amp;quot;, tests for single lexemes were removed. Also disallowed were negations of tests for the presence of a particular part-of-speech anywhere within a syntax group.</Paragraph>
      <Paragraph position="40"> In our preliminary runs, lowering the threshold tended to raise recall and lower precision.</Paragraph>
      <Paragraph position="41"> The Results Training produced a sequence of 95 rules which had 63.6% recall and 77.3% precision for an f-score of 69.8 when run on the test set. In our test set. the key relationships, SUBJ and OBJ, formed the bulk of the relationships (61%). Both recall and precision for both SUBJ and OBJ were above 70%, which pleased us. Because of their relative abundance in the test set, these two relationships also had the most number of errors in absolute terms. Combined, the two accounted for 45% of the recall errors asld 66deg,o of the precision errors. In terms of percentages, recall was low for many of the less common relationships, such as generic, time and loca-tion modification relationships. In addition, the relative precision was low for those modification relationships.</Paragraph>
      <Paragraph position="42"> The appendix shows some examples of our system responding to the test set.</Paragraph>
      <Paragraph position="43"> To see how well the rules, which were trained on reading comprehension test bodies: would carry over to other texts of non-specialized domains, we examined a set of six broadcast news stories. This set had 525 relationships (585 syntax groups, 1129 words). By some measures, this set was fairly similar to our training and test sets. In all three sets, 33-34% of the relationships were OBJ and 26-28% were SUBJ. The broadcast news set did tend to have relationships between groups that were slightly further apart: Percent of Relations with Length Set &lt; 1 &lt; 2 &lt; 3 training ..... 66% 87% 95% test 68% 89% 96% broadcast news 65% 84% 90% This tendency, plus differences in the relative proportions of various modification relationships are probably what produced the drop in results when we tested the rules against this news set: recall at 54.6%, precision at 70.5% (f-score at 61.6%).</Paragraph>
      <Paragraph position="44"> To estimate how fast the results would improve by adding more training data, we had the system learn rules on a new smaller training set and then tested against the regular test set. Recall dropped to 57.8%, precision to 76.2%. The smaller training set had 981 relationships (50% of the original training set). So doubling the training data here (going from the smaller to the regular training set) reduced the smaller training set's recall error of 42.2% by 14% and the precision error of 23.8% by 5%. Using the broadcast news set as a test produced similar error reduction results.</Paragraph>
      <Paragraph position="45"> One complication of our current scoring scheme is that identifying a modification relationship and mistyping it is more harshly penalized than not finding a modification relationship at all. For example, finding a modification relationship, but mistakingly calling it a generic modifier instead of a time modifier produces both a missed key error (not finding a time modifier) and a spurious response error (responding with a generic modifier where none exists). Not finding that modification relationship at all just produces a missed key error (not finding a time modifier). This complication, coupled with the fact that generic, time and location modifiers often have a similar surface appearance (all are often headed by a preposition or a complementizer) may have been responsible for the low recall and precision scores for these types of modifiers. Even the training scores for these types of modifiers were particularly low. To test how well our system finds these three types of modification when one does not care about specifying the sub-type, we reran the original training and test with the three sub-types merged into one sub-type in the annotation. With the merging, recall of these modification relationships jumped from 27.8% to 48.9%. Precision rose from 52.1% to 67.7%.</Paragraph>
      <Paragraph position="46"> Since these modification relationships are only about 20% of all the relationships, the overall improvement is more modest. Recall rises to 67.7%, precision to 78.6% (f-score to 72.6).</Paragraph>
      <Paragraph position="47"> Taking this one step further, the LOC-OBJ and various PP-x arguments also all have both a low recall (below 35%) in the test and a similar surface structure to that of generic, time and location modifiers. When these argument types were merged with the three moditier types into one combined type, their combined recall was 60.4% and precision was 81.1%. The corresponding overall test recall and precision were 70.7% and 80.5%, respectively.</Paragraph>
      <Paragraph position="48"> Comparison with Other Work At one level, computing grammatical relationships can be seen as a parsing task, and the question naturally arises as to how well this approach compares to current state-of-the-art parsers. Direct performance comparisons, however, are elusive, since parsers are evaluated on an incommensurate tree bracketing task. For exampie, the SPARKLE project \[Carroll et al., 1997a\] puts tree bracketing and grammatical relations in two different layers of syntax. Even if we disregard the questionable aspects of comparing tree bracketing apples to grammatical relation oranges, an additional complication is the fact that our approach divides the parsing task into an easy piece (core phrase boundaries) and a hard one (grammatical relations). The results we have presented here are given solely for this harder part, which may explain why at roughly 70 points of f-score, they are lower than those reported for current state-of-the-art parsers (e.g., Collins \[Collins, 1997\]). More comparable to our approach are sonde other grammatical relation finders. Some examples for English include the English parser used in tide SPARKLE project \[Briscoe et al., \] \[Carroll et al., 1997b\] \[Carroll et al., 1998b\] and the finder built with a memory-based approach \[Argamon et aI., 1998\]. These relation finders make use of large almotated training data sets and/or manually generated grammars and rules. Both techniques take much effort and time. At first glance both of these finders perform better than our approach. Except for the object precision score of 77% in \[Argamon et al., 1998\], both finders have grammatical relation recall and precision scores in the 80s. But a closer examination reveals that these results are not quite comparable with ours.</Paragraph>
      <Paragraph position="49"> . Each system is recovering a different variation of grammatical relations. As mentioned earlier, one difference between us and the SPARKLE project is that the latter ignores many of distinctions that we make for different types of modifiers. The system in \[Argamon et al., 1998\] only finds a subset of the surface subjects and objects.</Paragraph>
      <Paragraph position="50"> . In addition, the evaluations of these two finders produced more complications. In an illustration of the time consuming nature of annotating or i'eannotating a large corpus, the SPARKLE project originally did not have time to annotate the English test data for modifier relationships..ks a result, the SPARKLE English parser was originally not evaluated on how well it found modifier relationships \[Carroll et al., 1997b\] \[Carroll et al.: 1998b\]. The reported results as of 1998 only apply to the argument (subject, object, etc.) relationships. Later on, a test corpus with modifier relationship annotation was produced. Testing the parser against this corpus produced generally lower results, with an overall recall, precision and f-score of 75% \[Carroll et al., 1999\].</Paragraph>
      <Paragraph position="51"> This is still better than our f-score of 70%, but not by nearly as much. This comparison ignores the fact that tile results are for different versions of grammat- null ical relationships and for different test corpora.</Paragraph>
      <Paragraph position="52"> The figures given above were the original (1998) results for the system in \[Argamon et al., 1998\], which came from training and testing on data derived from the Penn Treebank corpus \[Marcus et al., 1993\] in which the added null elements (like null subjects) were left in. These null elements, which were given a -NONE- part-of-speech, do not appear in raw text.</Paragraph>
      <Paragraph position="53"> Later (1999 results), the system was re-evaluated on the data with the added null elements removed. The subject results declined a little. The object results declined more, with the precision now lower than ours (73.6% versus 80.3%) and the f-score not much higher (80.6% versus 77.8%). This comparison is also between results with different test corpora and slightly different notions of what an object is.</Paragraph>
      <Paragraph position="54"> Summary, Discussion, and Speculation In this paper, we have presented a system for finding grammatical relationships that operates on easyto-find constructs like noun groups. The approach is guided by a variety of knowledge sources, such as readily available lexica a, and relies to some degree on well-understood computational infrastructure: a p-o-s tagger and an attaPShment procedure for preposition and subordinate conjunctions. In sample text, our system achieves 63.6% recall and 77.3% precision (f-score = 69.8) on our repertory of grammatical relationships.</Paragraph>
      <Paragraph position="55"> This work is admittedly still in relatively early stages. Our training and test corpora, for instance, are lessthan-gargantuan compared to such collections as the Penn Treebank \[Marcus et al., 1993\]. However, the fact that we have obtained an f-score of 70 from such sparse training materials is encouraging. The recent implementation of rapid annotation tools should speed up further annotation of our own native corpus.</Paragraph>
      <Paragraph position="56"> Another task that awaits us is a careful measurement of interannotator agreement on our version the grammatical relationships.</Paragraph>
      <Paragraph position="57"> We are also keenly interested in applying a wider range of learning procedures to the task of identifying these grammatical relations. Indeed, a fine-grained analysis of our development test data has identified some recurring errors related to the rule sequence approach. A hypothesis for further experimentation is that these errors might productively be addressed by revisiting the way we exploit and learn rule sequences, or by some hybrid approach blending rules and statistical computations. In addition, since generic, time and location modifiers, and LOC-OBJ and various PP-x arguments often have a similar surface appearance, one aResources to find a word's possible stem(s), semantic class(es) and subcategorization category(ies).</Paragraph>
      <Paragraph position="58"> might first just try to locate all such entities and then in a later phase try to classiC- them by type.</Paragraph>
      <Paragraph position="59"> Different applications will need to deal with different styles of text (e.g., journalistic text versus narratives) and different standards of grammatical relationships.</Paragraph>
      <Paragraph position="60"> An additional item of experimentation is to use our system to adapt other systems, including earlier versions of our system, to these differing styles and standards.</Paragraph>
      <Paragraph position="61"> Like other Brill transformation rule systems \[Brill and Resnik, 1994\], our system can take in the output of another system and try to improve on it.</Paragraph>
      <Paragraph position="62"> This suggests a relatively low expense method to adapt a hard-to-alter system that performs well on a slightly different style or standard. Our training al)proach accepts as a starting point an initial labeling of the data. So fat', we have used an empty labeling. However, our system could just as easily start from a labeling produced as the output of the hard-to-alter system. The learning would then not be reducing the error between an empty labeling and the key annotations, but between the hard-to-alter system's output and the key annotations. By using our system in this post-processing manner, we could use a relatively small retraining set to adapt, for example, the SPARKLE English parser, to our standard of grammatical relationships without having reengineer that parser. Palmer \[Palmer, 1.997\] used a similar approach to improve on existing word segmenters for Chinese. Trying this suggestion out is also something for us to do.</Paragraph>
      <Paragraph position="63"> This discussion of training set size brings up perhaps the most obvious possible improvement. Namely, enlarging our very small training set. As has been mentioned, we have recently improved our annotation environment and look forward to working with nmre data.</Paragraph>
      <Paragraph position="64"> Clearly we have many experiments ahead of us. But we believe that the results obtained so far are a promising start, and the potential rewards of the al)proach are very significant indeed.</Paragraph>
      <Paragraph position="65"> Appendix: Examples from Test Results Figure 1 shows some example sentences from the test results of our main experiment. '~ :@ marks the relationship that our system missed. * marks the relationship that our system wrongly hypothesized. In these examples, our system handled a number of phenomena correctly, including:  \[A man\] \[named\] \[Noah\], which makes one noun group a name or label for another noun group.</Paragraph>
      <Paragraph position="66"> Our system misses a PP-OBJ relationship, which is a low occurrence relationship. Our system also accidentally make both \[,4 man\] and \[Noah\] subjects of the group \[wrote\] when only the former should be.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML