File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/p93-1033_metho.xml
Size: 17,037 bytes
Last Modified: 2025-10-06 14:13:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1033"> <Title>AN EMPIRICAL STUDY ON THEMATIC KNOWLEDGE ACQUISITION BASED ON SYNTACTIC CLUES AND HEURISTICS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> Keywords: Thematic Knowledge Acquisition, Syntac- </SectionTitle> <Paragraph position="0"/> </Section> <Section position="4" start_page="0" end_page="243" type="metho"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Natural language processing (NLP) systems need various knowledge including syntactic, semantic, discourse, and pragmatic knowledge in different applications. Perhaps due to the relatively well-established syntactic theories and forrc.alisms, there were many syntactic processing systew, s either manually constructed or automatically extenJ~d by various factory representation and acquisition methods of domain-independent semantic, disco~lrse, and pragmatic knowledge are not yet develo~d or computationally implemented. NLP systems 6f'.en suffer the dilemma of semantic representation. Sophisticated representation of semantics has better expressive power but imposes difficulties on acquF;ition in practice. On the other hand, the poor adequacy of naive semantic representation may deteriorate the performance of NLP systems. Therefore, for plausible acquisition and processing, domain-dependent semantic bias was 9ften employed in many previous acquisition systez, s (Grishman92b, Lang88, Lu89, and Velardi91).</Paragraph> <Paragraph position="1"> In thi~ paper, we present an implemented system that acquires domain-independent thematic knowledge using available syntactic resources (e.g.</Paragraph> <Paragraph position="2"> syntactic p~acessing systems and syntactically processed cort;ara). Thematic knowledge can represent semantic or conceptual entities. For correct and efficient parsing, thematic expectation serves as a basis for conflict resolution (Taraban88). For natural language understanding and other applications (e.g.</Paragraph> <Paragraph position="3"> machine translation), thematic role recognition is a major step. ~ematic relations may serve as the vocabulary shared by the parser, the discourse model, and the world knowledge (Tanenhaus89). More importantly, since thematic structures are perhaps most closely link~d to syntactic structures ($ackendoff72), thematic knowledge acquisition may be more feasible when only .:'yntactic resources are available. The consideration of the availability of the resources from which thematic knowledge may be derived promotes the practica2 feasibility of the acquisition method.</Paragraph> <Paragraph position="4"> In geaeral, lexical knowledge of a lexical head should (at ~east) include 1) the number of arguments of the lexic~-~l head, 2) syntactic properties of the arguments, and 3) thematic roles of the arguments (the argument ,:~ructure). The former two components may be eitt~er already constructed in available syntactic processors or acquired by many syntactic acquisition system s . However, the acquisition of the thematic roles of th~ arguments deserves more exploration. A constituent~ay have different thematic roles for different verbs in different uses. For example, &quot;John&quot; has different th,~matic roles in (1.1) - (1.4).</Paragraph> <Paragraph position="5"> (1.1) \[Agenz John\] turned on the light.</Paragraph> <Paragraph position="6"> (1.2) \[Goal rohn\] inherited a million dollars.</Paragraph> <Paragraph position="7"> (1.3) The magic wand turned \[Theme John\] into a frog.</Paragraph> <Paragraph position="8"> in ,into (1.4) The letter reached \[Goal John\] yesterday.</Paragraph> <Paragraph position="9"> To acquire thematic lexical knowledge, precise thematic roles of arguments in the sentences needs to be determined.</Paragraph> <Paragraph position="10"> In the next section, the thematic roles considered in this paper are listed. The syntactic properties of the thematic roles are also summarized. The syntactic properties serve as a preliminary filter to reduce the hypothesis space of possible thematic roles of arguments in training sentences. To further resolve the ambiguities, heuristics based on various linguistic phenomena and constraints are introduced in section 3. The heuristics serve as a general guidance for the system to collect valuable information to discriminate thematic roles. Current status of the experiment is reported in section 4. In section 5, the method is evaluated and related to previous methodologies. We conclude, in section 6, that by properly collecting discrimination information from available sources, thematic knowledge acquisition may be, more feasible in practice.</Paragraph> </Section> <Section position="5" start_page="243" end_page="244" type="metho"> <SectionTitle> 2. THEMATIC ROLES AND SYNTAC- TIC CLUES </SectionTitle> <Paragraph position="0"> The thematic roles considered in this paper and the syntactic clues for identifying them are presented in syntactic constituents of the arguments, 2) whether animate or inanimate arguments, 3) grammatical functions (subject or object) of the a;guments when they are Noun Phrases (NPs), and 4) p:epositions of the prepositional phrase in which the aaguments may occur, The syntactic constituents inc!t:de NP, Proposition (Po), Adverbial Phrase (ADVP), Adjective Phrase (ADJP), and Prepositional phrase (PP). In addition to common animate nouns (e.g. he, she, and I), proper nguns are treated as animate NPs as well. In Table 1, &quot;y&quot;, &quot;n&quot;, &quot;?&quot;, and &quot;-&quot; denote &quot;yes&quot;, &quot;no&quot;, &quot;don't care&quot;, and &quot;seldom&quot; respectively. For example, an Agent should be an animate NP which may be at the subject (but not object) position, and if it is in a PP, the preposition of the PP should be &quot;by&quot; (e.g. &quot;John&quot; in &quot;the light is turned on by John&quot;). We consider the thematic roles to be well-known and referred, although slight differences might be found in various works. The intrinsic properties of the thematic roles had been discussed from various perspectivez in previous literatures (Jackendoff72 and Gruber76). Grimshaw88 and Levin86 discussed the problems o_ ~ thematic role marking in so-called light verbs and aJjectival passives. More detailed description of the thematic roles may be found in the literatures. To illustrate the thematic roles, consider (2.1)(2.9). null (2.1) lag The robber\] robbed \[So the bank\] of \[Th the money\].</Paragraph> <Paragraph position="1"> (2.2) \[Th The rock\] rolled down \[Go the hill\].</Paragraph> <Paragraph position="2"> (2.3) \[In Tt,e key\] can open \[Th the door\].</Paragraph> <Paragraph position="3"> (2.4) \[Go Will\] inherited \[Qua million dollars\].</Paragraph> <Paragraph position="4"> (2.5) \[Th ~!e letter\] finally reached \[Go John\].</Paragraph> <Paragraph position="5"> (2.6) \[Lo &quot;121e restaurant\] can dine \[Th fifty people\]. (2.7) \[Ca A fire\] burned down \[Th the house\].</Paragraph> <Paragraph position="6"> (2.8) lAg John\] bought \[Be Mary\] \[Th a coat\] \[Ma reluctantly\].</Paragraph> <Paragraph position="7"> (2.9) lag John\] promised \[Go Mary\] \[Po to marry her\]. When a tr, lining sentence is entered, arguments of lexical verbs in the sentence need to be extracted before leart ing. This can be achieved by invoking a syntactic processor.</Paragraph> <Paragraph position="8"> * Volition Heuristic (VH): Purposive constructions (e.g. in order to) an0 purposive adverbials (e.g. deliberately and intentionally) may occur in sentences with Agent arguments (Gruber76). * Imperative Heuristic OH): Imperatives are permissible only for Agent subjects (Gruber76). * Thematic Hierarchy Heuristic (THH): Given a thematic hierarchy (from higher to lower) &quot;Agent > Location, Source, Goal > Theme&quot;, the passive by-phrases must reside at a higher level than the derived subjects in the hierarchy (i.e. the Thematic Hierarchy Condition in Jackendoff72). In this papzr, we set up the hierarchy: Agent > Location, Source, Goal, Instrument, Cause > Theme, Beneficiary, Time, Quantity, Proposition, Manner, Result. Subjects and objects cannot reside at the same level.</Paragraph> <Paragraph position="9"> * Preposition Heuristic (PH): The prepositions of the PPs in which the arguments occur often convey good discrimination information for resolving thematic roles ambiguities (see the &quot;Preposition in PP&quot; column in Table 1). * One-Theme Heuristic (OTH): An ~xgument is preferred to be Theme if itis the only possible Theme in the argument structure.</Paragraph> <Paragraph position="10"> * Uniqueness Heuristic (UH): No twc, arguments may receive the sanle thematic role (exclusive of conjunctions and anaphora which co-relate two constituents assigned with the same thematic role). If the sentence is selected from a syntactically processed corpus (such as the PENN treebank) the arguments may be directly extracted from the corpus. To identify the thematic roles of the arguments, Table 1 is consulted.</Paragraph> <Paragraph position="11"> For example, consider (2.1) as the training sentence. Since &quot;the robber&quot; is an animate NP with the subject grammatical function, it can only qualify for Ag, Go, So, and Th. Similarly, since &quot;the bank&quot; is an inanimate NP with the object grammatical function, it can only satisfy the requirements of Go, So, Th, and Re. Because of the preposition &quot;of&quot;, &quot;th~ money&quot; can only be Th. As a result, after con,;ulting the constraints in Table 1, &quot;the robber&quot;, &quot;the bank&quot;, and &quot;the money&quot; can only be {Ag, Go, So, Tb}, {Go, So, Th, Re}, and {Th} respectively. Therefore, although the clues in Table 1 may serve as a filter, lots of thematic role ambiguities still call for other discrimination information and resolution mechanisms.</Paragraph> </Section> <Section position="6" start_page="244" end_page="245" type="metho"> <SectionTitle> 3. FINDING EXTRA INFORMATION FOR RESOLVING THETA ROLE AMBIGUITIES </SectionTitle> <Paragraph position="0"> The remaining thematic role ambiguities should be resolved by the evidences from other sources.</Paragraph> <Paragraph position="1"> Trainers and corpora are the two most commonly available sources of the extra information. Interactive acquisition had been applied in various systems in which the oracle from the trainer may reduce most ambiguities (e.g. Lang88, Liu93, Lu89, and Velardi91). Corpus-based acquisition systems may also converge to a satisfactory performance by collecting evidences from a large corpus (e.g. Brent91, Sekine92, Smadja91, and Zernik89). We are concerned with the kinds of information the available sources may contribute to thematic knowledge acquisition.</Paragraph> <Paragraph position="2"> The heuristics to discriminate thematic roles are proposed in Table 2. The heuristics suggest the system the ways of collecting useful information for resolving ambiguities. Volition Heuristic and Imperative Heuriz'jc are for confirming the Agent role, One-Theme Heuristic is for Theme, while Thematic Hierarchy Heuristic, Preposition Heuristic and Uniqueness Heuristic may be used in a general way.</Paragraph> <Paragraph position="3"> It sh~ald be noted that, for the purposes of efficient acquisition, not all of the heuristics were identical to the corresponding original linguistic postulations. For example, Thematic Hierarchy Heuristic was motivated by the Thematic Hierarchy Condition (Jackendoff72) but embedded with more constraints to filter ou~ more hypotheses. One-Theme Heuristic was a relaxed version of the statement &quot;every sentence has a theme&quot; which might be too strong in many cases (Jack. mdoff87).</Paragraph> <Paragraph position="4"> Becaase of the space limit, we only use an example tc illustrate the idea. Consider (2.1) &quot;The robber rob'~ed the bank of the money&quot; again. As mentioned above, after applying the preliminary syntactic clues, &quot;the robber&quot;, &quot;the bank&quot;, and &quot;the money&quot; may be {Ag, Go, So, Th}, {Ge, So, Th, Re}, and {Th} respectively. By applying Uniqueness Heuristic to the Theme role, the argument structure of &quot;rob&quot; in the sentence can only be (AS1) &quot;{Ag, Go, So}, {Go, So, Re}, {Th}&quot;, which means that, the external argument is {Ag, Go, So} and the internal arguments are {Go, So, Re} and {Th}. Based on the intermediate result, Volition Heuristic, Imperative Heuristic, Thematic Hierarchy Heuristic, and Preposition Heuristic could be invoked to further resolve ambiguities.</Paragraph> <Paragraph position="5"> Volition Heuristic and Imperative Heuristic ask the learner to verify the validities of:the sentences such as &quot;John intentionally robbed the bank&quot; (&quot;John&quot; and &quot;the robber&quot; matches because they have the same properties considered in Table 1 and Table 2). If the sentence is &quot;accepted&quot;, an Agent is needed for &quot;rob&quot;. Therefore, the argument structure becomes (AS2) &quot;{Ag}, {Go, So, Re}, {Th}&quot; Thematic Hierarchy Heuristic guides the learner to test the validity of the passive Form of (2.1). Similarly, since sentences like &quot;The barb: is robbed by Mary&quot; could be valid, &quot;The robber&quot; is higher than &quot;the bank&quot; in the Thematic Hierarchy. Therefore, the learner may conclude that either AS3 or AS4 may be the argument structure of &quot;rob&quot;: (AS3) &quot;{Ag}, {Go, So, Re}, {Th}&quot; (AS4) &quot;{Go, So}, {Re}, {Th}&quot;.</Paragraph> <Paragraph position="6"> Preposition Heuristic suggests the learner to to resolve ambiguities based on the prel:ositions of PPs. For example, it may suggest the sys~.em to confirm: The money is from the bank? If sc, &quot;the bank&quot; is recognized as Source. The argument structure becomes (AS5) &quot;{Ag, Go}, {So}, {Th}&quot;.</Paragraph> <Paragraph position="7"> Combining (AS5) with (AS3) or (ASS) with (AS2), the learner may conclude that the arg~rnent structure of&quot;rob&quot; is &quot;{Ag}, {So}, {Th}&quot;.</Paragraph> <Paragraph position="8"> In summary, as the arguments of lexical heads are entered to the acquisition system, the clues in Table 1 are consulted first to reduce tiae hypothesis space. The heuristics in Table 2 are then invoked to further resolve the ambiguities by coliecting useful information from other sources. The information that the heuristics suggest the system to collect is the thematic validities of the sentences that may help to confirm the target thematic roles.</Paragraph> <Paragraph position="9"> The confirmation information required by Volition Heuristic, Imperative Heuristic. and Thematic Hierarchy Heuristic may come from corpora (and of course trainers as well), while Preposition Heuristic sometimes r, eeds the information only available from trainers. This is because the derivation of new PPs might generate ungrammatical sentences not available in general .:orpora. For example, (3.1) from (2.3) &quot;The key can open the door&quot; is grammatical, while (3.2) from (2.5) &quot;The letter finally reached John&quot; is ungrammatical.</Paragraph> <Paragraph position="10"> (3.1) The door is opened by the key.</Paragraph> <Paragraph position="11"> (3.2) *The letter finally reached to John.</Paragraph> <Paragraph position="12"> Therefore, simple queries as above are preferred in the method.</Paragraph> <Paragraph position="13"> It should also be noted that since these heuristics only serve as the guidelines for finding discrimination information, the sequence of their applications does not have significant effects on the result of learning. However, the number of queries may be minimized by applying the heuristics in the order: Volition Heuristic and Imperative Heuristic -> Thematic Hierarchy Heuristic -> Preposition Heuristic. One-Th',~me Heuristic and Uniqueness Heuristic are invoked each time current hypotheses of thematic roles are changed by the application of the clues, Volition Heuristic, Imperative Heuristic, Thematic Hierarchy Heuristic, or Preposition Heuristic. This is because One-Theme Heuristic and Uniqueness Heuristic az'e constraint-based. Given a hypothesis of thematic r~.es, they may be employed to filter out impossible combinations of thematic roles without using any qaeries. Therefore, as a query is issued by other heuristics and answered by the trainer or the corpus, the two heuristics may be used to &quot;extend&quot; the result by ft~lher reducing the hypothesis space.</Paragraph> </Section> class="xml-element"></Paper>