File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0309_metho.xml
Size: 22,470 bytes
Last Modified: 2025-10-06 14:14:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0309"> <Title>Qualia Structure and the Compositional Interpretation of Compounds</Title> <Section position="5" start_page="79" end_page="80" type="metho"> <SectionTitle> 3 Telic Qualia Modification </SectionTitle> <Paragraph position="0"> In order to illustrate our approach, we will start with examples such as bread knife (la), in which the modifying noun relates to the purpose of the head noun. The preferred interpretation of this compound is that it is a knife which is used to cut bread. The fact that a knife is an object whose inherent purpose is to cut things is encoded by the predicate cut_act in the TELIC role (see (3) above). The function of the modifier bread is to specify the third argument of the cut__act relation.</Paragraph> <Paragraph position="1"> The feature structure associated with bread knife will be as in (4). The first default argument D-ARG1 has been specialized from physobj to bread and this value is structure-shared with the third argument position in the cut_act predicate.</Paragraph> <Paragraph position="3"> In the GL representation, all of the participants which show up in the predicates in qualia are listed as default argument parameters in the ARGSTR.</Paragraph> <Paragraph position="4"> In order to account for the availability of compound forms in English, we utilize a family of phrase structure schemata. These schemata are essentially the same kind of entity as the Imme- null diate Dominance Schemata employed in Head-driven Phrase Structure Grammar (Pollard and Sag 1994). They are schemata which license the availability of complex nominals, which we treat as phrasal signs. These schemata are essentially phrase structure rules. Compounds are licensed and interpreted as part of the process of parsing.</Paragraph> <Paragraph position="5"> The combination of words into compound forms could also be captured using lexical rules (Flickinger 1987, Pollard and Sag 1987). We have chosen to use phrase structure schemata rather than lexical rules on the basis of storage considerations. Each lexical rule used for compounds will license a great many modifiers for large number of potential heads. If the lexical rules are used at a pre-compilation stage in order to flesh out the lexicon, allowing lexical rules for compounds will result in a massive increase in the size of the lexicon. For each noun, a huge number of compound forms will be generated. If you allow lexical rules for compounds to apply at runtime during the parsing process, then the storage problem is avoided, but then they are really not any different from phrase structure schemata.</Paragraph> <Paragraph position="6"> We will show the schemata as rules here. They can also be encoded as single feature structures. The basic structure of the schemata licensing the combination of nouns to form noun compounds is as in (5).</Paragraph> </Section> <Section position="6" start_page="80" end_page="80" type="metho"> <SectionTitle> MODIFIER NOUN HEAD </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> The schemata differ with respect to the constraints placed on the CONTENT values and the way in which the CONTENT values of the head and the modifier are composed to generate the CONTENT for the compound as a whole. The availability of compound forms such as bread knife, where the modifier specifies an argument in the TELIC, is accounted for by the schema in (6).</Paragraph> </Section> <Section position="7" start_page="80" end_page="81" type="metho"> <SectionTitle> MODIFIER NOUN HEAD </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> In this notation, the structures describing semantic types are the values of an attribute CON-TENT, and ORTH specifies the orthographic form. The CONTENT of the resulting compound is inherited from the head noun. In order to access the argument in the TELIC, the CONTENT value of the modifier is structure-shared with the first default argument in the CONTENT of the head.</Paragraph> <Paragraph position="3"> The modifying noun must be of semantic type individual and its CONTENT value is structure-shared with the D-ARG1 in the ARGSTR of the resulting compound. The lexical representation of the compound also contains an attribute DTRS containing a HEAD and a MOD value. These are structure-shared with the lexical representations for the head noun and the modifying noun respectively.</Paragraph> <Paragraph position="4"> This schema is one of a number which are used to license this kind of modification of default arguments. There will also be schemata for modification of other default arguments. The fact that the CONTENT of the compound always comes from the head noun is captured by having all of the compound phrase structure schemata, which are themselves implemented as types, all inherit the constraint specified by the structure-sharing index E\].</Paragraph> <Paragraph position="5"> As we saw before, if the modifier specifies an argument in the TELIC qualia role, the preposition in Italian is da. In order to account for the Italian forms, as in the English case, we utilize phrase structure schemata. In this case, the schema (7) specifies that the sequence HEAD NOUN, da, MODIFYING NOUN can be interpreted as having the semantic content of the modifying noun specify one of the arguments within the TELIC role.</Paragraph> </Section> <Section position="8" start_page="81" end_page="81" type="metho"> <SectionTitle> HEAD MODIFIER NOUN </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> The indeterminacy with respect to which argument in the TELIC is coindexed with the modifier in schema (7) is a shorthand representation. A number of phrase structure schemata are used, each specifying linking to a different argument position in the TELIC.</Paragraph> <Paragraph position="3"> For Italian, the nature of the modification can alternatively be directly encoded in the lexical entry for the preposition. The composition could then licensed by a more general phrase structure schema which would work with all of the different prepositions.</Paragraph> </Section> <Section position="9" start_page="81" end_page="82" type="metho"> <SectionTitle> 4 Agentive Qualia Modification </SectionTitle> <Paragraph position="0"> Compounds such as bullet hole and lemon juice (1 c,d), in which the modifier relates to the origin or bringing about of the object described by the head noun, are treated as modification of the AGENTIVE role. In the case of lemon juice, the head juice will have a squeeze_act as its AGENTIVE and the object squeezed will be listed as a default argument. The function of the modifying noun lemon is to further subtype this argument. This is possible because lemon is a subtype of fruit.</Paragraph> <Paragraph position="1"> These English forms will be accounted for by another schema licensing default argument type specification, like that in (6) above. The resulting representation for lemon juice is as in (8). The corresponding forms in Italian utilize the preposition di. The Italian forms are accounted for by a schema like (8), except that the preposition is di and the linkage is to the AGENTIVE qualia role.</Paragraph> <Paragraph position="3"/> </Section> <Section position="10" start_page="82" end_page="82" type="metho"> <SectionTitle> 5 Constitutive Qualia Modification </SectionTitle> <Paragraph position="0"> Another common function of modifiers in complex nominals is to specify a subpart of the denotation of the head noun or the material of which it is composed. Examples of this are given in (1 e,f).</Paragraph> <Paragraph position="1"> In our treatment, this involves modification of the CONSTITUTIVE role. The prepositions used in Italian for this: sort of modification are a and al. The modifiers glass and silicon denote materials.</Paragraph> <Paragraph position="2"> When composed with nominals such as door and breast they specify elements of the CONSTITUTIVE role. For example, glass door is represented as in (9). These forms are licensed using further phrase structure schemata for English and Italian.</Paragraph> <Paragraph position="4"> The basic pattern established so far is that modification of TELIC, AGENTIVE, and CONSTITUTIVE involves da, di, and a, respectively. This is a useful generalization but the correspondence between the different qualia roles and different choices of preposition in Italian is not as clear cut as this suggests. In the examples of TELIC qualia modification considered so far (1 a,b), the modifying noun was always of type individual. Matters become more complex when compounds in which the modifying noun describes an event are considered. These are addressed in the next section.</Paragraph> </Section> <Section position="11" start_page="82" end_page="84" type="metho"> <SectionTitle> 6 Telic Event Modifiers </SectionTitle> <Paragraph position="0"> In some forms where the modifier describes an event, the appropriate preposition in Italian is da, as in the forms in (10), while others the preposition is di, as in the forms in (11).</Paragraph> <Paragraph position="1"> (10) a. hunting rifle b. race car c. carving wood fucile d._._a caccia macchina d__a corsa legno d___a intaglio (11) a. destruction weapons b. credit card c. rest home axmi d i distruzione carta d~ credito casa d~ riposo d. concentration camp e. divorce procedure campo d_..i concentramento procedura d i divorzio In general, the TELIC use of the preposition di appears to select consistently for modifiers which denote events. Even though this does not yet explain the difference between (10) and (11), it already provides us with a restriction on the use of prepositions. In other words da selects for any type, while di is restricted to events. We assume the Vendlerian distinction between activities, states, accomplishments, and achievements. In addition, we adopt a decompositional view of event structure, as outlined in Pustejovsky (1991), in which the event structure representation of a lexical item makes reference to the configurational properties of subevents and arguments. In this framework, which allows us to make fine grained distinctions between event types, we can determine the selectional properties of di and da, on the basis of the event type of the modifiers. Nominals such as hunting, race, and carving describe activities. Nominals such as destruction, credit, and so on, in (11) above, describe the result of an activity. This distinction arises quite clearly in the glosses of (10) and (11). Compound forms such as hunting rifle or race car in (10), describe respectively an instrument which is used when hunting, and a vehicle that is driven for the purpose of racing. Conversely, the reading of the compounds in (11) makes explicit the result which is achieved by using a particular object. In particular (lla) refers to weapons that bring about destruction; (llb) to a card that brings about a credit, and so on.</Paragraph> <Paragraph position="2"> Unlike the operation which derives bread knife by associating the modifier to an argument position in the TELIC role of bread, the compositional operations which involve events produce a more complex structure. We argue that compounds where the modifying noun describes an event, such as those in (10), involve co-composition of the qualia structures of the head and the modifier. The resulting representation has a complex TELIC role with &quot;sub-qualia&quot;. In the case of hunting rifle, the TELIC of rifle, which is fire provides the AGENTIVE within the TELIC of the compound. The modifier hunting is a process nominal and provides hunt as the TELIC within the TELIC of the compound. Through the application of phrase structure schemata which constrain this co-composition, we obtain the representation in (12) for hunting rifle.</Paragraph> <Paragraph position="3"> * hunting rifle</Paragraph> <Paragraph position="5"> The interpretation of the compound form hunting rifle can be glossed as follows:&quot;a rifle which is used in its typical capacity (i.e. firing) for the purpose of performing the activity of hunting.&quot; The assignment of a complex structure to an individual quale is coherent with the general interpretation of qualia structure. Exploiting these recursive properties of event-denoting qualia is not an ad-hoc move to account for the interpretation of complex nominals but is also motivated by the behavior of agentive nominals and their semantic contribution in context (cf. Busa 1996).</Paragraph> <Paragraph position="6"> The modifying noun in Italian complex nominals with the preposition di describes the result that is achieved by performing the particular function associated with the head noun. The nominal destruction, in (lla), unlike the event nouns hunting and race which denote activities, is the nominalization of the transitional event denoted by the verb destroy. The two subevents, namely the process and the resulting state, in the event structure representation of the verb, are encoded in the nominalized form as separate events in the AGENTIVE and FORMAL roles, and they are related by the relation of temporal precedence <o~. As argued in Pustejovsky (1995) this representation gives rise to the polysemous behavior of the nominal. It alternates between a processand a result interpretation. In destruction weapon, the embedded AGENTIVE in the TELIC is again the TELIC of the head weapon, and the embedded TELIC is the resulting state from the semantics of destruction.</Paragraph> <Paragraph position="7"> ! The resulting TELIC is a process-result-lcp, as shown in (13).</Paragraph> <Paragraph position="9"> The analysis of AGENTIVE modification is also more complex. In addition to di, della is also found for subtyping of arguments in the AGENTIVE. In other cases, such as morte da annegamento, death from drowning and bruciatura da sole, sun burn, the preposition is da. This preposition da has a different meaning from the one associated with the TELIC. It corresponds to the English preposition from and it is interpreted as introducing an experiencing relation. It is found in cases where the head noun is an event and the modifier introduces the causal factor which brought about that event. We turn now to consider some of the applications of this work in more detail.</Paragraph> </Section> <Section position="12" start_page="84" end_page="86" type="metho"> <SectionTitle> 7 Applications </SectionTitle> <Paragraph position="0"> The analysis of complex nominal constructions presented in this paper has a range of important applications in natural language processing. Complex nominals play an important role in the encapsulation and expression of nominal concepts and are frequent in a wide variety of types of texts. Therefore, the ability to handle complex nominals is essential for parsing and generation systems for either English or Italian. It is important to note that systems utilizing compositional apparatus for the analysis of complex nominals need not treat all compounds compositionally. The optimal arrangement will be to list frequent and idiosyncratic compound forms in the lexicon and use the compositional apparatus for forms which are not listed, or in instances when the listed interpretation is ruled out by context. We would also like to point out that we do not expect to develop an analysis which will handle all and every compound form. Our target is to have an account which will handle the majority of productive compounding patterns. Another important use of the compositional apparatus described here is in lexical acquisition of compound forms.</Paragraph> <Paragraph position="1"> This machinery can be used to indicate potential interpretations for compounds. A human editor can then select the appropriate interpretation from the candidate set and add have the compound added to the lexicon.</Paragraph> <Paragraph position="2"> Given the range of different semantic relations that can hold between the elements of a complex nominal, they are frequently ambiguous. English compounds are worse than Italian post-modified forms in this respect, since in Italian the preposition gives at least some indication of the relation involved. The approach described in this paper constrains the interpretation of complex nominals using the type system. For example, the schema in (6), which accounts for bread knife, requires the modifying noun to be typed as individual. This limits the set of potential modifiers to those typed as individual. Since the content of the modifier is structure-shared with an argument position within the TELIC, this set of potential modifiers is further constrained by type constraints imposed by the relation in the TELIC role. The cut_act will require the object cut to be a separable object. It could potentially require the cutter to be significantly harder than the object to be cut. Type constraints of this kind serve to greatly reduce the degree of ambiguity in a given complex nominal, but it will still generally be the case that more than one interpretation is predicted for a given form. For example, a form like bone knife could be interpreted either as a knife used for cutting bone or a knife made of bone. The approach described here needs to be integrated with further mechanisms and heuristics in order to determine the best guess for complex nominal interpretation in any given case. One important class of mechanisms are those which examine the current sentential and discourse context in order to restrict the range of interpretations. For example, if bone knife appears in a medical text, bone most probably specifies the object to be cut by the knife, while if it shows up in a text concerning prehistoric man, bone most probably refers to the constitution of the knife. One way in which compounds can be further disambiguated is through the incorporation of a statistical model as one of the heuristics employed in determining the appropriate interpretation. In such an approach, one could train on a data set comprised of compounds paired with an indication of the relation holding between the head and the modifier. The resulting model would provide the probability that a given complex nominal involves a particular kind of modification relation. In order to have useful predictive power, it would be best to assign semantic types to the elements of the complex nominal and determine the probability that a complex nominal consisting of words of types A and B involves modification relation C. Given the sparsity of data to support a statistically based approach we believe that the way forward in this area is to pursue the integration of a rule-based approach with a statistical model. Such integration has already proven effective in the treatment of sense extension phenomena (Copestake and Briscoe 1995). We leave further investigation of this integration for future work.</Paragraph> <Paragraph position="3"> This work also has important consequences for applications in multilingual natural language processing. The most obvious of these is the use of a cross-linguistic approach to complex nominals in machine translation. Translation of complex nominals from Italian to English will be more straightforward, since there is a loss of information rather than a gain. It is important to note, however, that not all Italian complex nominals involving post-modification can be translated as noun-noun compounds in English. For example, forms such as coltello d._..a macellaio (literally, knife of butcher), in which the modifier is an agent using the object described by the head, does not translate as butcher knife. In English, the appropriate nominal construction in this case uses the possessive: butcher's knife.</Paragraph> <Paragraph position="4"> Translation from English to Italian is substantially more difficult given the difference in explicitness regarding the semantic relation between the head and modifier. In order to generate the proper output in Italian, it is necessary to determine the relation between the elements in the English compound structure and to determine the appropriate preposition in Italian for expression of that relation. One approach to this task is to use the GL representation language essentially as an interlingua (McDonald 1995). The phrase structure schemata for English are used in order to determine potential interpretations for a given English compound construction. The most likely interpretation from the candidate set is picked on the basis of contextual and statistical models.</Paragraph> <Paragraph position="5"> The CONTENT of the chosen candidate is then matched against the outputs of the various phrase structure schemata used for Italian. When an appropriate schema is identified it is instantiated with lexical items from the Italian lexicon in order to generate the Italian translation. An important feature of this approach is that it utilizes resources which are independently needed for analysis of the languages involved. Aside from translation, the phrase structure schemata can also be used for multi-lingual generation. If a particular concept is encoded in the GL lexical representation language, the language-specific phrase structure schemata can be employed to generate the corresponding complex nominal in each language.</Paragraph> <Paragraph position="6"> In addition to the importance of successful translation of complex nominals for full-text machine translation, this functionality is useful in itself for applications in multi-lingual information retrieval and information extraction. Since complex nominals are so frequently used to coin terms which encapsulate important distinguished concepts within a domain, their successful identification and processing is an essential element of determination of the topic of a text and the:, provide important hooks for information retrieval. In a multi-lingual setting, such as information retrieval over the World Wide Web, it may be desirable for a search for a complex nominal from one language to yield documents regarding the same concept in other languages. The approach to translation of complex nominals described above enables this functionality. For a given form compound form in English it is possible to determine potential realizations of that form in Italian.</Paragraph> </Section> class="xml-element"></Paper>