File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/e91-1038_metho.xml
Size: 15,121 bytes
Last Modified: 2025-10-06 14:12:38
<?xml version="1.0" standalone="yes"?> <Paper uid="E91-1038"> <Title>The Semantics of Collocational Patterns for Reporting Verbs</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> II Semantic Collocations </SectionTitle> <Paragraph position="0"> Reporting verbs carry a varying amount of information regarding time, manner, factivity, reliability etc. of the original utterance. The most unmarked reporting verb is say. The only presupposition for say is that there was an original utterance, the assumption being that this utterance is represented as closely as possible. In this sense say is even less marked than re.</Paragraph> <Paragraph position="1"> porl, which in addition specifies an a(Iressee (usually implicit from the context.) The other members in the semantic fieM are set apart through their semantic collocations. Let us consider in depth the case of insist. One usage cart be found in the first part of the first sentence in Figure 1, repeated here as (1).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 The Bush administration continued to insist yes- </SectionTitle> <Paragraph position="0"> terday that it is not involved in negotiations over the Weslern hostages in Lebanon.</Paragraph> <Paragraph position="1"> The lexical definition of insist in the Longman Dictionary of Contemporary English (LDOGE) \[Procter78\] is insist 1 to declare firmly (when opposed) and in the Merriam Webster Pocket Dictionary (MWDP) \[WooJrr4\]: insist to take a resolute stand: PER, SIST. The opposition, mentioned explicitly in LDOCE but only hinted at in MWDP, is an important part of the meaning of insisl. In a careful analysis of a 250,000 word text base of TIME magazine articles from 1963 (TIMEcorpus) \[Berglerg0a\] we confirmed that in every sentence containing insist some kind of opposition could be recovered and was supported by some other means (such as emphasis through word order etc.). Tire most common form of expressing the opposition was through negation, as in (1) above. In an automatic analysis of the 7 million word corpus containing Wall Street Journal documents (WSJC) \[Berglerg0b\], we found the distribution of patterns of opposition reported in Figure 2. This analysis shows that of 586 occurrences of insist throughout tim VVSJC, 10O were instances of the idiom insisted on which does not subcategorize for a clausal complement. Ignoring I.hese occurrences for now, of the remaining 477 occurrences, 428 cooccur Oct</Paragraph> <Paragraph position="3"> but & subj.</Paragraph> <Paragraph position="4"> Comments occurrences throughout the corpus these have been cleaned by hand and are actually occurrences of the idiom insist on rather than accidental co-occurrences. occurrences of both insist and but in the same sentence null includes not and n'l includes would, could, should, and be Figure 2: Negative markers with insist in WSJC with such explicit markers of opposition as but (selecting for two clauses that stand in an opposition), not and n't, and subjunctive markers (indicating an opposition to factivity). While this is a rough analysis ;rod contains some &quot;noise&quot;, it supports the findings of our carefid study on the TIMEcorpus, namely the following: 2 A propositional opposition is implicit in the lexical semantics of insist.</Paragraph> <Paragraph position="5"> This is where our proposal goes beyond traditional colloeational information, as for example recently argued for by Smadja and McKeown \[Smadja&McKeown90\]. They argue for a flexible lexicon design that can accomodate both single word eutries and collocational patterns of different strength and rigidity. But the collocations considered in their proposal are all based on word cooccurrences, not taking advantage of the even richer layer of semantic collocations made use of in this proposal. Semantic collocations are harder to extract than cooccurrence patterns--the state of the art does not enable us to find semantic collocations automatically t. This paper however argues that if we take advantage of lexicai paradigmatic behavior underlying the lexicon, we can at least achieve semi-automatic extraction of semantic collocations (see also Calzolari and Bindi (1990) I But note the important work by Hindle \[HindlegO\] on extracting semantically similar nouns based on their substitutability in certain verb contexts. We see his work as very similar in spirit.</Paragraph> <Paragraph position="6"> - 2!7 and Pustejovsky and Anick (1990) for a description of tools for such a semi-automatic acquisition of semantic information from a large corpus).</Paragraph> <Paragraph position="7"> Using qualia structure as a means for structuring different semantic fields for a word \[Pustejovsky89\], we can summarize the discussion of tile lexical semantics of insist with a preliminary definition, making explicit tile underlying opposition to the ,xssumed context (here denoted by C/) and the fact that insist is a reporting verb.</Paragraph> <Paragraph position="8"> in the previous section we argued that certain semantic collocations are part of the lexical semantics of a word. In this section we will show that reporting verbs as a class allow logical metonymy \[Pustejovsky91\] \[l'ustejovsky&Anick88\]. An example caLL be found in (1), where the metonymy is found in tile subject, NP. The Bush administration is a compositional object of type administration, which is defined somewhat like (4).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 (Lexical l)elinition) </SectionTitle> <Paragraph position="0"> administration \[Form: + plural part of: institution\] \[Telic: execute(x, orders(y)), where y is a high official in the specific institution\] \[Constitutive: + human executives, officials,...\] \[Aoentive: appoint(y, x)\] In its formal role at least, i an administration does not fldfill the requirements for making an utterance-only in its constitutive role is there the attribute \[4_ human\], allowing for the metonymic use. Although metonymy is a general device -- in that it can appear in almost any context and make use of associations never considered before 2 -- a closer 2As the well-known examl)h.&quot; The ham sandwich ordered another coke. illustrates.</Paragraph> <Paragraph position="1"> look at the data reveals, however, that metonymy as used in newspaper articles is much more restricted and systematic, corresponding very closely to logical metonymy \[Pustejovsky89\].</Paragraph> <Paragraph position="2"> Not all reporting verbs use the same kind of metonymy, however. Different reporting verbs select for different semantic features in their source NPs. More precisely, they seem to distinguish between a single person, a group of persons, and an institution. We confirmed this preference on the TIMEcorpus, extracting automatically all tile sentences containing one of seven reporting verbs and analyzing these data by hand. While the number of occurrences of each reportitLg verb was much too small to deduce tile verb's lexical sema,Ltics, they nevertheless exhibited interesting tendencies.</Paragraph> <Paragraph position="3"> Figure 3 shows the distribution of the degree of animacy. The numbers indicate percent of total occurrence of the verb, i.e. in 100 sentences that contain insist as a reporting verb, 57 have a single person as their source.</Paragraph> <Paragraph position="4"> \]person I group I instil. \[ other The significance of the results in Figure 3 is that semantically related words have very similar distributions and that this distribution differs from the distribution of less related words. Admit, denied and insist then fall ill one category that we call call here informally \[-inst\], said and told fan in \[+person\], and claim * and announce fall into a not yet clearly marked category \[other\]. We are currently implementing statistical methods to perform similar analyses on WSJC. We hope that the impreciseness of an automated analysis using statistical methods will be counterbalanced by very clear results.</Paragraph> <Paragraph position="5"> The TIMEcorpus also exhibited a preference for one particular metonymy, which is of special interest for reporting verbs, namely where the name of a country, of a country's citizens, of a capital, or even of the building in which the government resides stands for the government itself. Examples are Great Britain/ The British/London/ Buckingham Palace announced .... Figure 4 shows the preference of the re- 218-I)orting verbs for tiffs metonymy in subject position. Again the numbers are too small to say anything about each lexical entry, but the difference in preference is strong enough to suggest it is not only due to the specific style of the magazine, but that some metonymies form strong collocations that should be reflected in the lexicon. Such results ill addition provide interesting data for preference driven semantic 4: Country, countrymen, or capital standing government in subject l)osition of 7 reporting IV A Source NP Grammar The analysis of the subject NPs of all occurrences of tile 7 verbs listed ill Figure 3 displayed great regularity in tile TIMEcorpus. Not only was the logical metonymy discussed in the previous section pervasive, but moreover a fairly rigid semanticgrammar for the source NPs emerged. Two rules of this semantic grammar are listed in Figure 5.</Paragraph> <Paragraph position="6"> source \[quant\] \[mod\] descriptor \[&quot;,&quot; name &quot;,&quot;\] J \[descriptor j((a J the) rood)\] \[mod\] name J \[inst's I name's\] descriptor \[name\] J The grammar exemplified in Figure 5 is partial -- it only captures the regularities found in the TIMEcorpus. Source NPs, like all NPs, can be adorned with modifiers, temporal adjuncts, appositions, and relative clauses of any shape. Tile important observation is that these cases are very rare in thc corpus data and must be dealt with by general (i.e. syntactic) principles.</Paragraph> <Paragraph position="7"> The value of a specialized semantic grammar for source NPs is that it provides a powerful interface between lexical semantics, syntax, and compositional semantics. Our source NP grammar compiles differeat kinds of knowledge. It spells out explicitly that logical metonymy is to be expected in the context of reportiog verbs. Moreover, it restricts possible metonymies: the ham sandwich is not a typical source with reporting verbs. The source gralnmar also gives a likely ordering of pertinent information as roughly</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> COUNTRYILOCATION ALLEGIANCE INSTITU- TION POSITION NAME. </SectionTitle> <Paragraph position="0"> This information defines esscntially the schema for the rei)resentation of the source in the knowledge ex-I.raction domain.</Paragraph> <Paragraph position="1"> We are currently applying this grammar to the data i,a WSJC in order to see whether it is specific to the TIMEcorpus. Preliminary results were encouraging: The adjustments needed so far consisted only of small enhancements such as adding locative PPs at the end of a descriptor.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> V LCPs Lexical Conceptual Paradigms </SectionTitle> <Paragraph position="0"> The data that lead to our source NP gratmnar was essentially collocational materiah We extracted tile sul)ject NPs for a set of verbs, analyzed the iexicalization of tile source and generalized the findings a. In this section we will justify why we think that tile results can properly be generalized and what impact this has on tile representation in the lexicon.</Paragraph> <Paragraph position="1"> It has been noted that dictionary definitions form a -- usually slmllow -- hierarchy \[Amsler80\]. Unfortunately explicitness is often traded in for conciseness in dictionaries, and conceptual hierarchies cannot be automatically extracted from dictionaries alone. Yet for a computational lexicon, explicit dependencies in the form of lexicai inheritance are crucial \[Briscoe&al.90\] \[Pustejovsky&Boguraev91\]. Following Anick and Pustejovsky (1990), we argue that lexical items having related, paradigmatic syntactic behavior enter into the same iezical conceptual paradigm. Tiffs states that items within an LCP will have a set ofsyntactic realization patterns for how the</Paragraph> <Paragraph position="3"> word and its conceptual space (e.g. presuppositions) are realized in a text. For example, reporting verbs form such a paradigm. In fact the definition of an individual word often stresses the difl'erence between it and the closest synonym rather than giving a constructive (decompositioual) definition (see LDOCE). 4 Given these assumptions, we will revise our definition of insist in (3). We introduce an I,CP (i.e. soma,Jtic type), REPOffFING VERB, which spells out the core semantics of reporting verbs. It also makes explicit reference to the source NI ) grammar dist'ussed in Section IV as the default grammar for the subject NP (in active voicc). This general template allows us to define the individval lexical entry concisely in a form close to norn,al dictionary d,;li,fifions: deviations and enhancements ,as well as restrictions of the general pattern are expressed for the i,,dividnal entry, making a COml)arison betweelt two entries focus on the differences in eqtailments.</Paragraph> <Paragraph position="4"> to the assumed proposition in the context, tb: insist only specifies an opposition, whereas deny actually negates that proposition. The entries also reflect ~' ll'he notion of LCPs is of course related to the idea of aemanlic fields \[Trier31\].</Paragraph> <Paragraph position="5"> their common preference not to participate in the metonymy that allows insiitulions to appear in subjcct position. Note t, hat opposed and negate are not assumed to be primitives but decompositions; these predicates are themselves decomposed further in the lexicon.</Paragraph> <Paragraph position="6"> Insist (and other reporting verbs) &quot;inherit&quot; much structural inforrnation from their semantic type, i.e, the LCP REPOR'I3NG VERB. It is the semantic type that actual.ly provides the constructive definition, whereas the individual entries only dclinC refinements on the type. This follows standard inheritance mechanisms for inheritance hierarchies \[Pustciovsky&Boguraev91\] \[Evans&Gazdar90\].</Paragraph> <Paragraph position="7"> Among other things the I,CI ) itEPOltTING VEiLB specilles our specialized semantic grammar for one of its constituents, namely the subject NP in nonpassive usage. This not only enhances tile tools available to a parser in providing semantic constraints useful for constituent delimiting, but also provides an elegant:way to explicitly state which logical metonymies are common with a given class of words 5.</Paragraph> </Section> </Section> class="xml-element"></Paper>