XML Viewer - p89-1023

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/89/p89-1023_abstr.xml
Size: 24,801 bytes
Last Modified: 2025-10-06 13:46:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="P89-1023">
  <Title>COMPUTER AIDED INTERPRETATION OF LEXICAL COOCCURRENCES</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
COMPUTER AIDED INTERPRETATION OF LEXICAL COOCCURRENCES
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="191" type="abstr">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> This paper addresses the problem of developing a large semantic lexicon for natural language processing. The increas~g availability of machine readable documents offers an opportunity to the field of lexieal semantics, by providing experimental evidence of word uses (on-line texts) and word definitions (on-line dictionaries).</Paragraph>
    <Paragraph position="1"> The system presented hereafter, PETRARCA, detects word e.occurrences from a large sample of press agency releases on finance and economics, and uses these associations to build a ease-based semantic lexicon. Syntactically valid cooccurenees including a new word W are detected by a high-coverage morphosyntactic analyzer. Syntactic relations are interpreted e,g. replaced by case relations, using a a catalogue of patterns/interpretation pairs, a concept type hierarchy, and a set of selectional restriction rules on semantic interpretation types.</Paragraph>
    <Paragraph position="2"> Introduction Semantic knowledge codification for language processing requires two important issues to be  considered: 1. Meaning representation. Each word is a world: how can we conveniently circumscribe the semantic information associated to a lexic,;d entry? 2. Acquisition. For a language processor, to implement a useful application, several thousands of terms must have an entry in the semantic lexicon: how do we cope with one such a prohibitive task?  The problem of meaning representation is one which preoccupied scientists of different disciplines since the early history of human culture. We will not attempt an overall survey of the field of semantics, that provided material for many fascinating books; rather, we will concentrate On the computer science perspective, i.e. how do we go about representing language expressions on a computer, in a way that can be useful for natural language processing applications, e.g. machine translation, information retrieval, user-friendly interfaces.</Paragraph>
    <Paragraph position="3"> In the field of computational linguistics, several approaches were followed for representing semantic knowledge. We are not concerned here with semantic languages, which are relatively well developed; the diversity lies in the meaning representation principles. We will classify the methods of meaning representations in two categories: conceptual (or deep) and coilocative (or surface). The terms &amp;quot;conceptual&amp;quot; and &amp;quot;collocative&amp;quot; have been introduced in \[81; we decided to adopt an existing terminology, even though our interpretation of the above two categories is broader than for their inventor.</Paragraph>
    <Paragraph position="4">  1. Conceptual Meaning Conceptual meaning is the cognitive content of words; it can be expressed  by features or by primitives. Conceptual meaning is &amp;quot;deep&amp;quot; in that it expresses phenomena that are deeply embedded in language.</Paragraph>
    <Paragraph position="5">  2. Collocatlve meaning. What is communicated  through associations between words or word classes. Coilocative meaning is &amp;quot;superficial&amp;quot; in that does not seek for &amp;quot;the deep sense&amp;quot; of a word, but rather it &amp;quot;describes&amp;quot; its uses in everyday language, or in some sub-w, rid language (economy, computers, etc.). It provides more than a simple analysis of cooccurr~aces, because it attempts an explanation of word associations in terms of conceptual relations between a lexical item and other items or classes.</Paragraph>
    <Paragraph position="6"> Both conceptual and collocative meaning representations are based on some subjective, human-produced set of primitives (features, conceptual dependencies, relations, type hierarchies etc.) on which there is no shared agreement at the current state of the art. As far as conceptual meaning is concerned, the quality and quantity of phenomena to be shown in a representation is subjective as well. On the contrary, surface meaning can rely on the solid evidence represented by word associations; the interpretation of an association is subjective, but valid associations arc an observable, even though vast, phenomenon. To confu'm this, one can notice that different implementations of lexicons based on surface meaning are surprisingly similar, whereas conceptual lexicons arc very dishomogeneous.</Paragraph>
    <Paragraph position="7"> In principle, the inferential power of collocative, or surface \[18\] meaning representation is lower than for conceptual meaning. In our previous work on semantic knowledge representation, however, \[10l \[18\] \[12\] we showed that a semantic dictionary in the style of surface meaning is a useful basis for semantic interpretation.</Paragraph>
    <Paragraph position="8"> The knowledge power provided by the semantic lexicon (limited to about I000 manually entered defmitions) was measured by the capability of the language processor DANTE \[2\] \[18\] \[11\] to answer a variety of questions concerning previously analyzed sentences (press agency releases on finance and economics). It was found that, even though the system was unable to perform complex inferences, it could successfully answer more than 90% of the questions \[12\]L In other terms, surface semantics seems to capture what, at first glance, a human reader understands of a piece of text.</Paragraph>
    <Paragraph position="9"> In\[26\] , the usefulness of this meaning representation method is demonstrated for TRANSALTOR, a system used for machine translation in the field of computers.</Paragraph>
    <Paragraph position="10"> An important advantage of surface meaning is that makes it easier the acquisition of the semantic lexicon. This issue is examined in the next section. Acquisition of Lexical Semantic Knowledge.</Paragraph>
    <Paragraph position="11"> Acquiring semantic knowledge on a systematic basis is quite a complex task. One needs not to look at metaphors or idioms to fred this; even the interpretation of apparently simple sentences is riddled with such difficulties that makes it hard even cutting out a piece of the problem. A manual codification of the lexicon is a prohibitive task, regardless of the framework adopted for semantic knowledge representation; even when a large team of knowledge enters is available, consistency and completeness are a major problem. We believe -that automatic, or semi-automatic acquisition of the lexicon is a critical factor in determining how widespread the use of natural language processors will be in the next few years. ' Recently a few methods were presented for computer aided semantic knowledge acquisition. A widely used approach is accessing on-line dictionary defmitions to solve ambiguity problems \[3\] or to derive type hierarchies and semantic features \[24\]. The information presented in a standard dictionary has in our view some intrinsic limitation: s definitions are often circular e.g. the definition of a term A may refer to a term B that in turn points to A; * definitions are not homogeneous as far as the quality and quantity of provided information: they can be very sketchy, or give detailed structural information, or list examples of use-types, or attempt some conceptual meaning definition; * a dictionary is the result of a conceptualization effort performed by some human specialist(s); this effort may not be consistent with, or The test was performed over a 6 month period on about S0 occasional visitors and staff members of the IBM Rome scientific center, unaware of the system capabilities and structure. The user would look at 60 different releases, previously analyzed by the system (or re-analyzed during the demo), and freely asks questions about the content of these texts. In the last few months, the test was extended to a different domain, e.g. the Italian Constitution, without significant performance changes. See the referenced papers for examples of sentences and of (answered and not answered) query types (in general wh-questions).  exl (from \[8\]):</Paragraph>
    <Paragraph position="13"> suitable for, the objectives of an application for which a language processor is built.</Paragraph>
    <Paragraph position="14"> Examples of conceptual meaning representation in the literature A second approach is using corpora rather than human-oriented dictionary entries. Corpora provide an experimental evidence of word uses, word associations, and language phenomena as metaphors, idioms; and metonymies.</Paragraph>
    <Paragraph position="15"> The problem and at the same time the advantage of corpora is that they are raw texts whereas dictionary entries use some formal notation that facilitates the task of linguistic data processing. No computer program may ever be able to derive formatted data from a completely unformatted source. Hence the ability of extracting lexical semantic information form a corpus depends upon a powerful set of mapping rules between phrasal patterns and human-produced semantic primitives and relations. We do not believe that a semantic representation framework is &amp;quot;good&amp;quot; if it mimics a human cognitive model; more realistically, we believe that a set of primitives, relations and mapping rules is &amp;quot;fair', when its coverage over a language subworld is suitable for the purpose of some useful language processing activity. Corpora represent an 'objective&amp;quot; description of that subworld, against which it is possible to evaluate the power of a representation scheme; and they are particularly suitable for the acquisition of a colloeative meaning based semantic lexicon.</Paragraph>
    <Paragraph position="16"> Besides our work \[19\], the only knowledge acquisition system based on corpora (as far as we know) is described in \[7\]. In this work, when an unknown word is encountered, the system uses pre-existing knowledge on the context in which the word occurred to derive its conceptual category.</Paragraph>
    <Paragraph position="17">  The context is provided by on line texts in the economic domain. For example, the unknown word merger in &amp;quot;another merger offer&amp;quot; is categorized as merger-transaction using semantic knowledge on the word offer and on pre-analyzed sentences referring to a previous offer event, as suggested by the word another. This method is interesting but reties upon a pre-existing semantic lexicon and contextual knowledge; in our work, the only pre-existing knowledge is the set of conceptual relations and primitives.</Paragraph>
    <Paragraph position="18"> PETRARCA: a method for the acquisition and interpretation of cooccurrences PETRARCA detects cooccurrences using a powerful morphologic and syntactic anal~er \[141 I11; cooccurences are interpreted by a set of phrasal-patterns/ semantic-interpretation mapping rules. The semantic language is Conceptual Graphs \[17\]; the adopted type hierarchy and conceptual relations are described in \[10l. The following is a summary description of the algorithm:  step 3 might produce more than one interpretation for a single word pattern, due to the low selectivity of some semantic rule.</Paragraph>
    <Paragraph position="19"> step 3 might fail to produce an interpretation for metonymies and idioms, which violate semantic constraints. Strong syntactic evidence (unambiguous syntactic rules) is used to &amp;quot;signal&amp;quot; the user this type of failure. Ex: Knowledge sources used by PETRARCA  IAGREEMENT}- * (PARTICIPANT)- * ICOMPANYi.</Paragraph>
    <Paragraph position="20"> 4. (A) Generalize the interpretations.</Paragraph>
    <Paragraph position="21"> Ex: Given the following examples: \[AGREEMENT l- * (PARTICIPANT)- &gt; ICOMPANYI.</Paragraph>
    <Paragraph position="22"> \[AGREEMENT\]- &gt; (PARTICIPANT)- * \[COUNTRY.ORGANIZATIONI. \[AGREEMENT}- * (PARTICIPANT)- * \[PRESIDENT I.</Paragraph>
    <Paragraph position="23"> derive the most general constraint: \[AGREEMENT\]- * (PARTICIPANT)- &gt; IHUMAN.ENTITYI. The above is a new case description added to the definition of AGREEMENT 5. (M) Check the newly derived entry.</Paragraph>
    <Paragraph position="24"> To perform its analysis, PETRARCA uses five knowledge sources: I. an on line natural corpus (press agency releases) to select a variety of language expressions including a new word W; 2. a high coverage morphosyntactic analyzer, to derive phrasal patterns centered around W; 3. a catalogue of patterns/interpretation pairs, called Syntax-to-Semantic (SS rules); 4. a set of rules expressing selectional restriction on conceptual relation uses (CR rules); 5. a hierarchy of conceptual classes and a  catalogue associating to words concept types.</Paragraph>
    <Paragraph position="25"> Steps marked (A) are automatic; steps marked (M) axe manual. The only manual step is the last one: this step is however necessary because of the following: The natural corpus and the parser are used in steps 1 and 2 of the above algorithm; SS rules, CR rules and the word/concept catalogue are used in step 3; the type hierarchy is used in steps 3 and 4  The parser used by PETRARCA is a high coverage morphosyntactic analyzer developed in the context of the DANTE system. The lexical parser is based on a Context Free grammar, the complete set of Italian prefixes and suffixes, and a lexicon of 7000 elementary lernmata (stems without affixes). At present, the morphologic component has an 100% coverage over the analyzed corpus (100,000 words) 1141 1131.</Paragraph>
    <Paragraph position="26"> The syntactic analysis determines syntactic attachment between words by verifying grammar rules and forms agreement; the system is based on an Attribute Grammar, augmented with lookahead sets I1\]; the coverage is about 80%; when compiled, the parsing time is around 1-2 see. of CPU time for a sentence with 3-4 prepositional phrases; the CPU is an IBM mainframe.</Paragraph>
    <Paragraph position="27"> The syntactic relations detected by the parser are associated to possible semantic interpretations using SS rules. An excerpt of SS rules is given below for the phrasal pattern:</Paragraph>
    <Paragraph position="29"> /'rul~ito del leonl (the rmlr of the lions)'/ NP_PP(&amp;quot;~mrdl,dl,'wottl '2) &lt;- reI(CHARACTERISTIC.d.I,'word2.'wordl). /'rintelllgenza delrtlomo (the intelligence of the man)'/ Overall, we adopted about 50 conceptual relations to describe the set of semantic relations commonly found in language; see \[10\] for a complete list. The catalogue of SS rules includes about 200 pairs.</Paragraph>
    <Paragraph position="30"> Given a phrasal pattern produced by the syntactic parser, SS rules select a first set of conceptual relations that are candidate interpretations for the pattern.</Paragraph>
    <Paragraph position="31"> Selectional restriction rules on conceptual relations are used to select a unique interpretation, when possible. Writing CR rules was a very complex task, that required a process of progressive refinement based on the observation of the results.</Paragraph>
    <Paragraph position="32"> The following is an example of CR rule for the conceptual relation PARTICIPANT: participant -- null has..participant: meeting, agreement, fly, sail is.participant: human_entity Examples of phrasal patterns interpreted by the participant relation are: John flies (to New York); the meeting among parties; the march of the pacifists,&amp;quot; a contract between Fiat and A lfa; the assembly of the administrators, etc.</Paragraph>
    <Paragraph position="33"> An interesting result of the above algorithm is the following: in general, syntax will also accept semantically invalid cooccurrences. In addition, in step 3, ambiguous words can be replaced by the &amp;quot;wrong&amp;quot; concept names. Despite this, selectional restrictions are able to interpret only valid associations and reject the others. For example, consider the sentence: &amp;quot;The party decided a new strategy&amp;quot;. The syntax detects the association SUBJ(DECIDE, PARTY). Now, the word &amp;quot;party&amp;quot; has two concept names associated with it: POL PARTY, and FEAST, hence in step 3 both interpretations are examined. I lowever, no conceptual relation is found to interpret the pattern &amp;quot;FEAST DECIDE&amp;quot;. This association is hence rejected.</Paragraph>
    <Paragraph position="34"> Simalirily, in the sentence: &amp;quot;An agreement is reached among the companies, the syntactic analyzer will submit to the semantic interpreter two associations: NP_PP(A GREEMENT, AMONG, COMPA N Y) and VP_PP(REACIt, AMONG,COMPANY) Now, the preposition among in the SS rules, points to such conceptual relations as PARTICIPANT, SUBSET (e.g. &amp;quot;two among all us&amp;quot;), and LOCATION (e.g. &amp;quot;a pine among the trees'% but none of the above relates a MOVE ACT with a IIUMAN ORGANIZATION. The association is m hence rejected.</Paragraph>
    <Paragraph position="35"> Future experimentation issues This section highlights the current limitations and experimentation issues with PETRARCA.</Paragraph>
    <Paragraph position="36"> Definition of type hierarchies PETRARCA gets as input not only the word W, but a list of concept labels CWi, corresponding to the possible senses of W. For each of these CWi, the supertype in the hierarchy must be provided. Notice .however that the system knows nothing about conceptual classes; the hierarchy is only an ordered set of labels.</Paragraph>
    <Paragraph position="37"> In order to assign a supertype to a concept, three methods are currently being investigated. First, a program may &amp;quot;guide&amp;quot; the user towards the choice of the appropriate supertype, visiting top down the hierarchy. This approach is similar to the one described in I261.</Paragraph>
    <Paragraph position="38"> Alternatively, the user may give a fist of synonymous or near synonymous words. If one of these was already included in the hierarchy, the same supertype is proposed to the user.</Paragraph>
    <Paragraph position="39"> A third method lets the system propose the supertype. The system assumes CW=W and proceeds through steps 1, 2 and 3 of the case descriptions derivation procedure. As the supertype of CW is unknown, CR rules are less effective at determining a unique interpretation of syntactic patterns. If in some of these patterns the partner word is already defined in the dictionary, its case descriptions can be used to restrict the analysis. For example, suppose that the word president is unknown in: The president nominated etc.</Paragraph>
    <Paragraph position="40"> Pertini was a good president' the knowledge on possible AGENTs for NOMINATE let us infer PRESIDENT &lt; HUMANENTITY; from the second sentence, it is possible to further restrict to: PRESIDENT&lt; HUMAN ROLE. The third m method is interesting because it is automatic, however it has some drawbacks. For example, it is slow as compared 1:o methods 1 and 2; a trained user would rather use his experience to decide a supertype. Secondly, if the word is found with different meanings in the sample sentences, the system might never get to a consistent solution. Finally, if the database includes very few or vague examples, the answer may be useless (e.g. ACT, or TOP). It should also be considered that the effort required to assign a supertype to, say, 10.000 words is comparable with the encoding of the morphologic lexicon. This latter required about one month of data entry by 5-6 part-time researchers, plus about 2-3 months for an extensive testing. The complexity of hierarchically organizing concepts however, is not circumscribed to the time consumed in associating a type label to some thousand words. All NLP researchers experimented the difficulty of associating concept  types to words in a consistent way. Despite the efforts, no commonly accepted hierarchies have been proposed so far. In our view, there is no evidence in humans of primitive conceptual categories, except for a few categories as animacy, time, etc. We should perhaps accept the very fact that type hierarchies are a computer method to be used in NLP systems for representing semantic knowledge in a more compact form. Accordingly, we are starting a research on semi-automatic word clustering (in some given language subworld described by a natural corpus), based on fuzzy set and conceptual clustering theories.</Paragraph>
    <Paragraph position="41"> Interpretation of idiomatic expressions In the current version of PETRARCA, in case of idiomatic expressions the user must provide the correct interpretation. In case of metaphors, syntactic evidence is used to detect a metaphor, under the hypothesis that input sentences to the system are syntactically and semantically correct. At the current state of implementation, the system does not provide automatic interpretation of metaphors. However, an interesting method was proposed in 1201. According to this method, when for example a pattern such as &amp;quot;car drinks&amp;quot; is detected, the system uses knowledge of canonical definitions of the concepts &amp;quot;DRINK&amp;quot; and &amp;quot;CAR&amp;quot; to establish whether ~CAR&amp;quot; is used metaplaorically as a HUMANENTITY, or &amp;quot;DRINK&amp;quot; is used metaphorically as 1&amp;quot;O BE FEDBY&amp;quot;. An interesting user aided computer program for idiomatic expressions analysis is also described in 1231.</Paragraph>
    <Paragraph position="42"> Generalization of case descriptions In PERTRARCA, phrasal patterns are first mapped into 'low level&amp;quot; case description; in step 4, &amp;quot;similar&amp;quot; patterns are merged into &amp;quot;high level' case descriptions. In a first implementation, two or three low level case descriptions had to be derived before creating a more general semantic rule. This approach is biased by the availability of example sentences. A word often occurs in dozens of different contexts, and only occasionally two phrasal patterns reflect the same semantic relation. For example, consider the sentences: The company signs a contract for newfimding The ACE stipulates a contract to increase its influence Restricting ourselves to the word &amp;quot;contract', we get the following semantic interpretations of syntactic patterns:</Paragraph>
    <Paragraph position="44"> In patterns 1 and 3 &amp;quot;sign&amp;quot; and &amp;quot;stipulate&amp;quot; belong to the same supertype, i.e.</Paragraph>
    <Paragraph position="45"> INFORMATIONEXCHANGE; hence a new case description can be tentatively created for CONTRACT: ICOl,C/rr~cl+.l. * (TI'llIMI~. &gt; IlI,+F'ORMA'rioI,,I+BXO.IA I~F. ! Indeed, one can tell, talk about, describe etc. a contract.</Paragraph>
    <Paragraph position="46"> Conversely, patterns 3 and 4 have no common supertype; hence two &amp;quot;low level&amp;quot; case descriptions are added to the definition of CONTRACT.</Paragraph>
    <Paragraph position="47"> lCONTRAC'rl. * (PURPOSE)- ~ ILmlJNDINGI ICOiCTRACI&amp;quot;I- &gt; (PURPOSE)- * lll'~'ll, ltt.,~IIl Even with a large number of input sentences, the system createsmany of these specific patterns; a human user must review the results and provide for case descriptions generalization when he/she feels this being reasonable.</Paragraph>
    <Paragraph position="48"> A second approach is to generalize on the basis of a single example, and then retract (split) the rule if a counterexample is found. Currently, we axe ~a'udying different policies and comparing the results; one interesting issue is the exploitation of counterexamples.</Paragraph>
    <Paragraph position="49"> Concluding remarks Even though PETRARCA is still an experiment and has many unsolved issues, it is, to our knowledge, the first reported system for extensive semantic knowledge acquisition. There is room for many improvements; for example, PETRARCA only detects, but does not interpret idioms; neither it knows what to do with errors; if a wrong interpretation of a phrasal pattern is derived, error correction and refinement of the knowledge base is performed by the programmer. However PETRARCA is able to process automatically raw language expressions and to perform a first  classification and encoding of these data. The rich linguistic material produced by PETRARCA provides a basis for future analysis and refinements. Despite its limitations, we believe this method being a first, useful step towards a more complete system of language learning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML