File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3152_metho.xml
Size: 11,340 bytes
Last Modified: 2025-10-06 14:13:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-3152"> <Title>CONCEPT-ORIENTED PARSING OF DEFINITIONS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Definitions and Meaning TypC/~ </SectionTitle> <Paragraph position="0"> Our starting-point is the fact that words, with regard to their meaning, can be classified into meaning _t~zpes. Words can have meanings that are predominantly conceptual, collocational, grammatical, figurative, connotative, stylistic and contextual/discursive. A word such as the geological term magma typically has a conceptual meaning only, another one such as bloody (as in 'you bloody fool') typically combines collocational meaning (intensification) with stylistic meaning aspects (very informal), whereas the same word, bloody, e.g. in a sentence like 'I got my bloody foot caught in the bloody chair' (example taken from LDOCE) mainly gets a discursive, a contextual, meaning (functioning as an (emotional) stopword). Different kinds of lexical meaning types require different descriptive treatments. So e.g. terms, showing 'par excellence' conceptual meaning, will require first and foremost conceptual meaning descriptions i.e. concept-oriented definitions. In what follows then we will concentrate on terms and their meaning as expressed in definitions, the typical locus tbr conceptual meaning information. Accordingly the parser we will present will be concept-oriented.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Conceot-oriented oarsim, of terms </SectionTitle> <Paragraph position="0"> The parser under discussion is set up to analyze definitions of medical terms in En~,lish. As such it is but one of the components of a system which at the moment consists of a preprocessor, a segmentor, a lexicon, a set of conceptual relations and a parser proper. In order to better understand the approach under discussion we will first give a general overview of the overall algorithm thereafter globally comment upon those aspects which are most relevant from a computational linguistic point-of-view (as it is impossible, given the amount of ACTES DE COLING-92. NANTES, 23-28 AOt3&quot;r 1992 9 g g Prtoc. OF COLING-92. NArCrES, AUG. 23-28. 1992 space and time, to give a full and detailed picture of the whole project).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Overall al~,orithm </SectionTitle> <Paragraph position="0"> The basic algorithm can be roughly characterized as consisting of the following steps: a. read definition b. segment definition c. look for head of definition d. check clues e. look for subhead(s) of definition f. fill frame subhead(s) taking into account</Paragraph> <Paragraph position="2"> g. fill flame head h. write sense frame A typical input reads like this: &quot;rheumatoid &quot;arthritis: a chronic disease of the musculo-skeletal system, characterized by inflammation and swelling of the joints, muscle weakness, and fatigue&quot; (taken from Collins Dictionary of the English Language 19862 ) The corresponding output looks like rheumatoid arthritis: \[disease gaffects musc_skel_syst\] \[disease has_qual chronic\] \[disease hassymptom fatigue\] \[disease has_symptom weaknessl \[disease has_symptom inflammationl \[disease hassymptom swellingl Iweakness - gaffects musclel lswelling gaffects jointsl linflammation g_affects joints\] In what tbllows we will try to make clear the main features (a system leading to) such a result implies.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Basic features </SectionTitle> <Paragraph position="0"> Up till now we have only dealt with definitions for ~lhS.C,~Cdi (terms for nosoiogy concepts).</Paragraph> <Paragraph position="1"> These definitions can be taken from all kinds of sources, e.g. from termbanks or from (terminological) dictionaries. The example given above should make clear that we work with analytical definitions exhibiting all kinds of difficulties in both lexis and ~ (such as structural ambiguities cf. 'inflammation' vs.</Paragraph> <Paragraph position="2"> 'inflammation of the joints').</Paragraph> <Paragraph position="3"> 2,2,2 Lcmmatizer-tagger as Front-end It goes without ,saying that a lemmatizer-tagger is a basic requirement for the efficient operation of the parser. This way text words (: word forms occurring in the definitional text) can be linked up with the items occurring in the lexicon (see below). For that purpose we use an adapted version of Dilemma (see Martin e.a. 1988 and Paulussen-Martin 1992).</Paragraph> <Paragraph position="4"> After having been lemmatized and tagged, the definition gets ,split up into smaller parts (segments) by the Ee.g!llgaI~. This module is a minimal syntactic processor which, on the basis of categorial information (such as Boolean values for NP compatibility and NP delimitation), delimits word groups in the input string. Unlike other approaches (such as Alshawi 1989) which make use of syntactic pattern matching techniques, syntax is kept to a strict minimum as one of our claims is that ACRES DE COL1NG-92, NANTES, 23-28 AOOT 1992 9 8 9 PROC. OF COLING-92, NANTEs, AUG. 23-28. 1992 much of what is done (by others) syntactically, can be left out when one disposes of more powerful, i.c. conceptual, knowledge. As a result our input definition now looks as follows ( \[ indicating delimiters, \[ \[ indicating boundaries): a chronic disease l of the musculo-skeletal system I , (characterized) I by inflammation I and swelling I of the joint(s) I, muscle weakness l, and fatigue I I.</Paragraph> <Paragraph position="5"> The knowledge banks which form the core of the system are the ilg_xJgg.a and the set of conceotual relations.</Paragraph> <Paragraph position="6"> A lexical entry, e.g. aids, is a three-place predicate consisting of the actual lexeme, its concept type and its word category. So: (aids, concept (nosology-concept, aids, lu, u, u, u, u, ul), n).</Paragraph> <Paragraph position="7"> As one will observe, the second argument, the concept type, consists of a sixtuple, i.c. six unspecificied slots. The parsing of definitions is precisely aimed at fillinf, or snecifving these slots.</Paragraph> <Paragraph position="8"> It is the set of conceptual relations that a concept type may have that determines this specification. At the moment such a I~ for diseases (nosology concepts), somewhat simplified, looks as follows:</Paragraph> <Paragraph position="10"> has qual (nos, qual) micro, funct, In our approach the universe of discourse is split up into 22 interrelated ooncepttypes, which, as a rule, form homogeneous subsets. At the center of it one finds nosology concepts which show relations with other concepttypes which in their turn may show relations with other concepttypes, which in their turn are related to other concepttypes, etc.</Paragraph> <Paragraph position="11"> At this point it is important to see that implicit concepts such as nosology concepts, (and so the conceptual meaning of the iexeme aids e.g.) can a.o. be defined/specified by concepts taken from the domain of macro- and micro-anatomy and that, in the given case, the relation between both arguments will be established. In this respect it is crucial for the parser to find the head conceot of the definitional phrase. It does so by setting up a syntax-based hypothesis (taking the rightmost noun occurring in front of the first delimiter) and checking it with conceptual knowledge. In case of a definition of aids as &quot;a group of diseases secondary to a defect in cell-mediated immunity associated with a single newly discovered virus&quot; (taken from Eurodicautom) in a first instance group will be taken up as head. Afterwards it will be rejected on conceotual grounds, a.o. because of the fact that group is not considered a medical concept. In other cases head shiftin~ will take place because of the fact that the head candidate can not be conceptually specified by its subheads (conceptual incompatibility between the assumed head and its subhead(s)). In the same vein, when being confronted with &quot;classes of phenomena that present great difficulties for all syntactic formalisms (...) \[One ofl, the most important of these being conjunction (...)&quot; (Winograd 1983, 257-258), the parser again will solve (or try to solve) these cases by making use of conceptual information. That, in the case of rheumatoid arthritis, it does not yield parses such as 'swelling of muscle AcrEs DE COLING-92, NANTES. 23-28 AOt~' 1992 9 9 0 PROC. OF COLING-92, NANTES. AUO. 23-28, 1992 (weakness)' and that it manages to combine 'joints' both with 'swelling' and 'inflammation' proves it to be fairly successful in this respect. Other examples of conceptual calculation imply the establishment of new concept types out of old ones. 'Throat' e.g., being a macroanatomical concept, becomes a finding concept when in combination with a qual concept such as in 'sore', this way 'sore throat' can 'fill' a symptom relation with a nosoiogy concept.</Paragraph> <Paragraph position="12"> This example also shows that the ~ is conceived as an ~ one: concepts are thought of as atoms from which more complex structures can be derived; if the latter are compositional and can be computed however, they are not taken up as such. Other examples of conceptual par~ing include the application of rules for PP-attachment. Compare: &quot;a disease characterized by a sense of constriction chest&quot; vs. &quot;a disease characterized by a sense of constriction in children&quot;. In the tbrmer case the PP 'in the chest' will be attached to the preceding concept 'constriction', in the latter the PP 'in children' will be attached to the head concept 'disease'. Local attachment of PP's (other than those introduced by 'ot ~) only prevails on global attachment (to the head) if certain conceptual conditions are met, such as the nature of the concept types in the PP tollowing a finding concept such as 'constriction'.</Paragraph> <Paragraph position="13"> Given a definition of which the head or conceptual type has been established, the parser tries to fill its conceptual template or frame as much as possible. It does so by looking recursively for pre- and postmodifiers (the latter are called subheads), which 'fit' the head (or its modifiers). Fitting here means that the concept type of the governed lexeme corresponds with the concept type of one of the arguments of the template of the governing iexeme. In the 'rheumatoid arthritis' example above e.g. the functional concept type of which 'musculo-skeletal system' is an instantiation, 'fits' or 'fills' the first argument or slot of the concept type rheumatoid arthritis belongs to.</Paragraph> <Paragraph position="14"> M.m. the .came can be ,said tor all the other slot-fillers.</Paragraph> <Paragraph position="15"> Front the above it will have become clear that for the representation of conceptual meaning we have chosen tbr a frame-based system (see e.g. Habel 1985): concept types are defined by frames, i.e. sets of conceptual slots, attributes or features.</Paragraph> </Section> </Section> class="xml-element"></Paper>