File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0715_metho.xml
Size: 18,107 bytes
Last Modified: 2025-10-06 14:14:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0715"> <Title>A Formal Model of Text Summarization Based on Condensation Operators of a Terminological Logic</Title> <Section position="3" start_page="97" end_page="98" type="metho"> <SectionTitle> 2 The Terminological Knowledge </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="97" end_page="97" type="sub_section"> <SectionTitle> Representation Model </SectionTitle> <Paragraph position="0"> In the following, we describe a subset of a terminological logic (for an introduction to ~ts underlying basic notatlonal conventions, cf (Woods & Schmolze 92)) Sectmn 2 1 considers the terminological component, whde Section 2 2 deals with appropriate extensions for representing text-specific knowledge</Paragraph> </Section> <Section position="2" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 2.1 .The Basic Terminological Component </SectionTitle> <Paragraph position="0"> We dmtmgmsh two kinds of relations, namely properttes and conceptual relationships A property denotes a relation between individuals and string or integer values A conceptual relatsonshsp denotes a relation between two mchv~duals The concept description language prowdes constructs to formulate necessary (and possibly sufllcmnt) conditions on the properties and conceptual relationships every element of a concept class m reqmred to have The syntax of thin language m given m Fig 1</Paragraph> <Paragraph position="2"> F~gure 1 Syntax of a Terminological Logic Every constructor m Fig 1 can be used to define a concept class (cf Fig 5) The all-p constructor introduces the class of mdlwduals all Of which have a certain property (whose value can vary from individual to individual) For example, (all-p prsce \[$200,$5000\]) denotes the class of individuals that have a property called 'price' w~th a value ranging between $200 and $5000 An individual can only have one value for each of Its propertins (cf Fig 2) The alLr constructor introduces a class of individuals that all partlctpate m,~ertam kind of relatlonsh\]p to individuals from One of the concept classes given m the constructor For example, (all-r equzpped-wzth OperatmgSystem ApphcatsonSoftware) denotes the class of individuals that are m a relationship called 'eqmpped-wlth' only to individuals of the class 'OperatmgSystem' or the class 'ApphcatlonSoftware' The dmtmctlon between the constructs all-p and all-r m uncommon m the domain of terminological logics (Woods 8z Schmolze 92), because primitive types hke stnng and integer are usually considered to be concept classes as well As we wdl see m Section 3, the terminological reasomng underlying the text condensation process explmts thin dmtmctlon between properties and relatmnshlps The exist-v constructor introduces the class of individuals that all have a certain property value For example, (exlst-v wezght 6 51bs ) denotes the class of individuals that have a property called 'weight' with the value '6 51bs ' The exist-c constructor defines the class of individuals t\]~at have a conceptual relatloushlp to at least one individual of a specific concept class For example, (exlst-c has-part Cpu) denotes the class of mdlvlduals that are ma relationship called 'has-part' to at least one individual of the class 'Cpu' With the and constructor several class descriptions can be combined into one (cf Fig 5) The model-theoretic semantles of the terminological languagewe use m depicted in Fig 2</Paragraph> </Section> <Section position="3" start_page="97" end_page="98" type="sub_section"> <SectionTitle> 2.2 Representing Text Knowledge </SectionTitle> <Paragraph position="0"> TOPIC's text parser heavily rehes on terminological knowledge about the domain the texts deal wlth (Hahn 89). In the course of text analysm, the parser extends thin dommn knowledge incrementally by new concept definltlons In order to dlstmgumh.</Paragraph> <Paragraph position="1"> between prior dommn knowledge and newly acqmred text knowledge we extend our basic terminological language wlth the constructs specified m Fig 3 The operator _~T mdlcates a pnmltlve concept originate mg from the text analysm Only a Im~ited number of constructs can be used for such a concept defimtlon - they correspond to the kinds of knowledge the parser can extract from a text (see Fig 5) * A new concept can only be acquired when the text makes a reference to a superordmate concept already known m the domain knowledge Thus, the concept expression on the right-hand side of the _(T construct must comprme a reference to a superordmate concept, as expressed</Paragraph> <Paragraph position="3"> by the syntax * Properties of a new concept can be learned (exlst-v construct) * Relationships to other concepts can be learned (exlst-c construct) m case the relatlonshlp range m already defined by a corresponding all-r construct The text-knowledge-specflic versions of the exist-v and exist-c constructs have an additional argument whlch serves as a flag that is set whenever one of these constructs is added to a concept descnptlon 0 e, when the assoclated property or relatlonshlp has been learned) The text condensatmn component of TOPIC makes use of tlns flag m or-. der to determine those facts whlch have been learned since a certain reference point (where all flags were set to 0) Besides acqmrmg new domain knowledge from a text, the parser performs book-keeping activities In order to record how often a concept, a prop-erty of a concept, or a relatmnslnp to another concept m explicitly or tmphcltly mentioned In the text For this purpose, we provide the constructs ccount, pcount, and rcount for concept descriptions These constructs belong to the text knowledge and can be apphed to concept descriptions derived from the text as well as to concepts of the dommn knowledge The ccount (pcount) construct indicates how often (a property of) a concept has been mentioned, whereas (rcount re/conc awe,ght) indicates how often the relationship tel to a concept conc has been referred to We call the numbers introduced by the count operators actwatson wesghts An .(rcount re/ conc awe,ght) construct can only occur as part of a text concept description when it also contains a construct (an-r tel cl ca) where conc m subsumed by one of the c~s If thin m not the case, rcount refers to a concept being related via a relationship rel which m not m the range of this reta- . tlonslnp - thus, the rcount statement would make no sense Since none of the count constructs (and the flags) make an assertion about the meaning of the concepts revolved, they have no Influence on the concepts' extension (cf Fig 4) Fig 5 illustrates the apphcatlon of multiple knowledge base operatlons resulting in the text knowledge representation for the newly learned concept 'Notebooster' as a speclahzatlon of 'Notebook'</Paragraph> </Section> </Section> <Section position="4" start_page="98" end_page="101" type="metho"> <SectionTitle> 3 Text Knowledge Condensation </SectionTitle> <Paragraph position="0"> The text condensation process examines the text knowledge base generated by the parser to determine certmn chstnbutlons of activation weights, patterns of property and relatlonslnp assignments to con-.</Paragraph> <Paragraph position="1"> cept descriptions, and particular connectwlty patterns of active concepts m the concept hierarchy These constitute the basra for the construction of thematic descriptions as the result of text condensation Only the. most sigmficant concepts, relationships and properties (hereafter called sahent) are considered as part of a topic description (cf Section 3 1) Thus, text condensation (or, equally, text summanzatlon) can be considered an abstrachon process on (tezt) knowledge bases A topsc descrzpt:on m a combmat|on of salient concepts, relationships and properties of a formal text umt The computation of these concepts m started only m certain well-defined Intervals In the sub-language domain of expository texts, at least, topic shifts occur predominantly at paragraph boundaries Therefore, text condensation is started at the end of every paragraph so that thematic overlaps as well as topic breaks between adjacent paragraphs can be detected and the extension of a topic be exactly dehmlted The condensatmn process ymlds a set of topic descr~pt=ons, each one charactenzmg one or more adjacent paragraphs of the text (cf Section 3 2) Finally, the entire collection of topic descriptions of a single text can be generahzed m terms of a hmrarchlcal tezt graph (cf Section3 3), the representatmn form of a text summary</Paragraph> <Section position="1" start_page="99" end_page="99" type="sub_section"> <SectionTitle> 3.1 Condensation Operators </SectionTitle> <Paragraph position="0"> We apply several operators to text knowledge bases to detenmne which concepts, properties, and-relationships play a dominant role m the corresponding texts and thus should become part of their topic description All of these operators are grounded m the semantics of the underlying terminological logic Some of the operators make addltmnal use of cut-off values which are heurmtlcally motwated and have been evaluated emptrically</Paragraph> </Section> <Section position="2" start_page="99" end_page="101" type="sub_section"> <SectionTitle> Salient Concepts: </SectionTitle> <Paragraph position="0"> There are several criteria to determine salient concepts The most simple, less &quot;knowledgeable&quot; criterion conmders all those concepts sahent whose activation weight exceeds the average actwatlon weight of all active concepts 1 A second criterion renders a concept sahent, ff the total sum of references made to propertms of It and to relationships to other concepts.m greater than it m, on the average, the case for all other active concepts (SC1) exploits the structure of the aggregation luerarchy and evaluates it by the associated actwation weights (for the defimtmns of sets and functions we use below, cf Table 1)</Paragraph> <Paragraph position="2"> Wlnle (SC1) checks the total number of references made to any property or relationship, (SC2) m concerned with the number of dsfferent Propertms and relationships mentioned * 1Throughout the paper, we call a concept c an active</Paragraph> <Paragraph position="4"> rcount(c, rel, c') = n, ff c .~T (and &quot; (rcount tel c' n) ) O, else n, ff c < (and (pcount prop n) ) pcount(c, prop)= n, \]f c --<T (and (pcount prop n) ) O, else I, ff rpcount(c, rp) > 0 rpachve(c, rp) = O, else ( ~ex,,tc(c, rp, c'), ff rp ~ R 1, ~c --<T (and (exist-c rel c' f) ) A f # 0 exzstc(c, tel, c') = { O, ex~stv(c, prop,</Paragraph> <Paragraph position="6"> Th e following two cnterm explozt the inherent speclalzzatmn structure of concept hzerarchzes (cf also (Lm 95) for a slmzlar perspectwe on using semantm generalzzatmn relatmns for the computatmn of concept salmnce) They thus resemble criteria as used for the defimtmn of macro rules to achmve summanes of texts(Correzra 80, D~k 80, Fum et al 85) These criteria also incorporate some notmn of graph connectzvzty that has previously been conszdered by (Lehnert 81) for text summarLzatmn purposes (SC3) determines an actwe concept c as bemg salmnt sff a slgmficant amount of subordinates of c are actwe, too (SC4)zs szmflar but zt marks all non-actzve (t) concepts as being salmnt winch are related to a slgmficcant number of actwe subordinates Thus, concepts can be included m the topm descnptmn winch have never been mentioned exphcltly m a text (SC4) only ymlds the most spectfic concepts, z e,zt excludes concepts for whmh the main criterion zs fulfilled, but which are superorchnate to another concept that also fulfills the criterion Lastly, (SC4) has a more stnngent cut-off criterion Tins m necessary because zt makes non-actwe concepts sahent, accordingly, one has to be careful not to include \]rrelevant concepts Therefore, (SC4) reqmres a quarter of all subordinates (at least 3) to be actwe, whzle (SC3) has a relatwe cut-off, value winch gives lower percentages for greater numbers of subordinates (the cut-off values have been determined empmcally)</Paragraph> <Paragraph position="8"> Salient Relationships and Salient Properties: Just as certain concepts may have been dealt with.</Paragraph> <Paragraph position="9"> more extensively in a text than other ones, tangle features of a concept definition may have been more focused on than other features of the same concept The following criterion renders a relationship (or property) rp sahent tf the number of concepts (or property values) to which e has been related via rp is greater than it m, on the average, the case for relationships (or properties) In c Note that c must be a concept learned dunng text parsing, as learning new features m only possible for such concepts (SR1) is evaluated for salient concepts only because we are not interested in sahent features of concepts being irrelevant for a topic description ....</Paragraph> <Paragraph position="10"> (SR1) A relationship or property rp of a salient concept c is considered salient in the context of c lff rpaetzve(c, rp,) > 3 and It holds that</Paragraph> <Paragraph position="12"/> </Section> <Section position="3" start_page="101" end_page="101" type="sub_section"> <SectionTitle> Related Salient Concepts: </SectionTitle> <Paragraph position="0"> A concept d m considered a related sahent concept for the salient concept c if there m a relationship tel from c to d where the sum of the activation weights of all relationships of type tel from c to d or to subordinates of d m greater than the average activation weight of all active relationships for c If d is determined as a related salient concept for c, then the associated relationship tel becomes a salient relationdeg ship of e Thin criterion combines knowledge about conceptual aggregation and concept haerarchaes with a numerical weights (SRC1) A relationship tel between a sahent concept c and some concept d m considered salient and d is considered a related salient concept flf rpactsve(c, reid) _> 3 and the following holds '</Paragraph> <Paragraph position="2"> In the following, (c) denotes a salient concept c, (c r) a salient relationship r of concept c, and (c r d) denotes a related sahent concept d for concept c with respect to the relationship r</Paragraph> </Section> <Section position="4" start_page="101" end_page="101" type="sub_section"> <SectionTitle> 3.2 Paragraph-Level Topic Descriptions </SectionTitle> <Paragraph position="0"> The condensation operators just introduced are apphed at the end of every paragraph to the text knowledge base which results from parsing that paragraph They yield a set of salmnt concepts, relationships, properties, and related salient concepts In the next step, these raw data are combined to form a compound topic description for that paragraph The combination m performed according to the following rules * A salient concept (c) which m already covered by a salient relationship or property (c rp) or a related salient concept (c r d) is removed s A sahent relationship (c r) already covered by a related salient concept (e r d) is removed After having determined the topic description td of the previous paragraph a cheek is made whether this paragraph deals with the same topic as the immediately preceding paragraph(s), or vice versa If this is the case, the topic description td of the current paragraph is added to the topic description of the precechng paragraph(s), otherwise a new current topic * description is created and set to td Formally (cf also Table 2) Let td be the topic description of the last paragraph and td, be the topic description of one or more paragraphs immediately preceding td, then td, m set to td, Utd If td~ Utd = td~ V tds Utd = td otherwme td, is not modified and td,+i m set to td For example, the following two topic descriptions of adjacent paragraphs would be combined into one {(Notebooster has-part 486SL), (Notepad)}, {(Notebooster has-part)} Analyzing a text this way yields a set of consecutive topic dsscnptlons tdl, ,tdn, each one charactenzmg the topic of one or more adjacent paragraphs To every topic description td, we assomate the corresp0ndmg text passage and the facts acqmred from it We call the resulting compound structure, m which drfferent meclla combine, a (byper)text conststuent</Paragraph> </Section> <Section position="5" start_page="101" end_page="101" type="sub_section"> <SectionTitle> 3.3 The Text Graph </SectionTitle> <Paragraph position="0"> From the topic description contained m a text constituent, more generic constituents can be demved m terms of a hierarchy of toplc descnptlons, forming a text graph The construction of a text graph proceeds from the examination of every palr of basic topic descriptions and takes thelr conceptual commonalitms to generate more generic thematic characterlzatlons Exhaustively applying this procedure (also taking the newly generated topic abstractions scnphons (\ stands for the set complement operator) into consideratxon) results m a text graph as a hierarchy of topic descriptions The most specific descrlphons (they correspond to the text conshtuents) form the leaf nodes of the text graph, the generalized topic descriptions conshtute its non-leaf nodes Their hierarchical organlzahon ylelcls ~fferent levels of granularity of text summanzatmn (see Fig 6) It is exactly thin emergent generallzahon property of tile text graph that we consider the source of our scalabihty arguments Very brief summaries, only intended to capture the mmn topics of the text, can be generated from the upper level of the text graph Continuously deepemng the traversal level of the text graph provides access to more and more specific reformation Our procedure thus combines the potential for supplying summaries on the lndtcahve as well as informative level of text knowledge abstraction (cf (Borko g~ Bermer 75) for the distmchon between mdlcahve and informative abstracting)</Paragraph> </Section> </Section> class="xml-element"></Paper>