File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1086_intro.xml
Size: 10,963 bytes
Last Modified: 2025-10-06 14:05:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1086"> <Title>Inherited Feature-based Similarity Measure Based on Large Semantic Hierarchy and Large Text Corpus</Title> <Section position="3" start_page="0" end_page="508" type="intro"> <SectionTitle> '1 Introduction </SectionTitle> <Paragraph position="0"> Determination of semantic similarity between words is an important component of linguistic tasks ranging from text retrieval and filtering, word sense disambiguation or text matching. In the past five years, this work has evolved in conjunction with the availability of powerful computers and large linguistic resources such as WordNet (Miller,90), the EDR concept dictionary (EDR,93), and large text corpora.</Paragraph> <Paragraph position="1"> Similarity methods can be broadly divided into &quot;relation based&quot; methods which use relations in an ontology to determine similarity and &quot;distribution based&quot; methods which use statistical analysis as the basis of similarity judgements. This article describes a new method of similarity nmtehing, inherited feature based similarity matching (IFSM) which integrates these two approaches.</Paragraph> <Paragraph position="2"> Relation based methods include both depth based and path based measures of similarity.</Paragraph> <Paragraph position="3"> The Most Specific Common Abstraction (MSCA) method compares two concepts based on the taxonomic depth of their common parent; for example, &quot;dolphin&quot; and &quot;human&quot; are more similar than &quot;oak&quot; and &quot;human&quot; because the common concept &quot;mammal&quot; is deeper in the taxonomy than &quot;living thing&quot;.</Paragraph> <Paragraph position="4"> Path-length similarity methods are based on counting the links between nodes in a semantic network. (Rada,89) is a widely adopted approach to such matching and (Sussna,93) combines it with WordNet to do semantic disambiguation.</Paragraph> <Paragraph position="5"> The chief problems with relation-b~sed similarity methods lie in their sensitivity to artifacts in the coding of the ontology, l;or instance, MSCA algorithms are sensitive to the relative deplh and detail of different parts of the concept taxonomy. If one conceptual domain (say plants) is sketchily represented while another conceptual domain (say,, animals) is richly represented, similarity comparisons within the two domains will be incommensurable. A similar problem plagues pathlength based algorithms, causing nodes in richly structured parts of the ontology to be consistently judged less similm&quot; to one another than nodes in shallower or hess complete parts of the ontology.</Paragraph> <Paragraph position="6"> Distribution-based methods are based on the idea that the similarity of words can be derived frorn the similarity of the contexts in which they occur. These methods difl'er most significantly in the way they characterize contexts and the similarity of contexts. Word Space (Schutze,93) uses letter 4-grams to characterize both words and the contexts in which they appear. Similarity is based on 4-grams in common between the contexts. Church and tlanks ('89) uses a word window of set size to characterize the context of a word based on the immediately adjacent words.</Paragraph> <Paragraph position="7"> Other methods include the use of expensive-toderive features such as subject-verb-object (SVO) relations (Hindle,90) or other grammatical relations (Grefenstette,94). These choices are not simply iml)lemelltational but imply ditferent similarity judgements. The chief problem with distribution based methods is that they only permit the formation of first-order concepts definable directly in terms of the original text. Distribution based methods can acquire concepts b~sed on recurring patterns of words but not on recurring patterns of concepts. \[,'or instance, a distributional system could easily identify that an article involves lawyers based on recurring instances of words like &quot;sue&quot; or &quot;court&quot;. But it could not use the oc~ currence of these concepts as conceptual cues for <lewfloping coneel)ts like &quot;lit igadon&quot; or &quot;l)\]eading&quot; in connection with the &quot;lawyer&quot; eoncel)t.</Paragraph> <Paragraph position="8"> One. notable integration of relation t)ased and distrilmtional methods is llesnik's annotation of a relational ontology wil h distributional in fornlalion (l{esnik,95a,95b). \]lesnik inLroduees a &quot;class probability&quot; associated with nodes (synset.s)in WoMNet and uses these to determiue similarity.</Paragraph> <Paragraph position="9"> Given these probabilities, he eOltlptttes tile similarit.y of concepts I+)ased on the &quot;inl'on nation&quot; that wou\](l be necessary to distinguish them, tneasured ttsing iMbrmalion-theoretie calculations+</Paragraph> <Section position="1" start_page="508" end_page="508" type="sub_section"> <SectionTitle> The Feature-based Similarity Measure </SectionTitle> <Paragraph position="0"> The Inherited Feature Similarity Measure (IFSM) is another integrated approach to measuring simila.rity. It uses a semantic knowledge base where concepts are annotated wit\]\] disli<qlli.sbiW\] fi'altu'es and i)ases similarity on (:otnl>aril~.g these sels of feal;ures. In our exl)erime\]tts, we deriw>d the feature sets I) 3, a distJ'ilmtiona\] analysis of +t large t :Ol: l) tiN.</Paragraph> <Paragraph position="1"> Most existing relation-hase(l similarity methods directly use l,he relat:iotl ~Ol)O/ogy of the semantic network to derive similarity, either by strategies like link counting (f~a(la,89) or tim determination of the depth <)f <:otnmon al)slra<:lions (Kolod: net,g9). \[FSM, in eontrasl., uses the I:Ol)O\]Ogy to derive (leseril)lions whose (:omparisotl yields a similarity measure. In l)arti(:ular, it aSSlllnes art Otl- null I:o\[ogy where: I. Each (:once\])l; has a set of features 2. Each concept inherits Features from its get> erMizations (hypernyms) 3. \]!;;u:h concept has one or more &quot;(listinctiw~ features&quot; which are not inherite(l ft:om its hy\])el:nylllS. null Note that we neidter claim nor require t:hat the features eonq>letely charaelerize their (:(mcepts or lhat inh<'.ritan<:e of feal m:es is sound. We only require dlat there I)e some set of feal;ul:es we use for similarity judgcmettts. For instance, a similarity .iudgenle31t betwe(+m a penguin and a rot)in will t)e partially based on the fe++ture &quot;ean-\[ly&quot; assigned to the concel)t bird, ewm though it (toes not apl)ly it~dividually to t)et\]guins.</Paragraph> <Paragraph position="2"> Fig I shows a Siml)le exatnple o\[' a fragment of a (:oncel~ttud taxonomy wiLl~ associated featttres. Inherited features are in italh: while disliuctive llalcnl(< h~vu-chiid:>) falhcl(< male >< \]lave child >) iflothel(< female >< hove-chihl >1 Fig. 1 Fragment ot' c(mccptual taxonomy \[Salutes are in bold. In our model, features have a weight l)ased otl the importance o1' the feature to the eolleel)t.</Paragraph> <Paragraph position="3"> We \[laV(~ chosel\] to alltOlIla, tieally gel\]erate {'eatures (listril)utionally by analyzing a large eOrl)US. We (leseribe lids geueration process below, but we will \[irst ttlrtl to the e\qthlgti()tl of similarity based on feat ural analysis.</Paragraph> </Section> <Section position="2" start_page="508" end_page="508" type="sub_section"> <SectionTitle> 2.1 At)i)roaehes to Featm'e Matelfing </SectionTitle> <Paragraph position="0"> 'l'here are a variety of similarity measures awu\]able for sets of \[;~atm'es, biLL all make their eoml)arisons t)ase(l on some combination of shared \['etltlH;es, disLilleL \['eal ttres, altd sharect ttl)sellL l'ea-. tures (e.g., neither X or Y is red). For example, Tversky ('77) proposes a mode\] (based on huntan similarity judgements) where similarity is a linear combination of shared and distinct features where each f('atm'e is weighted 1)ased on its itnl)ortatme+ 'l'w>rsky's experiment showed the highesl eorrelalion with hunmn subjects ~ feelings when weighted shared and dislinet features are taken into consi(leration. null HI~X'I'ANT ((~reii:nstette,94) introduce(1 the \Veighted 3aeeard Measure which combitms the Jaeeard Measure with weights derive(l froth an inh)rmation theoreti<: anMysis of %ature occurfences+ '\]'he we:ight of a feature is com\[mte(l from a global weight (based on the nmuber of glohal occurrences of the, wor(l or concept) and a \[()(:at weight (based Oil the \['re(lllellcy Of tlt+> Features atlaehed to the word).</Paragraph> <Paragraph position="1"> \]n our (:urrent work. we have adol)te(t the Weighted .laeeard Measure for prelimit,ary ewJtmti(m of otu&quot; al)lJroaeh. 'l'he clistinetiw&quot; feature of our apl):roach is the rise of the ontology I.o (|e+ rive features rather than assuming atomic sets of Rmtures.</Paragraph> </Section> <Section position="3" start_page="508" end_page="508" type="sub_section"> <SectionTitle> 2.2 Properties of IFSM </SectionTitle> <Paragraph position="0"> /u this section we compare IFSM's similarity judgements to those generated by other tneth()<Is. In our diseltssiou, we will consider the simple netwoHC/ o~' Fig 2. We will use 1he expression sim.(ci, cj' ) to denote the similarity of eoncel)ts (h arid e2.</Paragraph> <Paragraph position="1"> Given lhe situation of Fig 2, both MS(L,\ an(t tlesnik's M ISM (Most In formative Sul>stmtor</Paragraph> <Paragraph position="3"> MSCA makes the sitnilarit.y the satile because they have the sante (nearest) eotmnon abstraction CO.</Paragraph> <Paragraph position="4"> MISM holds the similarity Io be the same 1)eeause in higher/lower levels of hierarchy the assertion of C2 adds no information given the assertion of C3. Path-length methods, in contrast, assert sire(C1, C2) < sire(C2, C3) since the number of links between the concepts is quite different. Because IFSM depends on the features derived from the network rather than on the network itself, judgements of similarity depend on the exact features assigned to C1, C2, and C3. Because IFSM assumes that some distinctive features exist for C3, sire(el, 62) and sire(el, C3) are unlikely to be identical. In fact, unless the distinctive features of C3 significantly overlap the distinctive feature of C1, it will be the case that si,~(C1, C2) < si,~(C2, C3). IFSM differs from the path length model because it is sensitive to depth. If we assume a relatively uniform distribution of features, the total number of features increases with depth in the hierarchy. This means that sim(C0,C1) located in higher part of the hierarchy is expected to be less than sim(C2,C3) located in lower part of the hierarchy. null</Paragraph> </Section> </Section> class="xml-element"></Paper>