File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1002_metho.xml

Size: 18,492 bytes

Last Modified: 2025-10-06 14:07:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1002">
  <Title>Learning Word Clusters from Data Types</Title>
  <Section position="3" start_page="0" end_page="10" type="metho">
    <SectionTitle>
2 The approach
</SectionTitle>
    <Paragraph position="0"> Ih~re we illustrate a (lifl'er(mt; at)l)roa('h t() a(&gt; (is|ring lexico semant;i(&amp;quot; (:lasses from sy~,ta('t;i(:ally lo(-al ('ontexts. Like the family of sto(:hastie metho(ts of se(:tion 1, we make use of ~t simibu:ity ntel;ric 1)ased on sul)stitui;ability ill (ve, rb,noun,flntction) tril)les. We also share the assumption that lexi(:o semantic classes are inherently multidimensional, as they heavily depend on cxis|;ence of a perspectivizing factor.</Paragraph>
    <Paragraph position="1"> Yet, we depart from other assmnt)tions. Classification of verl)s an(l n(mns is asymmetric: two IIOIIIIS {Ll'e similar if (;hey collo(:ate with as many semantically diverse vcrl)s as possible in as many (tifli:rent syntactic contexts as l)ossit)le.</Paragraph>
    <Paragraph position="2"> The converse apl)lies to verbs. In other words, semantic similarity of nom:s is not conditional on the similarity of their ac(:oml)anying verbs, and vicevcrsa. In a sells(',, classification brc, aks th, c symmetry: maximization of the silnilarity of nouns (verbs) may cause minimization of the similarity of i;heir accoml)anying verbs (nouns).</Paragraph>
    <Paragraph position="3"> A (:lass where a maximum of noun similarity correlates with a lni~ximmn of verb similarity cm~ be uninforniative, as exeml)litied above by the ease of t)oorly selective verbs.</Paragraph>
    <Paragraph position="4"> Secondly, we assmne (following Fodor 1998) that the number of t)erst)ectivizing factors governing lexieal selection may have the order of magnitude of the lexicon itself. The use of global semantic dimensions may smooth out lexical t)refcrences. This is hardly what we need to semantically annotate lexical l)reti;rences. A more conservative al)proa(:h to the t)rol)lem, inducing local semantic ('lasses, (:an (:oml)ine al)1)licability to real language 1)recessing l)roblen~s with the fln'l;lmr t)omls of exploring a relatively mmharted territory.</Paragraph>
    <Paragraph position="5"> Thirdly, p(vi, nj) ai)t)ears to be too sensitive to changes in text genre, tol)ic lind domain to be eXl)ectcd to converge reliably. We prefer to ground a similarity metric oIx measuring the correlation among verb noun, type, s ratlw, r than |eke, as, tbr two basic reasons: i) verl) noun types arc (tis(:retc, ;m(t less l)rone t;o random variation in it (parsed) (:orpus, ii) verl) noun tyl)eS (:ml reliably l)e a(:(tuir(xl from highly intbrm~ttive trot hardly redundant knowledge sources such as h~xi(:a an(1 encyclot)aedias.</Paragraph>
    <Paragraph position="6"> Finally, our information refit tbr measuring wor(t similarity is not a (:Oul)le of context sharing pairs (e.g. (set, sta.ndard,obj) and (sct, re.cord,obj)) but a qv, ad'r'uplc, of such context;s, tbrme(t 1)y (:ombining two verbs with two .(,.))s (,.,.</Paragraph>
    <Paragraph position="7"> a.d ), such that they enter an av, ah)gical proportion.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The analogical proportion
</SectionTitle>
      <Paragraph position="0"> In the t)resenl; conl;ext, an anah)gieal prot)ortion (hereafl;er AP) is a quadrui)le of flmctionally mmotated t)airs resulting from tile combination of any two ltOllllS 'l~, i and ~3.j wit;h any two verbs v/,. and vt su(:h as (2) holds:</Paragraph>
      <Paragraph position="2"> where terms along the two diagonals can sw~p t)lace in the 1)rot)ortion , and identity of subs(:ript indi(:ates identity of wflues. Three aspects of (2) are worth eml)hasizing in this context.</Paragraph>
      <Paragraph position="3"> First, it does not require that the stone syntactic time|ion hold 1)etween all pairs, but only that time|ions be pairwisc identical. Moreover, (2) does not cover all possible syntactic contexts where hi, uj, &amp;quot;vk and vt may coral)|he, but only th, ose where verb and .function values co-vary.</Paragraph>
      <Paragraph position="4"> (.set, standard, obj) : (,set, record, obj) = (meet, standard, obj) : (,,,.tet:,record, x) (3) We call this constraint tile &amp;quot;same-verb-same flmction&amp;quot; principle. As we will see in section 2.3, the principle has important consequences on tile sort of similarity induced by (2). Finally, if one uses subscripts as tbrmal constraints on type identity, then any term can be derived from (2) if the values of all other terms are known. For example given tile partially instantiated propof tion in (3), the last term is filled in unambiguously by substituting x = fn = obj.</Paragraph>
      <Paragraph position="5"> AP is an important generalization of the inter-substitutability assumption, as it extends tile assumption to cases of flmctionally heterogeneous verb-nonn pairs. Intuitively, an AP says that, for two nouns to be taken as systematically similar, one has to be ready to 'ase th, cm interchangeably in at lea,st two different local contexts. This is where the inferential and the classificatory perspectives meet.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Mathematical background
</SectionTitle>
      <Paragraph position="0"> We gave reasons for defining the similarity metric as a flmction of verb-noun type correlation rather than verb noun token correlation. In this section we sketch the mathematical framework underlying this assumption, to show that, tbr a set of verb nonn pairs with a unique syntactic function, AP is the smallest C that satisfies eq.(1).</Paragraph>
      <Paragraph position="1"> Eq.(1) says that vi and nj are conditionally independent given C, meaning that their correlation only det)ends on the probability of their belonging to C, as tbrmally restated in eq.(4).</Paragraph>
      <Paragraph position="2"> p(n, vlC) = p(nlC)p(vlC) (4) In passing from token to type frequency, we assume that a projection operator simply assigns a mfitbrm type probability to each event (pair) with a nonzero token probability in the training corpus. From a learning perspective, this corresponds to the assumption that an infinite memory filters out events already seen during training. The type probability pT(n,v) is defined as in eq.(5), where Np is the number of different pairs attested in the training corpus.</Paragraph>
      <Paragraph position="4"> By eq.(4), pT(n, vlC ) C/ 0 if and only if pT(nlC) ~ 0 and pT(vlC) ~ O. This amounts to saying that all verbs in C are freely interchangeable in the context of all nouns in C, and viceversa. We will hereafter refer to C as a substitutability island ( SI). AP can accordingly be looked at as the minimal SI.</Paragraph>
      <Paragraph position="5"> The strength of correlation of nouns and verbs in each SI can be measured as a summation over the strength of all APs where they enter. Formally, one can define a correlation score or(v, n) as the probability of v and n being attested in a pair. This can be derived from our definition of pr(v,n), as shown in eq.(6), by substituting pT(n,v) = pr(v)pT(nlv) and pT(nlv ) = 1/w(v), where w(a) is tile type fi'equency of a (i.e. number of different attested pairs containing a).</Paragraph>
      <Paragraph position="7"> By the same token, the correlation flmction c7(AP) relative to the 4 possit)le pairs in AP is calculated as</Paragraph>
      <Paragraph position="9"> Eq.(8) captures the intuition that the correlation score between verbs and nouns in AP is an inverse function of their type frequency.</Paragraph>
      <Paragraph position="10"> Nouns and verbs with high type frequency occur in many different pairs: the less selective they are, the smaller their semantic contribution to cr(AP).</Paragraph>
      <Paragraph position="11"> Our preference tbr a(AP) over el(v, n) underlies the definition of correlation score of SI given in eq.(9) (see also section 4).</Paragraph>
      <Paragraph position="12"> -_- Z (9) APESI</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="10" type="sub_section">
      <SectionTitle>
2.3 Breaking the symmetry
</SectionTitle>
      <Paragraph position="0"> In section 2.2 we assumed, for the sake of simplicity, that verbs and nouns are possibly related through one syntactic function only. In a  proportion like (2), however, l;he syntactic timetion is allowed to wtry. Nonetheless each rclal;ed S\] contains nouns which always combine with a given verb with one and the .same syntactic ./:unction. Clearly, the Sallle is no|; true of verbs. Suppose that an S1 contains two verbs vk and vt (say drive and pierce) and two nouns ni and nj (say nail and peg) that ~re respecl;ively el)jeer and subject of 'vl~ and yr. The type of similarity in the resulting n(mn and verb clusters is of a completely ditti;rent nature: in the case of n(mns, we acquire dist'rib,utionally parallel words (e.g. nail and peg); in the case of verbs, we get distrib'utionaIly correlated words (say drive and pierce) which are not interchangeable in the same conl;exl;. Mixing the two types of distributional similarity in the same class makes little sense. Hereafter, we will aim at maximizing the similarity of disl;ributionally parallel nouns. In doing so, we will use functionally hel;erogencous contexts as in (2). This breaks classitication symlne|,ry, and there is no guarantee I;hal; semantically coher(mt verb clust;ers be rcl;m'ncd.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="10" end_page="12" type="metho">
    <SectionTitle>
3 The method
</SectionTitle>
    <Paragraph position="0"> The section illusl;ratcs an at)plication of the principles of section 2 1;() 1;t5o task of clustering the set of object;s ot! a vo, rl) on t;he basis of a repository of flmctionally mmol;al;ed cont(;xts.</Paragraph>
    <Section position="1" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
3.1 The knowledge base
</SectionTitle>
      <Paragraph position="0"> The training evidence is a Knowledge. Base (KB) of flm('tionally anno|;ated verb noun 1)airs, ins|;mltiating a wide rallg~e of syntactic relations: null</Paragraph>
      <Paragraph position="2"> c) verb prepositional_complement, e.g. (recappare, probIcma, in,) 'run_into-problenl'.</Paragraph>
      <Paragraph position="3"> The KaY contains 43,000 pair types, automatically extracted from different knowledge sources: dictionaries, both bilingual and mono-lingual (Montemagni 1995), and a corpus of tinancial newspapers (Federici st al. 1998). The two sources rettect two ditt'erent modes of lexical usage: dictionaries give typical examples of use of a word, and rmming corpora attest actual usage of words in specific enfl)edding domains. These differences have all impact on the typology of senses which the two sources 1)rovi(le evidence for. General dictionaries testitly all possible senses of a given word; tyl)ical word collocates acquired from dictionaries tend to cover the entire range of possible senses of a headword. On the other hand, unrestricted texts reIlect actual usage and possibly bear withess to senses which are relevant to a specific domain only.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="11" type="sub_section">
      <SectionTitle>
3.2 The input words
</SectionTitle>
      <Paragraph position="0"> There is abundant psycholinguistic evidence that semantic similarity between words is eminently conI;exl; sensitive (Miller and Charles 1991). Moreover, in many language processing tasks, word similarity is typically judged relative to an actual context, as in the cases of syntactic disambiguation (both structural and fulwtional), word sense disambiguation, and selection of the contextually approt)riate transla1;ion equiwflent of a word given its neighl)ouring words. Finally, close examination of real data shows that (titl'erellt word senses select classes of complements according to different dilnensions of semantic similarity. This is so pervasive, that it soon be(:omes impossit)le to t)rovide an efl'eclive account of these, dimensions independently of the sense in question.</Paragraph>
      <Paragraph position="1"> Ewduation of botll accuracy and usability of any autontatic classitication of words into semantic clusters cammt lint artificially elude th(; 1)asic question &amp;quot;similar in what respc, ct;?&amp;quot;. Our choice of input words retlects these concerns.</Paragraph>
      <Paragraph position="2"> \Y=e automatically clustered the set of objects of a given verb, as they arc attested in a test col pus. This yields local lexico seman{;ic classes, i.e. conditional on the selected verb head, as opposed to global classes, i.e. built once and tbr all to accomlt tbr the collocates of any verb.</Paragraph>
      <Paragraph position="3"> Among the practical advantages of local classitication we should at least mention the following two. Choice of a verb head as a perspectivizing factor considerably reduces the possibility that the same polysemous object collocate is used in different senses with the same verb.</Paragraph>
      <Paragraph position="4"> Fnrthermore, the resulting clusters can give intormal;ion about the senses, or meaning facets, of the verb head.</Paragraph>
      <Paragraph position="5"> a.a Identification and ranking of noun clusters For the sake of concreteness, let us consider the tbllowing object-collocates of the Italian verb  relative to the collocates of the causare 'cause', as they are found in a test corpus: null appcsantimento 'increase in weight', crescita 'growth', flessionc 'decrease', guaio 'trouble', p~vblcma 'prol)lem', rialzo 'rise', ridimensionamcnto 'reduction', ritardo 'delay', turbolenza 'turbulence'.</Paragraph>
      <Paragraph position="6"> Clustering these input words requires preliminary identification of Substitutability Islands (Sis). An example of SI is the quadruple tbrmed by the verb pair causare 'cause' and ineapparc 'rnn into' and the noun pair guaio 'trouble' and problema 'problem', where menfl)ers of the same pair are inter-substitutable in context, give:: the constraints entbrced by the AP type in (2). Note that guaio and problema are objects of eausare, and prepositional complements (headed by in 'in') of incappare. This makes it possible to maximize the sinfilarity of trouble and problem across fimctionally heterogeneous contexts.</Paragraph>
      <Paragraph position="7"> Bigger Sis than the one just shown will form as many APs as there are quadruples of col:textually interchangeable nouns and verbs. We consider a lexico-semantic cluster of nouns the projection of an SI onto the set of nouns. Fig.1 illustrates a sample of noun clusters (between curly brackets) projected from a set of Sis, together with a list of the verbs tbund in the same Sis (the suffix 'S' stands tbr subject, and 'O' for object). Due to the asymmetry of classification, verbs in Sis are not taken to tbrm part of a lexico-semantic cluster in the same sense as nonns are.</Paragraph>
      <Paragraph position="8">  Not all projected noun clusters exhibit the same degree of semantic coherence. Intuitively, the cluster {appesantimento crescita flessione riaIzo} 'increase in weight, growth, decrease, rise' is semantically more appeMing tlmn the cluster {crescita problema} 'growth problem' (Fig.l).</Paragraph>
      <Paragraph position="9"> A quantitative measure of the sen:antic cohesion of a noun cluster CN is give:: by the con'elation score ~(SI) of the SI of which UN is a projection. In Fig.2 noun clusters are ranked by decreasing vahms of cr(SI), calculated according to eq.(9).</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="12" type="sub_section">
      <SectionTitle>
3.4 Centroid identification
</SectionTitle>
      <Paragraph position="0"> Noun clusters of Figs.1 and 2 are admittedly considerably fine grained. A coarser grain can be attained trivially through set union of inte:'secting clusters. In fact, what we want to obtain is a set of mazimally orthogonal and semantically coherent noun classes, under the assumption that these (:lasses highly correlate with the principal meaning components of the verb head of which input nouns are objects.</Paragraph>
      <Paragraph position="1"> In the algorithm ewfluated here this is achieved in two steps: i) first, we select the best possible centroids of the prospective classes among the noun clusters of Fig.2; secondly, ii) we lmnp outstanding clusters (i.e. clusters which have not been selected in step i)) around the identified centroids. In what tbllows, we will only focus on step i). Results and evaluation of step ii) are reported in (Allegrini et al. 2000).</Paragraph>
      <Paragraph position="2"> In step i) we assume that centroids are disjunctively defined, maximally coherent classes; hence, there exists no pair of intersecting centroids. The best possible selection of centroids will include non intersecting clusters with the highest possible cumulative score. In practice, the best centroid corresponds to the  cluster with the topmost cr(SI). The second best centroid is the cluster with the second highest o-(SI) and no intersection with the first; centroid, and so on (the i-th centroid is 1;11(', i.-th highest chlster with no intersection with the tirst i - 1. centroids) until all clusters in the rank are used Ill). Clusters selected as centroids in the causare example above ~tre: {GUAIO PROBLEMA}, {RIDIMENSIONAMENTO RITARDO}, {CaP.SCITA FL~.SS-rONE}.</Paragraph>
      <Paragraph position="3"> Clearly, this is not the only t)ossible strategy t'or centroid selection, lint certainly a suitabh; one giv(;n our assulnl)tions mM goals. To stun Ul) , the targeted classification is local, i.('., conditional on a specific verl~ head, and orthogonal , i.c. it aims at identifying maximally disjulmtive classes with high correlation with the principal meaning ('Oral)orients of the vert) head. This strategy h;ads to identificalfion of the different senses, or possibly me,ruing facets, of a vert).</Paragraph>
      <Paragraph position="4"> In tlteir turn, noun clusters may capture sul)tle semm~tic distinctions. For instance,, a distinction is made between incremental eveni;s or results of incrt'anental e, vents, which 1)resut)l)OSe ~ scalar dimension (as in the ease of {cre,,s'cita ,/lcssione} 'growth, decrease') and re, scheduling eve, nts, where a change occurs with respect to a previously planned event or object (see the centroid {ridimensiov, amento ritardo} :reduction debw' ).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML