File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2108_metho.xml

Size: 19,513 bytes

Last Modified: 2025-10-06 14:07:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2108">
  <Title>Chlstering Verbs Semantically According to their Alternation Behaviour</Title>
  <Section position="4" start_page="747" end_page="749" type="metho">
    <SectionTitle>
2 Automatic Acquisition of
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="747" end_page="749" type="sub_section">
      <SectionTitle>
Semantic Verb Classes
</SectionTitle>
      <Paragraph position="0"> Tile first step was the induction of purely syntactic subcategorisation fi'ames for verbs from the heterogeneous British National CoTpus (BNC). I used the robust statistical head-entity parser as described in (Carroll and Rooth, 1998) which utilises an English context-free grammar and a lexicalised probability model to produce parse forests, and extracted the maximum probability (Viterbi) parses, for a total of 5.5 million sentences. The trees were mapped to subcategorisation frame tokens consisting of a inain verb and its argmnents. Each syntactic category was accompanied by the lexical head, the pret)ositional phrase by the lexical prepositional head plus the head noun of the subordinated noun phrase. Proper names were accompanied by the identifier pn. The head information in the frames was lemmatised. For example, the sentence Samrout handled the plaudits during the awards ceremony would be represented by the frame token handle subj*pn*sammut obj*plaudit pp*during*ceremony.</Paragraph>
      <Paragraph position="1"> To generalise over the verbs' usage of subcategorisation frames, I defined as 88 frame types the most frequent frames which appeared at least 2,000 times in total in the BNC sentence parses, disregarding the lexical head information. On the basis of the frame types I collected information about the joint frequencies of the verbs in the BNC and the subcategorisation frame types they appeared with. These frequency counts then represented the syntactic description of the verbs.</Paragraph>
      <Paragraph position="2"> Tim next step was to refine the subcategorisation frame types by a preferential ordering on conceptual classes for the argument slots in the fl'ames. The basis I could use for the selectional preferences was provided by the lexical heads ill the fi'anm tokens.</Paragraph>
      <Paragraph position="3"> For example, the nouns appearing in the direct object slot of the transitive frame for the verb drink included coffee, milk, beer, indicating a conceptual class like beverage tbr this argument slot.</Paragraph>
      <Paragraph position="4"> I followed (Resnik, 1993)/(Resnik, 1997) who defined selectional preference as the amount of information a verb provides about its semantic argument classes. He utilised the WordNet taxonomy (Beckwith et al., 1991) for a probabilistic model capturing the co-occurrence behaviour of verbs and conceptual classes, where the conceptual classes were identified by WordNet synsets, sets of synonymous nouns within a semantic hierarchy. Referring to the above example, the three nouns coffee, milk, beer are in three different synsets -since they are not synonyms-, but are all subordinated to the synset {beverage, drink, potable}. The goal in this example would therefore be to determine the relevant synset as the most selectionally preferred synset for the direct object slot of the verb drink.</Paragraph>
      <Paragraph position="5"> Redefined fbr iny usage, the selectional preference of a verb v tbr a certain semantic class c within a subcategorisation franm slot s was deternfined by the association ass between verb and semantic class: =des Pl, C, lV~pOg ~ (5) with the probabilities estimated by maxinmnl likelihood: f(v,,</Paragraph>
      <Paragraph position="7"> and the following interpretation:  1. f(v,, c,): number of times a semantic class appeared in a fi'ame slot of a verb's fi'ame type 2. f(v,): frequency of a verb regarding a specific fi'ame type, i.e. the joint Dequency of verb and frame type 3. f(Cs): numl)er of times a semantic class appeared in a fi'ame slot of a frame type disregarding tim verb 4. ~C/'c,~'**,,s f(c'~) equals f(s), the frequency of the  argument slot within a certain frame type, since summing over all possible classes within a subcategorisation fl'ame slot equals the lmlnber of tinms the slot; appeared 5. f(s): uulnber of times the franle type appeared, since the frequency of a. frame type equals the frequency of that frame with a certain slot marked The fi'equencies of a semantic class concerning an argument slot, of a frame type (dependent or independent of a verb) were calculated by all approach slightly difl'erent to Resnik's, originally proposed by (Ribas, 1994)/(Ribas, 1995). For each noun appearing in a certain argument position its fi'equency was divided by the nmnber of senses the noun was assigned by the WordNet hierarchy, t to take account of the uncertainty about the sense of the noun. The fi'action was allocated to each conceptual class in the hierarchy to which the noun belonged and accumulated upwards until a top node was reached. Tile result was a numerical distribution over the Word-Net classes: /(noun) (8) s(c,/-- E  from its context, we do not know whether we are talking about the beverage coffee, the plant coffee or a coffee bean. Thero.fore, a third of the frequency of the noun was assigned to each of the three classes.</Paragraph>
      <Paragraph position="8">  I restricted tlm possible (:onceptual classes within 1;he fl'ames' argmnent slots to 23 Wor(tNet nodes, 2 1;o facilitate generalisation and comI)arison of the verbs' seleetional preference behaviour.</Paragraph>
      <Paragraph position="9"> On the basis of the inforlnation al)out subcategorisation frame types and their arguments' concet)tual classes I clustered 153 verbs from Levin's classitica(;ion. I chose (i) some l)olysemous verbs to investigate how this l)henomenon could be handled 1)y the clustering algorithms, and (ii) high and low frequent verbs to see the intluence of frequency on th(; algorithms: the 1~3 verbs had 226 verb senses which belonged to 30 different semantic classes. D)ur of the verbs were low-Dequeney verbs with a total corpus frequency below 100.</Paragraph>
      <Paragraph position="10"> To cluster the verbs I applied two different algorithms, and each algorithm clustered the verl)s bot, h (h) according to only the syntactic information about tlm subcategorisation frames, and (B) according to the intbrmation at)out the subcategorisation ti'ames including their selectional 1)referelmes. ,. lterative clustering based on a dcfinition by (Ilugh, es, 109/,): In the l)eginning, each vert) represent;ed a singleton cluster. Iteratively, the distances between tim clusters were lneasure(l and the closest chlsters merged togel;her.</Paragraph>
      <Paragraph position="11"> For the rel)resentation of the. verbs, each verl) v was assigned a distribution over the ditfere.nt tyl)es of subcategorisatioll fl'anms i, according 1;o the. maximum likelihood estimate, of (k) the. verb apl)earing with the frame tyl)e: f(v,/,) f(,,,) (9) with f(v,t) the joint fi'equency of verb and frmne type, and f(v) the fl'e(tuency of the verb, and (B) the verb appearing with the frame tyt)e mid a selectionally t)refe.rred (:lass coml)ination C for the m'gmnent t)osil;ions .s in t: i,(~,, ely ) =,,ef p(tl v) * J,(Clv , t) (10) with p(/,lv) defined as in equation (9), and p(C\]v, t) =&amp;/ Ec:6,:l,,.~, \[Isct a.s.s'(v.~, c') (11) which intuitively estimates the probability of a certain class combination by comparing its association value with the sum over all possible class combinations, concerning the respective verb and frame.</Paragraph>
      <Paragraph position="12"> 2I chose l.he 11 tel) level nodes of the 11 WordNet l,ierarchies as conceptual classes. 'Phe top level node Entity seemed too general as concel)tual class, so it was replaced by its 13 sulml'dinal, ed synsets.</Paragraph>
      <Paragraph position="13"> Starting out with each verb representing a singleton cluster, I iteratively determined the two closest chlsters by applying tim information-theoretic measure relative cutropy :~ (Kulll)ack mid Leibler, 1951) to comi)are the distributions. The nearest clusters were merged into one cluster, and their distributions were merged 1)y calculating a weighted average. Based on test runs I defined lleuristics about how many elusl, eriug iteral;ions were pertbrmed. In addition, i liraire(1 the maximum mnuber of verbs within one (:luster to four elements because otherwise the.</Paragraph>
      <Paragraph position="14"> verbs showed the tendency to cluster together in a few large clusters only; so after the over-all clustering process was finished, each cluster with more tlmn four members initialised a fllrther clustering pass on itself.</Paragraph>
      <Paragraph position="15"> Unsupervised latent, class aualysis as described in (l~ooth, 1998), based on the cxpcetation'maximisation al.qorithm: The algorithm identified categori(:al types among indirect, ly observed multinomial distributions 1) 3, apl)lying the EM-algorithm (\])elnpsteret al., 1977) to maximise the joint prol)ability of (h) t;he verb and frmne tyl)e: p(v, t), and (B) the verl) and frame type considering the selectional I)referenees: p(v, t, C).</Paragraph>
      <Paragraph position="16"> \]TUl)Ut to the algorithm were absolute, frequencies of the verl)s at)l)earing with the sul)categorisation frames. Test runs showed that 80 clusters modelled the semantic verl) classes best. To 1)e able to comI)a.re the analysis wit;h the iterative clustering al)proach , I also limited tim numb(~r of verbs wit;hin a (:lus|;er 1;o four considering that; generally all verbs ai)l)ear within each (:lusl;er when using this apl)roach , the verbs wil;h l:he highest l)rol)abilities where chosen.</Paragraph>
      <Paragraph position="17"> D)r version (h) the frequencies were provide.d by the joint frequencies of verbs and frame tyI)es, for version (B) I used the association va.lues of the verbs with tile frame tyl)eS considering seleetional preferences, as described 1)y equation (10).</Paragraph>
      <Paragraph position="18"> The unsupervised algorithm then classified joint events of verbs and subeategoris~tion frmncs with 200 iterations of the EM-algorithm into 80 clusters r, based on the iteratively estimated</Paragraph>
      <Paragraph position="20"> aConcerning the two typical prol)lems one has with this measure, (i) zero frequencies were smoothed 1)y adding 0.5 to all frequencies, and (ii) since the measure is not symmetric, the resl)ective smaller vahm was used as distance.</Paragraph>
      <Paragraph position="22"> for versions (h) and (B), respectively.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="749" end_page="749" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> The evaluation of the resulting clusters was based on Levin's classification. Figures 1 and 2 present the success of the two clustering algorithms, considering tim two difl'erent informational versions (/~) and (B).</Paragraph>
    <Paragraph position="1"> They contain the total mnnber of clusters the algorithms had formed (clusters containing between two and four verbs in the iterative algorithm, and the fixed immber of 80 clusters in the l&amp;l;ent (:lass rarelysis), the prol)ortion of correct clusters (non-singleton clusters which were subsets of a Levin (:lass, for example the cluster conl;aining the verl)s need, like, ,want, desire is a subset of the Levin (:lass Desire), and the numl)er of verbs wMlin those clusters. In figure, 2 the nulnl)er of verbs in brackets rethrs to the respective number of Lheir senses, since a verb could be clustered several times according to its senses.</Paragraph>
    <Paragraph position="2"> For examl)le, the verl) want could t)e meml)er of the (:lasses Desire and Declaration.</Paragraph>
    <Paragraph position="3"> Recall was define(l by the I)ercentage of verbs (verb senses) within the correct clusters compared to the total munber of verbs (verb senses) to be clustered: null</Paragraph>
    <Paragraph position="5"> and precision was defined by the percentage of verbs (verb senses) apl)earing in the correct clusters compared to the numl)er of verbs (verb senses) apl)earing in any cluster:</Paragraph>
    <Paragraph position="7"> Concerning t)recision, the assignntent of verbs into semantic classes was most successfifl when using the il;erative distance clustering method; 61% of all verbs were clustered into correct classes. Clustering the verbs into latent classes was with 54% less successtiff. With both clustering methods the results became worse when adding information about the selectional preferences tbr the arguments in the subcategorisation fl'ames.</Paragraph>
    <Paragraph position="8"> A baseline ext)eriment was performed in order to determine how hard the task of verb clustering was: each verb was randomly assigned another verb as &amp;quot;closest neighbour&amp;quot;, which resulted in only 5% el the, verl)s being paired with a verb Don1 the same Levin (:lass. Performing the same experiment by assigning the closest neighbour on the basis of moasm'ing the relative entropy between two verbs' distributions over subcategorisation fl'ames resulted in 61% of the verbs pointing to a verb flom the same Levin class.</Paragraph>
  </Section>
  <Section position="6" start_page="749" end_page="751" type="metho">
    <SectionTitle>
4 Discussion d
</SectionTitle>
    <Paragraph position="0"> The classitications of both clustering approaches illustrate the close relationship between alternation behaviour and semantic classes, lYor exalnple, the common preferences of verbs (see the tlve most probable frames) ill the iteratively crea.ted Desire (:lass were towards a sul)ject followed by an infinitival phrase (subj :to). Alternatively a l;ransitive subj :obj flame was used, partly followed by an additional infinitival phrase indicated by to: s 4For a more detailed discussion see tile original work (Schulte im Walde, 1998).</Paragraph>
    <Paragraph position="1"> Note that the (wrongly chosen) intransitive fl:ame is listed as well. This is {Ill('. t,o underlying sentences containing an NP ellipsis, parsing mistakes and Dame extraction.</Paragraph>
    <Paragraph position="3"> Adding ilfformation about the selectional preferenees of the verbs' argmnents hell)s to gel; a deeper idea about their lexical semantics. D:)r exalnple, mar~,'n, er of Motion verbs 1)referably appeared with a subject only, sometimes with a following adverl).</Paragraph>
    <Paragraph position="4"> The subject was an inanimate ol)ject, for move it might also be a part (such as a body part like fin_ ger) or a grout), roll and fly alternatively used the transitive frmne type subj :obj, preferal)ly with a living entity as subject, followed by an inanimate  Parallel examples created by the latent class analysis present the clusters with the most probable verbs and frmnes, according to cluster membershi I) (first column). The dot indicates whether the verb-fi'mne combination was seen in the data, the mmtber next to the verb frame gives the probability of the verbfrmne combination.</Paragraph>
    <Paragraph position="5"> Some verbs of Telling were clustered mainly according to their similar transitive use combined with an infiifitival phrase: ~deg_ g o'deg g</Paragraph>
    <Paragraph position="7"> The verl)s of Aspect alternate between a subject only, realised by an action, an inanimate subject followed by an infinitiwfl phrase, and a living subject followed by a gerund: g', ~ g g ClHster o d o o  Both approaches established a relationship between alternation behaviour and semantic class by only considering information about the syntactic usage of the subcategorisation Dames. The refinement by the frames' selectional preferences allowed fllrther demarcations by the identifying (:onceptual restrictions on tile use of the frames.</Paragraph>
    <Paragraph position="8"> Since tim latent class analysis is a soft; clustering method, it additionally distinguishes between the dith;rent verbs' senses and the resl)ective uses of subcategorisation Dames. For example, the verb play was clustered with meet 1)ecause of tile common strong tendency towards a transitive ti&amp;quot;ame illustrating a gen(;ral meeting, and it, was clustered with figh, t t)eemlse of their colnmon preference for an intransitive fi'ame together with a prepositional phrase headed 1)y against, illustrating a more aggressix'(; me.eting like a fight:  An extensive investigation of tile linguistic reliability of the clustered verbs and frames showed that l;he character(sing usages could be under\](ned by corpus data, for example the above cited transitive use  of the verb fly concerning the subj : obj frame type with a living subject and ml inanimate object can be illustrated by the BNC-sentence In March the manufacturer's test pilot flew the aircraft for its annual inspection check flight. The clusters were therefore created on a reliable linguistic basis representing (a selective part of) the verbs' properties.</Paragraph>
    <Paragraph position="9"> Comparing the two informational versions, however, showed that refining the fralnes with selectional preferences points to a problem caused by data sparseness in the verb description. Investigating the automatically created distribution of the verbs over the enriched fl'ame types revealed that, for example, even the high fl'equent, alternating verb move contains 97% (smoothed) zeroes within its distribution. In accordance with this fiuding even subtle similarities, e.g. the sole fact that two verbs have non-zero wflues for certain fl'ame types, highly correlates the two verbs. For example, a semantic cluster contained the two verbs promise and love, because both have non-zero attribute values for the subj :to frame, demmlding an agent for the subject slot; in their alternation behaviour (including selectional preferences) the two verbs differ, however, so they should not be packed into one cluster. A possible suggestion to handle the problem of data sparseness could be to formulate the conceptual class types in a way which ensures an increased data potential for each type.</Paragraph>
    <Paragraph position="10"> Concerning the polysemy of verbs, the (hard) iterative distance clustering failed to model verb senses; a polysemous verb was either not at all assigned to any cluster, or assigned to a cluster describing one of the verb's senses. The (soft) latent (:lass analysis was able to filter the multiple senses and assign them to distinct (:lusters, but tended to split senses. Low-frequency verbs presented another problem, because the verbs' distributions contained mostly zeroes. They were assigned to clusters nearly randomly. null An investigation of selected WordNet conceptual classes revealed that the selectional preferences within the subcategorisation frames were donfinated by a few WordNet classes, mainly LifeForm and Agent. The demarcation between these two concepts was not obvious when referring to actually appearing nouns within the frames, since both contain a large number of common subordinated nouns. In contrast, some WordNet classes were not chosen at all, e.g. Unit or Anticipation. Since the WordNet hierarchy in general had turned out to define intuitively correct seleetional preferences, an improved classification utilised for my conceptual classification should be substituted by finer synsets, i.e. one should consider using a different cut through the WordNet hierarchy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML