XML Viewer - p02-1029

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1029_metho.xml
Size: 23,579 bytes
Last Modified: 2025-10-06 14:07:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1029">
  <Title>Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Syntactic Descriptors for Verb Frames
</SectionTitle>
    <Paragraph position="0"> The syntactic subcategorisation frames for German verbs were obtained by unsupervised learning in a statistical grammar framework (Schulte im Walde et al., 2001): a German context-free grammar containing frame-predicting grammar rules and information about lexical heads was trained on 25 million words of a large German newspaper corpus. The lexicalised version of the probabilistic grammar served as source for syntactic descriptors for verb frames (Schulte im Walde, 2002b).</Paragraph>
    <Paragraph position="1"> The verb frame types contain at most three arguments. Possible arguments in the frames are nominative (n), dative (d) and accusative (a) noun phrases, reflexive pronouns (r), prepositional phrases (p), expletive es (x), non-finite clauses (i), finite clauses (s-2 for verb second clauses, s-dass for dass-clauses, s-ob for ob-clauses, s-w for indirect wh-questions), and copula constructions (k). For example, subcategorising a direct (accusative case) object and a non-finite clause would be represented by nai. We defined a total of 38 subcategorisation frame types, according to the verb subcategorisation potential in the German grammar (Helbig and Buscha, 1998), with few further restrictions on argument combination.</Paragraph>
    <Paragraph position="2"> We extracted verb-frame distributions from the trained lexicalised grammar. Table 1 shows an example distribution for the verb glauben 'to  We also created a more delicate version of subcategorisation frames that discriminates between different kinds of pp-arguments. This was done by distributing the frequency mass of prepositional phrase frame types (np, nap, ndp, npr, xp) over the prepositional phrases, according to their frequencies in the corpus. Prepositional phrases are referred to by case and preposition, such as 'Dat.mit', 'Akk.fur'.</Paragraph>
    <Paragraph position="3"> The resulting lexical subcategorisation for reden and the frame type np whose total joint probability is 0.35820, is displayed in Table 2 (for probability values a0 1%).</Paragraph>
    <Paragraph position="4">  The subcategorisation frame descriptions were formally evaluated by comparing the automatically generated verb frames against manual definitions in the German dictionary Duden - Das Stilworterbuch (Dudenredaktion, 2001). The F-score was 65.30% with and 72.05% without prepositional phrase information: the automatically generated data is both easy to produce in large quantities and reliable enough to serve as proxy for human judgement (Schulte im Walde, 2002a).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 German Semantic Verb Classes
</SectionTitle>
    <Paragraph position="0"> Semantic verb classes have been defined for several languages, with dominant examples concerning English (Levin, 1993) and Spanish (Vazquez et al., 2000). The basic linguistic hypothesis underlying the construction of the semantic classes is that verbs in the same class share both meaning components and syntactic behaviour, since the meaning of a verb is supposed to influence its behaviour in the sentence, especially with regard to the choice of its arguments.</Paragraph>
    <Paragraph position="1"> We hand-constructed a concise classification with 14 semantic verb classes for 57 German verbs before we initiated any clustering experiments. We have on hand a larger set of verbs and a more elaborate classification, but choose to work on the smaller set for the moment, since an important component of our research program is an informative post-hoc analysis which becomes infeasible with larger datasets. The semantic aspects and majority of verbs are closely related to Levin's English classes. They are consistent with the German verb classification in (Schumacher, 1986) as far as the relevant verbs appear in his less extensive semantic 'fields'.</Paragraph>
    <Paragraph position="2">  1. Aspect: anfangen, aufhoren, beenden, beginnen, enden 2. Propositional Attitude: ahnen, denken, glauben, vermuten, wissen 3. Transfer of Possession (Obtaining): bekommen, erhalten, erlangen, kriegen 4. Transfer of Possession (Supply): bringen, liefern, schicken, vermitteln, zustellen 5. Manner of Motion: fahren, fliegen, rudern, segeln 6. Emotion: argern, freuen 7. Announcement: ankundigen, bekanntgeben, eroffnen, verkunden 8. Description: beschreiben, charakterisieren, darstellen, interpretieren 9. Insistence: beharren, bestehen, insistieren, pochen 10. Position: liegen, sitzen, stehen 11. Support: dienen, folgen, helfen, unterstutzen 12. Opening: offnen, schliessen 13. Consumption: essen, konsumieren, lesen, saufen, trinken 14. Weather: blitzen, donnern, dammern, nieseln,  regnen, schneien The class size is between 2 and 6, no verb appears in more than one class. For some verbs this is something of an oversimplification; for example, the verb bestehen is assigned to verbs of insistence, but it also has a salient sense more related to existence. Similarly, schliessen is recorded under open/close, in spite of the fact it also has a meaning related to inference and the formation of conclusions. The classes include both high and low frequency verbs, because we wanted to make sure that our clustering technology was exercised in both data-rich and data-poor situations. The corpus frequencies range from 8 to 31,710.</Paragraph>
    <Paragraph position="3"> Our target classification is based on semantic intuitions, not on our knowledge of the syntactic behaviour. As an extreme example, the semantic class Support contains the verb unterstutzen, which syntactically requires a direct object, together with the three verbs dienen, folgen, helfen which dominantly subcategorise an indirect object. In what follows we will show that the semantic classification is largely recoverable from the patterns of verb-frame occurrence. null</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Clustering Methodology
</SectionTitle>
    <Paragraph position="0"> Clustering is a standard procedure in multivariate data analysis. It is designed to uncover an inherent natural structure of the data objects, and the equivalence classes induced by the clusters provide a means for generalising over these objects. In our case, clustering is realised on verbs: the data objects are represented by verbs, and the data features for describing the objects are realised by a probability distribution over syntactic verb frame descriptions.</Paragraph>
    <Paragraph position="1"> Clustering is applicable to a variety of areas in Natural Language Processing, e.g. by utilising class type descriptions such as in machine translation (Dorr, 1997), word sense disambiguation (Dorr and Jones, 1996), and document classification (Klavans and Kan, 1998), or by applying clusters for smoothing such as in machine translation (Prescher et al., 2000), or probabilistic grammars (Riezler et al., 2000).</Paragraph>
    <Paragraph position="2"> We performed clustering by the k-Means algorithm as proposed by (Forgy, 1965), which is an unsupervised hard clustering method assigning a0 data objects to exactly a1 clusters. Initial verb clusters are iteratively re-organised by assigning each verb to its closest cluster (centroid) and re-calculating cluster centroids until no further changes take place.</Paragraph>
    <Paragraph position="3"> One parameter of the clustering process is the distance measure used. Standard choices include the cosine, Euclidean distance, Manhattan metric, and variants of the Kullback-Leibler (KL) divergence. We concentrated on two variants of KL in Equation (1): information radius, cf. Equation (2), and skew divergence, recently shown as an effective measure for distributional similarity (Lee, 2001), cf.</Paragraph>
    <Paragraph position="4"> Equation (3).</Paragraph>
    <Paragraph position="6"> Measures (2) and (3) can tolerate zero values in the probability distribution, because they work with a weighted average of the two distributions compared.</Paragraph>
    <Paragraph position="7"> For the skew-divergence, we set the weight a47 to 0.9, as was done by Lee.</Paragraph>
    <Paragraph position="8"> Furthermore, because the k-Means algorithm is sensitive to its starting clusters, we explored the option of initialising the cluster centres based on other clustering algorithms. We performed agglomerative hierarchical clustering on the verbs which first assigns each verb to its own cluster and then iteratively determines the two closest clusters and merges them, until the specified number of clusters is left. We tried several amalgamation methods: single-linkage, complete-linkage, average verb distance, distance between cluster centroids, and Ward's method.</Paragraph>
    <Paragraph position="9"> The clustering was performed as follows: the 57 verbs were associated with probability distributions over frame types1 (in condition 1 there were 38 frame types, while in the more delicate condition 2 there were 171, with a concomitant increase in data sparseness), and assigned to starting clusters (randomly or by hierarchical clustering). The k-Means algorithm was then allowed to run for as many iterations as it takes to reach a fixed point, and the resulting clusters were interpreted and evaluated against the manual classes.</Paragraph>
    <Paragraph position="10"> Related work on English verb classification or clustering utilised supervised learning by decision trees (Stevenson and Merlo, 1999), or a method related to hierarchical clustering (Schulte im Walde, 2000).</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Clustering Evaluation
</SectionTitle>
    <Paragraph position="0"> The task of evaluating the result of a cluster analysis against the known gold standard of hand-constructed verb classes requires us to assess the similarity between two sets of equivalence relations. As noted by (Strehl et al., 2000), it is useful to have an evaluation measure that does not depend on the choice of similarity measure or on the original dimensionality of the input data, since that allows meaningful comparison of results for which these parameters vary. This is similar to the perspective of (Vilain et al., 1995), who present, in the context of the MUC co-reference evaluation scheme, a model-theoretic measure of the similarity between equivalence classes.</Paragraph>
    <Paragraph position="1"> Strehl et al. consider a clustering a0 that partitions  probabilities, such as frequencies and binarisation, but none proved as effective as the probabilities.</Paragraph>
    <Paragraph position="2"> We call the cluster result a22 and the desired gold-standard a23 . For measuring the quality of an individual cluster, the cluster purity of each cluster a22  that are projected into the same class a23 a25 .</Paragraph>
    <Paragraph position="3"> The measure is biased towards small clusters, with the extreme case of singleton clusters, which is an undesired property for our (linguistic) needs.</Paragraph>
    <Paragraph position="4"> To capture the quality of a whole clustering, Strehl et al. combine the mutual information between a22 and a23 (based on the shared verb membership a22a26a23 a30 a12 ) with a scaling factor corresponding to the numbers of verbs in the respective clusters, a22a22</Paragraph>
    <Paragraph position="6"> This manipulation is designed to remove the bias towards small clusters:2 using the 57 verbs from our study we generated 50 random clusters for each cluster size between 1 and 57, and evaluated the results against the gold standard, returning the best result for each replication. We found that even using the scaling factor the measure favours smaller clusters. But this bias is strongest at the extremes of the range, and does not appear to impact too heavily on our results.</Paragraph>
    <Paragraph position="7"> Unfortunately none of Strehl et al's measures have all the properties which we intuitively require from a measure of linguistic cluster quality. For example, if we restrict attention to the case in which all verbs in an inferred cluster are drawn from the same actual class, we would like it to be the case that the evaluation measure is a monotonically increasing function of the size of the inferred cluster. We therefore introduced an additional, more suitable measure for the evaluation of individual clusters, based on the representation of equivalence classes as sets of pairs.</Paragraph>
    <Paragraph position="8"> It turns out that pairwise precision and recall have some of the counter-intuitive properties that we objected to in Strehl et al's measures, so we adjust pair-wise precision with a scaling factor based on the size 2In the absence of the penalty, mutual information would attain its maximum (which is the entropy of a39 ) not only when A is correct but also when a40 contains only singleton clusters. of the hypothesised cluster.</Paragraph>
    <Paragraph position="10"> We call this measure a22a3a0a4a0 , for adjusted pairwise precision. As with any other measure of individual cluster quality we can associate a quality value with a clustering a0 which assigns each of the items a1a9a12 to a cluster a0a15a14a1 a12a17a16 by taking a weighted average over the qualities of the individual clusters.</Paragraph>
    <Paragraph position="12"> Figures 1 and 2 summarise the two evaluation measures for overall cluster quality, showing the variation with the KL-based distance measures and with different strategies for seeding the initial cluster centres in the k-Means algorithm. Figure 1 displays quality scores referring to the coarse condition 1 subcategorisation frame types, Figure 2 refers to the clustering results obtained by verb descriptions based on the more delicate condition 2 subcategorisation frame types including PP information. Base-line values are 0.017 (APP) and 0.229 (MI), calculated as average on the evaluation of 10 random clusters. Optimum values, as calculated on the manual classification, are 0.291 (APP) and 0.493 (MI). The evaluation function is extremely non-linear, which leads to a severe loss of quality with the first few clustering mistakes, but does not penalise later mistakes to the same extent.</Paragraph>
    <Paragraph position="13"> From the methodological point of view, the clustering evaluation gave interesting insights into k-Means' behaviour on the syntactic frame data. The more delicate verb-frame classification, i.e. the refinement of the syntactic verb frame descriptions by prepositional phrase specification, improved the clustering results. This does not go without saying: there was potential for a sparse data problem, since even frequent verbs can only be expected to inhabit a few frames. For example, the verb anfangen with a corpus frequency of 2,554 has zero counts for 138 of the 171 frames. Whether the improvement really matters in an application task is left to further research. null We found that randomised starting clusters usually give better results than initialisation from a hierarchical clustering. Hierarchies imposing a strong structure on the clustering (such as single-linkage: the output clusterings contain few very large and many singleton clusters) are hardly improved by k-Means. Their evaluation results are noticeably below those for random clusters. But initialisation using Ward's method, which produces tighter clusters and a narrower range of cluster sizes does outperform random cluster initialisation. Presumably the issue is that the other hierarchical clustering methods place k-Means in a local minimum from which it cannot escape, and that uniformly shaped cluster initialisation gives k-Means a better chance of avoiding local minima, even with a high degree of perturbation. null</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Linguistic Investigation
</SectionTitle>
    <Paragraph position="0"> The clustering setup, proceeding and results provide a basis for a linguistic investigation concerning the German verbs, their syntactic properties and semantic classification.</Paragraph>
    <Paragraph position="1"> The following clustering result is an intuitively plausible semantic verb classification, accompanied by the cluster quality scores a22a3a0a2a0 , and class labels illustrating the majority vote of the verbs in the cluster.3 The cluster analysis was obtained by running k-Means on a random cluster initialisation, with information radius as distance measure; the verb description contained condition 2 subcategorisation frame types with PP information.</Paragraph>
    <Paragraph position="2"> a) ahnen, vermuten, wissen (0.75) Propositional Attitude b) denken, glauben (0.33) Propositional Attitude c) anfangen, aufhoren, beginnen, beharren, enden, insistieren, rudern (0.88) Aspect d) liegen, sitzen, stehen (0.75) Position e) dienen, folgen, helfen (0.75) Support f) nieseln, regnen, schneien (0.75) Weather g) dammern (0.00) Weather h) blitzen, donnern, segeln (0.25) Weather i) bestehen, fahren, fliegen, pochen (0.4) Insisting or Manner of Motion j) freuen, argern (0.33) Emotion k) essen, konsumieren, saufen, trinken, verkunden (1.00) Consumption l) bringen, eroffnen, lesen, liefern, schicken, schliessen, vermitteln, offnen (0.78) Supply 3Verbs that are part of the majority are shown in bold face, others in plain text. Where there is no clear majority, both class labels are given.</Paragraph>
    <Paragraph position="3"> k-Means cluster centre initialisation distance evaluation random hierarchical single complete average centroid ward  m) ankundigen, beenden, bekanntgeben, bekommen, beschreiben, charakterisieren, darstellen, erhalten, erlangen, interpretieren, kriegen, unterstutzen (1.00) Description and Obtaining n) zustellen (0.00) Supply We compared the clustering to the gold standard and examined the underlying verb frame distributions. We undertook a series of post-hoc cluster analyses to explore the influence of specific frames and frame groups on the formation of verb classes, such as: what is the difference in the clustering result (on the same starting clusters) if we deleted all frame types containing an expletive es (frame types including x)? Space limitations allow us only a few insights.</Paragraph>
    <Paragraph position="4"> a0 Clusters (a) and (b) are pure sub-classes of the semantic verb class Propositional Attitude. The verbs agree in their syntactic subcategorisation of a direct object (na) and finite clauses (ns-2, ns-dass); denken and glauben are assigned to a different cluster, because they also appear as intransitives, subcategorise the prepositional phrase Akk.an, and show especially strong probabilities for ns-2. Deleting na or frames containing s from the verb description destroys the coherent clusters.</Paragraph>
    <Paragraph position="5"> a0 Cluster (c) contains two sub-classes from Aspect and Insistence, polluted by the verb rudern 'to row'. All Aspect verbs show a 50% preference for an intransitive usage, and a minor 20% preference for the subcategorisation of non-finite clauses. By mistake, the infrequent verb rudern (corpus frequency 49) shows a similar preference for ni in its frame distribution and therefore appears within the same cluster as the Aspect verbs. The frame confusion has been caused by parsing mistakes for the infrequent verb; niis not among the frames possibly subcategorised by rudern.</Paragraph>
    <Paragraph position="6"> Even though the verbs beharren and insistieren have characteristic frames np:Dat.auf and ns-2, they share an affinity for n with the aspect verbs. When eliminating n from the feature description of the verbs, the cluster is reduced to those verbs using ni.</Paragraph>
    <Paragraph position="7"> a0 Cluster (d) is correct: Position. The syntactic usage of the three verbs is rather individual with strong probabilities for n, np:Dat.auf and np:Dat.in. Even the elimination of any of the three frame features does not cause a separation of the verbs in the clustering.</Paragraph>
    <Paragraph position="8"> a0 Cluster (j) represents the semantic class Emotion which, in German, has a highly characteristic signature in its strong association with reflexive frames; the cluster evaporates if we remove the distinctions made in the r feature group.</Paragraph>
    <Paragraph position="9"> a0 zustellen in cluster (n) represents a singleton because of its extraordinarily strong preference ( a0 50%) for the ditransitive usage. Eliminating the frame from the verb description assigns zustellen to the same cluster as the other verbs of Transfer of Possession (Supply).</Paragraph>
    <Paragraph position="10"> Recall that we used two different sets of syntactic frames, the second of which makes more delicate distinctions in the area of prepositional phrases. As pointed out in Section 5, refining the syntactic verb information by PPs was helpful for the semantic clustering. But, contrary to our original intuitions, the detailed prepositional phrase information is less useful in the clustering of verbs with obligatory PP arguments than in the clustering of verbs where the PPs are optional; we performed a first test on the role of PP information: eliminating all PP information from the verb descriptions (not only the delicate PP information in condition 2, but also PP argument information in the coarse condition 1 frames) produced obvious deficiencies in most of the semantic classes, among them Weather and Support, whose verbs do not require PPs as arguments. A second test confirmed the finding: we augmented our coarse-grained verb frame repertoire with a much reduced set of PPs, those commonly assumed as argument PPs. This provides some but not all of the PP information in condition 2. The clustering result is deficient mainly in its classification of the verbs of Propositional Attitude, Support, Opening, and few of these subcategorise for PPs.</Paragraph>
    <Paragraph position="11"> Clusters such as (k) to (l) suggest directions in which it might be desirable to subdivide the verb frames, for example by adding a limited amount of information about selectional preferences. Previous work has shown that sparse data issues preclude across the board incorporation of selectional information (Schulte im Walde, 2000), but a rough distinction such as physical object vs. abstraction on the direct object slot could, for example, help to split verkunden from the other verbs in cluster (k).</Paragraph>
    <Paragraph position="12"> The linguistic investigation gives some insight into the reasons for the success of our (rather simple) clustering technique. We successfully exploited the connection between the syntactic behaviour of a verb and its meaning components. The clustering result shows a good match to the manually defined semantic verb classes, and in many cases it is clear which of and how the frames are influential in the creation of which clusters. We showed that we acquired implicit components of meaning through a syntactic extraction from a corpus, since the semantic verb classes are strongly related to the patterns in the syntactic descriptors. Everything in this study suggests that the move to larger datasets is an appropriate next move.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML