File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1016_concl.xml
Size: 2,220 bytes
Last Modified: 2025-10-06 13:53:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1016"> <Title>Spectral Clustering for German Verbs</Title> <Section position="8" start_page="2" end_page="2" type="concl"> <SectionTitle> 7 Conclusions </SectionTitle> <Paragraph position="0"> We have described the application to natural language data of a spectral clustering technique (Ng et al., 2002) closely related to kernel PCA (Christianini et al., 2002). We have presented evidence that the dimensionality reduction involved in the clustering technique can give k-Means a robustness that it does not display in direct use. The solutions found by the spectral clustering are always at least as well-aligned with the distance measure as is the gold standard measure produced by human intuition, but this does not hold when k-Means is used directly on the untransformed data.</Paragraph> <Paragraph position="1"> Since we work in a transformed space of low dimensionality, we gain e ciency, and we no longer have to sum and average data points in the original space associated with the verb frame data. In principle, this gives us the freedom to use, as is standardly done with SVMs (Christianini and Shawe-Taylor, 2000), extremely high dimensional representations for which it would not be convenient to use k-Means directly. We could for instance use features which are derived not from the counts of a single frame but of two or more. This is linguistically desirable, since Levin's verb classes are de ned primarily in terms of alternations rather than in terms of single frames. We plan to explore this possibility in future work.</Paragraph> <Paragraph position="2"> It is also clearly against the spirit of (Levin, 1993) to insist that verbs should belong to only one cluster, since, for example, both the German \d ammern&quot; and the English \dawn&quot; are clearly related both to verbs associated with weather and natural phenomena (because of \Day dawns.&quot;) and to verbs of cognition (because of \It dawned on Kim that . . . &quot;). In order to accommodate this, we are exploring the consequences of replacing the k-Means step of our algorithm with an appropriate soft clustering technique.</Paragraph> </Section> class="xml-element"></Paper>