File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-3008_intro.xml
Size: 2,385 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-3008"> <Title>SenseClusters - Finding Clusters that Represent Word Senses</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Most words in natural language have multiple possible meanings that can only be determined by considering the context in which they occur. Given instances of a target word used in a number of different contexts, word sense discrimination is the process of grouping these instances into clusters that refer to the same word meaning. Approaches to this problem are often based on the strong contextual hypothesis of (Miller and Charles, 1991), which states that two words are semantically related to the extent that their contextual representations are similar. Hence the problem of word sense discrimination reduces to that of determining which contexts of a given target word are related or similar.</Paragraph> <Paragraph position="1"> SenseClusters creates clusters made up of the contexts in which a given target word occurs. All the instances in a cluster are contextually similar to each other, making it more likely that the given target word has been used with the same meaning in all of those instances. Each instance normally includes 2 or 3 sentences, one of which contains the given occurrence of the target word.</Paragraph> <Paragraph position="2"> SenseClusters was originally intended to discriminate among word senses. However, the methodology of clustering contextually (and hence semantically) similar instances of text can be used in a variety of natural language processing tasks such as synonymy identification, text summarization and document classification. SenseClusters has also been used for applications such as email sorting and automatic ontology construction.</Paragraph> <Paragraph position="3"> In the sections that follow we will describe the basic functionality supported by SenseClusters. In general processing starts by selecting features from a corpus of text. Then these features are used to create an appropriate representation of the contexts that are to be clustered. Thereafter the actual clustering takes place, followed by an optional evaluation stage that compares the discovered clusters to an existing gold standard (if available).</Paragraph> </Section> class="xml-element"></Paper>