File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1028_intro.xml
Size: 2,796 bytes
Last Modified: 2025-10-06 14:06:51
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1028"> <Title>Word Sense Disambiguation in Untagged Text based on Term Weight Learning</Title> <Section position="3" start_page="0" end_page="209" type="intro"> <SectionTitle> 2 Polysemy in Context </SectionTitle> <Paragraph position="0"> Most previous corpus-based WSD algorithms are based on the fact that semantically similar words appear in a similar context. Semantically similar verbs, for example, co-occur with the same nouns. The following sentences from the Wall Street Journal show polysemous usages of take.</Paragraph> <Paragraph position="1"> (sl) Coke has typically taken a minority stake in such ventures.</Paragraph> <Paragraph position="2"> (sl') Guber and pepers tried to buy a stake in mgm in 1988.</Paragraph> <Paragraph position="3"> (s2) That process of sorting out specifies is likely to take time.</Paragraph> <Paragraph position="4"> (s2') We spent a lot of time and money in building our group of stations.</Paragraph> <Paragraph position="5"> Let us consider a two-dimensional Euclidean space spanned by the two axes, each associated with stake and time, and in which take is assigned a vector whose value of the i-th dimension is the value of Mu between the verb and the noun assigned to the i-th axis. Take co-occurs with the two nouns, while buy and spend co-occur only with one of the two nouns. Therefore, the distances between take and these two verbs are large In order to capture the synonymy of take with the two verbs correctly, one has to decompose the vector assigned to take into two component vectors, takel and take2, each of which corresponds to one of the two distinct usages of take (in Figure 1). (we call them hypothetical verbs in the following). The decomposition of a vector into a set of its component vectors requires a proper decomposition of the context in which the word occurs. Furthermore, in a general situation, a polysemous verb co-occurs with a large group of nouns and one has to divide the group of nouns into a set of subgroups, each of which correctly characterises the context for a specific sense of the polysemous word. Therefore, the algorithm has to be able to determine when the context of a word should be divided and how.</Paragraph> <Paragraph position="6"> The approach proposed in this paper explicitly introduces new entities, i.e. hypothetical verbs when an entity is judged polysemous and associates them with contexts which are sub-contexts of the context of the original entity* Our algorithm has two basic operations, splitting and lumping* Splitting means to divide a polysemous verb into two hypothetical verbs and lumping means to combine two hypothetical verbs to make one verb out of them (Fukumoto and Tsujii, 1994).</Paragraph> </Section> class="xml-element"></Paper>