XML Viewer - p97-1009

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1009_metho.xml
Size: 18,358 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1009">
  <Title>Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity</Title>
  <Section position="4" start_page="65" end_page="66" type="metho">
    <SectionTitle>
3 The Approach
</SectionTitle>
    <Paragraph position="0"> The polysemous words in the input text are disambiguated in the following steps: Step A. Parse the input text and extract local contexts of each word. Let LCw denote the set of local contexts of all occurrences of w in the input text.</Paragraph>
    <Paragraph position="1"> Step B. Search the local context database and find words that appeared in an identical local context as w. They are called selectors of w:</Paragraph>
    <Paragraph position="3"> Step C. Select a sense s of w that maximizes the similarity between w and Selectors~.</Paragraph>
    <Paragraph position="4"> Step D. The sense s is assigned to all occurrences of w in the input text. This implements the &amp;quot;one sense per discourse&amp;quot; heuristic advocated in (Gale, Church, and Yarowsky, 1992).</Paragraph>
    <Paragraph position="5"> Step C. needs further explanation. In the next subsection, we define the similarity between two word senses (or concepts). We then explain how the similarity between a word and its selectors is maximized.</Paragraph>
    <Section position="1" start_page="65" end_page="66" type="sub_section">
      <SectionTitle>
3.1 Similarity between Two Concepts
</SectionTitle>
      <Paragraph position="0"> There have been several proposed measures for similarity between two concepts (Lee, Kim, and Lee, 1989; Kada et al., 1989; Resnik, 1995b; Wu and Palmer, 1994). All of those similarity measures are defined directly by a formula. We use instead an information-theoretic definition of similarity that can be derived from the following assumptions:  Assumption 1: The commonality between A and B is measured by I(common(A, B)) where common(A, B) is a proposition that states the commonalities between A and B; I(s) is the amount of information contained in the proposition s. Assumption 2: The differences between A and B is measured by I ( describe( A, B) ) - I ( common( A, B ) ) where describe(A, B) is a proposition that describes what A and B are.</Paragraph>
      <Paragraph position="1"> Assumption 3: The similarity between A and B, sire(A, B), is a function of their commonality and differences. That is, sire(A, B) = f(I(common(d, B)), I(describe(A, B))) Whedomainof f(x,y) is {(x,y)lx &gt; O,y &gt; O,y &gt; x}. Assumption 4: Similarity is independent of the unit used in the information measure.</Paragraph>
      <Paragraph position="2">  According to Information Theory (Cover and Thomas, 1991), I(s) = -logbP(S), where P(s) is the probability of s and b is the unit. When b = 2, I(s) is the number of bits needed to encode s. Since log~,, Assumption 4 means that the func- logbx = logb, b , tion f must satisfy the following condition: Vc &gt; O, f(x, y) = f(cz, cy) Assumption 5: Similarity is additive with respect to commonality.</Paragraph>
      <Paragraph position="3"> If common(A,B) consists of two independent parts, then the sim(A,B) is the sum of the similarities computed when each part of the commonality is considered. In other words: f(xl + x2,y) = f(xl,y) + f(x2,y).</Paragraph>
      <Paragraph position="4"> A corollary of Assumption 5 is that Vy, f(0, y) = f(x + O,y) -f(x,y) = O, which means that when there is no commonality between A and B, their similarity is 0, no matter how different they are. For example, the similarity between &amp;quot;depth-first search&amp;quot; and &amp;quot;leather sofa&amp;quot; is neither higher nor lower than the similarity between &amp;quot;rectangle&amp;quot; and &amp;quot;interest rate&amp;quot;.</Paragraph>
      <Paragraph position="5">  Assumption 6: The similarity between a pair of identical objects is 1.</Paragraph>
      <Paragraph position="6"> When A and B are identical, knowning their commonalities means knowing what they are, i.e., I ( comrnon(.4, B ) ) = I ( describe( A. B ) ) . Therefore, the function f must have the following property: vz,/(z, z) = 1.</Paragraph>
      <Paragraph position="7"> Assumption 7: The function f(x,y) is continuous. null Similarity Theorem: The similarity between A and B is measured by the ratio between the amount of information neededto state the commonality of A and B and the information needed to fully describe what A and B are: sirn( A. B) = logP(common( A, B) ) logP( describe(.4, B) )  Proof.&amp;quot; To prove the theorem, we need to show f(z,y) = ~. Since f(z,V) = f(~,l) (due to Assumption 4), we only need to show that when ~ is a rational number f(z, y) = -~. The result can be gen- y eralized to all real numbers because f is continuous and for any real number, there are rational numbers that are infinitely close to it.</Paragraph>
      <Paragraph position="8"> Suppose m and n are positive integers.</Paragraph>
      <Paragraph position="9"> f(nz, y) = f((n - 1)z, V) + f(z, V) = nf(z, V) (due to Assumption 5). Thus. f(z, y) = 1/4f(nx, y). Substituting ~ for x in this equation: f(z,v) Since z is rational, there exist m and n such that</Paragraph>
      <Paragraph position="11"> For example. Figure 1 is a fragment of the Word-Net. The nodes are concepts (or synsets as they are called in the WordNet). The links represent IS-A relationships. The number attached to a node C is the probability P(C) that a randomly selected noun refers to an instance of C. The probabilities are estimated by the frequency of concepts in SemCor (Miller et al., 1994), a sense-tagged subset of the Brown corpus.</Paragraph>
      <Paragraph position="12"> If x is a Hill and y is a Coast, the commonality between x and y is that &amp;quot;z is a GeoForm and y is a GeoForm&amp;quot;. The information contained in this  statement is -2 x logP(GeoForm). The similarity between the concepts Hill and Coast is:</Paragraph>
      <Paragraph position="14"> where P(fqi Ci) is the probability of that an object belongs to all the maximally specific super classes (Cis) of both C and C'.</Paragraph>
    </Section>
    <Section position="2" start_page="66" end_page="66" type="sub_section">
      <SectionTitle>
3.2 Disambiguation by Maximizing
Similarity
</SectionTitle>
      <Paragraph position="0"> We now provide the details of Step C in our algorithm. The input to this step consists of a polysemous word W0 and its selectors {l,I,'l, I, V2 ..... IVy}. The word Wi has ni senses: {sa,..., sin, }.</Paragraph>
      <Paragraph position="1"> Step C.I: Construct a similarity matrix (8). The rows and columns represent word senses. The matrix is divided into (k + 1) x (k + 1) blocks.</Paragraph>
      <Paragraph position="2"> The blocks on the diagonal are all 0s. The elements in block Sij are the similarity measures between the senses of Wi and the senses of II~.</Paragraph>
      <Paragraph position="3"> Similarity measures lower than a threshold 0 are considered to be noise and are ignored. In our experiments, 0 = 0.2 was used.</Paragraph>
      <Paragraph position="4"> sire(sit. Sjm) if i C/ j and</Paragraph>
      <Paragraph position="6"> Step C.4: The sense of Wi~,,~ is chosen to be 8i~.~lm,a,. Remove Wi,.,,,, from A.</Paragraph>
      <Paragraph position="7"> A ( A- {W/.,., } Step C.5: Modify the similarity matrix to remove the similarity values between other senses of W/~, and senses of other words. For all l, j, m, such that l E \[1,ni.~.,\] and l ~ lmaz and j # imax and m E \[1, nj\]: Si.~o~j (/, m) e---- 0 Step C.6: Repeat from Step C.3 unless im,~z = O.</Paragraph>
    </Section>
    <Section position="3" start_page="66" end_page="66" type="sub_section">
      <SectionTitle>
3.3 Walk Through Examples
</SectionTitle>
      <Paragraph position="0"> Let's consider again the word &amp;quot;facility&amp;quot; in (3). It has two local contexts: subject of &amp;quot;employ&amp;quot; (subj employ head) and modifiee of &amp;quot;new&amp;quot; (adjn new rood). Table 1 lists words that appeared in the first local context. Table 2 lists words that appeared in the second local context. Only words with top-20 likelihood ratio were used in our experiments.</Paragraph>
      <Paragraph position="1"> The two groups of words are merged and used as the selectors of &amp;quot;facility&amp;quot;. The words &amp;quot;facility&amp;quot; has  1. something created to provide a particular service; null 2. proficiency, technique; 3. adeptness, deftness, quickness; 4. readiness, effortlessness; 5. toilet, lavatory.</Paragraph>
      <Paragraph position="2">  Senses 1 and 5 are subclasses of artifact. Senses 2 and 3 are kinds of state. Sense 4 is a kind of abstraction. Many of the selectors in Tables 1 and Table 2 have artifact senses, such as &amp;quot;post&amp;quot;, &amp;quot;product&amp;quot;, &amp;quot;system&amp;quot;, &amp;quot;unit&amp;quot;, &amp;quot;memory device&amp;quot;, &amp;quot;machine&amp;quot;, &amp;quot;plant&amp;quot;, &amp;quot;model&amp;quot;, &amp;quot;program&amp;quot;, etc. Therefore, Senses 1 and 5 of &amp;quot;facility&amp;quot; received much more support, 5.37 and 2.42 respectively, than other senses. Sense 1 is selected.</Paragraph>
      <Paragraph position="3"> Consider another example that involves an unknown proper name: (9) DreamLand employed 20 programmers.</Paragraph>
      <Paragraph position="4"> We treat unknown proper nouns as a polysemous word which could refer to a person, an organization, or a location. Since &amp;quot;DreamLand&amp;quot; is the subject of &amp;quot;employed&amp;quot;, its meaning is determined by maximizing the similarity between one of {person, organization, locaton} and the words in Table 1. Since Table 1 contains many &amp;quot;organization&amp;quot; words, the support for the &amp;quot;organization&amp;quot; sense is nmch higher than the others.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="66" end_page="69" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We used a subset of the SemCor (Miller et al., 1994) to evaluate our algorithm.</Paragraph>
    <Section position="1" start_page="68" end_page="69" type="sub_section">
      <SectionTitle>
4.1 Evaluation Criteria
</SectionTitle>
      <Paragraph position="0"> General-purpose lexical resources, such as Word-Net, Longman Dictionary of Contemporary English (LDOCE), and Roget's Thesaurus, strive to achieve completeness. They often make subtle distinctions between word senses. As a result, when the WSD task is defined as choosing a sense out of a list of senses in a general-purpose lexical resource, even humans may frequently disagree with one another on what the correct sense should be.</Paragraph>
      <Paragraph position="1"> The subtle distinctions between different word senses are often unnecessary. Therefore, we relaxed the correctness criterion. A selected sense 8answer is correct if it is &amp;quot;similar enough&amp;quot; to the sense tag skeu in SemCor. We experimented with three interpretations of &amp;quot;similar enough&amp;quot;. The strictest interpretation is sim(sanswer,Ske~)=l, which is true only when 8answer~Skey. The most relaxed interpretation is sim(s~nsw~, Skey) &gt;0, which is true if 8answer and 8key are the descendents of the same top-level concepts in WordNet (e.g., entity, group, location, etc.). A compromise between these two is sim(Sans~er, Skew) &gt;_ 0.27, where 0.27 is the average similarity of 50,000 randomly generated pairs (w, w') in which w and w ~ belong to the same Roget's category. null We use three words &amp;quot;duty&amp;quot;, &amp;quot;interest&amp;quot; and &amp;quot;line&amp;quot; as examples to provide a rough idea about what sirn( s~nswer, Skew) &gt;_ 0.27 means.</Paragraph>
      <Paragraph position="2"> The word &amp;quot;duty&amp;quot; has three senses in WordNet 1.5. The similarity between the three senses are all below 0.27, although the similarity between Senses 1 (responsibility) and 2 (assignment, chore) is very close (0.26) to the threshold.</Paragraph>
      <Paragraph position="3"> The word &amp;quot;interest&amp;quot; has 8 senses. Senses 1 (sake, benefit) and 7 (interestingness) are merged. 2 Senses 3 (fixed charge for borrowing money), 4 (a right or legal share of something), and 5 (financial interest in something) are merged. The word &amp;quot;interest&amp;quot; is reduced to a 5-way ambiguous word. The other three senses are 2 (curiosity), 6 (interest group) and 8 (pastime, hobby).</Paragraph>
      <Paragraph position="4"> The word &amp;quot;line&amp;quot; has 27 senses. The similarity threshold 0.27 reduces the number of senses to 14.</Paragraph>
      <Paragraph position="5"> The reduced senses are * Senses 1, 5, 17 and 24: something that is communicated between people or groups.</Paragraph>
      <Paragraph position="6"> 1: a mark that is long relative to its width 5: a linear string of words expressing some idea ')The similarities between senses of the same word are computed during scoring. We do not actually change the WordNet hierarchy 17: a mark indicating positions or bounds of the playing area 24: as in &amp;quot;drop me a line when you get there&amp;quot;  * Senses 2, 3, 9, 14, 18: group 2: a formation of people or things beside one another 3: a formation of people or things one after another 9: a connected series of events or actions or developments 14: the descendants of one individual 18: common carrier * Sense 4: a single frequency (or very narrow band) of radiation in a spectrum * Senses 6 and 25: cognitive process 6: line of reasoning 25: a conceptual separation or demarcation * Senses 7, 15, and 26: instrumentation 7: electrical cable 15: telephone line 26: assembly line * Senses 8 and 10: shape 8: a length (straight or curved) without breadth or thickness 10: wrinkle, furrow, crease, crinkle, seam, line * Senses 11 and 16: any road or path affording passage from one place to another; 11: pipeline 16: railway * Sense 12: location, a spatial location defined by a real or imaginary unidimensional extent; * Senses 13 and 27: human action 13: acting in conformity 27: occupation, line of work; * Sense 19: something long and thin and flexible * Sense 20: product line, line of products * Sense 21: space for one line of print (one column wide and 1/14 inch deep) used to measure advertising * Sense 22: credit line, line of credit * Sense 23: a succession of notes forming a distinctived sequence  where each group is a reduced sense and the numbers are original WordNet sense numbers.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="69" end_page="69" type="metho">
    <SectionTitle>
4.2 Results
</SectionTitle>
    <Paragraph position="0"> We used a 25-million-word Wall Street Journal corpus (part of LDC/DCI 3 CDROM) to construct the local context database. The text was parsed in 126 hours on a SPARC-Ultra 1/140 with 96MB of memory. We then extracted from the parse trees 8,665,362 dependency relationships in which the head or the modifier is a noun. We then filtered out (lc, word) pairs with a likelihood ratio lower than 5 (an arbitrary threshold). The resulting database contains 354,670 local contexts with a total of 1,067,451 words in them (Table 1 is counted as one local context with 20 words in it).</Paragraph>
    <Paragraph position="1"> Since the local context database is constructed from WSJ corpus which are mostly business news, we only used the &amp;quot;press reportage&amp;quot; part of SemCor which consists of 7 files with about 2000 words each. Furthermore, we only applied our algorithm to nouns. Table 3 shows the results on 2,832 polysemous nouns in SemCor. This number also includes proper nouns that do not contain simple markers (e.g., Mr., Inc.) to indicate its category. Such a proper noun is treated as a 3-way ambiguous word: person, organization, or location. We also showed as a baseline the performance of the simple strategy of always choosing the first sense of a word in the WordNet. Since the WordNet senses are ordered according to their frequency in SemCor, choosing the first sense is roughly the same as choosing the sense with highest prior probability, except that we are not using all the files in SemCor.</Paragraph>
    <Paragraph position="2"> It can be seen from Table 3 that our algorithm performed slightly worse than the baseline when the strictest correctness criterion is used. However, when the condition is relaxed, its performance gain is much lager than the baseline. This means that when the algorithm makes mistakes, the mistakes tend to be close to the correct answer.</Paragraph>
  </Section>
  <Section position="7" start_page="69" end_page="69" type="metho">
    <SectionTitle>
5 Discussion
5.1 Related Work
</SectionTitle>
    <Paragraph position="0"> The Step C in Section 3.2 is similar to Resnik's noun group disambiguation (Resnik, 1995a), although he did not address the question of the creation of noun groups.</Paragraph>
    <Paragraph position="1"> The earlier work on WSD that is most similar to ours is (Li, Szpakowicz, and Matwin, 1995). They proposed a set of heuristic rules that are based on the idea that objects of the same or similar verbs are similar.</Paragraph>
    <Section position="1" start_page="69" end_page="69" type="sub_section">
      <SectionTitle>
5.2 Weak Contexts
</SectionTitle>
      <Paragraph position="0"> Our algorithm treats all local contexts equally in its decision-making. However, some local contexts hardly provide any constraint on the meaning of a word. For example, the object of &amp;quot;get&amp;quot; can practically be anything. This type of contexts should be filtered out or discounted in decision-making.</Paragraph>
    </Section>
    <Section position="2" start_page="69" end_page="69" type="sub_section">
      <SectionTitle>
5.3 Idiomatic Usages
</SectionTitle>
      <Paragraph position="0"> Our assumption that similar words appear in identical context does not always hold. For example, (10) ... the condition in which the heart beats between 150 and 200 beats a minute The most frequent subjects of &amp;quot;beat&amp;quot; (according to our local context database) are the following: (11) PER, badge, bidder, bunch, challenger, democrat, Dewey, grass, mummification, pimp, police, return, semi. and soldier.</Paragraph>
      <Paragraph position="1"> where PER refers to proper names recognized as persons. None of these is similar to the &amp;quot;body part&amp;quot; meaning of &amp;quot;heart&amp;quot;. In fact, &amp;quot;heart&amp;quot; is the only body part that beats.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML