File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3814_intro.xml

Size: 5,574 bytes

Last Modified: 2025-10-06 14:04:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3814">
  <Title>Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm</Title>
  <Section position="4" start_page="89" end_page="90" type="intro">
    <SectionTitle>
2 HyperLex
</SectionTitle>
    <Paragraph position="0"> Before presenting the HyperLex algorithm itself, we briefly introduce small-world graphs.</Paragraph>
    <Section position="1" start_page="89" end_page="89" type="sub_section">
      <SectionTitle>
2.1 Small world graphs
</SectionTitle>
      <Paragraph position="0"> The small-world nature of a graph can be explained in terms of its clustering coefficient and characteristic path length. The clustering coefficient of a graph shows the extent to which nodes tend to form connected groups that have many edges connecting each other in the group, and few edges leading out of the group. On the other side, the characteristic path length represents &amp;quot;closeness&amp;quot; in a graph. See (Watts and Strogatz, 1998) for further details on these characteristics. null Randomly built graphs exhibit low clustering coefficients and are believed to represent something very close to the minimal possible average path length, at least in expectation. Perfectly ordered graphs, on the other side, show high clustering coefficients but also high average path length. According to Watts and Strogatz (1998), small-world graphs lie between these two extremes: they exhibit high clustering coefficients, but short average path lengths.</Paragraph>
      <Paragraph position="1"> Barabasi and Albert (1999) use the term &amp;quot;scalefree&amp;quot; to graphs whose degree probability follow a power-law2. Specifically, scale free graphs follow the property that the probability P(k) that a vertex in the graph interacts with k other vertices decays as a power-law, following P(k) [?] k[?]a. It turns out that in this kind of graphs there exist nodes centrally located and highly connected, called hubs.</Paragraph>
    </Section>
    <Section position="2" start_page="89" end_page="90" type="sub_section">
      <SectionTitle>
2.2 The HyperLex algorithm for WSD
</SectionTitle>
      <Paragraph position="0"> The HyperLex algorithm builds a cooccurrence graph for all pairs of words cooccurring in the context of the target word. V'eronis shows that this kind of graph fulfills the properties of small world graphs, and thus possess highly connected components in the graph. The centers or prototypes of these components, called hubs, eventually identify the main word uses (senses) of the target word.</Paragraph>
      <Paragraph position="1"> We will briefly introduce the algorithm here, check (V'eronis, 2004) for further details. For each word to be disambiguated, a text corpus is collected, consisting of the paragraphs where the word occurs.</Paragraph>
      <Paragraph position="2"> From this corpus, a cooccurrence graph for the target word is built. Nodes in the graph correspond to the words3 in the text (except the target word itself).</Paragraph>
      <Paragraph position="3"> Two words appearing in the same paragraph are said to cooccur, and are connected with edges. Each edge is assigned with a weight which measures the relative frequency of the two words cooccurring. Specifically, let wij be the weight of the edge4 connecting  nodes i and j, then</Paragraph>
      <Paragraph position="5"> The weight of an edge measures how tightly connected the two words are. Words which always occur together receive a weight of 0. Words rarely cooccurring receive weights close to 1.</Paragraph>
      <Paragraph position="6"> Once the cooccurrence graph is built, a simple iterative algorithm is executed to obtain its hubs. At each step, the algorithm finds the vertex with highest relative frequency5 in the graph, and, if it meets some criteria, it is selected as a hub. These criteria are determined by a set of heuristic parameters, that will be explained later in Section 4. After a vertex is selected to be a hub, its neighbors are no longer eligible as hub candidates. At any time, if the next vertex candidate has a relative frequency below a certain threshold, the algorithm stops.</Paragraph>
      <Paragraph position="7"> Once the hubs are selected, each of them is linked to the target word with edges weighting 0, and the Minimum Spanning Tree (MST) of the whole graph is calculated and stored.</Paragraph>
      <Paragraph position="8"> The MST is then used to perform word sense disambiguation, in the following way. For every instance of the target word, the words surrounding it are examined and confronted with the MST. By construction of the MST, words in it are placed under exactly one hub. Each word in the context receives a set of scores s, with one score per hub, where all scores are 0 except the one corresponding to the hub where it is placed. If the scores are organized in a score vector, all values are 0, except, say, the i-th component, which receives a score d(hi,v), which is the distance between the hub hi and the node representing the word v. Thus, d(hi,v) assigns a score of 1 to hubs and the score decreases as the nodes move away from the hub in the tree.</Paragraph>
      <Paragraph position="9"> For a given occurrence of the target word, the score vectors of all the words in the context are added, and the hub that receives the maximum score is chosen.</Paragraph>
      <Paragraph position="10"> 5In cooccurrence graphs, the relative frequency of a vertex and its degree are linearly related, and it is therefore possible to avoid the costly computation of the degree.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML