File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1320_intro.xml

Size: 7,086 bytes

Last Modified: 2025-10-06 14:01:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1320">
  <Title>A Statistical Model for Parsing and Word-Sense Disambiguation</Title>
  <Section position="3" start_page="0" end_page="155" type="intro">
    <SectionTitle>
2 Motivation for the Approach
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Motivation from examples
</SectionTitle>
      <Paragraph position="0"> Consider the following examples:  1. IBM bought Lotus for $200 million. 2. Sony widened its product line with personal computers.</Paragraph>
      <Paragraph position="1"> 3. The bank issued a check for $100,000. 4. Apple is expecting \[NP strong results\]. 5. IBM expected \[SBAa each employee to  wear a shirt and tie\].</Paragraph>
      <Paragraph position="2"> With Example 1, the reading \[IBM bought \[Lotus for $200 million\]\] is nearly impossible, for the simple reason that a monetary amount is a likely instrument for buying and not for describing a company. Similarly, there is a reasonably strong preference in Example 2 for \[pp with personal computers\] to attach to widened, because personal computers are products with which a product line could be widened. As pointed out by (Stetina and Nagao, 1997), word sense information can be a proxy for the semantic- and world-knowledge we as humans bring to bear on attachment decisions such as these. This proxy effect is due to the &amp;quot;lightweight semantics&amp;quot; that word senses--in particular WordNet word senses-convey. null Conversely, both the syntactic and semantic context in Example 3 let us know that bank is not a river bank and that check is not a restaurant bill. In Examples 4 and 5, knowing that the complement of expect is an NP or an SBAR provides information as to whether the sense is &amp;quot;await&amp;quot; or &amp;quot;require&amp;quot;. Thus, Examples 3-5 illustrate how the syntactic context of a word can help determine its meaning.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="155" type="sub_section">
      <SectionTitle>
2.2 Motivation from previous work
2.2.1 Parsing
</SectionTitle>
      <Paragraph position="0"> In recent years, the success of statistical parsing techniques can be attributed to several factors, such as the increasing size of computing machinery to accommodate larger models, the availability of resources such as the Penn Treebank (Marcus et al., 1993) and the success of machine learning techniques for lower-level NLP problems, such as part-of-speech tagging (Church, 1988; Brill, 1995), and PP-attachment (Brill and Resnik, 1994; Collins and Brooks, 1995). However, perhaps even more significant has been the lexicalization of the grammar formalisms being probabilistically modeled: crucially, all the recent, successful statistical parsers have in some way made use of bilexical dependencies. This includes both the parsers that attach probabilities to parser moves (Magerman, 1995; Ratnaparkhi, 1997), but also those of the lexicalized PCFG variety (Collins, 1997; Charniak, 1997).</Paragraph>
      <Paragraph position="1">  Even more crucially, the bilexical dependencies involve head-modifier relations (hereafter referred to simply as &amp;quot;head relations&amp;quot;). The intuition behind the lexicalization of a grammar formalism is to capture lexical items' idiosyncratic parsing preferences. The intuition behind using heads as the members of the bilexical relations is twofold. First, many linguistic theories tell us that the head of a phrase projects the skeleton of that phrase, to be filled in by specifiers, complements and adjuncts; such a notion is captured quite directly by a formalism such as LTAG (Joshi and Schabes, 1997). Second, the head of a phrase usually conveys some large component of the semantics of that phraseJ In this way, using head-relation statistics encodes a bit of the predicate-argument structure in the syntactic model. While there are cases such as John was believed to have been shot by Bill where structural preference virtually eliminates one of the two semantically plausible analyses, it is quite clear that semantics--and, in particular, lexical head semantics--play a very important role in reducing parsing ambiguity.</Paragraph>
      <Paragraph position="2"> (See (Collins, 1999), pp. 207ff., for an excellent discussion of structural vs. semantic parsing preferences, including the above John was believed.., example.) Another motivation for incorporating word senses into a statistical parsing model has been to ameliorate the sparse data problem. Inspired by the PP-attachment work of (Stetina and Nagao, 1997), we use Word-Net vl.6 (Miller et al., 1990) as our semantic dictionary, where the hypernym structure provides the basis for semantically-motivated soft clusters. 2 We discuss this benefit of word senses and the details of our implementation further in Section 4.</Paragraph>
      <Paragraph position="3">  While there has been much work in this area, let us examine the features used in recent 1Heads originated this way, but it has become necessary to distinguish &amp;quot;semantic&amp;quot; heads, such as nouns and verbs, that correspond roughly to predicates and arguments, from &amp;quot;functional&amp;quot; heads, such as determiners, INFL's and complemeutizers, that correspond roughly to logical operators or are purely syntactic elements. In this paper, we almost always intend &amp;quot;head&amp;quot; to mean &amp;quot;semantic head&amp;quot;.</Paragraph>
      <Paragraph position="4"> 2Soft clusters are sets where the elements have weights indicating the strength of their membership in the set, which in this case allows for a probability distribution to be defined over a word's membership in all the clusters.</Paragraph>
      <Paragraph position="5"> statistical approaches. (Yarowsky, 1992) uses wide &amp;quot;bag-of-words&amp;quot; contexts with a naive Bayes classifier. (Yarowsky, 1995) also uses wide context, but incorporates the one-sense-per-discourse and one-sense-per-collocation constraints, using an unsupervised learning technique. The supervised technique in (Yarowsky, 1994) has a more specific notion of context, employing not just words that can appear within a window of Ik, but crucially words that abut and fall in the ~2 window of the target word. More recently, (Lin, 1997) has shown how syntactic context, and dependency structures in particular, can be successfully employed for word sense disambiguation. (Stetina and Nagao, 1997) have shown that by employing a fairly simple and somewhat ad-hoc unsupervised method of WSD using a WordNet-based similarity heuristic, they could enhance PP-attachment performance to a significantly higher level than systems that made no use of lexical semantics (88.1% accuracy). Most recently, in (Stetina et al., 1998), the authors made use of head-driven bilexical dependencies with syntactic relations to attack the problem of generalized word-sense disambiguation, precisely one of the two problems we are dealing with here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML