File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1050_intro.xml

Size: 2,582 bytes

Last Modified: 2025-10-06 14:06:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1050">
  <Title>A Corpus-Based Approach to Deriving Lexical Mappings</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Dictionaries are now commonly used resources in NLP systems. However, different lexical resources are not uniform; they contain different types of information and do not assign words the same number of senses. One way in which this problem might be tackled is by producing mappings between the senses of different resources, the &amp;quot;dictionary mapping problem&amp;quot;. However, this is a non-trivial problem, as examination of existing lexical resources demonstrates. Lexicographers have been divided between &amp;quot;lumpers', or those who prefer a few general senses, and &amp;quot;splitters&amp;quot; who create a larger number of more specific senses so there is no guarantee that a word will have the same number of senses in different resources.</Paragraph>
    <Paragraph position="1"> Previous attempts to create lexical mappings have concentrated on aligning the senses in pairs of lexical resources and based the mapping decision on information in the entries. For example, Knight and Luk (1994) merged WordNet and LDOCE using information in the hierarchies and textual definitions of each resource.</Paragraph>
    <Paragraph position="2"> Thus far we have mentioned only mappings between dictionary senses. However, it is possible to create mappings between any pair of linguistic annotation tag-sets; for example, part of speech tags. We dub the more general class lexical mappings, mappings between two sets of lexical annotations. One example which we shall consider further is that of mappings between part of speech tags sets.</Paragraph>
    <Paragraph position="3"> This paper shall propose a method for producing lexical mappings based on corpus evidence. It is based on the existence of large-scale lexical annotation tools such as part of speech taggers and sense taggers, several of which have now been developed, for example (Brill, 1994)(Stevenson and Wilks, 1999). The availability of such taggers bring the possibility of automatically annotating large bodies of text. Our proposal is, briefly, to use a pair of taggers with each assigning annotations from the lexical tag-sets we are interested in mapping. These taggers can then be applied to, the same, large body of text and a mapping derived from the distributions of the pair of tag-sets in the corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML