File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-2166_intro.xml

Size: 3,325 bytes

Last Modified: 2025-10-06 14:04:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2166">
  <Title>Machine-Readable Dictionary&amp;quot; in The Uses of Large Text Databases, Proceedings of the Third</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. The Computational Meta-Lexicou
</SectionTitle>
    <Paragraph position="0"> \]'here is growing awareness among computational linguists that much of the information needed for lexical entries across systems is basically shared or &amp;quot;identical&amp;quot; information /lngria 1986, Zaencn 1986/. An example for verbs is subcategorization hffonnation (transitive, intransitive, takes a that-complement), and selectional features (takes a human object, selects for inanimate subject); an example for nouns is gender (female, male). It should be possible for much of this shared information to be collected into a large &amp;quot;polytheoretical&amp;quot; data base for use by individual systems. This lexicon (sometimes called a &amp;quot;recta-lexicon&amp;quot;) would consist of the overlapping set of the various attributes, features, characteristics, etc., that are necded by all or most NLP systems. Each system could then consult the repository of infonnation stored in the central lexicon and extract the informatkm it might need. The extracted information could be enhanced by theory-specific and application-specific information. Thus, instead of each system duplicating efforts, the computational &amp;quot;recta-lexicon&amp;quot; gathers together lexical information for use by programs, in the same way tlmt traditional dictionaries contain information for use by people.</Paragraph>
    <Paragraph position="1"> One of the goals of the Lexical Systems project at IBM is to desigu and build such a lexicon. We have called the system COMPI~EX (for COMPutational H;.Xicon). Although this is an ambitious goal, we believe that careful lexicographic, linguistic, and computational research will permit us to represent whatever information is common to most NLP systems in a neutral representation and in a uniform data structure so as to be compatible with a range of requirements of natural language systems.</Paragraph>
    <Paragraph position="2"> Corollary to the goal of designing and building a data structure containing information for different NLP systems is tile goal of broad coverage. Indeed, until recently, the lexicon was not tile primary focus of most natural language processmg (NLP) projects. \]'he result (with a few exceptions) has been a proliferation of descriptively rich syntactic and semantic analyzers with impoverished lexieal coverage. Many NLP systems have small hand-built lexicons, hand-tailored to the idiosyncrasies of formatting and processing required by the system. Our aim is to extract inh)rmation automatically or semi-automatically using machine-readable sources, and in this way to achieve broad coverage. Currently, our primary resources are machine readable dictionaries although we have plans to expand to text corpora in the near future.</Paragraph>
    <Paragraph position="3"> Initially, we restrict our attention to building F.nglish lexicons but there is good evidence that some inlbrmation may be transferable to computational lexicons for other languages via bilingual dictionaries. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML