File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1043_intro.xml

Size: 2,978 bytes

Last Modified: 2025-10-06 14:05:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1043">
  <Title>Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The acquisition and representation of lexical knowledge from machine-readable dictionaries and text corpora have increasingly become major concerns in Computational Lexicography/Lexicology. While this trend was essentially set by the need to mmximize cost-effectiveness in building large scale Lexical Knowledge Bases for NLP (LKBs), there is a clear sense in which the construction of such knowledge I)ascs also caters to the demand for better dictionaries. Currently available dictionaries and thesauri provide an undoubtedly rich source of lexical information, but often omit or neglect to make explicit salient syntactic and semantic properties of word entries. For exa|nplc, it is well known that the same verb sense can appear in a wtriety of snl)categorization frames which can be related to one at|other through valency alternations (diatheses). Some dictionaries provide subcategorization information by means of grammar codes, as shown below for the &amp;quot;sail&amp;quot; sense. of the verb dock in LI)OCE -- Longman's Dictionary of Contemporary English (Procter, 1978).</Paragraph>
    <Paragraph position="1"> (1) a,,,:k &amp;quot; |, \[Tl;m: (&amp;quot;0\] ....</Paragraph>
    <Paragraph position="2"> The codes \[T1;10:(at)\] indicate that the vcrl) can bc either transitive or intransitive with the possible a(Idition of all oblique colnpienlent introduced by l.he preposition at: (2) a. \[T1 (at)\]: Kim docked his ship (at Clasgow) b. \[IO (at)l: The ship docked (at Glasgow) Unfortunately, an indication of diatheses which relate the various occurrences of tt,e verb to one another is rarely provided. Consequently, if we were to use the grammar code information found in M)OCE to create verb entries in an I,I(B by automatic conversion we would construct four seemingly vnrelated entries for the verb dock (see SS3). Inadequacies of this kind may be redressed through semiantomatie techniques *The researcl, relmrted in this paper was carried out within the ACQUILFX project. Iatn indebted to Ted Briscoe, Ann Col)estake and Pete Whitek)ck for helpful comments.</Paragraph>
    <Paragraph position="3"> wl|ich make it possil)le to suplfly infornmtion concerning amenability to diathesis alternations so ~tq to avoid expanding distinct entries for related uses of the same verb. This practice woldd allow us to develop an I,KB from dictionary databases which offers a more co|nplate and linguistically relined repository of lexical information l, hall the source databases. Such an \],Kll wouhl be used to generate lexical components for NI,P systems, and couhl also be integrated into a lexicographer's workstation to guide word classification.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML