File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1043_metho.xml

Size: 13,627 bytes

Last Modified: 2025-10-06 14:13:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1043">
  <Title>Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation</Title>
  <Section position="4" start_page="0" end_page="223" type="metho">
    <SectionTitle>
2 The ACQUILEX Lexicon Development
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="223" type="sub_section">
      <SectionTitle>
Environnmnt
</SectionTitle>
      <Paragraph position="0"> Our points of departure are tile tools for lexical acquisition and knowledge representation (lew~loped iL~ part of the ACQUII,I'3X project ('The Acquisition of Lcxieal Knowledge for NLP Systems').</Paragraph>
      <Paragraph position="1"> The ACQUILI'~X l,exicon l)evelopment Environmen |uses typed graph unilication with inheritance as its lexical representation htnguage (for details, see Copestake (1992), Sanfiliplm &amp; l'oznafiski (1992), and pal)ers by Copestake, de Paiva and Sanfilippo in Briscoe el al. (1993)). It; allows the user to define an inheritance hierarchy of types with associated restrictions expressed in terms of attril)ute-wdue \[)airs as shown in Fig 1, and to create lexicons where such types are used to create lexical templates which encode word-se,|se specific information ex{.racte.d from MRI)s st, ch as the one in Fig 2. (Bold lowerc~me is used for types, caps for attributes, and boxes enclosing types indicate total omission of attribute-vahm pairs, l)etails COIICel'llillg, |lie OllcOdillg o\[ vet'\]) Sylll, aX an(I seulantics can be found in Sanlilil)po (1993).) Feature Structure (I&amp;quot;S) descriptions of word senses such as tilt.' one in Fig 2 are created semiautomatically through a program which converts syntactie an(I</Paragraph>
      <Paragraph position="3"> sign rule .</Paragraph>
      <Paragraph position="4"> lex-sig, .-. h!xllrld-l'uh.. ... verb-slgn * . .</Paragraph>
      <Paragraph position="5"> strict-intraua-Mgn . . . deg verl,-Mgn \] 1- lexical-rule \]</Paragraph>
      <Paragraph position="7"> \[;'igure I: q'ype IIierarchy &amp; Constraints (fragment).</Paragraph>
      <Paragraph position="9"> semantic specifications encoded in MRDs into LKB types. For example, the choice of LKB types used in the characterization of the verb swim above was induced from the syntactic and semantic codes found in LDOCE and tim Longman Lexicon of Contemporary English (LLOCE, McArtllur 1980). In LI)OCE, the first sense of the verb swim is marked ,'us a strict intransitive verb (\[I0\]) whose subject is animate ((box .... 0)); in LLOCE, the same verb sense is semantically classified as a movement verb with manner of motion specified (M19):</Paragraph>
      <Paragraph position="11"> The MRD-to-LKB equivalences induced by the conversion algorithm are as shown in (4) where agt-eausemove-manner indicates that the subject participant relation implies self-induced movement with manner specified.</Paragraph>
      <Paragraph position="12"> (4) \[10\] --4&amp;quot; str|ct-|ntranu-slgn</Paragraph>
      <Paragraph position="14"/>
    </Section>
  </Section>
  <Section position="5" start_page="223" end_page="274" type="metho">
    <SectionTitle>
3 Verbal Diatheses and Lexieal
Acquisition
</SectionTitle>
    <Paragraph position="0"> In the example discussed above, MI~D-to-LKB conversion is relatively straightforward: a single LKB entry is created for swim since a single grammar code is found in the MRD sources used. Where a verb-sense entry gives more than one grammar code, however, the question arises whether or not each grammar code should be mapped into a distinct LKB entry. For example, the codes given in LDOCE for the verb dock (see (1)) could potentially be used to derive four LKB verb entries: null  (s) LKB TYPF, EXAMP LI,; a. striet-trans-sign Kim docked the boat b. obl-trans-slgn Kim docked the boat at Southampton c. striet-intrans-sign The boat docked d. obl-intrmxs-sign The boat docked at</Paragraph>
    <Section position="1" start_page="223" end_page="274" type="sub_section">
      <SectionTitle>
Southampton
</SectionTitle>
      <Paragraph position="0"> Notice, however, that in tllis case the creation of four distinct LKB entries is unnecessary insofar ,as the use of the verb exemplified in (5b) contains enongh information to derive the remaining uses of the verb through lexical rules which progressively reduce the verb's valency by dropping the subject and/or prepositional argument(s). Such a step would be linguistically motivated in that it establishes a clear link between alternative uses of the same verb sense. Moreover, compact representation of verb use extensions is desirable from an engineering perspective its it reduces the size of ttm lexicon, allowing verb use expansion to be delayed till parsing time. This practice can be made to facilitate the resolution of lexical ambiguity by enforcing selective application of lexical rules (Copestake &amp; Briscoe, 1994).</Paragraph>
      <Paragraph position="1"> Compact representation of verb use extensions due to valency alternations requires that a note of all applicable lexical rules be made in each kernel entry. In choosing ol)l-trans-slgn as the LKB type for dock, for example, specifications would be added saying that the verb is amenabh~&amp;quot; to the causative-inchoative alternation relating agentive and agentless uses ((5a,b) vs. (5c,d)), and the path alternation pertaining to the omission of the prepositional argument ((5a,c) vs.</Paragraph>
      <Paragraph position="2"> (5b,d)). In addition, the path alternation would ha~(e to be specified as to whether it preserves amenability to a telic interpretation (accomplishment or achievemen|) of the event described by the verb or not. For example, tile omission of the goal argnment for a verb such as drive, push or carmj induces an atelic (process) interpretation as indicated by incompatibility with a  terminative adverbial: (6) a. ,\]ohn drove his car to London in one hour b. John drove his ear (*in one hour)  Within a (partial) deeomposltional approach to verb semantics (Tahny, 1985; Jackendolr, 1990; Sanfilippo, 1993; Sanlilil)po el al., 1992)), this contrast can be explained with reference to the rneaniug component path. In (6a), the goal argument (1o London) fixes a final bound for the path along which the driving event takes place. Assuming that, the compositional meaning of tile sentence involves establishing a homomorphism between tile event described by the verb and the path along which such an event takes place (l)owty, 1991; Sanlilippo, 1991), it follows that with an unbounded path (e.g. (6b)) only a process interpretation is possible, whereas with a bounded path (e.g. (6a)) a relic interpretation is more likely. P,y contr~t, the omission of the goal argument with verbs such ,an deliver, bring, dock and send does not inhibit amenability to a relic interpretation, e.g.</Paragraph>
      <Paragraph position="3">  Our aim, then, wtLs to capture regularities across distinct nses of the same verb sense by relating the sub-categorization frames relative to these uses via regular syntactic and semantic changes. 'lb iLssess the feasibility of this approach, we attgmented the MtH)-to-LKB conversion code with facilities which make it possible to infer amenability to specific diathesis alternations from occurrence of multiple grammar codes and their ~ssociated semantic codes in the MR, Ds. To improve on the informational content of LDOCE grammar codes, we used an intermediate dictionary semiantonmtically derived from LDOCE (LI)OCEAnter) where the sub-categorization information inferrablc from grammar codes and other orthographic conventions wi~s made more explicit (Boguraev &amp; Briscoe, 1989; Carroll &amp; Grovel 1989). Semantic inlbrmation about verb classes was obtained by mapping across LI)OCF, and II, LOCE so as to augment LI)OCE queries with thesaurus information, i.e. semantic codes (Santilippo &amp; Poznafiski, 1992).</Paragraph>
      <Paragraph position="4"> Syntactic and semantic intbrmation relative to verb senses was extracted through special functio,~s which operate on pointers to dictionary entries. The extracted info was used to generate FS representations of word senses. The collversion process was carried out in such a way that whenever multiple subcategorization frames were found in association with a verb sense, only those which could not be derived via diathesis alternation were expanded into LKII entries, l;br example, the LDOCEAnter entry for dock gives four subeategorization frames:</Paragraph>
      <Paragraph position="6"> In this case, the four uses of the verb can all be derived from the last one through application of the causative-inchoative and bounded-path alternations mentioned above; all that ,ceds doing is to mark what diatheses are possible in the LKB entry derived, e.g.</Paragraph>
      <Paragraph position="8"> 'l?he algorithm which guides this process checks whether information regardiug diathesis alternations can be inferred from dictionary entries iu the MRI) sources or must be manually supplied. In performlug this check, snbcategorization options relative to a given verb sense which can be inferred from a more informative subcategorization frame are ignored. This technique was successfidly employed in semiautomatic derivation of lexicons for 360 n~ow.'n~ent verbs yieldiug over 500 additional possible expansions by application of lexical rules.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="274" end_page="276" type="metho">
    <SectionTitle>
4 Verbal Diatheses and Knowledge
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="274" end_page="276" type="sub_section">
      <SectionTitle>
Representation
</SectionTitle>
      <Paragraph position="0"> To encode amenability to verbal diathescs, the feature D1ATHESES wiLs introduced ;Ls an extension of the morphological features ~Lssoeiated with verbs (see (8)). This feature takes as value the type altem,alions which is in turn (teiine(l ,'ks having a wtriety of specialized types according to which diathesis alternations are admissible for each choice of verb type (e.g. intransitive, transit.ive, ditransitive), ~u shown in Figure 3 (see next page). The following table provides diatheses refi~rred to in l&amp;quot;ig 3.</Paragraph>
      <Paragraph position="1">  l)iathesis alternations are enforced by means of lexical rules whM,, on par with all other information structures in the LKP,, are hierarchically arranged, ~s shown in Fig 4 with reference t,o the bound and unl)ound path alternations for intransitive verbs. LexicM rules  enforcing diathesis alternations may involw~ a variety of syntactic, semantic and orthographic changes. For example, the u-path-ohl-intrans-alt rule shown in Fig 5 I)elow takes as input an I&amp;quot;S of type obl-intranssign which represents a verb describing a non-stative eventuality (dyn-eve) whose subject participant (with semantics KI) is implied as moving along a directed path (th-move-dir) the endpoint of which is specitled by the oblique a,'gument (pp-sign), e.g. the use of swim in Kim swam acTvss the river. The output is an FS representing a strict intransitive verb (strietintrans-sign) which describes a process and whose subject participant is like that of the inlutt with the directed path speciIication removed (th-move instead of th-mow~-dir), e.g. swim in Kim swam).</Paragraph>
      <Paragraph position="2"> 5 Using the I,KB to Guide Dictionary Compilation There are at least two ways in which an LKB such as the one developed in ACQUILEX offers the means to</Paragraph>
      <Paragraph position="4"> dltrana-dlatl ..... \[t bl-diatl ........ ..... \] PPJt%ALT = prt-or-obl-alt  |PRT-ALT = prt-or-obl-alt &amp;quot;PttANS-AL'r = trana-alt OBL-ALT = prt-or=obl-alt l TRANS-AIIr = italia-air L ODL-AIIY = prt-or-obl-alt L I)AT-MOVT = (|itt=nlovt trana-alt _E caua-incl h middle) htdef-obj~ def-obj, recip, pass prt-or-obl-alt ~ b-path) u-path dat-niovt ~ to) for  facilitate word classificatiol, in the compilation of new lexieal databases.</Paragraph>
      <Paragraph position="5"> First, the links between LKB types and dictionary entries established in the conversion stage can be used to run consistency cheeks on tile MRI) sources and to supply missing information or correct errors, This offers an efficient and cost-effective way of generating improved versions of the same (lietionary.</Paragraph>
      <Paragraph position="6"> Second, the types associated with specific word classes can he made to guide lexical aquisition from corpora when creating new dictionaries. It is now widely recognized that corpora are indispensable in the acquisition of lexicaI information relating to issues of usage such as the range and frequency of different patterns of syntactic realization. 'FILe availability of software tools for partial analysis of texts {e.g. morphological and semantic tagging, phrasal parsing etc.) has increased significantly the utility of corpora in lexical acquisition by providing ways to structure the information contained in them (see Briscoe {199l) and references therein), l'h~rther advances yet can be made by using LKB types to chLssify words in text eof pora. Suplmse , for example, we lit,ked the input and ontl)ut of lexical rules to semantically tagged subcategorization frames extracted from bracketed corpora| (Poznafiski &amp; Sautillppo, 1993). As indicated in Fig 6, this would allow us to assess which alternations might be of interest in establishing regular verb sense/usage shifts. Such an assessment wonhl provide an effective way to drive verb categorization from corpora in the domain of valency alternations.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML