File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1011_metho.xml

Size: 21,530 bytes

Last Modified: 2025-10-06 14:12:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1011">
  <Title>The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Sources</Title>
  <Section position="4" start_page="81" end_page="82" type="metho">
    <SectionTitle>
LDOCE
</SectionTitle>
    <Paragraph position="0"> Lexieal templates such as the one in (1) are generated through a user definable conversion function -- a facility included in the LKB -- which makes it possible to establish correspondences between information derived through LDB queries and LKB types. For example, information relative to selectional restrictions for transitive verbs (e.g. e-human and obj in (1)) is encoded by establishing a correspondence between the value for the individual variables of the subject and object roles in LKB representations and the values retrieved from the relevant LDOCE entry for box codes 5 and 10 (see Figure 1). Similarly, the assignment of verb types (e.g.</Paragraph>
    <Paragraph position="1"> STRICT-TRANS-SIGN) to verb senses is carried out by relating LKB types for English verbs -- about 30 in the current implementation (Sanfilippo, forthcoming) -- to subcategorization patterns retrieved from LDOCE_Inter.</Paragraph>
    <Paragraph position="2"> For example, if a verb sense in LDOCE_Inter were associated with the information in (2), the conversion function would associate the lexical template being generated with the type STRICT-TRANS-SIGN.</Paragraph>
    <Paragraph position="3"> (2) ((Cat V) (Takes NP NP) ...) Needless to say, the amount of information specified in LKB entries will be directly proportional to the amount of information which can be reliably extracted through LDB queries. With respect to verbs, there are several of roles is computed in terms of entailments of verb meanings which determine the most (p-agt) and least (p-pat) agentive event participants for each choice of predicate; see Figures 4 and 5 for illustrative example. This approach reproduces the insights of Dowty's and Jackendoff's treatments of thematic information (Dowty, 1991; Jackendoff, 1990) within a neo-Davidsonian approach to verb semantics (Sanfilippo, 1990). ways in which the representations derived from templates such as the one in (1) can be enriched. In the simplest case, additional information can be recovered from a single MRD source either directly or through translation programs which allow the creation of derived dictionaries where information which is somehow contained in the source MRD can be made more explicit.</Paragraph>
    <Paragraph position="4"> This technique may however be insufficient or inappropriate to recover certain kinds of information which are necessary in building an adequate verb lexicon. Consider the specification of verb class semantics. This is highly instrumental in establishing subcategorization and regimenting lexically governed grammatical processes (see Levin (1989), Jackendoff (1990) and references therein) and should be thus included within a lexicon which supplied adequate information about verbs. For example, a verb such as delight should be specified as a member of the class of verbs which express emotion, i.e. psychological verbs. As is well known (Levin, 1989; Jackendoff, 1990), verbs which belong to this semantic class can be classified according to the following parameters: * affect is positive (admire, delight), neutral (experience, interest) or negative (fear, scare) * stimulus argument is realized as object and experiencer as subject, e.g. admire, experience, fear * stimulus argument is realized as subject and experiencer as object, e.g. delight, interest, scare Psychological verbs with experiencer subjects are 'noncausative'; the stimulus of these verbs can be considered to be a 'source' to which the experiencer 'reacts emc, tively'. By contrast, psychological verbs with stimulu,, subjects involve 'causation'; the stimulus argument ma3 be consided as a 'causative source' by which the experi.</Paragraph>
    <Paragraph position="5"> encer participant is 'emotively affected'. Six subtypes o psychological verbs can thus be distinguished accordint to semantic properties of the stimulus and experience: arguments as shown in (3) where the verb delight is spec ified as belonging to one of these subtypes.</Paragraph>
    <Paragraph position="6">  classes in (3) through LDB queries which used as soure a standard dictionary (e.g. LDOCE) is a fairly hopele~, pursuit. Standard dictionaries are simply not equippe to offer this kind of information with consistency an exhaustiveness. Furthermore, the technique of creatin derived dictionaries where the information contained i a main source MRD is made more explicit is unhel\[ ful in this case. For example, one approach would b scare  to derive a dictionary from LDOCE where verbs are organized into a network defined by IS-A links using the general approach to taxonomy formation described by Amsler (1981). Such an approach would involve the formation of chains through verb definitions determined by the genus term of each definition. Unfortunately, the genus of verb definitions is often not specific enough to supply a taxonomic characterization which allows for the identification of semantic verb classes with consistency and exhaustiveness. In LDOCE, for example, the genus of over 20% of verb senses (about 3,500) is one of 8 verbs: cause, make, be, give, put, take, move, have; many of the word senses which have the same genus belong to distinct semantic verb classes. This is not to say that verb taxonomies are of no value, and in the final section we will briefly discuss an important application of verb taxonomies with respect to the assignment of semantic classes to verb senses. Nevertheless, the achievement of adequate results requires techniques which reclassify entries in the same source MRD(s) rather than making explicit the classification 'implicit' in the lexicographer's choice of genus term. Thesauri provide an alternative semantically-motivated classification of lexical items which is most naturally suited to reshape or augment the taxonomic structure which can be inferred from the genus of dictionary definitions. The LLOCE is a thesaurus which was developed from LDOCE and there is substantial overlap (although not identity) between the definitions and entries of both MRDs. We decided to investigate the plausibility of semi-automatic sense correlations with LDOCE and LLOCE and to explore the utility of the thesaurus classification for the classification of verbs in a linguistically motivated way.</Paragraph>
  </Section>
  <Section position="5" start_page="82" end_page="83" type="metho">
    <SectionTitle>
3 DCK: A Flexible Tool for Correlating
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="82" end_page="82" type="sub_section">
      <SectionTitle>
Word Senses Across MRDs
</SectionTitle>
      <Paragraph position="0"> Our immediate goal in developing an environment for correlating MRDs was thus to merge word senses, and in particular verb senses, from LDOCE and LLOCE. More generally, our aim was to provide a Dictionary Correlation Kit (DCK) containing a set of flexible tools that can be straightforwardly tailored to an individual user's needs, along with a facility for the interactive matching of dictionary entries. Our DCK is designed to correlate word senses across pairs of MRDs which have been mounted on the LDB (henceforth source-dict and destination.dict) using a list of comparison heuristics.</Paragraph>
      <Paragraph position="1"> Entries from the source-dict and destination-dict are compared to yield a set of correlation structures which describe matches between word senses in the two dictionaries. A function is provided that converts correlation structures into entries of a derived dictionary which can be mounted and queried on the LDB.</Paragraph>
    </Section>
    <Section position="2" start_page="82" end_page="82" type="sub_section">
      <SectionTitle>
3.1 General Functionality of DCK
</SectionTitle>
      <Paragraph position="0"> Entry fields in the source-dict and destination-dict are compared by means of comparators. These are functions which take as input normalized field information extracted from the entries under analysis, and return two values: a score indicating the degree to which the two fields correlate, along with an advisory datum which indicates what kind of action to take. The objective of each match is to produce a correlation structure consisting of a source-dict sense and a set of destination-dict sense/score pairs representing possible matches. Prior to converting correlation structures into derived dictionary entries, the best match is selected for each correlation structure on the basis of the comparator scores. When there is ambiguity as to the best match, a correlation dialog window pops up that allows the user to peruse the candidate matches and manually select the best match (see Figure 3).</Paragraph>
    </Section>
    <Section position="3" start_page="82" end_page="82" type="sub_section">
      <SectionTitle>
3.2 Customislng DCK
</SectionTitle>
      <Paragraph position="0"> Two categories of information must be provided in order to correlate a pair of new LDB-mounted dictionaries: * functions which normalize dictionary-dependent field values, and * dictionary independent comparators which provide matching heuristics.</Paragraph>
      <Paragraph position="1"> Field values describing the same information may be labeled differently across dictionaries. For example, pronouns may be tagged as Pron in the part-of-speech field of one dictionary and Pronoun in part-of-speech field of another dictionary. It is therefore necessary to provide normalizing functions which convert dictionary-specific field values into dictionary-independent ones which can be compared using generic comparators.</Paragraph>
      <Paragraph position="2"> Comparators take as arguments pairs of normalized field values relative to the senses of the two MRDs under comparison, and return a score associated with an advisory datum which indicates the course of action to be followed. The score and advisory datum provide an index of the degree of overlap between the two senses.</Paragraph>
    </Section>
    <Section position="4" start_page="82" end_page="83" type="sub_section">
      <SectionTitle>
3.3 Determining the Best Sense
</SectionTitle>
      <Paragraph position="0"> A correlation structure contains a list of destination-dict sense/score pairs which indicate possible matches with the corresponding source-dict sense. The most appropriate match can be determined automatically using two  user-provided parameters: 1. the threshold, which indicates the minimal acceptable score that a comparator list must achieve for automatic sense selection, and 2. the tolerance, which is the minimum difference be- null tween the top two scores that must be achieved if the top sense with the highest score is to be selected. The sense/score pair with the highest score is automatically selected if: A. the advisory datum provides no indication that the correlation should be queried, B. the score relative to a single match exceeds the threshold, or C. the score relative to two or more matches exceeds the threshold, and the difference between the top two scores exceeds the tolerance.</Paragraph>
      <Paragraph position="1"> If either one of these conditions is not fulfilled, the correlation dialog is invoked to allow a manual choice to be made.</Paragraph>
      <Paragraph position="2">  Ldoce Entr~ feel(5), Id: I I felll /fi:l/ t felt /felt/ 3 \[TI,5;U3\] to bel lave, asp. for the imeent (solethlng thC/ cannot be proved): ~ fe/C/ ~C/ Pkl fac /o~.</Paragraph>
    </Section>
    <Section position="5" start_page="83" end_page="83" type="sub_section">
      <SectionTitle>
3.4 The Correlation Dialog
</SectionTitle>
      <Paragraph position="0"> The correlation dialog allows the user to examine correlation structures and select none, one or more destination-dict senses to be matched with the source-dict sense under analysis. A typical interaction can be seen in Figure 3. A scrollable window in the centre of the dialog box provides information about the destination-dict senses and their associated scores. Single clicking the mouse button on one or more rows makes them the current selection. The large button above the threshold and tolerance indicators summarizes source-dict sense information. Clicking on this button invokes an LDB query window which inspects the source-dict sense (cf. bottom left window in Figure 3).</Paragraph>
      <Paragraph position="1"> The dialog can be in one of three modes: * Explain Scores -- the mode specific key pops up a window for each destination-dict sense in the current selection, explaining how each score was obtained from the comparators; * Display Entries -- the mode specific key invokes standard LDB browsers on the destination-dict senses in the current selection (cf. top-left window in Figure 3), and * Accept Entries -- the mode specific key terminates the dialog and accepts the current selection as the best match.</Paragraph>
      <Paragraph position="2"> Two additional buttons on the top right of the dialog box allow the current selection to be accepted independent of the current mode, or all senses to be rejected (i.e. no match is found). At the bottom of the screen, two 'thermometers' allow the user to adjust the threshold and tolerance parameters dynamically.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="83" end_page="84" type="metho">
    <SectionTitle>
4 Using DCK
</SectionTitle>
    <Paragraph position="0"> We run DCK with LLOCE as source-dict and LDOCE as destination-dict to produce a derived dictionary, LDOCE_Link, which when loaded together with LDOCE would allow us to form LDOCE queries which integrated thesaurus information from LLOCE. The work was carried out with specific reference to verbs which express 'Feelings, Emotions, Attitudes, and Sensations' and 'Movement, Location, Travel, and Transport' (sets 'F' and 'M' in LLOCE). Correlation structures were derived for 1194 verb senses (over 1/5 of all verb senses in LLOCE) using as matching parameters degree of overlap in grammar codes, definitions and examples, as well as equality in headword and part-of-speech. After some trim runs, correlations appeared to yield best results when all parameters were assigned the same weight except the comparator for 'degree of overlap in examples' which was set to be twice as determinant than the others. Tolerance was set at 7% and threshold at 65%. The rate of interactions through the correlation dialog was about one for every 8-10 senses. It took about 10 hours running time on a Macintosh IIcx to complete the work, with less than three hours' worth of interactions.</Paragraph>
    <Paragraph position="1"> A close examination of over 500 correlated entries disclosed an extremely low incidence of infelicitous matches (below 1%). In some cases, sense-matching inadequacies could be easily redressed without reassignment of correlation links. For example, DCK erroneously correlated the verb sense for float in LLOCE with the first verb sense of float in LDOCE. As shown in (4), the LLOCE sense refers only to the intransitive use of the verb, while the LDOCE sense refers to both transitive and intransitive uses of the verb (i.e. the LLOCE sense is subsumed  by LDOCE sense).</Paragraph>
    <Paragraph position="2"> (4) a LLOCE  float\[I0\] to stay on or very near the surface of a liquid, esp. water b LDOCE float 2 v 1 \[10;T1\] to (cause to) stay at the top of a liquid or be held up in the air without sinking water One way to redress this kind of inadequate match would be to augment DCK with a lexical rule module catering for diathesis alternations which made it possible to establish a clear relation between distinct syntactic realizations of the same verb. For example, the transitive and intransitive senses of float could be related to each other via the 'cansative/inchoative' alternation. This augmentation would be easy to implement since information about amenability of verbs to diathesis alternations is recoverable from LDOCE_Inter, as shown below for float (Ergative is the term used in LDOCE-Inter to characterize verbs which like float are amenable to the causative/inchoative alternation).</Paragraph>
    <Paragraph position="3">  Notice, incidentally, that even though DCK yielded an incorrect sense correlation for the verb entry float, the information which was inherited by LDOCE from LLOCE through the correlation link was still valid. In LLOCE, float is classified as a verb whose set, group and main identifiers are: floating-and-sinking, Shipping and Movement-location-travel-and-transport. This information is useful in establishing the semantic class of both the transitive and intransitive uses of float. This is also true in those rare cases where DCK incorrectly preferred a sense match to another as shown below for the first LLOCE sense of behave which DCK linked to the third LDOCE sense rather than the first. Either sense of behave is adequately characterized by the set, group and main identifiers 'behaving', 'Feeling-andbehaviour-generally', and 'Feelings-emotions-attitudesand-sensations' which LDOCE inherits from LLOCE through the incorrect sense correlation established by DCK.</Paragraph>
    <Paragraph position="4"> (6) a LLOCE behave 1 \[L9\] to do things, live, etc.</Paragraph>
    <Paragraph position="5"> usu in a stated way: She behaved with great courage when her husband died ...</Paragraph>
    <Paragraph position="6"> b LDOCE behave v 1 \[L9\] to act; bear oneself: She behaved with great courage .... 3 \[L9\] (of things) to act in a particular way: /t can behave either as an acid or as a salt ...</Paragraph>
    <Paragraph position="8"/>
  </Section>
  <Section position="7" start_page="84" end_page="84" type="metho">
    <SectionTitle>
5 LKB Encoding of Lexical Knowledge
from Combined MRD Sources
</SectionTitle>
    <Paragraph position="0"> LDOCE_Link was derived as a list of entries consisting of correlated LLOCE-LDOCE sense pairs plus an explicit reference to the corresponding set identifier in LLOCE, as shown in (7).</Paragraph>
    <Paragraph position="2"> Loading LDOCE with LDOCE_Link makes it possible to form LDOCE queries which include thesaurus information from LLOCE (i.e. the set identifiers). The integration of thesaurus information provides adequate means for developing a semantic classification of verbs. With respect to psychological verbs, for example, the set identifiers proved to be very helpful in identifying members of the six subtypes described in (3). The properties used in this classification could thus be used to define a hierarchy of thematic types in the LKB which gave a detailed characterization of argument roles. This is shown in the lattice fragment in Figure 4 where the underlined types correspond to the role types used to distinguish the six semantic varieties of psychological predicates. 3 The correspondence between LLOCE set identifiers and the thematic role types shown in Figure 4 made it possible to create word-sense templates for psychological verbs from LDB queries which in addition to providing information about morphological paradigm, subcategorization patterns, diathesis alternations and selectional restrictions, supplied thematic restrictions on the stimulus and experiencer roles. Illustrative LKB entries relative to the six verb subtypes described in (3) are shown in Figure 5.</Paragraph>
  </Section>
  <Section position="8" start_page="84" end_page="86" type="metho">
    <SectionTitle>
6 Final Remarks
</SectionTitle>
    <Paragraph position="0"> Taking into consideration the size of the LLOCE fragment correlated to LDOCE (1/5 of LLOCE verb senses) and the results obtained, it seems reasonable to expect that this work should extend straightforwardly to other verbs as well as word senses of different category types.</Paragraph>
    <Paragraph position="1"> As far as we were able to establish, the major limitation of the work carried out arises from the fact that the entries and senses per homonyn in the source dictionary were considerably fewer than those in the destination dictionary (e.g. 16,049 entries with 25,100 senses in LLOCE vs. 41,122 entries with 74,086 senses in LDOCE). Consequently, many senses of correlated verb entries as well as entire verb entries in LDOCE are bound to be left without a specification of thesaurus information. We are currently exploring the possibility of using verb taxonomies to extend the results of LLOCE-LDOCE correlations to those LDOCE entries and verb 3The labels 'p-agt' and 'p-pat' are abbreviations for 'proto-typica\]' agent and patient roles which subsume clusters of entailments of verb meanings which qualify the most and least agentive event participants for each choice of predicate  down taxonomies using as parent nodes verb entries for ate -- and let the daughter nodes of these taxonomie~, inherit the thesaurus specifications associated with th(  parent nodes. We expect that this use of verb taxonomy should provide a significant solution for the lack of sense-to-sense correlations due to differences in size.</Paragraph>
  </Section>
  <Section position="9" start_page="86" end_page="86" type="metho">
    <SectionTitle>
Acknowledgements
</SectionTitle>
    <Paragraph position="0"> We are indebted to Ted Briscoe, John Carroll and Ann Copestake for helpful comments and technical advice. Many thanks also to Victor Lesk for mounting the LLOCE index and having a first go at correlating</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML