File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2089_metho.xml

Size: 24,635 bytes

Last Modified: 2025-10-06 14:12:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2089">
  <Title>A FEATURE-BASED MODEL FOR LEXICAL DATABASES</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. PREVIOUS MODELS
</SectionTitle>
    <Paragraph position="0"> The classical relational model has been proposed to represent dictionaries (Nakamura and Nagao, 1988).</Paragraph>
    <Paragraph position="1"> However, as Neff, Byrd, and Rizk, 1988, point out, the relational model cannot capture the obvious hierarchy in most dictionary entries. For example, the entry for abandon in Fig. 1 has two main sub-parts, one for its verb senses and one for its noun sense, and the two senses of the verb labeled &amp;quot;1&amp;quot; in Fig. 1 are in fact two sub-senses of the first sense given in tile entry. These two sub-senses are more closely related to each other than to senses 2, 3, and 4, but file tahular format of relational models obscures this fact.</Paragraph>
    <Paragraph position="2"> Neff, Byrd, and Rizk describe a lexical database (the IBM LDB) based on an unnormalized (also Non First Normal Form or NF 2) relational data model, in which attribute values may be nested relations with their own internal structure (see Abiteboul and Bidoit, 1984; Roth et al., 1988). Fig. 2 shows the LDOCE entry for abandon represented in a NF 2 model. The outermost table consists of a rclation between a headword and some number of homographs. In turn, a homograph consists of a part of speech, a grammar code, and some number of senses, etc. Obviously, this model better captures the hierarchical structure of information in the dictionary and enables the tactoring of attributes.</Paragraph>
    <Paragraph position="3"> Although NF 2 models clearly improve on other models for representing dictionary information, a number of problems, outlined in the following subsections, still remain.</Paragraph>
    <Paragraph position="4"> Acids DE COLING-92, NANTES, 23-28 AoL'r 1992 5 8 8 PROC. OF COLING-92, NAN'rEs, AUG. 23-28, 1992 a.ban.don I/,~'bamdon/v \[TIt 1 to leave completely and for ever; desert: The sailors abandoned the burning ship. 2 to leave (a relation or friend) in a thoughtless or cruel way: lie abandoned his wife and went away with all their money. 3 to give up, esp.</Paragraph>
    <Paragraph position="5"> without finishing: The search was abandoned when night came, even though the child had not been found. 4 (to) to give (oneself) up completely to a feeling, desire, etc.: lie abandoned him*elf to grief I abandoned behaviour. -- ~ment n IU\].</Paragraph>
    <Paragraph position="6"> abandon 2 n \[U\] the state when one's feelings and actions are uncontrolled; freedom from control: 7'he people were so excited that they jumped and shouted with abandon / in gay abandon.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Recursive nesting
</SectionTitle>
      <Paragraph position="0"> Some dictionaries take the grouping and nesting of senses several levels deep in order to distinguish finer and finer grains of meaning. The Haebette Zyzomys CD-ROM dictionary, for instance, distinguishes up to five levels in an entry (Fig. 3).</Paragraph>
      <Paragraph position="1"> valour \[valceR\] n. f. A. 1. l. Ce par quoi une ~a rsonne est digne d'estime, ensemble des qualit6s qui recommandent. (V. m6rite). Avoir conscience de sa valeur. C'est un heroine de grande valour. 2. Vx.</Paragraph>
      <Paragraph position="2"> Vaillance, bravoure (sp~ial., au combat). &amp;quot;La valour n'anend pas le hombre des anndes&amp;quot; (Corneille). O Valour militaire (croix de la): d6coration frangaise...</Paragraph>
      <Paragraph position="3"> i'i, 1. Ce en quoi une chose est dignc d'int6r6t. Los souvenirs attaches h cot objet font pour toni sa valeur.</Paragraph>
      <Paragraph position="4"> 2. Caract~re de ce qui est reconnu digne d'int6r6t...</Paragraph>
      <Paragraph position="5"> B\] L 1. Caract~re mesurable d'un objet, en tam qu'il est susceptible d'6tre 6chang6, d6sir6, vendu, etc. (V.</Paragraph>
      <Paragraph position="6"> prix). Faire estimer la valour d'un objet d'art...</Paragraph>
      <Paragraph position="7"> Fig. 3. Part of the definition of 'valour' in Zyzomys NF2 models explicitly prohibit recursive embedding of relations. Therefore, the only way to represent the recursive nesting of senses is through the proliferation of attributes such as sENS&lt; I,ZV~I.1, SENSE L~WL2, etc. m represent the different levels. This in turn demands that queries take into account all the possible positions where a given sub-attribute (e.g., usage) could appear. For example, mulitple queries are required to retrieve all nouns which have an archaic (Vx = vieux) sense. Since arty sense at any level could have this attribute value, it is necessary to query each level.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Exceptions
</SectionTitle>
      <Paragraph position="0"> Exceptional cases are characteristic of lexical data. For instance, sense 3 of the word &amp;quot;conjure&amp;quot; in the OALD has a pronunciation different from the other senses in the entry, and the entry &amp;quot;heave&amp;quot; in the CED shows that inflected forms may apply to individual senses--in this case, the past tense and past participle is &amp;quot;heaved&amp;quot; for all but the nautical senses, for which it is &amp;quot;hove&amp;quot; (Fig. 4).</Paragraph>
      <Paragraph position="1"> con.jure \[k^nd3o(r)/vt, vi I \[VP2A,15A\] do clever tricks which appear magical... 2 \[VP15B1 ~ up, cause to appear as if from nothing... 3/kan'dsUa(r)/ \[VP17\] (formal) appeal solemnly to_. \[OALD\] heave (hi:v) vb. heaves, heaving, heaved or (chiefly nautical) hove .... 5. (pa.~t tense and past participle hove) Nautical. a. to move or cause to move in a specified way ._ ICED\] Fig. 4. Exceptions in dictionary entries Allowing the same attribute at different levels, in different nested relations (for example, allowing a pronunciation attribute at both the homograph and sense levels) would require a mechanism to &amp;quot;override&amp;quot; an attribute value at an inner level of nesting. NF 2 models do not provide any such mechanism and, in fact, do not allow the same attribute to appear at different levels. If any attribute can appear in any nested relation, the model becomes ill-defined since the very notion of hierarchy upon which it relies is undermined. Therefore, the only</Paragraph>
      <Paragraph position="3"> v T1 1 .... H .... T to leave completely The sailors abandoned the and for ever burning ship .........................................</Paragraph>
      <Paragraph position="4">  ......................... _a e._s.PS r.t. ................................................................................ 2 --D-H .... H to leave (a relation He abandoned his wife and or friend) in a thought:- went away with all their ......................... ~_ttPS_L_o__r___c__r__u__c_'.l___w__a_.z ............. ~.deg_\[te..Z ...................................... 3 .... It .... T tO give up, esp. The search was abandoned without: finishing when night came, even though the child had not been found - ~-'-- &amp;quot;-- \]--&amp;quot;-&amp;quot; h-'-&amp;quot; - ~-#-~ {-6&amp;quot;&amp;quot; ~&amp;quot; ire -- ~\]{ \[{6-~ e- iY= \]&amp;quot; L-6 ........ -i ~-- -~ ~ ~a'~}\] ~-n-~ ~ - -fiTA-AZ Y ~-- ~-6- ........  complete\]y to a feeling, grief desire, etc, abandoned behaviour -',V~&amp;quot;0---'-'6&amp;quot;~-:--'-'-'~s-'-'ss'~h'~&amp;quot;s'CWJ~&amp;quot;~?e'~-'o-~ ~ ......... '~&amp;quot;fi~-&amp;quot; &amp;quot;~e o-r~7~\[ - -~ r'~&amp;quot; - s'6&amp;quot; - ~,'~c Y\[~ .... feelings and actions are that they jumped and shouted uncontrolled with abandon~in gay abandon ...................................... ~... freedom from control Fig. 2. NF 2 representation of the entry 'abandon' ACRES DE COLING-92, NAMES, 23-28 AO~&amp;quot; 1992 5 8 9 PRO(:. OF COLING-92, NANTES, AUO. 23-28, 1992 way exceptions could be handled in an NF 2 model would be by re-defining the template so that attributes such as pronunciation, inflected forms, etymology, etc., are associated with senses rather than homographs.</Paragraph>
      <Paragraph position="5"> However, this would disable the factoring of this information, which applies to the entire entry in the vast majority of cases.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Variable factoring
</SectionTitle>
      <Paragraph position="0"> Dictionaries obviously differ considerably in their physical layout. For example, in one dictionary, all senses of a given orthographic form with the same etymology will be grouped in a single entry, regardless of part of speech; whereas in another, different entries for the same orthographic form are given if the part of speech is different. The CED, for instance, has only one entry for abandon, including both the noun and verb forms, but the LDOCE gives two entries for abandon, one for each part of speech. As a result of these differences, the IBM LDB template for the LDOCE places the part of speech attribute at the homograph level, whereas in the CED template, part of speech must be given at the level of sense (or &amp;quot;sense group&amp;quot; if some new attribute were defined to group senses with the same part of speech within an entry). This means that the query for part of speech in the LDOCE is completely different from that for the CED. Further, it means that the merging or comparison of information from different dictionaries demands complete (and possibly complex) de-structuring and re-strncturing of the data. This makes data sharing and interchange, as well as the development of general software for the manipulation of lexical data, difficult.</Paragraph>
      <Paragraph position="1"> However, differences in dictionary layout are mainly differences in structural organization, whereas the fundamental elements of lexieal information seem to be constant. In the example above, for instance, the basic information (orthography, pronuncation, part of speech, etc.) is the same in both the CED and LDOCE, even if its organization is different.</Paragraph>
      <Paragraph position="2"> The only way to have directly compatible databases for different dictionaries in the NF 2 model, even if one assumes that attributes for the same kind of information (e.g., orthography) can have the same name across databases, is to have a common template across all of them. However, the fixed factoring of attributes in NF 2 models prohibits the creation of a common template, because the template for a given database mirrors the particular factoring of a single dictionary. Therefore, a more flexible model is needed that would retain the particular factoring of a given dictionary, and at the same time render that factoring transparent to certain database operations.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. A FEATURE-BASED MODEL
</SectionTitle>
    <Paragraph position="0"> We introduce a model for dictionary data based on feature structures. We demonstrate the mapping between the information found in dictionaries and the feature-based model, and show how the various characteristics of lexical data, such as recursive nesting of elements, (variable) factoring of information, and exceptions can be handled using well-developed feature structure mechanisms.</Paragraph>
    <Paragraph position="1"> Fig. 5 shows how feature structures can be used to represent simple dictionary entries. We will consider feature structures as typed (as defined, for instance, by Pollard and Sag, 1987), that is, not all features can appear anywhere, but instead, they must follow a schema that specifies which features are allowable (although not necessarily present), and where. The schema also specifies the domain of values, atomic or complex, allowed for each of these features. For example, entries are described by the type ENTRY, in which the features allowed are form, gram, usage, def, etc. The domain of values for form is feature structures of type FORM, which consists of feature structures whose legal features include orth, hyph, and pron. Each of these features has, in turn, an atomic value of type STRING, etc.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Value disjunction and variants
</SectionTitle>
      <Paragraph position="0"> The use of value disjunction (Karttunen, 1984) enables the represention of variants, common in dictionary entries, as shown in Fig. 6. We have added an extension which allows the specification of either a set (noted {Xl, ... xn\]) or a list (noted (xl .... Xn)) of possible values.</Paragraph>
      <Paragraph position="1"> This enables retaining the order of values, which is in many cases important in dictionaries. For example, the orthographic form given first is most likely file most common or preferred form. Other information, such as grammatical codes, may not be ordered.</Paragraph>
      <Paragraph position="2">  biryani or biriani (,blrl'o:nl) n. Any of a variety of \] Indian dishes... \[CED\] I I .... Forth: (biryani, biriani)l- ~ kpron ,biri'A:nl J|  In many cases, sets or lists of alternatives are not single values but instead groups of features. This is common in dictionaries; for instance, Fig. 7 shows a typical example where the alternatives are groups consisting of orthography and pronunciation.</Paragraph>
      <Paragraph position="3"> ACRES DE COLING-92, NAW~s, 23-28 AOtJT 1992 5 9 0 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 mackle ('mmk'l) or macule ('nnekju:l) n, Priming. a double or blurred impression caused by shifting paper or type. \[CED\] Id orm : orth: mackle I orth: mactl\] e LIt ....... 'm&amp;kju: l\]J usago: L dora: Prirltinf~ ef: \[ text;: a double or blurted.,.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 General disjunction and factoring
</SectionTitle>
      <Paragraph position="0"> General disjunction (Kay, 1985) provides a means to specify alternative sub-parts of a feature structure.</Paragraph>
      <Paragraph position="1"> Again, we have extended rite mechanism to enable the specification of both sets and lists of sub-parts.</Paragraph>
      <Paragraph position="2"> Therefore, feature structures can be described as being of the form \[~1 .... ~,1, where each q~i is a feature-value pair f: V, a set of feature structures { V! .... Vp}, or a list of feature structures (VI .... Vp).</Paragraph>
      <Paragraph position="3"> General disjunction allows common parts of components to be tactored. Without any disjunction, two different representations for the entry for hospitaller from the CED are required. The use of value disjunction enables localizing the problem and thus eliminates some of the redundancy, but only general disjunction (Fig. 8) captures the obvious factoring and represents the entry cleanly and without redumlancy.</Paragraph>
      <Paragraph position="4"> hospitaller or U.S. hospitaler ('h0tspltolo) n. a person, esp. a member of certain religious orders... ICED\] \] fotra:f\[pton: 'hQsplt@\] @ \] \[orth: hospita \[ I er\]\] I  General disjunction provides a means to represent multiple senses, since they can be seen as alternatives (Fig. 9). 1 Sense nesting is also easily represented using this mechanism. Fig. 10 shows the representation for abandon given previously. At the outermost level of the feature structure, there is a disjunction between the two different parts of speech (which appear in two separate entries in the LDOCE), The disjunction enables the factoring of orthography, pronunciation, and lNote that in our examples, &amp;quot;\]\]&amp;quot; signals the beginning of a comment which is not part of the feature structure. We have not included the sense number as a feature in our examples because sense numbers can be automatically generated.</Paragraph>
      <Paragraph position="5"> hyphenation over both homographs. Within the first component of the disjunction, the different senses for the verb comprise an embedded list of disjunets.</Paragraph>
      <Paragraph position="6"> -- \] Fdisproof (dls'pru:f) n. 1. facts that disprove \[ something. 2. the act of disproving. \[CED\] ll orm ~orth: disproof I !\] I L pron: dls'pKu: fJ r~m fpo~: n\] I \['~11 ..... 1 I I ~dei: \[text: facts that dinprove..\]  An important characteristic of this model is that there is ne different type of feature structure for entries, homographs, or senses. This captures what appears to be n fundamental property of lexical data, that is, that tile different levels (entries, homographs, senses) arc associated with rite same kinds of information, Previous models have treated these different levels as different objects, associated wtih different kinds of information, which obscures die more fundamental structure of the infornmtion.</Paragraph>
      <Paragraph position="7"> Note that we restrict the lorm of feature structures in our model to a hierarchical normal form. That is, in any feature structure F = \[C/1 .... ~,J, only one C/i, let us say 0, = {I//1 .... ~p\], is a disjunction. This restriction is applied recursively to embedded feature structures. This scheme enables representing a feature structure as a tree in which factored information \[0l .... ~n-ll at a given level is associated with a node, and branches from that node correspond to the disjuncts ~1 .... gp. lnformatiou associated with a node applies to the whole sub-tree rooted at timt node. For example, the tree in Fig. 11 represents the feature structure for abandon given in Fig. 10. The representation of information as a tree of feature structares, where each node represeuts a level of hierarchy in the dictionary, reflects structure and factoring of information in dictionaries and captures the fm~damental similarity among levels cited above.</Paragraph>
      <Paragraph position="8"> 3.3 Disjunctive normal torn, and equivalence It is possible to define an unfactor operator to multiply out the terms of alternatives in a general disjunction (Fig. 12), assuming that no feature appears at both a higher level and inside a disjunct. 2 By applying the unfactor operator recursively, it is possible to eliminate all disjunctions except at the top level. The resulting (extremely redundant) structure is called the disjunctive normal form (DNF). We say that two feature structures are DNF-equivalent if they have  Ilowever, a value disjunction \[f: {a, b}\] can be converted to a general disjunction \[ {If: al, If: bl } l, and subsequently un factored. ACRES DE COLING-92, NANTES, 23-28 AOtn' 1992 5 9 l I'ROC. OF COLING-92, NANTES, AUG. 23-28. 1992 form:\[ orth: abandon \[ hyph: a.ban,do~| pron: @&amp;quot;b&amp;ndOn J '~homograph 1 gram: pos: v gramc: T1 ~/sense i \[ boxc : .... tI ....</Paragraph>
      <Paragraph position="9"> ef: ~\[ text: to leave completely and for ever \] L\[text: desert\] ~x: \[text: The sailors abandoned the burning ship</Paragraph>
      <Paragraph position="11"> ~elated:\[orth: abandonment\] //homograph 2 .... \[::::c; I 1 W deg: \[C17_:::; .... ldef: \[text: the state when one's feelings and actions I ex: \[text: The people were so excited that they jumped.. k~ Fig 10. R~re~ntation of ~e ~ abandon in LDOCE a.ban.donll pron:hyph: @&amp;quot;b&amp;ndOn J~ //homograph i //homograph 2 gramc: gramc: U J r em \[=ode -bone: .... T .....</Paragraph>
      <Paragraph position="12"> ldef:\[tthest .... ..... \]1 // ..... 1 LX: \[ ..... The people, Eem: f scod ...... --\] ......... L boxc: .... H .... T _I \] ~f: r\[ t .... to 1 ....... 1~1 L\[ text: d .... t\] JI x: \[ text: The sailors,..\]l Fig. 11. Hierarchical Normal Form the same DNF. The fact that the same DNF may have two or more equivalent factorings enables the representation of different factorings in dictionaries, while retaining a means to recognize their equivalence. Fig. 13a shows the factoring for inflected forms of alumnus in the CED; the same information could have been factored as it appears in Fig. 13b. Note that we have used sets and notlists in Fig. 13. Strictly speaking, the corresponding future structures with lists would not have the same DNFs. However, since it is trivial to convert lists into sets, it is easy to define a stronger version of DNF-equivalence that disregards order. L1E :aJJ Fig, 12. Unfactoring We can also define a factor operator to apply to a group of disjuncts, in order to factor out common information. Information can be unfactored and refactored in a different format without loss of information, thus enabling various presentations of the AClT.S DE COLING-92, NANTEs, 23-28 ho\[;r 1992 $ 9 2 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 same information, which may, in turn, correspond to different printed renderings or &amp;quot;views&amp;quot; of the data. I alumnus (a'l^nmas) or (fern.) alumna (Cl^nmO) n., pl. -ni (-nail or -nae (-hi:) ... \[CEDI orth: alumnu IlL\[ ...... @&amp;quot;l^mn@~J I l orth: alumna form: b L Prdegn: 8&amp;quot;i mn@-J\]J I numb: p\] r qend: masc \]\]I  |otth: alumni L pron: @&amp;quot;l^mnaI\[ ~ l orth: alumnae pron: @&amp;quot;i ^ran</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Partial factoring
</SectionTitle>
      <Paragraph position="0"> The type of factoring described above does not handle the example in Fig. 14, where only a part of the grammatical information is factored (0os and subc, but not gcode). We call allow a given feature to appear at both the factored level and inside the disjunct, as long as the two values for that feature are compatible. In that case, unfactoring involves taking the unification of the factored information &amp;quot;and the information in rite disjmtet. ea,reen/k~'ri:n/ vt,vi 1 \[VP6A\] turn (a ship) on one side for cleaning, repairing, etc. 2 \[VP6A, 2A\] (cause to) tilt, lean over to one side. \[OALD\] -'f \[ .... \] orth: careen hyph: ca. reen pron: k@'ri:n .... \] stlbc i (tr, Jntr I ~am: ~gc:ode: VP6A-J def: \[text: ttlrn (a ship)...\]  We saw ill the previous section that compatible information can appear at various levels in a disjunction. Exceptions in dictionaries will be handled by allowing incompatible information to appear at different levels. When this is the case, nnfactoring will be defined to retain only the information at the imlermost level. In this way, a value specified at rite outer level is overridden by a value specified for the same feature at an intter level. For example, Fig. 15 shows the factored entry for conjure, in which the pronunciation specified at the outermost level applies to all senses except sense 3, where it is overriden. .= conjure/'k^nd3o(r)/ vt, vi 1 \[VP2A,15AI do clever \[ tricks which appear magical... 2 \[VPISB\] ~ up, cau~ to appear as if from nothing... 3/kon'd5Oa(r)/ \[VP17\] I (formal) appeal solemnly to... \[OALD\] &amp;quot;kVndZ@ (r) oft. h: conjuze form; hyph: con. jure pron : gta,l: \[ pos: v \]  ~;tlbc: (tr, intr) q def: \[te~t: do clever tzicks...\] gram: gcode : VPIbB\] related; orth : conjure up\] gram: \[gcode : VP II\] def: Ltext : appt~al solemnly...</Paragraph>
      <Paragraph position="1"> Fig. 15. Overriding of values AcrEs DE COLING-92, NANTES, 23-28 AO0r 1992 5 9 3 lh{oc, OF COL1NG-92, NANTES, AUG. 23-28, 1992</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.6 Implementation
</SectionTitle>
      <Paragraph position="0"> Feature-based systems developed so far are designed for parsing natural language and are not intended to be used as general DBMSs. Therefore, they typically do not provide even standard database operations. They arc furthermore usually restricted to handle only a few hundred grammar rules, and so even the largest systems are incapable of dealing with the &amp;quot;large amounts of data tbat wotdd be required for a dictionary.</Paragraph>
      <Paragraph position="1"> In Ide, Le Maitre, V6rouis (forthcoming), we describe an object-oriented implementation which provides the required expressiveness and flexibility. We show how the feature-based model can be implemented in an object-oriented DBMS, and demonstrate that leature structures map readily to an object-oriented data model. However, our work suggests that the development of a featttrc-based DBMS, including built-in mechnisms for disjunction, unification, generalization, etc., is desirable. Such feature-based DBMSs could have applications far beyond the representation of lexical dam.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML