File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-4189_metho.xml

Size: 14,029 bytes

Last Modified: 2025-10-06 14:13:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-4189">
  <Title>Genus Disambiguation: A Study in Weighted Preference*</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Sense Selection Based on LDOCE Semantic
Codes
</SectionTitle>
    <Paragraph position="0"> This section describes our investigation of the use of semantic category information for disambigualion, and outlines the problems in using that type of information. The basic hierarchical strlmtum of the semantic codes provided by LDOCE is depicted in Figure 1. In addition to the codes positioned in that tree structure, seventeen other codes, which we refer to as &amp;quot;composite&amp;quot; are defined as follows:</Paragraph>
    <Paragraph position="2"> To evaluate our assumption that the semantic category of the genus word is the same or more genthat the listing order of senses in the let edition of LDOCE is similm&amp;quot; to that of the 2rid, tnd have found empirical evidence in tim work of Guo (19891 mad this Itudy to show that * simihtr connection botwtam the ord*r in which word ~n~ Jro listed And the, fr~luoney with which they arm uJcd (in LDOCE) holds for the lit edition u well.</Paragraph>
    <Paragraph position="3"> era) than the semantic category of the headword, it was necessary to define what we meant by &amp;quot;more general&amp;quot; for the composite categories. We did this by incorporating the composite codes into the hierarchical structure display in Figure 1, and defining a semantic distance between word senses based on the placement of their respective codes in the hierarchy.</Paragraph>
    <Paragraph position="4"> It was obvious from the start that the addition of these cedes te the tree depicted in Figure 1 would create a tangled hierarchy. The problem was to decide where these codes should be placed in the tree stnlctnre in order to preserve inheritance. For exmnpie, shenld &amp;quot;E&amp;quot; (the code for &amp;quot;solid or liquid&amp;quot;) be placed above or below &amp;quot;solid&amp;quot; and &amp;quot;liquid&amp;quot;, and would a similar placement hold for code 7, which reads &amp;quot;gas AND liquid&amp;quot; (as opposed to &amp;quot;liquid OR  Basic Hierarchy of LDOCE Semantic Codes To answer such questions, two types of studies were conducted. The first was an in-depth look at the words marked with composite codes (nouns marked to identify a semantic category and adjectives and verbs marked as to their selection restrictions). The second was a survey of the genus senses for head-words with composite semantic codes. As might be expected, there were inconsistencies in the assignment of nouns categories. For example, within the &amp;quot;liquid&amp;quot; categories, we observed that nouns which represent both liquids and solids can be found in both categories L and E, mad abstractions of liquids can be found in categories L, 6, and 7. This is not surprising, as it is difficult to create distinct categories for overlapping concepts.</Paragraph>
    <Paragraph position="5"> ACRES DE COLING-92, NANTEs, 23-28 AOt;l&amp;quot; 1992 1 ! 8 8 PROC. OI: COLING-92, NANTES, AUG. 23-28, 1992 Our proposed placement of composite codes within the hierarchy structure provided by LDOCE is presented in Figure 2. In constructing Figure 2, we attempted to create a hierardly which would reflect not only the data gathered on the properties of words assigned to each category, but also the most frequently occurring superset for each composite code, based on tire results of tile second study.</Paragraph>
    <Paragraph position="7"> Revised Hierarchy of LDOCE Semantic Codes Based on this study of the semantic codes used in LDOCE, three inlplelnentations of a partial genus sense selection algorittun (partial becanse at this time we are only considehng the contribution made by the semantic code comparison to sense selection) were found to be possible. They are as follows: I. Selection of the genus sense with a minimum semantic distance fiom the headword sense, where semantic distance is measured by the placement of the respective codes in the hierarchy presented in Figure 2. (This formulation of a genus sense selection criteria is the basis of the algorithm reported in Guthrie et al. 1990.)  2. Choose the genus sense with a semantic code belonging to fire stone code set as fire code of the headword, where the code sets are the nodes of the tree structure presented in Figure 2.</Paragraph>
    <Paragraph position="8"> 3. Select the genus sea\]se with a semantic code identical to the headword.</Paragraph>
    <Paragraph position="9"> 3. Sense Selection Based on LDOCE Pragmatic  Tile pragmatic codes in LDOCE are another set of terms organized into a hierarchy, although the hierarchy provided by LDOCE is quite fiat. As stated earlier, these terms are used to classify words by subject area. The LDOCE pragmatic coding system divides all possible subjects into 124 major categories, ranging frmn aeronautics, aerospace, and agriculture, to winter.sports, and zoology. The hierarchy is only two layers deep, and the 124 majol categories have equal aa~d unrelated status.</Paragraph>
    <Paragraph position="10"> Slator (1988) m\]plemented a scheme which imposed deeper structure onto the LDOCE pragmatic code hierarchy. He restructured the LDOCE pragmatic code hierarchy by making Communication, Economics, Entertainment, Household, Politics, Science, and Transportation flmdamental categories, and grouping all other pragmatic codes under those headings. His restructuring of tile code hierarchy revealed that words classified under Botany have pragmatic connectious to words classified as Plant-Names, as well as connections with other words classified under Science.</Paragraph>
    <Paragraph position="11"> We investigated four implementations of a germs sea~se selection algorittun based on pragmatic codes. The first implementation utilized the hierarchy developed by Slator. In that schelne, file pragmatic cedes were arranged in a tree structure in which each node of the tree is a single pragmatic e(xle.</Paragraph>
    <Paragraph position="12"> In addition, pragmatic code sets were defined direedy from Slator's hierarchy by creating seven large groups cort~..sponding to the seven subtrees of tile top level of the hierarchy. Each of the seve~l code sets contained all codes descendant from tire correspending top level node. Within this construction, lack of common set menthership is a strong indication of disjoint subject areas.</Paragraph>
    <Paragraph position="13"> In summa\[y, we proposed four approaches to genus sense selection based on praglnatic codes:  1. Choose the ganus sense with minimmn pragmatic distance from the headword sense, where pragmatic distance is measured by the placement of the respective codes in the hierarchy implenlented by Slator.</Paragraph>
    <Paragraph position="14"> 2. Select the genus sense with a pragmatic code belonging to the sane code set as the code of the headword. Seven code sets were constmcted corresponding to the seven major divio sinus of Slator's baerarcby.</Paragraph>
    <Paragraph position="15"> 3. Rule out all headword/genus sense combina~ tions with pragmatic codes that are not in the same code set.</Paragraph>
    <Paragraph position="16"> AclT,:s DE COL1NG-92, NANTES, 23-28 AO~I 1992 1 1 8 9 Paoc. OF COLING-92. NANTES, AUo. 23-28. 1992 4. Select the genns sense with a pragmatic code identical to the headword.</Paragraph>
    <Paragraph position="17"> 4. Results of the Experimentation  All tests of the proposed sense selectien 6riteria were mn on the same random sample of 520 definitions. Table I provides a summary of the relevant test results. Although each selection mechanism was evaluated separately, because of the large nmnber of word senses having either redundaut code markings, or no markings at all (particularly with pragmatic codes), it was necessary to introduce a default or &amp;quot;tie breaking&amp;quot; mechanism for all selection criteria other than usage frequency. Usage fiequency was established as the default selection mechanism for all tests. When no sense selection (or no nnique sense selection) could be made based on the criteria beiug tested, the sense selection was based on usage fi'equency (ie., of the competing senses, the sense cccurrmg first in the listing order was selected).</Paragraph>
    <Paragraph position="18"> The variation in performance between all approaches developed for genus sense selection was relatively small - no more than 8%. Both the best mad the worst performance of a single sense selection parameter was achieved using pragmatic code relationships. The best performance (80% success rate) resulted from requiring identical code markings for headword and genus senses. The worst disambiguation performance was the resnlt of sense selection based on common pragmatic code set membership.</Paragraph>
    <Paragraph position="19"> The variation in disambiguation performance was small in the experiments which used only the semantic code information. The maximum success rate of 77% resulted fi'om stipulating common code set membership, while the minimmn success rate was 75% for identical code designation.</Paragraph>
    <Paragraph position="20"> Some of the test results were uI~expected: for instance, we did not expect selectien of the first sense listed to yield a 76% success rate. Net did we expect sense selection based on a subset/superset relationship between codes to be as unsuccessful as it was, yielding no more than a 78% success rate for both pragmatic and semantic codes.</Paragraph>
    <Paragraph position="21"> Although the experiments showed that a direct inatch of pragmatic codes was the most successfifl single selectiou mechanism, the result is somewhat misleading. Because many words have no pragmatic cede, the defanlt rule was applied often, resulting in the selection of the most frequently used sense, l-laving said that, it remains true that the tests show pragmatic code information to be the best predictor of the correct genus sense, when it is present.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUMMARY OF DISAMBIGUATION EXPERIMENTS
GENUS SENSE SELECTION TEST RESULTS
</SectionTitle>
    <Paragraph position="0"> common :mnumdc eodc tO. - weight 1 id,mtieal p.mgraatic code - weight I 80% u~ frotu~cy - tie breaker eonma, ma *umatntic code v.~ - weight 1 ideatiett Intgauttic code - we/S/at 2 80% correct usage frequency - ti~ b~tker ram.male ~ hierarchy - weight 1 I 2 .... potgnuttte code hicnm~hy - weight 2 79% correct u~ frequoacy - ~ ~er ,mmmatic code tet - weight 1 idmttieal pragmatic eodc - w~ght 2 90% cotreta mttgc ft'equea~ - tie hi~k~ b~u~l-ca~ ~xcttai~s.indt~led  formed using all three factors in combination. These experiments were conducted to determine the optimum weight to assign each of the three factors when considering their ctanulative predictive capability. The selection of weights was based on the performance of each factor individually. Again, the variation in performance across all tests of different weighings was small (less than 1%). The highest success rate was achieved when pragmatic code information received tile greatest weight.</Paragraph>
    <Paragraph position="1"> As a result of these tests, our disambiguation algorithm was forumlated as follows: * Choese the most frequently used genus sense unless an altemate sense choice is indicated by a strong relationship between headword and genus codes, either semantic or pragmatic.</Paragraph>
    <Paragraph position="2"> * If the sense selection based on semantic codes differs from that inferred by the pragmatic ACYES DE COLING-92, NANqES, 23-28 AOt~,r 1992 i 1 9 0 I)ROC. OF COLING-92, NANTES, AUtL 23-28, 1992 codes, base file seine selection on tile pragnlatic cedes.</Paragraph>
    <Paragraph position="3"> * Select among conlpeting germs senses with identical code markings by choosing the most frequently used sense.</Paragraph>
    <Paragraph position="4"> By a &amp;quot;strong relationship&amp;quot; in the case of semantic codes, we nlean menlbership in file saiue code set. This is not surprising due to the limited scope of the code sets, and the inhel~nt overlap of the composite codes. Strong relationship for pragmatic codes means an exact ulatch.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5. The Final Disambiguation Algurlthm
</SectionTitle>
    <Paragraph position="0"> Review of tile output data from e, ach disaarbiguation trial using tile tilrec parmncter algorithm revealed that tile majority of the failures were on a very small number of frequently occurring germs words. Often, the pragmatic and senaintic classifications of these word senses were either deficient (lacking in code information), or redrmdant (more than one word sense having the Sanle nmrkings). Such situations frequently arise with very abstract words (e.g. pat, quality, piece, arid ntmaber) where fllere are nnnlerous word seaises, and most (if not all) senses have identical semmltic codes mid no pragmatic codes.</Paragraph>
    <Paragraph position="1"> The filial modificahon to onr gentts sense selection algorithm was introduced to solve this problenl: the correct sense selections fol words with errors in their code information, as well as certain very general words are pre-selected, and assumed to be constant.</Paragraph>
    <Paragraph position="2"> Fewer than ten words required haild coding of the correct sense and ahnoat all were abstract words such as part or quality. While it is tlue that tile majority of these words are &amp;quot;disturbed heads&amp;quot; (Gnthrie et al. 1990), and will, in the fnture, not seive as geims terms but rather as identifiers of alternate link types, we still require that they be sense disambignated to serve as relation descriptors. This fiiml modification to the sense selection algorithm mcleased pelfolmalice by 10%, resulting in success rate of 90%.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML