File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0219_metho.xml

Size: 24,937 bytes

Last Modified: 2025-10-06 14:12:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W91-0219">
  <Title>Development of the Concept Dictionary - Implementation of Lexical Knowledge</Title>
  <Section position="3" start_page="206" end_page="209" type="metho">
    <SectionTitle>
2. Development of Concept Descriptions
</SectionTitle>
    <Paragraph position="0"> Concept relations are described at the following three levels: a) concept-concept relation descriptions b) concept-category relation descriptions c) category-category reladon descriptions</Paragraph>
    <Section position="1" start_page="206" end_page="208" type="sub_section">
      <SectionTitle>
2.1 Concept-concept Relation Descriptions
</SectionTitle>
      <Paragraph position="0"> We are building an on-line corpus which includes 1,000,000 practical example sentences that are analyzed lexically, syntactically and semantically for the most pan manually (EDR corpus). Figure 1 is an example of an entry of the corpus.</Paragraph>
      <Paragraph position="1"> Firstly, (a) is the word sense selection (lexical analysis) section, where a word sense (concept) has been selected for each word in the sentence. Secondly, (b) is the syntactic analysis section, where all binding relations among words has been analyzed. Finally, (c) is the semantic analysis section, where the semantic network representing the meaning of the sentence is decomposed into a set of triplets. These triplets correspond to the following concept-concept relation descriptions:  As shown above, concept-concept relation descriptions are extracted directly from the semantic analysis secdon (and word sense selection section) in the EDR Corpus. A method of collecting and selecting source sentences for the EDR corpus is described in (Nakao 1990a) and a method of extracting concept descriptions from the FEDR corpus is explained in detail in (Nakao 1990b).</Paragraph>
      <Paragraph position="2"> Source texts of the EDR corpus are selected so as to diversify as much as possible the concepts in them. However, it is impossible to collect all concepts or concept relations from the corpus even if the amount of texts is very large. To compensate for the shortage of examples, we also create example sentences and analyze them lexically and semantically. Concept-concept relation descriptions are also extracted from ~e sentences,  &lt;&lt;Text No : 00040000187d : 6/13/90 from ~J327#O3,y~l/4*&gt; =&gt;&gt; Structurally. the U,N. is still fluid and vulnerable ~o the pressures that its new and enlarged memberships are bringing to bear upon it.</Paragraph>
      <Paragraph position="3">  ; I Ilructumlly .............................................................................................................................. ; 4~e_U.N, * ............................................................................................................................. ; 6b ................................................. ; 13-3~ ; 8 =.ill ........................... ;13-2,M ; tO fluid : |3-I.S : : ; 13 vulnerable fluid~vulnurlble .-&gt;l'luid,.vulnm'lblo .-&gt;is.nuid~vulnurlble ........................................................... ; 1 $ ~ ................................................................................................................. ; 19-3,S : 17 the : 19-t~ ; 19 pr=.,um pm,mu'e_$ --&gt;~he.pressurcc ........................................................ *~e_pmuure -~o.dlo.prmsum ; : 22 thai ................................................................................ : 37-5,S : ; 24 a ............................................... ; 33-2,,M : : : 26 new ............ ; 30-2~ : : : ; 28 mtd ............... : : : : ; 30 ~'ilUrll mtlmrpd -*new.~d.e~lm'lKed : 33- I,,M : : : ; 33 mcmnbership .......................... &gt;membership -.:mwmbership : 37-4.M : : ; 3S tm ............... ; 37-2,S : : : ; 37 brinl bhnlPnl .-&gt;Ite.bnnlml --&gt;ire.brAn|ins ........... &gt;,m.bnnlin I -&gt;mm_brinsins ; 19-2,M ; 40 to : 42-1~ : 42 l~r to_bear --&gt;to.bur ........ ; 37-3,M : 4,4 upon ; 46-1,S :</Paragraph>
    </Section>
    <Section position="2" start_page="208" end_page="208" type="sub_section">
      <SectionTitle>
2.2 Concept-category Relation Descriptions
</SectionTitle>
      <Paragraph position="0"> If some concept-concept relations share a concept, it is possible to bundle them into a representation. For example, concept.concept relation descriptions (2) can be bundled into a concept-category relation description (4), if a super-sub relation (3) is also described simultaneously within the concept taxonomy. 'This level of description corresponds to Fodor and Katz's representation using semantic markers and selection restrictions.</Paragraph>
      <Paragraph position="1"> (2) cteoreak --&lt;object&gt;-- c#promise</Paragraph>
      <Paragraph position="3"> (to break a promise) (to beak a law) (to break a rule) (to break a code)  (3) (rules) c#promise c#1aw~c tie ode (4) c#break--&lt;object&gt;-- (rules)</Paragraph>
    </Section>
    <Section position="3" start_page="208" end_page="209" type="sub_section">
      <SectionTitle>
2.3 Category-category Relation Descriptions
</SectionTitle>
      <Paragraph position="0"> In the previous section, we discussed the cases in which filler concepts of deep case pattems can be bundled into categories, namely concept-category relations. Moreover, frame concepts in deep case patterns can also be bundled and represented by categories (frame categories) (Ogino et. al. 1989). For example, the three categories O1.2.6, 06.8.2 and C)9.14.4 defined in Figure 2 (Hereafter, the notation &amp;quot;C)&amp;quot; also means a category,) bundle concepts and are linked with other categories to describe category-category relations (5), (6)</Paragraph>
      <Paragraph position="2"> This level of descriptions corresponds to Schank's representation using primitive actions, in a sense. For example, 06.8.2 includes a connotation that can be represented by MTRANS, which is one of the primitive actions, although other frame categories do not always correspond to a primitive action. In addition, relations between verbs and adverbs, for example those mentioned in (La.koff 1966), are also described at this level.</Paragraph>
      <Paragraph position="4"> (&lt;agent&gt;: &amp;quot;a person&amp;quot;, &lt;object&gt;: &amp;quot;a button&amp;quot;. &lt;implement&gt;: with &amp;quot;a linger&amp;quot;) \] (&lt;agent&gt;: &amp;quot;a person&amp;quot;, &lt;object&gt;: against &amp;quot;a door&amp;quot;, &lt;implement&gt;: with &amp;quot;one's hands&amp;quot;) \] (&lt;agent&gt;: &amp;quot;a person&amp;quot;. &lt;object&gt;: &amp;quot;a bali&amp;quot;, &lt;implement: with &amp;quot;a fooC) \] (&lt;agent&gt;: &amp;quot;a person&amp;quot;. &lt;object&gt;: on &amp;quot;a can&amp;quot;, &lt;implement&gt;: with &amp;quot;a fooC) \] (&lt;agent&gt;: &amp;quot;a person&amp;quot;, &lt;object&gt;: &amp;quot;a ball&amp;quot;, &lt;implement&gt;: with &amp;quot;a hand&amp;quot;) \] (&lt;agent&gt;: &amp;quot;a person&amp;quot;, &lt;object&gt;: &amp;quot;a box&amp;quot;. &lt;implement&gt;: on &amp;quot;one's shoulder&amp;quot;) \]</Paragraph>
      <Paragraph position="6"> \[speak (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: about &amp;quot;the_story&amp;quot;, &lt;goal&gt;: to &amp;quot;her&amp;quot;)l \[tell (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: &amp;quot;the_way&amp;quot;, &lt;goal&gt;: &amp;quot;the_lraveler&amp;quot;)l \[dc~riba (&lt;agent&gt;: &amp;quot;he&amp;quot;. &lt;object&gt;: &amp;quot;the_situation&amp;quot;. &lt;goal&gt;: in &amp;quot;the_book&amp;quot;)l \[explain (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: &amp;quot;the_plan&amp;quot;, &lt;goal&gt;: to &amp;quot;his_boss&amp;quot;)l \[write (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: &amp;quot;his_name&amp;quot;, &lt;goal&gt;: on &amp;quot;the_shcet')l \[input (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: &amp;quot;the_data&amp;quot;, &lt;goal&gt;: into &amp;quot;the_file&amp;quot;)l \[copy (&lt;agent&gt;: &amp;quot;he&amp;quot;, &lt;object&gt;: &amp;quot;the_document&amp;quot;. &lt;goal&gt;: into &amp;quot;his.notebook')l O9.14.4 (For_Functions_(Of_H uman_Beings)_to_Become_Lower) \[&lt;object&gt; : (Animals) \] c#baaten, c#go.down, c~il, c~ollapse, c~ispirited c#pyrosis, c~inophobia, c#malnutrition, etc.</Paragraph>
      <Paragraph position="7"> (The notation &amp;quot; \[...\] &amp;quot; is a dccp case pattern to distinguish the category from the other categories at the</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="209" end_page="214" type="metho">
    <SectionTitle>
3 Development of the Concept Taxonomy
</SectionTitle>
    <Paragraph position="0"> The two kinds of concept descriptions including categories, narncly concept-category relation descriptions and category-category relation descriptions, mentioned in the previous sections, must have their descendant concept-concept relations in order to become useful. That is, concepts must be able to be actually classified into categories included in such descriptions.</Paragraph>
    <Paragraph position="1"> A concept can generally be classified into more than one categories (multiple classification). However it is difficult to make exhaustive multiple classification from the beginning, because in the case of multiple classification, we must compare concepts with categories mn times when there are m concepts and n categories. In the case of exclusive classification using a distinctive u'eC/ whose leaves mean categories, on the other hand, we must only compare concepts with nodes on the tree Ofrn(log n)) times. Additionally, the  number of categories which share same sub-concepts with a category (cross categories) is generally much less than the number of all categories. Moreover, it is not so difficult to make a list of cross categories for each category (cross category list) in advance.</Paragraph>
    <Paragraph position="2"> Considering the points mentioned above, we use the following method for concept classification : 1) exclusive classification : selecting categories which hardly share same sub-concepts (exclusive categories), and making the first classification using a distinctive tree locating the exclusive categories at its leaf level. 2) cross classification : making the second classification into categories other than exclusive categories, based on cross category lists, and building a concept taxonomy from the results of the second classification. 3) improvement of the Concept Dictionary : modifying the Concept Dictionary based on the results from tests using various testing systems and automatic concept clustering from concept-concept relation descriptions. In the following sections 3.1 and 3.2, we explain the first exclusive classification and the second cross classification respectively. In section 3.3, we describe a method for modification of the Concept Dictionary.</Paragraph>
    <Section position="1" start_page="210" end_page="213" type="sub_section">
      <SectionTitle>
3.1 Exclusive Classification of Concepts
3.1.1 Classification into MONO-Categories
</SectionTitle>
      <Paragraph position="0"> The first classification into categories for nominal concepts (MONO-concepts) is made by using the MONO-concept taxonomy as shown in Figure 3 as a distinctive tree. That is, the classification starts from the top node, decends along branches of the tree, and when reaching a node, compares the node's children nodes with the input concept. This process is repeated, and if one of the leaves of the tree is reached, the MONO-category corresponding to the leave should ~ selected.</Paragraph>
      <Paragraph position="1">  For example, in the case of the concept &amp;quot;c#police_man&amp;quot;, when we start with the question, &amp;quot;Is that a physical object, a place or an abstract object?&amp;quot; (answer. a physical object), and pass through the questions, &amp;quot;Is that an animate object, a part of the body of an animate object, a natural object, a human artifact or an organization?&amp;quot; (answer: an animate object), &amp;quot;Is that a human being, an animal or a plant?&amp;quot; (answer: a human being), and &amp;quot;Is that a child, a relative, an occupation ...?&amp;quot; (answer: an occupation), then we can classify the concept into the category (Occupations) .</Paragraph>
      <Paragraph position="2"> \  As mentioned above, the first classification of the MONO-concepts is made by using the MONO-concept taxonomy's hierarchy as a distinctive tree. On the other hand, the method for the first classification of verbal concepts (KOTO-concepts) is not made by using the hierarchy as a distinctive tree but by semantic association from the meanings of the concepts and examples of deap case patterns of the concepts.</Paragraph>
      <Paragraph position="3"> The hierarchy has three levels. The highest level has been divided coarsely based on semantic association (coarse semantic clusters; all can be seen in Figure 4). The second level has been also divided based on semantic association (fine semantic clusters; all below coarse semantic cluster ~ I can be seen in Figure 5). On the contrar'y, the third level has been divided based on the deep case pattern shared by concepts (KOTO-categories; all below fine semantic cluster * 1.2 can be seen in Figure 6), where one category has only one deep case pattern which is specified with its distinctive pattern (expressed with the notation&amp;quot; \[... \] &amp;quot; as in Figure 2). We have now 14 coarse semantic clusters, 253 fine semantic clusters and 984 KOTO-categories in the hierarchy.</Paragraph>
      <Paragraph position="5"> lff 10&lt;PROGRESS&gt; : relations and attributes of events meaning degrees of actualization l 1 &lt;TIME&gt; : relations and attributes of events meaning temporal order or distance '~' ! 2&lt;QUANTITY&gt; : relations and attributes of objects meaning quantity or deip'ee lff 13&lt;OTHER_ATTRIBUTES&gt; : attributes other than those above  The first classification of a concept into the KOTO-categories is made based on semantic association with the concept and deap case patterns created with the concept. The procedures are as follows: I) ~signing basic concepts into KOTO-categories: classifying about 4,000 basic concepts into fine semantic clusters, describing deep case patterns underlying example sentences created with the concepts and dividing the clusters into KOTO-categories to make each of them have only one deap case pattern. 2) Establishing two indexes : making a) a word index for retrieving categories by a word, and b) a case frame index for retrieving categories by a deep case set. 3) Searching category candidates: a) searching category candidates by associating basic concepts which seem to share a deap case pattern with the concept and retrieving the word index by words meaning the basic concepts, b) In a case in that it is impossible to associate any basic concepts with the concept, finding category candidates by creating example sentences, making deap case frames from the sentences, and retrieving the case frame index by the frames to find category candidates. 4) Selecting a category from the category candidates: classifying concepts into the most appropriate category by considering from the following three points of view: a) the names of the categories and their upper clusters, b) the distinctive patterns of the categories, and c) the basic concepts assigned to the categories.</Paragraph>
    </Section>
    <Section position="2" start_page="213" end_page="213" type="sub_section">
      <SectionTitle>
3.2 Cross Classification of Concepts
</SectionTitle>
      <Paragraph position="0"> Cross classification of concepts is made in the following way: l) Making cross category lists for each exclusive categories. Types of cross relations are assorted into the following three types: a) a cross category which implies an exclusive category, b) a cross category which intersects an exclusive category, and c) a cross category which includes an exclusive category.</Paragraph>
      <Paragraph position="1"> 2) Contrasting each concept classified into an exclusive categoriy and each cross category listed in the cross category list of the exclusive category and judging whether or not the concept can be classified into the cross category. Here in the above case c), all concepts in the exclusive category can be automatically classified into the cross category.</Paragraph>
    </Section>
    <Section position="3" start_page="213" end_page="214" type="sub_section">
      <SectionTitle>
3.3 Improvement of the Concept Dictionary
</SectionTitle>
      <Paragraph position="0"> Through the following procedures, categories which should be modified ate found and improvement of the Concept Dictionary is made: 1) Collecting negative examples: When an answer other than correct answers is output from a testing system, an inappropriate concept-concept relation must be found deduced from the Concept Dictionary by viewing a debugging ~ace of the process of/he system. Such concept-concept relations are collected as negative examples. 2) Collecting positive examples: When a correct answer is not output from a testing system, a concept-concept relation must be found to be added to the Concept Dictionary. Moreover, all correct answers output from all testing systems  must have their corresponding concept-concept relations deduced from the Concept Dictionary. These concept-concept relations are collected as positive examples.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="214" end_page="216" type="metho">
    <SectionTitle>
3) Estimation:
</SectionTitle>
    <Paragraph position="0"> At a slage when negative and positive examples have been collected to some extent, the divisibility of each concept-category relation description and category-category relation description is estimated by using the following formula: (8) D (I, m, n) = &amp;quot;~&gt;~ qt/</Paragraph>
    <Paragraph position="2"> n : the number of concepts under the category, i : the number of incorrect classifications into the category, m : the number of examples, I : the number of negative examples.</Paragraph>
    <Paragraph position="3"> The formula (8) is derived as follows: We suppose that the concept-category (or category,.cazegory) relation description (9) is an object of our  estimation: (9) a- tel &amp;quot;&amp;quot; B that the number of the concepts classified under the camgory B is n: (I0) B bt b2 ..... b, and that the number of (both negauve and positive) examples is m and the number of negative examples in the  if the number of concepts which are under the category B but not appropriate for the concept description (9) is i,  the probability that 1 negative examples are found out of m examples is given by the formula (12)&amp;quot;.'</Paragraph>
    <Paragraph position="5"> Therefore, the probability that the ratio of the concepts not appropriate for the description (9) to the concepts located under the category B is more than k is given by the formula (8). Here we use Bayse's Theorem because sellections of the number i are events independent from each other.</Paragraph>
    <Paragraph position="6"> 4) Deletion of the concept descriptions: In cases in that the value of the formula (8) is more than 0.9 when k = 0.9, the concept description (9) is deleted from the Concept Dictionary and remaining positive examples are asserted as concept-concept relation descriptions into the Concept Dictionary because most of the examples for the description are negative.</Paragraph>
    <Paragraph position="7"> 5) Division of categories In cases in that the value of the formula (8) is more than 0.9 when k = O. I, the category B is divided in two in order to represent both a category satisfying the relation (9) and a category not satisfying the relation (9) and all concepts under the category B is reclassified into the two categories because we recognize that a) the number of examples are large enough for the estimation, and that b) the number of negative examples is too large to neglect. Here if the divided two categories exist as sub-categories of the category B in the concept taxonomy, the classification is not necessary.</Paragraph>
    <Paragraph position="8"> 6) Accumulation of preference knowledge In cases in that the value of the formula (8) is not more than 0.9 when k = O. I, the collected negative examples are translated to preference knowledge (for data structures and usages of preference knowledge, see Section 4). From a debugging trace of a testing system, together with a negative example, a concept, a word or a pronumciation corresponding to the negative example and a concept description more appropriate than the negative example must be also gained. This information is represented by preterence knowdedge with the following format 03) and accumulated:  (13) on &lt;concept&gt; I &lt;word&gt; I &lt;pronunciation&gt; give preference to &lt;a-more-appropriate-concept-description&gt; over &lt;a-negative-example-of-.concept-description&gt; ... ........... . ..... oo.. ......... . ............</Paragraph>
    <Paragraph position="9"> ( ~., 'r) ~. We ~ay use Poisson distribution asan approximation to (12) if n is large. However. since n~ 3,000, it is realistic to calculate the formula (12).</Paragraph>
    <Paragraph position="10">  7) Clustering of concept-concept relation descriptions Concept-concept relation descriptions remaining after all the above procedures are clustered by using an optimal scaling or using DM-decomposition and a probability-based estimation and the gained clusters are asserted as concept-category relation descriptions into the Concept Dictionary. The clustering algorithms are explained in detail in (Nakao 1988, Matsukawa 1989).</Paragraph>
    <Paragraph position="11"> 8) Reconstruction of the concept taxonomy Category-category relation descriptions are clustered and hierachized by using DM-decomposition and set-relation calculations in order to bundle the descriptions into higher level categories. The hierachization algorithm is explained in detail in (Matsukawa 1990a, 1990b, Yokola 1990).</Paragraph>
  </Section>
  <Section position="6" start_page="216" end_page="216" type="metho">
    <SectionTitle>
4 Preference Knowledge
</SectionTitle>
    <Paragraph position="0"> All concepts, categories and concept descriptions have an ID number called concept ID. Knowledge for ordering input ,sentence interpretations given by using the Concept Dictionary are represented by data individually expressing the order of the concept \[Ds that co-occur with each word pronunciation, word and concept, respectively (preference knowledge). We use the following three methods for ordering concept IDs: a) Linear lists of concept IDs b) Association lists of concept IDs and the concept IDs' preference value c) Dbected graphs including arcs meaning preference relations between concept \[Ds As mentioned in section 3.3, modification of the Concept Dictionary is made based on feedback information from tests performed by various kinds of processes in application systems (testing systems). Word sense selection and translation word candidates selection are ones of these processes. 'When an output answer given by such a testing system is different from correct answers, the reason for the difference is analyzod by viewing u'aces of processes of the system, and the Concept Dictionary and/or the preference knowledge ate/is modified. After such modifications, the correct answers become able to be selected by using the Concept Dictionary and the preference knowledge. For example, the word &amp;quot;suspend&amp;quot; has five senses, as shown in Figure 7. If the concept-concept relation shown in (14) is input, only two out of the five senses match the relation. The two senses are shown in (15): (14) c#suspend --&lt;object&gt;'-&amp;quot; c#police_man (to suspend the policeman)  If we have preference knowledge on &amp;quot;c#police_man&amp;quot; as shown in (I 6), we can select only one sense out of the two senses, namely &amp;quot;c#(0dbT0c)to_prevent_from_taking_part_in_a_team_for_a-time.&amp;quot;  By using this knowledge with the concept descriptions shown in Figure 8, Japanese sentences can be properly translated, for example as shown in (18) (for a method for unification of concepts expressed by different words or in different languages, see (Miike 1990b, Tominaga 1991)):  word candidate selection, but also throiagh those of structual disambiguation, paraphrasing and the like. Therefore, the knowledge includes descriptions corresponding to lexical preference proposed by Ford, Bresnan and Kaplan for structual disambiguation (Ford Bresnan and Kaplan 1982). Although such knowledge provides just a bias of interpretations of ambiguous structures, the knowledge is indispensable for deterministic sentence analysis without any knowledge about the discourse to be refered in order to use the principle of parsimony, the principle of a priori plausibility, etc. (Crain and Steedman 1984, Hirst 1984).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML