XML Viewer - p98-1098

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1098_evalu.xml
Size: 8,363 bytes
Last Modified: 2025-10-06 14:00:27
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1098">
  <Title>Combining a Chinese Thesaurus with a Chinese Dictionary</Title>
  <Section position="6" start_page="602" end_page="604" type="evalu">
    <SectionTitle>
4. Results and Evaluation
</SectionTitle>
    <Paragraph position="0"> There are altogether 29,679 words shared by the two resources, which hold 35,193 entries in the thesaurus and 36,426 senses in the dictionary. We now consider the 13,165 entries and 14,398 senses which are irrelevant with the 22,028 univocal words. Tab. 2 and 3 list the distribution of the entries with respect to the number of their sense tags, and the distribution of the senses with respect to the number of their code tags respectively.</Paragraph>
    <Paragraph position="1">  their code tags In order to evaluate the efficiency of our method, we define two measures, accuracy rate and loss rate, for a group of entries E as 11) and 12) respectively 8.</Paragraph>
    <Paragraph position="2"> a We only give the evaluation on the results for entries, the evaluation on the results for senses can be done similarly.</Paragraph>
    <Paragraph position="4"> where RTe is a set of the sense tags for the entries in E produced by the tagging procedure, and CT~ is a set of the sense tags for the entries in E, which are regarded as correct ones somehow.</Paragraph>
    <Paragraph position="5"> What we expect for the tagging procedure is to select the appropriate sense tags for the entries in the thesaurus, if they really exist in the dictionary. To evaluate the procedure directly proves to be difficult. We turn to deal with it in an indirect way, in particular, we explore the efficiency of the procedure of tagging the entries, when their appropriate sense tags don't exist in the dictionary. This indirect evaluation, on the one hand, can be .carried out automatically in a large scale, on the other hand, can suggest what the direct evaluation entails in some way because that none appropriate tags can be seen as a special tag for the entries, say None 9.</Paragraph>
    <Paragraph position="6"> In the first experiment, let's consider the 18,653 uniyocal words again which are selected in parameter estimation stage. For each of them, we create a new entry in the thesaurus which is different from its original one. Based on the analysis in section 3, the senses for theses words should only be the correct tags for their corresponding entries, the newly created ones have to take None as their correct tags.</Paragraph>
    <Paragraph position="7"> When creating new entries, we adopt the following 3 different kinds of constraints: i) the new entry belongs to the same medium category with the original one; ii) the new entry belongs to the same major category with the original one; iii) no constraints; With each constraint, we select 5 groups of new 8 A default sense tag for the entries.</Paragraph>
    <Paragraph position="8">  entries respectively, and carry out the experiment for each group. Tab. 4 lists average accuracy rates and loss rates under different constraints.</Paragraph>
    <Paragraph position="9">  From Tab. 4, we can see that the accuracy rate under constraint i) is a bit less than that under constraint ii) or iii), the reason is that with the created new entries belonging to the same medium category with the original ones, it may be a bit more likely for them to be tagged with the original senses. On the other hand, notice that the accuracy rates and loss rates in Tab.4 are complementary with each other, the reason is that IRTei equals ICTel in such cases.</Paragraph>
    <Paragraph position="10"> In another experiment, we select 5 groups of 0-tag, 1-tag and 2-tag entries respectively, and each group consists of 20-30 entries. We check their accuracy rates and loss rates manually. Tab.  Notice that the accuracy rates and loss rates in Tab.5 are not complementary, the reason is that IRT~ doesn't equal ICTel in such cases.</Paragraph>
    <Paragraph position="11"> In order to explore the main factors affecting accuracy and loss rates, we extract the entries which are not correctly tagged with the senses, and check relevant definitions and semantic codes. The main reasons are: i) No salient codes exist with respect to a category, or the determined are not the expected. This may be attributed to the fact that the words in a category may be not strict synonyms, or that a category may contain too less words, etc.</Paragraph>
    <Paragraph position="12"> ii) The information provided for a word by the resources may be incomplete. For example, word &amp;quot;~(/quanshu/, all) holds one semantic code Ka06 in the thesaurus, its definition in the dictionary is:  The correct tag for the entry should be the sense listed above, but in fact, it is tagged with None in the experiment. The reason is that word ~:~ (/quanbu/, all) can be an adverb or an adjective, and should hold two semantic codes, Ka06 and Eb02, corresponding with its adverb and adjective usage respectively, but the thesaurus neglects its adverb usage. If Ka06 is added as a semantic code of word ~_~ (/quanbu/, all), the entry will be successfully tagged with the expected sense.</Paragraph>
    <Paragraph position="13"> iii) The distance defined between a sense and a category fails to capture the information carded by the order of salient codes, more generally, the information carded by syntactic structures involved. As an example, consider word ~-~ (/yaochuan/), which has two definitions listed in the following.</Paragraph>
    <Paragraph position="14"> i~ 1) i~\[Dal9\] ~\[Ie01l.</Paragraph>
    <Paragraph position="15"> /yaochuan/ /yaoyan/ /chuanbo/ hearsay spread the hearsay spreads.</Paragraph>
    <Paragraph position="16"> 2) ~\[Ie01\] I~ ~.~-~ \[Dal9\] /chuanbo/ Idel /yaoyan/ spread of hearsay the hearsay which spreads The two definitions contain the same content words, the difference between them lies in the order of the content words, more generally, lies in the syntactic structures involved in the definitions: the former presents a sub-obj structure, while the latter with a &amp;quot;l~(/de/,of)&amp;quot; structure. To distinguish such definitions needs to give more consideration on word order or syntactic structures.</Paragraph>
  </Section>
  <Section position="7" start_page="604" end_page="605" type="evalu">
    <SectionTitle>
5. Discussions
</SectionTitle>
    <Paragraph position="0"> In the tagging procedure, we don't try to carry out any sense disambiguation on definitions due to its known difficulty. Undoubtedly, when the noisy semantic codes taken by some definition words exactly cover the salient ones of a category, they will affect the tagging accuracy. But the probability for such cases may be lower, especially when more than one salient code exists with respect to a category.</Paragraph>
    <Paragraph position="1"> The distance between two categories is defined according to the distribution of their member words in a corpus. A natural alternative is based on the shortest path from one category to another in the thesaurus (e.g., Lee at al., 1993; Rada et al., 1989), but it is known that the method suffers from the problem of neglecting the wide variability in what a link in the thesaurus entails. Another choice may be information content method (Resnik, 1995), although it can avoid the difficulty faced by shortest path methods, it will make the minor categories within a medium one get a same distance between each other, because the distance is defined in terms of the information content carded by the medium category. What we concern here is to evaluate the dissimilarity between different categories, including those within one medium category, so we make use of semantic code based vectors to define their dissimilarity, which is motivated by Shuetze's word frequency based vectors (Shuetze, 1993).</Paragraph>
    <Paragraph position="2"> In order to determine appropriate sense tags  for a word entry in one category, we estimate a threshold for the distance between a sense and a category. Another natural choice may be to select the sense holding the smallest distance from the category as the correct tag for the entry. But this choice, although avoiding estimation issues, will fail to directly demonstrate the inconsistency between the two resources, and the similarity between two senses with respect to a category.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML