File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2007_metho.xml

Size: 13,280 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2007">
  <Title>Building a Bilingual WordNet-Like Lexicon: the New Approach and Algorithms</Title>
  <Section position="3" start_page="0" end_page="2" type="metho">
    <SectionTitle>
2 The New Approach to Building a Bilingual
WordNet-Like Lexicon
</SectionTitle>
    <Paragraph position="0"> The distinct principles of organization of WordNet can be described below: concepts, viz. synsets, act as the basic units of lexical semantics, and the hyponymy of the concepts acts as the basic relation among others. Upon this tree structure of hyponymy, there also exist some other semantic relations like holonymy, antonymy, attribute, entailment, cause, etc., which further interweave all the concepts in the lexicon into a huge semantic network, say 99,643 synset nodes all told in WordNet 1.6.</Paragraph>
    <Paragraph position="1"> What really counts and takes a lot of trouble in building WordNet itself is how to set up all these synsets and relations properly, and, how to maintain the semantic consistencies in case of frequent occurrences of modifications during the revision [Beckwith et al, 1993]. As the desirable developing tool based directly on a large-scale network has not yet appeared, due to the connatural complexity of net structure, this problem is all the way a Gordian knot for the lexicographers.</Paragraph>
    <Paragraph position="2"> To build a Chinese WordNet in the same route just as Princeton had taken and then to construct the mapping between these two WordNets may be not a satisfying idea.</Paragraph>
    <Paragraph position="3"> So, it is crucial that we had better find an approach to reusing the English common knowledge already described in WordNet as the semantic basis for Chinese when building the bilingual lexicon. And this kind of reusing should contain some capabilities of adjustments to the bilingual concepts besides word-for-word translations. If we can manage it, not only the building of the monolingual Chinese lexicon benefits but also the mapping between Chinese-English [Liu et al, 2002]. Actually, the practice of mapping has now become a direct and dynamic process and the evolution of the bilingual lexicon is no longer a problem. A comparatively high efficiency may be achieved.</Paragraph>
    <Paragraph position="4"> Such are the essential ideas of the new solution. A characteristic of this approach is to emphasize the inheritance and transformation of the already existent monolingual lexicon.</Paragraph>
    <Paragraph position="5"> Accordingly, it deals with 2 processes. The first process simply gets the semantic basis for further use and the lexicographers' work always focuses on the second. In fact, the bilingual lexicon has just gradually come into being in this more natural process.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The Inheritance Process of WordNet
</SectionTitle>
      <Paragraph position="0"> This process is intended to extract the common hyponymy information in WordNet as the semantic basis for future use.</Paragraph>
      <Paragraph position="1"> However, to extract the full hyponyms for a certain concept is by no means easy. As we have examined, the number of hyponyms for a synset ranges from 0 to 499 with a maximal hyponymy depth of 15 levels in WordNet. This shows the structure of the potential hyponymy tree is quite unbalanced. Due to this high complexity, the ordinary searching algorithm can hardly do. If one inputs the word entity as entry in WordNet 1.6 and tries to search its full hyponyms, he will get nothing but a note of failure. Sure enough, if the entry is not entity but another word, say entrance, the searching will probably do. The cases actually depend on the location of the entry word in the potential hyponymy tree in WordNet. The higher the level of the entry word, the less possibility of success the searching will have.</Paragraph>
      <Paragraph position="2"> By now, we have got a refined searching algorithm for getting the full hyponymy information in WordNet [Liu et al, 2002].</Paragraph>
      <Paragraph position="3"> By and large, it involves a series of Two Way Scanning action and of Gathering/Sieving and Encoding action, with each round of the series intending to get information of nodes on one same level in the hyponymy tree.</Paragraph>
      <Paragraph position="4"> By this special algorithm, the complexity of searching is greatly reduced. We can even get all the 45,148 hyponyms for the topmost entry word entity, in 100 or so seconds, on an ordinary PC. People who are interested in it can find more details about the algorithm in [Liu et al, 2002].</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
2.2 The Transformation Process of WordNet
</SectionTitle>
      <Paragraph position="0"> This process is for the lexicographers to interactively operate on the hyponymy tree to express the bilingual semantics. The bilingual lexicon will gradually come into being in this process.</Paragraph>
      <Paragraph position="1"> For this task, we have designed and realized a visualized and data-sensitive tree control with 8 well-defined operations on it, some of the pivotal algorithms for which will be discussed later.</Paragraph>
      <Paragraph position="2"> After extracting the hyponymy information for each initial semantic unit in WordNet respectively, we then organize the information into a hyponymy tree by using the above tree control. Every tree node, viz. synset, still carries all other semantic relations already described in WordNet. The lexicographers can now operate on the tree interactively.</Paragraph>
      <Paragraph position="3"> The actual practices of the lexicographers are as follows: (i) For each tree node in English, if there exists a corresponding Chinese concept, the lexicographers simply translate the English concept into Chinese.</Paragraph>
      <Paragraph position="4"> (ii) If there does not, cases may be that the English concept is either too general or too specific for Chinese.</Paragraph>
      <Paragraph position="5"> (ii  ) For the former case, the lexicographers can create new hyponyms in Chinese for the English concept and link all these new hyponyms in Chinese with the English concept. (ii  ) For the latter case, the lexicographers just delete the English concept in a special way, which means the English concept has no equivalent in Chinese and only links the English concept with its hypernym.</Paragraph>
      <Paragraph position="6"> In fact, all the above-mentioned semantic manipulations concerning hyponymy relation have already been encoded into the 8 visualized operations on the hyponymy tree. In addition, in the 8 operations, some other semantic relations already described in the synsets in WordNet are all properly dealt with through systematic and reasonable calculations.</Paragraph>
      <Paragraph position="7"> We can see these adjustments clearly in the description of the algorithms.</Paragraph>
      <Paragraph position="8"> Now, it is of much significance that the lexicographers need simply operate on the hyponymy tree to express their semantic intention and no longer care for lots of details about the background database, for the foreground operations have already fulfilled all the automatic modifications of the database. In this way, the problems of mapping between the bilingual concepts and evolution of the bilingual lexicon are dynamically resolved. Our developing tool for building the bilingual WordNet-like lexicon has come out as below.</Paragraph>
      <Paragraph position="9"> The interface view shows the hyponymy tree for the entry food, which is one of the 25 initial semantic units of noun in WordNet with the category value of 13. For the currently chosen node, the lexicographers can further adopt a proper operation on it when needed. This new kind of Visualized Auxiliary Construction of Lexicon is characteristic of the inheritance and transformation of the existent monolingual lexicon. We call it Vacol model for short.</Paragraph>
      <Paragraph position="10"> As we see, the new approach, in fact, is independent of any specific languages and actually offers a general solution for building a bilingual WordNet-like lexicon.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 Tree Operations and their Algorithms
</SectionTitle>
    <Paragraph position="0"> As the lexicographers always work on the tool, the visualized, data-sensitive tree control with operations on it is the key to the new approach.</Paragraph>
    <Paragraph position="1"> By now, we've schemed a set of algorithms based on the Treeview control in the Microsoft Visual Studio 6.0 and eventually implemented a data-sensitive tree control with operations on it.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Tree Operations
</SectionTitle>
      <Paragraph position="0"> The 8 operations that we have semantically well defined are listed as follows. When choosing a synset node in the hyponymy tree, these are the operations from which the lexicographers can further adopt one.</Paragraph>
      <Paragraph position="1">  [1] To add a synset as brother node; [2] To add a synset as child node; [3] To delete the synset node (not including its descendants if exist); [4] To delete the synset node (including all its descendants if exist); [5] To cut the subtree; [6] To copy the subtree; [7] To paste the subtree as brother node; [8] To paste the subtree as child node.  These operations are all to edit the tree, with respectively No. 1, 2 for addition, No. 3, 4 for deletion, and No. 5, 6, 7, 8 for batch movement. In fact, all these operations have been carefully decided on to make them concise enough, capable enough and semantically meaningful enough.</Paragraph>
      <Paragraph position="2"> It is easy to prove that any facultative tree form can be attained by iterative practices of these 8 operations.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Algorithms for the Tree Operations
</SectionTitle>
      <Paragraph position="0"> The data structure of a hyponymy tree with n nodes can be illustrated by the following table:  There are 3 parts of information in each record: the structural information {Pos</Paragraph>
      <Paragraph position="2"> } which are relevant only to the concept proper. Among these 3 parts of information, {Pos</Paragraph>
      <Paragraph position="4"> } for lexical semantics. It should be noticed that Pos</Paragraph>
      <Paragraph position="6"> stands for a special encoding for the tree in the foreground and is somewhat different from</Paragraph>
      <Paragraph position="8"> , a relational pointer of hyponymy, which represents its specific semantics in the background database. And it is the relations in</Paragraph>
      <Paragraph position="10"> } that have highly contributed to the dense net structure of WordNet.</Paragraph>
      <Paragraph position="11"> After these analyses, we find that each operation should just properly deal with these 3 parts of information. First, it is crucial that two sorts of consistencies should be maintained.</Paragraph>
      <Paragraph position="12"> One is that of the structural information {Pos</Paragraph>
      <Paragraph position="14"> of the tree and the other is that of the relation  lexicon. Following that, the cases of the basic information {BasicInfo i } are comparatively simple for only English-Chinese translations are involved.</Paragraph>
      <Paragraph position="15"> Before we can go on to dwell on the algorithms, we still need a little while to touch on the structural information {Pos i }. When we say a position Pos i , we actually mean the location of a certain node in the tree and it serves to organize the tree. For example, a Pos i by the value &amp;quot;005001002&amp;quot; is to represent such a location of a node in a tree: at the 1st level, its ancestor being the 5th; at the 2nd level, its ancestor being the 1st; and at the 3rd level, its ancestor viz. itself now being the 2nd. In fact, such an encoding onto a linear string does fully express the structural information in a tree and makes all the tree operations algorithms feasible by direct and systematic calculations of the new position.</Paragraph>
      <Paragraph position="16"> If we don't want to badger with much of the details, the algorithms for tree operations can be described in a general way. Although for each line of the pseudocode, there indeed are lots of jobs to do for the programmer.</Paragraph>
      <Paragraph position="17"> The algorithms described below are suitable for the non-batch-movement operations, viz. operations [1, 2, 3, 4]. And the batch-movement operations, viz. operations [5, 6, 7, 8], can be regarded as their iterative practices. The lexicographers trigger an action on node  The algorithms have some nice features. Since the structural information {Pos}, defined as the primary key of the table, is kept in order, the maintenance of tree structure can always be completed in a single pass.</Paragraph>
      <Paragraph position="18"> The maintenance of consistencies of the  the lexicon is also limited to a local section of the table.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML