File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3146_metho.xml

Size: 14,393 bytes

Last Modified: 2025-10-06 14:13:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-3146">
  <Title>TOWARDS A NEW GENERATION OF TERMINOLOGICAL RESOURCES: AN EXPERIMENT IN BUILDING A TERMINOLOGICAL KNOWLEDGE BASE</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 RESEARCH ISSUES IN
COMPUTATIONAL TERMINOLOGY
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Terminological vs. Lexieal
Knowledge Bases
</SectionTitle>
      <Paragraph position="0"> Much of the world's terminological data is stored in large terminological databases (TDBs) such as Canada's TERMIUM III, which contains over one million bilingual records. These TDBs are useful only to humans, and even then to only a small subset of potential users: translators remain the principal user category, even though TDBs have obvious applications in technical writing, management information and domain learning, not to mention a wide variety of machine uses such as information retrieval, machine translation and expert systems. A major weakness of TDBs is that they provide mainly linguistic information about terms (e.g. equivalents in other languages, morphological information, style labels); conceptual information is sparse (limited to definitions and sometimes contexts), unstructured, inconsistent and implicit.</Paragraph>
      <Paragraph position="1"> Given these problems, a growing number of terminology researchers are calling for the evolution of TDBs into a new generation of terminological repositories that are knowledge-based. Since this vision of a TKB has been recently paralleled in computational lexicology by the vision of a lexical knowledge base or LKB (e.g. Atkins 1991, Boguraev and Levin 1990, Pustejovsky and Bergler 1991), we would like to briefly position our research framework in relation to these developments. null The LKB projected by Boguraev and Levin 1990 differs from an LDB in two ways: 1) the LDB states lexical characteristics on a word-by-word basis, while the LKB permits generalizations; and 2) the LKB permits inferencing, and thus the possibility of dynamically extending the Ac'rr~s DE COLING-92. NANTES. 23-28 AOt~T 1992 95 6 Paoc. OF COLING-92, NANTEs. AUG. 23-28, 1992 lexicon to accommodate new senses. Both characteristics are extremely important for the TKB as well: 1) a capacity for supporting generalisations is particularly relevant to terminology since terminological repositories have an important teaching function2; and 2) the accommodation of new senses is even more crucial to terminology than to the general lexicon since specialized languages grow so rapidly. While the TKB must share these characteristics, it differs from the LKB in one important way, which derives from the fundamental difference between general and specialized lexical items. This difference can be summarized in the following two principles: * an LKB must make explicit what a native speaker knows about concepts denoted by general lexical items deg a TKB must make explicit what a native speaker who is also a domain ex_oert knows about concepts denoted by specialized lexical iterrL~ While the lexicographer's ultimate source of lexieal knowledge is his/her own intuition, the terminologist's challenge is to model experts' terminological intuitions, which stem in large part from their domain knowledge. The acquisition of domain knowledge, therefore, has traditionally been the starting point for any practical terminology project; only when the knowledge structures of a domain are systematized to some degree can terminologists proceed with term extraction, definition construction, analysis of synonymy and polysemy, identification of equivalents in other languages, etc. The crucial importance of modelling domain knowledge in a TKB necessitates a conceptual framework and technology which, in our view, should derive partly from recent insights in knowledge engineering.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 Terminology and Knowledge
Engineering
</SectionTitle>
      <Paragraph position="0"> At the heart of the relationship between terminology and knowledge engineering is the fact that practitioners of both disciplines function as intermediaries in a knowledge communication context involving experts on the one hand and a knowledge processing technology on the other.</Paragraph>
      <Paragraph position="1"> This type of knowledge communication context entails three principal activities: Knowledge acquisition. Acquisition of knowledge, whether by elicitation from a human 2 Most TDB users are not domain experts, and thus hope to acquire some domain knowledge when they look up a term. expert or extraction from texts, is complicated by the fact that domain expertise consists of three elements - performance, understanding and communication - that require the expert to play the roles of practitioner, scientist and teacher, respectively (Gaines 1990). Unfortunately, experts vary widely in their teaching skills: they may not have the linguistic ability to express knowledge clearly; they may not provide exactly the knowledge that is required; etc. As well, they may vary in their understanding of the field, presenting the knowledge engineer/terminologist with problems of inconsistency and contradiction, Knowledge formalization. Knowledge does not come &amp;quot;off the shelf, prepackaged, ready for use&amp;quot; (Hayes-Roth 1987:293). As already mentioned, it can be inconsistent and contradictory. It can be multidimensional, since experts' understanding of a conceptual system can depend on their point of view. It may be hard to &amp;quot;capture&amp;quot;, since it is constantly changing, and since emergent knowledge can be incomplete and unclear. Finally, from the knowledge engineer/terminologist's point of view, it will exist in various degrees of&amp;quot;clarity&amp;quot; and &amp;quot;depth&amp;quot;: since knowledge acquisition is incremental, certain concepts will be more clearly or deeply understood than others at any given time.</Paragraph>
      <Paragraph position="2"> Knowledge refinement. Once formalized, knowledge may be refined in two ways: 1) it may be validated by testing the knowledge-based system on the intended application, and/or 2) it may be periodically updated, for example, as the knowledge engineer/terminologist's understanding of the field deepens or expands, when the field itself changes, or when the system needs more knowledge due to changes in the application.</Paragraph>
      <Paragraph position="3"> Knowledge refinement may again entail knowledge acquisition and formalization, making the knowledge engineering cycle a continuous process. null Over the last three years, we have developed and tested a knowledge engineering tool called</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CODE (Conceptually Oriented Description
</SectionTitle>
    <Paragraph position="0"> Environment), which is designed to assist a user who may or may not be a domain expert in acquiring, formalizing and refining specialized knowledge. Although genetic by design, CODE emphasizes linguistic and particularly terminological support, which we feel is crucial to all knowledge engineering applications. From 1987 to 1990, a working prototype was developed and tested in three terminology-intensive applications: term bank construction, software engineering and database ACIES DE COLING-92, NAbrlES, 23-28 AO~r 1992 9 S 7 PROC. OF COLING-92, NA/C~'ES, AUG. 23-2.8, 1992 design 3. Our research has now entered a second three-year phase, with the goal of using CODE to help us develop a clearer concept of a TKB and of an associated methodology.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 COGNITERM: A
TERMINOLOGICAL KNOWLEDGE
BASE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 General Description
</SectionTitle>
      <Paragraph position="0"> COGNITERM is essentially designed as a hybrid between a conventional TDB and a knowledge base. Each concept is represented in a frame-like structure called a concept descriptor (CD), which has two main information categories. The Conceptual Information category is the knowledge base component, listing conceptual characteristics and their values. CDs are normally, though not necessarily, arranged in inheritance hierarchies.</Paragraph>
      <Paragraph position="1"> The Linguistic Information category is the TDB component, providing all the strictly linguistic information normally found in conventional TDBs.</Paragraph>
      <Paragraph position="2"> The TKB can be visualized graphically in a variety of semantic net displays. Both hierarchical (e.g. generic-specific, part-whole) and non-hierarchical relations can be graphed. Since knowledge acquisition typically proceeds one subdomain at a time, subwindows may show only a restricted part of the knowledge structure (i.e. a subtree). There is also a masking capability which, for example, can show only concepts that fall within a given &amp;quot;dimension&amp;quot; of reality.</Paragraph>
      <Paragraph position="3"> As an aid to definition construction, and specifically to assist in determining the differentiating characteristics, CODE offers a Characteristic Comparison Matrix that presents the union of all characteristics of coordinate concepts 4, with the exclusion of those that are identical in all coordinates. null Finally, navigation through COGNITERM is facilitated by CODE's Browser, which allows the knowledge to be accessed either by names of concepts or names of their characteristics, both of which can be presented in a conceptual (i.e. hierar3 This first phase of our research has already been documented elsewhere: a general technical description of CODE can be found in Skuce (in press b); an analysis of the relationship between terminology and knowledge engineering can be found in Meyer 1991 and (in press); the three terminology-intensive applications are described in Skuce and Meyer 1990a/b (term bank construction), Skuce (in preparation) (software engineering), and Downs et al.</Paragraph>
      <Paragraph position="4"> 1991 (database design).</Paragraph>
      <Paragraph position="5"> 4 By coordinate concepts we mean concepts that share the same parent in a hierarchy. chical) or alphabetical order. A variety of masks can be applied to restrict the knowledge.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Advantages of a TKB over a TDB
</SectionTitle>
      <Paragraph position="0"> The differences between a conventional TDB and a TKB can be examined from three points of view: 1) the information itself, 2) support for acquiring and systematizing the information and 3) facilities for retrieving the information. A brief description 5 of each is found below.</Paragraph>
      <Paragraph position="1"> The information. In a TDB, conceptual information is encoded implicitly in the form of definitions, contexts, indication of domain(s), etc. In a TKB, it is encoded explicitly. The resultant degree of structure imposed on the information has three important by-products. First, it allows for an explicit representation of conceptual relations (as opposed to implicit representations in TDB definitions or contexts). Second, it facilitates consistency: since generic concepts are explicitly indicated, for example, definitions of all coordinate concepts must have the same genus term; since characteristics inherit to subeoncepts, they will correspond from one coordinate concept to another. Third, an explicit representation of conceptual relations facilitates graphical representations of knowledge structures; this aspect is particularly emphasized in the COGNITERM Project since graphical representations aid learning, providing the kind of conceptual &amp;quot;map&amp;quot; advocated by numerous educational psychologists 6.</Paragraph>
      <Paragraph position="2"> Acquisition and systematization of information. Unlike conventional TDBs, a TKB such as COGNITERM provides not only a medium for storing information, but also mechanisms to assist in acquiring and systematizing the information in the fast place. Inheritance mechanisms play an important role in this regard: on the simplest level, they free the terminologist from repeating information from one hierarchical level to another, and allow the possibility of &amp;quot;what-if&amp;quot; experiments; on a more interesting level, inheritance can be associated (as it is in CODE) with mechanisms for signalling conflicts when changes to one hierarchical level &amp;quot;percolate&amp;quot; through the knowledge structures. A browsing mechanism such as we have implemented provides additional support for acquisition, as it allows the kind of hypertext-like &amp;quot;navigation&amp;quot; through the knowledge structures that is needed to ferret out compatible knowledge &amp;quot;spaces&amp;quot; for a new concept. Other implemented  developed graphical display, are just some exam~ pies of the potential facilities of a TKB environment designed to help terminologists &amp;quot;get the knowledge straight&amp;quot; throughout the acquisition process.</Paragraph>
      <Paragraph position="3"> Retrieval of information. Conventional TDBs are severely handicapped by their fundamental term-to-concept orientation: knowing a teml, one can expect the TDB to indicate (to some degree, at least) what it means, what its synonyms are, etc. Terminological research, however, is very often concept-to-term oriented: for example, &amp;quot;reallife&amp;quot; terminology is typified by questions like &amp;quot;What do you call the machine with function W?&amp;quot;, &amp;quot;What do you call the material that has physical characteristics X, Y, and Z?&amp;quot; The inability of conventional TDBs to answer these kinds of questions leads to the proliferation of synonyms and quasisynonyms, one of the greatest impediments to communication in specialized domains. Users of COGN1TERM can access its data through any conceptual characteristic to determine whether the concept they have in mind already has a name.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML