File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0203_metho.xml

Size: 40,193 bytes

Last Modified: 2025-10-06 14:12:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W91-0203">
  <Title>KNOWLEDGE MANAGEMENT FOR TERMINOLOGY-INTENSIVE APPLICATIONS: NEEDS AND TOOLS</Title>
  <Section position="3" start_page="0" end_page="21" type="metho">
    <SectionTitle>
. LEXICAL-SEMANTIC AND ENCYCLOPEDIC KNOWLEDGE IN
TERMINOLOGY
</SectionTitle>
    <Paragraph position="0"> Terminology is the practical discipline concerned with describing and naming concepts in specialized domains. The data produced by this process is, increasingly, stored in data bases known as term banks. Since concepts are the starting point for all practical terminology work, and since concepts are the building-blocks of knowledge, it follows that terminology is a very knowledge-intensive activity: describing concepts involves acquiring knowledge about their characteristics, and naming concepts involves matching conceptual characteristics with linguistic forms (i.e. terms). In fact, terminology is somewhat of a misnomer: most fundamentally, it is not the study of &amp;quot;terms&amp;quot;, but rather of the knowledge conveyed by the terms.</Paragraph>
    <Paragraph position="1"> Given the crucial role of knowledge in terminology, one needs to address the question of what kind of knowledge to include in term banks. In the related discipline of lexicography, the same question, formulated in relation to dictionaries and lexicons, has resulted in the long-standing debate about differences between lexical-semantic (i.e.</Paragraph>
    <Paragraph position="2"> linguistic) and encyclopedic (i.e. world, extra-linguistic) knowledge (1). One viewpoint has been that dictionaries and encyclopedias should be conceived as distinct entities - hence the apothegm &amp;quot;dictionaries are about words, encyclopedias are about things&amp;quot;. Meaning-Text (MT) Lexicography (Mel'cuk 1988a/b), for example, makes a strict distinction between lexical-semantic and encyclopedic knowledge on the basis of semantic features: those which are necessary and sufficient (in the mathematical sense) to a definition are lexical-semantic, while those that are superfluous are encylopedic, and banned from  definitions. A contrasting point of view, expressed for example by McArthur (1986:pp.</Paragraph>
    <Paragraph position="3"> 102-109) is that for certain purposes, it may be useful to produce a hybrid of dictionary and encyclopedia - in other words, an encyclopedic dictionary. McArthur proposes that the dictionary-encyclopedia relationship be seen as a continuum rather than a dichotomy, and proposes the term micro-lexicography to designate the activity dealing with &amp;quot;the world of words...&amp;quot;, and the term macro-lexicography to designate the activity which &amp;quot;shades out into the world of things and subjects, and centres on compendia of knowledge...&amp;quot;.</Paragraph>
    <Paragraph position="4"> Depending on the lexicographic framework and the intended user of the dictionary, a strict differentiation between lexical-semantic and encyclopedic knowledge can be not only theoretically interesting, but also practically relevant: in the MT framework, for example, it underlies virtually all aspects of lexicographic methodology. Learners' dictionaries, on the other hand, whether they are aimed at learners of the mother tongue (i.e. children's dictionaries) or learners of a foreign language, typically feature encyclopedic characteristics, such as pictures and a rich supply of examples, that are intended to supplement definitions. Furthermore, the definitions themselves may include information that exceeds the bounds of &amp;quot;necessary and sufficient&amp;quot;.</Paragraph>
    <Paragraph position="5"> The lexical-semantic vs. encyclopedic debate is pertinent to terminology since this discipline is closely linked to lexicography in purpose and method. In our view, terminology is clearly macro-lexicographic (to use McArthur's term) in orientation: term banks must include not only necessary and sufficient information about concepts, but also a certain amount of encyclopedic information as well. The following are just some of the reasons for our view: Relationship to specialized domains. Terminology is closely related to the specialized domains of activity whose lexica it describes. This is reflected in the basic organization of term banks according to specialized domains (i.e. subject fields). Until recently, most terminology work was done by domain experts, and the increasing numbers of terminologists who are not domain experts still consider consultation with experts to be crucial to their work. One of the goals of terminology is to provide assistance in the ordering and use of terms within specialized domains. Because of its relationship to languages for special purposes (LSPs), terminology has a need for subject classification and thesaural structure. In other words, it is closely linked to information science, with which it shares tools such as keywords, indexes and thesauri.</Paragraph>
    <Paragraph position="6"> The role of term banks as learning tools. Although term banks can be consulted by users with a wide range of domain expertise, by far the bulk of users are not domain experts. The largest user group has always been translators, who consult term banks not only for strictly linguistic information (e.g. part-of-speech, morphology, target-language equivalent), but also for conceptual information (e.g. conceptual characteristics, relations to other concepts), since it is well known that a certain depth of understanding of the domain is necessary to use its terminology correctly. Term banks can be seen as learning tools for the terminologists themselves, for example, when they are assigned a new field in which they have little knowledge, or when they are working in a field that is highly influenced by neighboufing fields with which they are not very familiar. Because of the teaching role of term banks, definitions are often complemented by examples of terms in context, much in the same way that learners' dictionaries are. Like encyclopedias, terminological publications often include pictures and diagrams.</Paragraph>
    <Paragraph position="7"> The multilingual aspect of terminology. All the large term banks that currently exist are multilingual, and this tendency will most likely remain in the face of the increasing importance of international communication for trade and knowledge communication. It is well known that establishing lexical equivalence between different languages is often impossible on the basis of lexical-semantic information alone. To take a well-known example from general language, the word river is defined in Webster's Ninth New Collegiate Dictionary as &amp;quot;a natural stream of water of considerable volume&amp;quot;. French, however, distinguishes between flowing bodies of water that empty into the ocean (fleuve)  and those that empty into a lake or another flowing body of water (rividre) - information that would be considered encyclopedic if one applied the necessary-and-sufficient rule.</Paragraph>
    <Paragraph position="8"> The need for multifun~tional term banks. In keeping with the increasing emphasis on the shareability of lexical resources in general, term banks will have to aim at meeting the needs of more and more user types, including machines (Freibott and Heid 1990, McNaught 1990, Meyer 1991). Machine uses (e.g. machine translation, expert systems, NL interfaces to databases) will require very large quantities of explicitly represented conceptual information, since they do not possess much of the world knowledge that humans know implicitly.</Paragraph>
    <Paragraph position="9"> Because of the important encyclopedic dimension of terminology, we feel that a term bank can be conceived as a kind of knowledge base, and we are currently in the process of designing a prototype knowledge-based term bank, called COGNITERM, in the Artificial Intelligence Laboratory of the University of Ottawa, Canada. COGN1TERM will be constructed using a knowledge engineering tool called CODE (Conceptually Oriented Design Environment, Skuce et al. 1989), that has already been tested in two terminology-intensive environments, where a number of small knowledge bases (several hundred concepts) were constructed. Before discussing the research in progress for the COGNITERM project (Section 3), we will briefly describe some of the knowledge management needs that our research is aiming to fill.</Paragraph>
  </Section>
  <Section position="4" start_page="21" end_page="26" type="metho">
    <SectionTitle>
. KNOWLEDGE MANAGEMENT NEEDS ACROSS THE
TERMINOLOGY SPECTRUM
</SectionTitle>
    <Paragraph position="0"> As explained above, the knowledge management problem in terminology is heightened by the fact that persons doing terminology need to manage both lexical-semantic and encyclopedic knowledge. This problem is further complicated by the fact that terminology is a very heterogeneous discipline, since the naming and description of specialized concepts can be carried out in a wide spectrum of working environments, dictating various types of knowledge management support. At one end of this spectrum is what we might call the most &amp;quot;pure&amp;quot; form of terminology, namely terminology practised as a distinct specialization. In this type of environment, we find persons officially designated as terminologists, often with professional training and/or certification in terminology (2), following a controlled methodology (3). At the other end of the spectrum we find a much more &amp;quot;casual&amp;quot; form of terminology as it is practised as a component of documentproduction. Here, the naming and description of concepts is carried out at various &amp;quot;links&amp;quot; in a &amp;quot;chain&amp;quot; of activities, which can include product design specification, technical writing (e.g. user manuals), revision, proofreading, translation, management information, etc.</Paragraph>
    <Paragraph position="1"> Normally, many of the persons involved in these activities have no specialized training in terminology, their methodology can be highly informal, and there may be no centralized repository for the terminological data.</Paragraph>
    <Paragraph position="2"> The technology and methodology we are developing for terminology-oriented knowledge management support are intended to be generic enough to be useful across the spectrum of terminology environments. The various knowledge management (KM) needs that characterize the two ends of the spectrum are examined in turn below.</Paragraph>
    <Section position="1" start_page="21" end_page="23" type="sub_section">
      <SectionTitle>
2.1 Terminology as a Distinct Specialization
</SectionTitle>
      <Paragraph position="0"> This type of environment is typified by organizations such as the Department of the * Secretary of State of Canada, which has had an official terminology service since 1953,  employing up to 80 staff terminologists preparing up to 4,000 terminology records a week (4). The mission of these terminologists is to facilitate the proper use of terms, in English and French, throughout the Public Service of Canada. To this end, terminologists maintain what is now the largest term bank in the world (about one million database records). They also prepare bilingual glossaries (which are often published) on subject areas requested by clients, and respond to inquiries from clients on specific problems. The terminological data that is collected can be conceptual or linguistic (5): conceptual data includes information such as subject-field labels, synonyms and antonyms, definitions, and equivalents in the second language; linguistic data includes information such as part-of-speech, morphological anomalies, usage labels, and idiomatic expressions. Terminologists in environments such as this one most often work thematically (6): in other words, they collect and describe (as exhaustively as practical constraints allow) the specialized terms used in a given field.</Paragraph>
      <Paragraph position="1"> The major challenge of terminology is conceptual, not linguistic: terminologists are trained in linguistics and thus are properly prepared for the linguistic dimension of their task; in contrast, they are not normally domain experts, yet they require a substantial amount of expert knowledge in order to do their work. In other words, the major difficulty is pinning down the meanings of terms. Compounding their problem is the fact that terminologists can be required to work in several fields simultaneously, or to change fields frequently depending on clients' needs.</Paragraph>
      <Paragraph position="2"> In the following paragraphs, we summarize (7) the four components of a terminologist's work in terms of the KM tasks on the one hand, and the roles of this knowledge in the production of terminology records on the other.</Paragraph>
      <Paragraph position="3">  KM tasks. Before any collection or analysis of terms can occur, terminologists must select the knowledge sources for the project. Given their linguistic orientation, they have traditionally preferred texts as knowledge sources, although the collaboration of experts is also highly valued. Before collecting the documentary corpus, terminologists acquire some general knowledge about the field by doing introductory reading in textbooks, encyclopedia articles, popularizing journals, etc. They begin to familiarize themselves with the general knowledge structures of the field, trying to determine its boundaries, subdivisions, and areas of overlap with other fields (for multidisciplinary fields). Often, at this stage, terminologists will sketch out these &amp;quot;skeletal&amp;quot; knowledge structures in the form of a concept network. They will also make mental or written notes on a number of individual concepts which emerge as being particularly important.</Paragraph>
      <Paragraph position="4"> Roles of knowledge. These preliminary KM activities are crucial to the selection of the documentary corpus since they help to clarify the project's scope: a clear idea of the conceptual boundaries of the field helps delimit the range of documentation to be sought. Determining areas of overlap with other fields also helps terminologists establish links with related documentation. When the terminologists are ready to begin the search for the documentary corpus, a clear idea of the major subfields helps them orient their work along a number of documentary &amp;quot;paths&amp;quot;, which may be priontized according to users' needs. The names of subfields, of key concepts, and of the characteristics of these concepts help provide specific points of entry into the documentation. Having a general idea of the hierarchical structure of the field also helps orient the process of documentation selection since terminologists tend to proceed from general to more specific literature.</Paragraph>
      <Paragraph position="5"> Once a preliminary corpus is obtained, their general knowledge of the domain provides terminologists with a yardstick for judging its quality. It also helps them classify documentation according to subfield. In multilingual terminology, classification according to subfield is particularly important: to &amp;quot;manage&amp;quot; the large amounts of documentation to be scanned (see 2.1.2 below), terminologists very often work on one subfield at a time, in one language and then in the other, before proceeding to another subfield.</Paragraph>
      <Paragraph position="6">  Finally, these preliminary conceptual activities provide terminologists with the conceptual framework and basic terminology needed for communicating with librarians and other documentation resource persons, as well as with experts. Communication is particularly important in the case of experts (8), who tend to be very busy: if terminologists have done their &amp;quot;homework&amp;quot;, they will be able to direct the conversation in order to elicit the maximum information in a minimum amount of time. &amp;quot;Starting out on the right foot&amp;quot; in this way boosts terminologists' credibility with experts, and increases their chances of convincing experts to remain involved as the project advances.</Paragraph>
      <Paragraph position="7">  KM tasks. Once the documentation has been selected, it undergoes a process called scanning, i.e. careful reading, with the extraction (9) of potential terms along with their contexts (10). Additional research may be needed for specific problems (e.g. terms not found for concepts identified, terms with inadequate contexts), after which the data is organized by grouping the various instances of a term, noting obvious cases of synonymy, abbreviations, usage labels, etc. Through the scanning process, terminologists begin to analyze (11) the general knowledge structures of the field, fleshing out (whether on paper or in the mind) the skeletal concept network drafted during their background reading. They also begin to analyze the conceptual characteristics of individual terms (i.e. the terms found in the documentation), based on the contexts in which the terms appear.</Paragraph>
      <Paragraph position="8"> Roles of knowledge. Drawing on their general understanding of the domain, terminologists begin identifying the lexical items that are specific to their field. This process involves eliminating terms that would constitute &amp;quot;noise&amp;quot; in the terminology, i.e. lexical items that belong to general rather than specialized vocabulary, or terms that do not fall within the established boundaries of the field. As well, terminologists must identify what are known as &amp;quot;silences,&amp;quot; i.e. lacunae in the preliminary terminology. As terminologists prepare to finalize the nomenclature (i.e. to determine the terms for which records will be prepared) and decide which contexts will be retained for analysis (2.1.3), the conceptual framework acquired so far will help them continue to communicate about problem areas with documentation resource persons and domain experts.</Paragraph>
    </Section>
    <Section position="2" start_page="23" end_page="24" type="sub_section">
      <SectionTitle>
2.1.3 Preparation of term records
</SectionTitle>
      <Paragraph position="0"> KM tasks. Using the established terminology and associated contexts, terminologists can begin a systematic analysis of terms-in-context. The primary function of this analysis is to determine the meanings of the terms, although it also serves to identify other linguistic characteristics such as part-of-speech, gender, frequency, geographic origin, etc. The conceptual goal at this stage is to achieve the depth of understanding needed to complete the term records. Terminologists carefully analyze the various contexts in which the terms have been found in order to identify a certain number of conceptual characteristics for all concepts. These characteristics will then be compared with those of potentially related concepts (e.g. synonyms, equivalents in the other language) in order to determine those which are necessary for establishing a conceptual match.</Paragraph>
      <Paragraph position="1"> Roles of knowledge. The most important application of conceptual analysis is definition construction. If they are attempting the classic intensional (i.e. genus-differentia) definition, terminologists will need to compare the characteristics of a given concept with those of concepts at the same hierarchical level (i.e. with the characteristics of the co-ordinate concepts (12)) in order to determine the distinguishing characteristics (i.e. the differentia in an intensional definition). Relations other than the generic-specific (e.g.</Paragraph>
      <Paragraph position="2"> whole-part, cause-effect, tool-function) may also be analyzed and reflected in definitions.</Paragraph>
      <Paragraph position="3"> Conceptual analysis is also essential to identifying synonyms and equivalents in the second language. Identifying synonyms requires a careful comparison of conceptual characteristics in order to determine that these are indeed identical for the terms in question.</Paragraph>
      <Paragraph position="4">  When two concepts differ in only a very few (and not very significant) characteristics, they may be designated as pseudosynonyms (e.g. one concept may have one more characteristic than another, and thus be more specific). Establishing a conceptual match is also crucial to multilingual terminology work, which is complicated by the fact that conceptual structures often do not correspond perfectly from one language to another, resulting in cases of incomplete equivalence. Sometimes there may be no equivalent in the other language at all, resulting in the need to create a neologism (13). In this case, conceptual analysis is essential for determining whether the concept already exists within the current knowledge structures of the target language, and when it does, what its characteristics are (since the concept is so new, its characteristics, and consequently its location within the knowledge structures, may still be fluctuating). In many cases, an existing term will be adopted to designate the new concept, and conceptual analysis of the candidate terms is essential for determining which one possesses the greatest semantic compatibility with the new concept.</Paragraph>
      <Paragraph position="5">  KM tasks. Quality control can be achieved by two types of activity: revision and updating. On the one hand, before the project is completed, the various types of information collected by the terminologist are revised by domain experts and other terminologists (e.g. terminologists with experience in neighbouring or related fields, or more experienced terminologists). On the other hand, after the project is completed, a periodic updating of terminology records can occur whenever this is justified by changes and expansion in the domain. Revising the results of a terminology project involves analyzing and discussing specific conceptual problems identified by the experts and/or other terminologists. Periodic updating implies a monitonng of changes in knowledge structures and conceptual characteristics.</Paragraph>
      <Paragraph position="6"> Roles of knowledge. To facilitate revision, terminologists need a sound understanding of the domain in order to interpret feedback from experts, and to elicit information on this feedback (e.g. when terminologists do not understand feedback, when the feedback contradicts what the terminologists found, or when experts give conflicting feedback). Regarding updating, a clear understanding of the current state of the knowledge will give the terminologist a basis for comparison when new structures and conceptual characteristics emerge. Conceptual problems increase when a field is particularly large or has complex knowledge structures, or when the field is changing rapidly.</Paragraph>
    </Section>
    <Section position="3" start_page="24" end_page="25" type="sub_section">
      <SectionTitle>
2.2 Terminology as a Component of Document-Production
</SectionTitle>
      <Paragraph position="0"> By document-production, we mean a &amp;quot;chain&amp;quot; of writing activities that are carried out from the inception of a product (14) to the production of public (or widely available) written information about this product. The &amp;quot;links&amp;quot; in a document-production chain can be distributed throughout an organization, and the actual &amp;quot;documents&amp;quot; in various states of completion. These documents can include anything from product designers' rough personal notes, to intermediate &amp;quot;current state&amp;quot; documents used to coordinate members of a team, to &amp;quot;official&amp;quot; publications (e.g. technical manuals produced by technical writers), to translations of these official publications.</Paragraph>
      <Paragraph position="1"> Although there are usually no officially designated terminologists in this type of environment, terminology-intensive activities are pervasive nonetheless: concepts are described and named by persons such as product designers, technical writers, proofreaders, revisers, abstracters, management information specialists, public relations officers, and translators (15). Given the heterogeneity of this type of environment, the terminology-related KM problems are much more complex than they are in the &amp;quot;purer&amp;quot; form of terminology work described above. The following are just some of the issues that contribute to this complexity.</Paragraph>
      <Paragraph position="2">  Given the variety of people involved in document-production, this kind of environment typically exhibits a lack of consistent methodology for terminology work.</Paragraph>
      <Paragraph position="3"> This problem is particularly crucial at the early stages of document-production. For example, product designers carry a heavy burden of defining and naming concepts, but have no formal training (and very often, no interest!) in terminology. Terms that are chosen &amp;quot;on the fly&amp;quot; easily become entrenched, even though they may be inappropriate. Normally, this type of environment does not stress a methodology for assuring that terms are clearly described and logically named, nor that the consistent use of approved terminology is enforced.</Paragraph>
      <Paragraph position="4">  Given the number of people that can be involved in document-production, coordinating the various links in the chain is a fundamental problem. Writers in a given link in the chain may, for example, have trouble understanding what the originator of certain terms actually meant by them. If it is impossible to contact the originators of knowledge personally, the meanings of terms may have to be reconstructed from scant resources. Knowing that a given document will soon be passed on to another link in the chain, documentors are easily tempted not to resolve terminological problems that they have inherited, leading to a &amp;quot;pass-my-confusion-onto-the-next-person&amp;quot; phenomenon. Complicating things is that documents do not flow in a one-way direction from inception to finalization; documentors, consequently, can be sent in loops. Common terminological problems resulting from this situation are inconsistency (terms being used to mean different things by different people), and overloading. (terms used in too many different senses). Coordination is also complicated by the fact that concepts exist at different levels of &amp;quot;clarity&amp;quot; at the various links in the chain: at the initial design stage, they may still be quite fuzzy; by the time they are documented in some kind of &amp;quot;official&amp;quot; text, their conceptual characteristics should be (in pnnciple, at least!) much clearer. From a terminological point of view, this conceptual fluidity means a continuous evolution of concept definitions and names from one link in the document-production chain to the next.</Paragraph>
    </Section>
    <Section position="4" start_page="25" end_page="26" type="sub_section">
      <SectionTitle>
2.2.3 Centralization of terminological data
</SectionTitle>
      <Paragraph position="0"> Most organizations do not maintain centralized repositories (e.g. term banks) of terminological data. When such repositories do exist, they often take the form of informal glossaries that may be out of date, not validated by experts and/or professional writers, and not used consistently throughout the organization. This state of affairs places a heavy onus on the documentor to find out who originated certain terms, what the terms mean, how they should be used in context, how they should be translated, and so on. The lack of centralization of terminological data (particularly conceptual data) is particularly problematic for people who are at the end Of the document-production chain - for example, the editors, proofreaders, and translators (16): they are the furthest away from the originators of concepts (and have the hardest time accessing these originators); the documents passed on to them are likely to have the greatest number of terminological problems (due to the &amp;quot;pass-my-confusion-onto-the-next-person&amp;quot; phenomenon mentioned above); and finally, these people usually have the least amount of domain expertise (editors, proofreaders and translators are typically language experts, not domain experts). A lack of centralized information about terms is also a drawback for newcomers to a project, since it forces them to acquire knowledge about terms almost from scratch.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="26" end_page="28" type="metho">
    <SectionTitle>
3. A GENERIC TOOL FOR TERMINOLOGY-ORIENTED
KNOWLEDGE MANAGEMENT
</SectionTitle>
    <Paragraph position="0"> As has been argued elsewhere (e.g. Ahmad et al. 1989, Czap and Nedobity 1990, Meyer and Paradis 1991, Parent 1989, Skuce and Meyer 1990a/b, Wijnands 1989), the knowledge management problems of terminology are not unique to this field. Rather, they are general problems of knowledge engineenng that are now receiving extensive attention in the literature of AI. The AI research group at the University of Ottawa, Canada, has over the past few years developed a generic knowledge engineering tool called CODE (Conceptually Oriented Design Environment, Skuce et al. 1989), which is written in Smalltalk and runs on a Macintosh, 386 or UNIX platform. CODE can be described as a generic knowledge manager, designed to assist any person (including the non-expert) faced with the task of acquinng, formalizing, refining and accessing the knowledge structures of a specialized domain. CODE allows the user to construct a knowledge base which describes concepts in frame-like units called CDs (concept descriptors) that are normally, though not necessarily, arranged in inheritance hierarchies.</Paragraph>
    <Paragraph position="1"> CODE has been tested in two terminology applications: a bilingual vocabulary project at the Department of the Secretary of State of Canada (Meyer and Paradis 1991, Skuce and Meyer 1990a/b) and a software documentation project at Bell Northern Research, the Canadian counterpart of Bell Labs (Skuce 1991). These two environments correspond to the two ends of the terminology spectrum described above. Based on what we learned during these experiences, we are now enhancing the system's terminological support in a new version of CODE (Version 4), expected to be operational in late 1991.</Paragraph>
    <Paragraph position="2"> Concurrently with system development, we are using CODE to build a prototype bilingual term bank, called COGNITERM, with a rich, highly structured and easily accessible knowledge component. In a nutshell, this term bank can be described as a hybrid between a traditional term bank (17) and a knowledge base.</Paragraph>
    <Paragraph position="3"> Since a general technical description of the current and forthcoming versions of CODE are found elsewhere (Skuce et al. 1989, Skuce and Meyer 1991), we shall just outline below some of the features that are receiving particular attention in light of the fact that we intend COGNITERM to facilitate the management of both lexical-semantic and encyclopedic information, and to be usable across the spectrum of terminology environments.</Paragraph>
    <Paragraph position="4"> User interface. Given the many different types of users that can be engaged in terminology-intensive activities, and the fact that we see a knowledge-based term bank as both a communication tool (e.g. between terminologists, between terminologists and experts, between the various &amp;quot;links&amp;quot; in a document-production &amp;quot;chain&amp;quot;) and a teaching tool, the user interface has been a top priority in system development from the start. Hence, the current version of CODE is already user-tailorable, i.e., the same knowledge base is accessible in different manners for different purposes. For example, a domain expert or a terminologist who is highly experienced in a domain will have a different set of options than a learner. In the current version of CODE, we have also placed a strong emphasis on graphical representation. The system can easily produce various types of semantic net diagrams, for both hierarchical and non-hierarchical relations. The graphical display updates automatically when changes are made to the knowledge base, and offers mechanisms for focussing on certain parts of the knowledge base, highlighting special concepts (e.g. concepts that are uncertain, unconfirmed, etc.), and comparing and contrasting knowledge substructures. In CODE Version 4, Hypercard-like bit map images will be available, so that one can ask of a term &amp;quot;show me one&amp;quot;, or of an image &amp;quot;what is this called?&amp;quot; Access to, and navigation through, the knowledge base. Since a knowledge base incorporates large amounts of encyclopedic information, and since different users will  require different information, it is important that the knowledge be easily accessible and navigable. A CODE knowledge base is essentially a hierarchically organized hypertext-like system, incorporating the notion of property inheritance. One may navigate in whatever manner is appropriate, with typical retracing abilities of hypertext systems. Unlike traditional term banks, in which access is strictly terminological (i.e. one must know a term in order to get conceptual information about it), CODE allows conceptual charactersfics to be entry-points into the knowledge, so that one can ask questions like &amp;quot;what is the term for the machine with function X&amp;quot;, &amp;quot;what is the term for the material with physical properties X, Y, Z?&amp;quot; Access to, and navigation through, the knowledge is facilitated by the graphical component described above, and also through a browsing capability. In Version 4, the browser will use a basic window whose behaviour is modelled after an outline processor, with the ability to dynamically expand and contract tree-structures. The user can easily tailor-make the browser to suit a given need. To facilitate the use of terms as entry-points into the knowledge, the current version of CODE has a search/rename browser that permits scanning of the entire knowledge base for every occurrence of a term, and can be restricted to certain contexts (e.g. concept names, names of conceptual characteristics, descriptions of conceptual characteristics) to speed up the search. Version 4 will include a clearly defined set of terminological &amp;quot;status levels&amp;quot;, by which we mean attributes of a term such as how it is used (e.g. as a concept name or the name of a conceptual characteristic), whether it is defined or not, whether it is used in definitions but is not a knowledge base concept or property, etc.</Paragraph>
    <Paragraph position="5"> Informal, trial-and-error knowledge experimentation. The system contains features, which we are still developing, for managing knowledge that is in different states of &amp;quot;clarity&amp;quot; (for want of a better term). Lack of clarity may be due to several causes: for example, a terminologist may be unclear about a concept because he/she does not have the domain expertise to understand it properly; a technical writer, translator, etc. in the document-production chain may be unclear about a concept because people at various preceding links in the chain have used a term inconsistently; a concept may be very new (e.g. in the case of neologisms) and thus intrinsically unclear; and so on. In all these situations, we find problems such as what to call a concept, what the superconcept is, what subconcepts it has, what characteristics it has, what the similar concepts are. CODE permits rapid entry of hunches, guesses, trials, etc., followed by experimentation with the consequences of entering new knowledge. For example, superconcept links may be changed on the graph just by dragging, and the consequences can be seen immediately in textual or graphical form. One may ask for &amp;quot;similar&amp;quot; concepts, or potential terminological conflicts. Previously made changes (up to three) can be discarded in one click.</Paragraph>
    <Paragraph position="6"> Multidimensionality. It is well known that concepts and entire knowledge structures can be &amp;quot;seen&amp;quot; from various &amp;quot;viewpoints&amp;quot; (18), which correspond roughly to the needs or interests of the knowledge base user. CODE offers a &amp;quot;masking&amp;quot; facility that allows one to restrict what is visible in the knowledge base by Boolean conditions on concepts and characteristics. For example, different users might require different types of knowledge about a certain laboratory procedure. CODE allows one user to say &amp;quot;show me only things about this laboratory procedure related to the tools that are required&amp;quot;, and another to say &amp;quot;show me only things related to the types of organisms that the procedure can identify&amp;quot;. The masking facility also allows the notion of viewpoint to be extended to include a notion of depth of domain expertise. For example, the user may request information about the laboratory procedure that would be of interest (and understandable) to a beginning biology student, or to a seasoned researcher.</Paragraph>
    <Paragraph position="7"> Ranking of conceptual characteristics. We are currently investigating the usefulness of ranking characteristics according to where they fall in the lexicalsemantic/encyclopedic continuum. For certain purposes (e.g. users with different levels of domain expertise), it may be useful to at least distinguish between characteristics that are necessary and sufficient, those that are encyclopedic but useful to establishing interlingual  equivalence, and finally, all other encyclopedic characteristics. We are also investigating an algorithm proposed by Maybury (1990) for ranking characteristics according to concept similarity on the one hand (e.g. similarity of characteristics of co-ordinate concepts), and prototypicality on the other (e.g. the degree to which a concept's characteristics are reflected in its subordinate concepts), which offers the possibility of generating definitions of the genus-differentia type automatically.</Paragraph>
    <Paragraph position="8"> Multiple knowledge bases. Facilities for managing multiple knowledge bases (under development for Version 4) are required in order to work in multidisciplinary fields, and in order to work multilingually (since knowledge structures rarely correspond perfectly from one language to another). Both situations require support for isolating areas of correspondence and non-correspondence, and for comparing and contrasting. Multilingual work will require support for automatically generating some knowledge substructures (i.e.</Paragraph>
    <Paragraph position="9"> those that do correspond for the most part); eventually, this would involve a machine translation component. CODE already includes a general high-level ontology, which is being regularly refined; it will eventually serve as a basis for integrating knowledge bases. Quality control. The envisaged use of the system by various persons in a terminology environment necessitates a sophisticated capacity for quality control. CODE offers a capacity for detecting conceptual inconsistencies of various types, carrying out type checking, flagging entries as to source, entry person, date, state of correctness, etc.</Paragraph>
    <Paragraph position="10"> Database-like retrieval facilities permit queries such as &amp;quot;show me all entries about laser printers made by X since last month and not yet approved&amp;quot;. In order to ensure terminological consistency, CODE offers a number of features for assisting users in naming a conceptual characteristic (a common terminological problem in knowledge base building). The system can display all currently used names of similar properties (e.g. all properties belonging to the same category of property), and will prompt if this property name has already been used elsewhere.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML