File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1801_intro.xml
Size: 19,553 bytes
Last Modified: 2025-10-06 14:02:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1801"> <Title>A Lexico-semantic Approach to the Structuring of Terminology</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recent literature in terminology circles constantly reminds us that methods and practices have changed drastically due mostly to the extensive use of electronic corpora and computer applications. What might appear as normal and standard in computational circles has had profound consequences for terminologists; this has led many to criticize traditional theoretical principles and some to propose new approaches (Bourigault and Slodzian 1999; Cabre, 2003, among others; see L'Homme et al., 2003 for a review).</Paragraph> <Paragraph position="1"> One of the issues at the centre of this debate is that of diverging views on the relationship between the term and the abstract entity it is supposed to represent (a &quot;concept&quot; or a &quot;meaning&quot;). Differing views will inevitably lead to very different ways of envisaging terms and methods of structuring them.</Paragraph> <Paragraph position="2"> Some might be compatible with a given application, while others are much more difficult to accommodate.</Paragraph> <Paragraph position="3"> In this paper, I will try to demonstrate some of the methodological consequences of adopting a conceptual approach or a lexico-semantic approach to terminology structuring. These observations are drawn from my experience in compiling specialized dictionaries using corpora as primary sources and computer applications to exploit them.</Paragraph> <Paragraph position="4"> Even though the application I am familiar with is very specific and obviously influences my view on the structuring of terms, I believe this topic is also relevant for other terminology-related applications. For example, in computational terminology, there is an increasing interest for structuring extracted terms (articles in Daille et al., 2004 and in Nazarenko and Hamon, 2002, among others).</Paragraph> <Paragraph position="5"> Automatic term structuration is carried out by considering morphological variants (Daille, 2001; Grabar and Zweigenbaum, 2004), performing distributional analysis to build classes of semantically related terms (Nazarenko et al., 2001, among others), or acquiring other types of linguistic units, such as collocations or verbal phrases, from specialized corpora.</Paragraph> <Paragraph position="6"> These questions will be addressed from a linguistic point of view, but many have been dealt with directly or indirectly by computational terminologists and, in fact, are often raised by their work on specialized corpora. I will also try to demonstrate that the problems dealt with in this paper are by no means a reflection of a tendency often attributed to linguists to make things more complicated than they actually are. I would like to show that they are a reflection of the functioning of terms in running text.</Paragraph> <Paragraph position="7"> 2 Two different approaches to terminology The conceptual approach I describe is the one advocated by the Vienna School of terminology that has been and is still applied to work carried out by terminologists. The results of its analyses is encoded in term records in term banks or in articles in terminological dictionaries.</Paragraph> <Paragraph position="8"> CompuTerm 2004 - 3rd International Workshop on Computational Terminology 7 The lexico-semantic approach on which my discussion is based is the Explanatory and Combinatorial Lexicology (ECL) (Mel'euk et al., 1995; Mel'euk et al. 1984-1999) which is the lexicological component of the Meaning-text Theory (MTT). As will be seen further, ECL provides an apparatus, namely lexical functions (LFs), that can capture a wide variety of semantic relations between lexical units. ECL descriptions are encoded in an Explanatory and Combinatorial Dictionary (ECD) (Mel'euk et al. 1984-1999).</Paragraph> <Paragraph position="9"> In order to illustrate the methodological consequences of the two approaches under consideration, I will use a basic term in the field of computing, i.e., program. This term was chosen because no one will question its status in computing no matter what his or her view is on terms and terminology.</Paragraph> <Paragraph position="10"> In addition, like many basic terms, program is polysemic, ambiguous in some contexts, and semantically related to several other terms. It will be very useful to show the variety of semantic relationships in which terminological units participate. Finally, program does not refer to a concrete object. Hence, its analysis will pose problems different from those raised by terms like printer or computer.</Paragraph> <Paragraph position="11"> I will also frequently refer to a corpus from which my observations are derived. This corpus contains over 53 different texts and amounts to 600,000 words. It was compiled by the terminology team within the group Observatoire de linguistique Sens-Texte (OLST) in Montreal.</Paragraph> <Paragraph position="12"> Since I am not an expert in computer science, I must rely - like other terminologists - on information provided in a corpus and not on previous knowledge to analyze the meaning of program and the other terms to which it is related.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 A conceptual approach to the processing </SectionTitle> <Paragraph position="0"> of the term program When considering a unit such a program, terminologists who adhere to a conceptual approach will define its place within a conceptual structure. This is done by considering its characteristics (in fact, often by deciding which ones are relevant), and by analyzing classical relationships, such as hyperonymy (or, rather, generic-specific) and meronymy (or whole-part). In order to achieve this, terminologists usually gather information from reliable corpora.</Paragraph> <Paragraph position="1"> The corpus first informs us that &quot;program&quot; can be subdivided into in one of the following categories; 1. &quot;operating system&quot;; 2. &quot;application software&quot;, i.e., &quot;word processor&quot;, &quot;spreadsheet&quot;, &quot;desktop publishing software&quot;, &quot;browser&quot;, etc.; and 3. &quot;utility program&quot;. It also tells us that there are different types of &quot;programs&quot;: 1. &quot;shareware programs&quot;, &quot;freeware programs&quot;; &quot;educational programs&quot;; and &quot;commercial programs&quot;; 2.</Paragraph> <Paragraph position="2"> &quot;command-driven programs&quot; and &quot;menu-driven programs&quot;.</Paragraph> <Paragraph position="3"> One possible representation of these relationships has been reproduced in Figure 1. Of course, my interpretation of the data listed above is simplified, since it does not take into account all the relationships that can be inferred from it (e.g., the fact that software programs or educational programs can be menu-driven). Also, part-whole relationships for some of these subdivisions can be identifed (e.g., the fact that programs - classified according to the interface - have parts such as menus, windows, buttons, options, etc.).</Paragraph> <Paragraph position="4"> program according to the task or tasks to perform between &quot;program&quot; and related concepts For the time being, I will assume that I have solved the problems related to the relations between &quot;program&quot; and other relevant concepts (which, in fact, is not the case, as we will see below).</Paragraph> <Paragraph position="5"> The corpus also allows me to observe that the concept I am currently dealing with, has different names: program and software program. This will normally be dealt with in conceptual CompuTerm 2004 - 3rd International Workshop on Computational Terminology8 representations by taking for granted that all these different linguistic forms refer to the same concept, and thus are true synonyms. In my representation, they will be attached to the same node as &quot;program&quot; (see Figure 2).1 Furthermore, since concepts and conceptual representations are considered to be languageindependent, their description and representation should be valid for all languages. Hence, my representation system should apply to French (and to true synonyms in French) and other languages (see Figure 2).</Paragraph> <Paragraph position="6"> program (program; software program) (Fr. logiciel) according to the task or tasks to perform Regarding this last issue, a choice must often be made between several potential synonyms in order to select a single identifier for a concept. This choice can simply be functional (allowing the labelling of a node in a representation such as that in Figure 1) or result from standardizing efforts. The choice of a unique identifier is central in conceptual analyses, since relationships are defined first and foremost between concepts and are considered to be valid for the linguistic forms that label them.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Other issues related to the analysis of </SectionTitle> <Paragraph position="0"> program In my discussion on the processing of program, I deliberately avoided other important issues revealed by the data contained in the corpus. We will look at some of these issues in this section. First, &quot;programs&quot; can be further classified according to the language used create them (&quot;C programs&quot;, &quot;C++ programs&quot;, &quot;Java programs&quot;), or according to the hardware device they manage 1Large-scale ontologies represent concepts and lexical forms using a similar strategy. For example, the Unified Medical Language System (UMLS) (National Library of Medicine, 2004) makes a clear separation between a Semantic Network and a Lexicon.</Paragraph> <Paragraph position="1"> (&quot;BIOS program&quot;, &quot;boot program&quot;). Incidentally, in French, the first subdivision (the one represented in section 2.1) corresponds to logiciel. The ones we just introduced are named programme.</Paragraph> <Paragraph position="2"> This obviously has consequences for the representation of program produced above. The problem can be solved in conceptual approaches by: a. Considering that program refers to a single concept, and trying to account for the different ways of organizing its relationships with other concepts with new conceptual subdivisions. This will produce a very complex, yet possible, graphical representation; b. Focussing on a single organization of the concept &quot;program&quot; (for example, the one chosen in section 2.1.) and defining the others as being related to vague or improper uses of program; or, finally, c. Saying that program is associated with two or three different concepts, and possibly classifying them into three different subfields of computing, i.e., concept1 = micro-computing; concept2 = programming; concept3 = hardware. If the description is carried out in a multilingual context, the subdivision will be necessary to account for the fact that, in French, for instance, program can be translated by logiciel or programme. This latter choice is the one that is closest to the distinctions made with the lexico-semantic approach dealt with in the following section.</Paragraph> <Paragraph position="3"> Secondly, program shares with other lexical units many other different semantic relationships other than the taxonomic and meronymic relations previously considered. All the relationships listed below have been found in the corpus.2 o Relationships that involve activities and that are expressed linguistically mostly by collocates of program: Function: a program performs tasks Creation: development, creation of a program, programming Actions that can be carried out on programs: configuration, installation, running, aborting, etc.</Paragraph> <Paragraph position="4"> 2Some of these have been listed in Sager (1990) who argued that a large variety of conceptual relationships could be found in specialized subject fields.</Paragraph> <Paragraph position="5"> CompuTerm 2004 - 3rd International Workshop on Computational Terminology 9 o Relationships that involve properties and that are also expressed linguistically by collocates of program: powerful program, user-friendly program; feature of a program o Argument or circumstantial relationships: Agent: user of a program; programmer Instrument: create a program with a language Location: install the program on the hard disk, on the computer o Other relationships expressed by morphological derivatives terms that include the meaning of program; programming, programmable, reprogrammable Most relationships listed above are non-hierarchical and may be expressed by parts of speech other than nouns. Consider, for example, actions that can be performed on a program (configuration, configure; install; installation, etc.). 3 Some will be very difficult to account for in terms of conceptual representations. Of course, conceptual-approach advocates might argue that these relationships are not relevant for terminology.</Paragraph> <Paragraph position="6"> Thirdly, in my discussion of the fact that concepts could have different names, I mentioned only a synonym, but concepts are expressed in a variety of forms in corpora. Many of these will not take the form of nouns.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 A lexico-semantic approach </SectionTitle> <Paragraph position="0"> In this section, I repeat my analysis of program this time using a lexico-semantic approach. This approach is also based on data gathered from corpora. The discussion presented in this section is summarized in Table 1.</Paragraph> <Paragraph position="1"> First, the analysis of program in the corpus reveals that it has three different meanings.</Paragraph> <Paragraph position="2"> Program can be defined as: 1) a set of instructions written by a programmer in a given programming language in order to solve a problem (this meaning is also conveyed by computer program); 2) a set of programs (in sense 1) a user installs and runs on his computer to perform a number of tasks (this meaning being also conveyed by software program); and 3) a small set of instructions designed to run a specific piece of hardware.</Paragraph> <Paragraph position="3"> 3 Another non-hierarchical relationship has received a lot of attention recently, that of cause-effect.</Paragraph> <Paragraph position="4"> This sense distinction is validated by the fact that program can be related to different series of lexical units.</Paragraph> <Paragraph position="5"> For example, a program1 is something that someone, called a programmer, writes, executes, compiles and debugs. It can be machine-readable or human-readable. It can also end or terminate.</Paragraph> <Paragraph position="6"> Program can be modified by names given to languages, i.e., C program, C++ program, Java program. Finally, it can also have parts such as modules, routines, and instructions.</Paragraph> <Paragraph position="7"> A program2 is something a user installs on his computer, loads into the memory, runs, and sometimes uninstalls. Different sorts of programs can be identified, such as operating systems, applications, and utilities. Programs can have parts such as windows, menus, options, etc. Finally, a program2 can be user-friendly.</Paragraph> <Paragraph position="8"> CompuTerm 2004 - 3rd International Workshop on Computational Terminology10 A program3 consists of a few code lines written in order to specify the behaviour of a specific hardware device, such as a memory. The device is then said to be programmable and/or reprogrammable. It can be programmed and reprogrammed.</Paragraph> <Paragraph position="9"> In this lexico-semantic approach, the relationships observed between program and other terms are attached to its specific meanings. This distinction allows us to relate other terms to specific senses. For example, program1 is related to other senses as follows: Synonym: computer ~ Types of programs: C ~, Java ~ Parts of programs: instruction, page, segment, line, routine Creation of a program: write ~, create ~, to program, programming Agent: programmer Cause a program to function: execute ~ The program stops functioning: ~ ends, ~ terminates etc.</Paragraph> <Paragraph position="10"> Since most semantic relationships are nonhierarchical, they can be represented in a relational model. In ECL, paradigmatic and syntagmatic semantic relations are represented by means of a single formalism, i.e., lexical functions (LFs). LFs are used to capture abstract and general senses that remain valid for a large number of lexical units. The relationships listed above could be formalized as follows: 4 synonym: Syn(program1) = computer ~ agent of a program: S1(program1) = programmer create a program: CausFunc0(program1) = create [DET ~], write [DET ~] Cause a program to function:</Paragraph> <Paragraph position="12"> Authors have proposed LFs especially designed to represent these relations (Spec, for hyponymy; and Part; for meronymy). However, ECL will prefer accounting for these relationships with non-standard lexical functions in order to explain the specific nature of the relationships between a lexical unit and its meronym.</Paragraph> <Paragraph position="13"> 3 General comments on the analyses of terms These two brief analyses of program reveal the following about terms: * Terms can convey multiple meanings. This is not an accidental property that only affects program. Numerous examples can be found in corpora and have been dealt with in recent literature. This, of course, has important consequences for both conceptual and lexico-semantic approaches.</Paragraph> <Paragraph position="14"> * Terms can enter into a large variety of relationships with other terms, and not only taxonomic or meronymic relationships. The understanding of these relationships is necessary to capture sense distinctions; in addition, relationships are valid for a specific meaning.</Paragraph> <Paragraph position="15"> * Some of the relationships observed between terms are hierarchical: hyperonymy and meronymy.</Paragraph> <Paragraph position="16"> * Most semantic relationships are nonhierarchical: e.g., actions carried out by terms, properties, cause-effect.</Paragraph> <Paragraph position="17"> * Some relationships involve lexical units other than nouns: e.g., actions and creation are often expressed linguistically by means of verbs; properties are expressed by adjectives.</Paragraph> <Paragraph position="18"> * Most relationships involve terms considered as linguistics units rather than labels for concepts: e.g., morphological derivatives.</Paragraph> <Paragraph position="19"> In fact, what these observations tend to show is that terms behave like other lexical units and must be dealt with accordingly. Terms will acquire their specificity through a given application with set objectives, but as units occurring in corpora, terms cannot be differentiated from other lexical units.</Paragraph> </Section> </Section> class="xml-element"></Paper>