XML Viewer - e95-1003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1003_intro.xml
Size: 14,265 bytes
Last Modified: 2025-10-06 14:05:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1003">
  <Title>Criteria for Measuring Term Recognition</Title>
  <Section position="2" start_page="0" end_page="18" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years, the automatic extraction of terms from running text has become a subject of growing interest. Practical applications such as dictionary, lexicon and thesaurus construction and maintenance, automatic indexing and machine translation have fuelled this interest. Given that concerns in automatic term recognition are practical, rather than theoretical, the lack of serious performance measurements in the published literature is surprising. null Accounts of term-recognition systems sometimes consist of a purely descriptive statement of the advantages of a particular approach and make no attempt to measure the pay-off the proposed approach yields (David, 1990). Others produce partial figures without any clear statement of how they are derived (Otman, 1991). One of the best efforts to quantify the performance of a term-recognition system (Smadja, 1993) does so only for one processing stage, leaving unassessed the text-to-output performance of the system.</Paragraph>
    <Paragraph position="1"> While most automatic term-recognition systems developed to date have been experimental or in-house ones, a few systems like TermCruncher (Normand, 1993) are now being marketed. Both the developers and users of such systems would benefit greatly by clearly qualifying what each system aims to achieve, and precisely quantifying how closely the system comes to achieving its stated aim.</Paragraph>
    <Paragraph position="2"> Before discussing what a term-recognition system should be expected to recognize and how performance in recognition should be measured, two underlying premises should be made clear. Firstly, the automatic system is designed to recognize segments of text that, conventionally, have been manually identified by a terminologist, indexer, lexicographer or other trained individual. Secondly, the performance of automatic term-recognition systems is best measured against human performance for the same task. These premises mean that for any given application - terminological standardization and vocabulary compilation being the focus here - it is possible to measure the performance of an automatic term-recognition system, and the best yardstick for doing so is human performance.</Paragraph>
    <Paragraph position="3"> Section 2 below draws on the theory of terminology in order to qualify what a true term-recognition system must achieve and what, in the short term, such systems can be expected to achieve. Section 3 specifies how the established ratios used in information retrieval - recall and precision - can best be adapted for measuring the recognition of single- and multi-word noun terms.</Paragraph>
    <Paragraph position="4"> 2 What is to be Recognized? Depending upon the meaning given to the expression &amp;quot;term recognition&amp;quot;, it can be viewed as either a rather trivial, low-level processing task or one that is impossible to automate. A limited form of term recognition has been achieved using current techniques (Pcrron, 1991; Bourigault, 1994; Normand, 1993). To appreciate what current limitations are and what would be required to achieve full term recognition, it is useful to draw the distinction between &amp;quot;term&amp;quot; and &amp;quot;termform&amp;quot; on the one hand, and &amp;quot;term recognition&amp;quot; and &amp;quot;term interpretation&amp;quot; on the other.</Paragraph>
    <Section position="1" start_page="0" end_page="17" type="sub_section">
      <SectionTitle>
2.1 Term vs Termform
</SectionTitle>
      <Paragraph position="0"> Particularly in the computing community, there is a tendency to consider &amp;quot;terms&amp;quot; as strictly formal entities. Although usage among terminologists varies, a term is generally accepted as being the &amp;quot;designation of a defined concept in a special language by a linguistic expression&amp;quot; (ISO, 1988). A term is hence  the intersection between a conceptual realm (a defined semantic content) and a linguistic realm (an expression or termform) as illustrated in Figure 1.</Paragraph>
      <Paragraph position="1"> A term, thus conceived, cannot be polysemous although termforms can, and often d% have several meanings. As terms precisely defined in information processing, &amp;quot;virus&amp;quot; and &amp;quot;Trojan Horse&amp;quot; are unambiguous; as termforms they have other meanings in medicine and Greek mythology respectively.</Paragraph>
      <Paragraph position="2"> This view of a term has one very important consequence when discussing term recognition. Firstly, term recognition cannot be carried out on purely formal grounds. It requires some level of linguistic anMysis. Indeed, two term-formation processes do not result in new termforms: conversion and semantic drift 1. A third term-formation process, compression, can also result in a new meaning being associated with an existing termform 2.</Paragraph>
      <Paragraph position="3"> Proper attention to capitalization can generally result in the correct recognition of compressed forms.</Paragraph>
      <Paragraph position="4"> Part-of-speech tagging is required to detect new terms formed through conversion. This is quite feasible using statistical taggers like those of Garside (1987), Church (1988) or Foster (1991) which achieve performance upwards of 97% on unrestricted text. Terms formed through semantic drift are the wolves in sheep's clothing stealing through terminological pastures. They are well enough conceMcd to allude at times even the human reader and no automatic term-recognition system has attempted to distinguish such terms, despite the prevalence ofpolysemy in such fields as the social sciences (R.iggs, 1993) and the importance for purposes of terminological standardization that &amp;quot;deviant&amp;quot; usage be tracked. Implementing a system to distinguish new 1Conversion occurs when a term is formed by a change in grammatical category. Verb-to-noun conversion commonly occurs for commands in programming or word processing (e.g. Undelete works if you catch your mistake quickly). Semantic drift involves a (sometimes subtle) change in meaning without any change in grammatical category (viz. &amp;quot;term&amp;quot; as understood in this paper vs the loose ~Jsage of &amp;quot;~etm&amp;quot; to mc~n &amp;quot;termform&amp;quot;). 2Compression is the shortening of (usually complex) termforms to form acronyms or other initialisms. Thus PAD can either designate a resistive loss in an electrical circuit or a &amp;quot;packet assembler-disassembler'.</Paragraph>
      <Paragraph position="5"> meanings of established termforms would require analyzing discourse-level clues that an author is assigning a new meaning, and possibly require the application of pragmatic knowledge. Until such advanced levels of analysis can be practically implemented, &amp;quot;term recognition&amp;quot; will largely remain &amp;quot;termform recognition&amp;quot; and the failure to detect new terms in old termforms will remain a qualitative shortcoming of all term-recognition systems.</Paragraph>
    </Section>
    <Section position="2" start_page="17" end_page="18" type="sub_section">
      <SectionTitle>
2.2 Term Recognition vs Term
Interpretation
</SectionTitle>
      <Paragraph position="0"> The vast majority of terms in published technical dictionaries and terminology standards are nouns.</Paragraph>
      <Paragraph position="1"> Furthermore, most terms have a complex termform, i.e. they are comprise~t of more than one word.</Paragraph>
      <Paragraph position="2"> Sublanguages create series of complex termforms in which complex forms serve as modifiers (natural language ~ \[natural language\] processing) and/or are themselves modified (applied \[\[natural language\] processing\]). In special language, complex termforms containing nested termforms, or significant subexpressions (Baudot, 1984), have hundreds of possible syntagmatic structures (Portelance, 1989; Lauriston, 1993). The challenge facing developers of term-recognition systems consists in determining the syntactic and conceptual unity that complex nominals must possess in order to achieve termhood 3 Another, and it will be argued far more ambitious, undertaking is term interpretation. Leonard (1984), Finen (1985) and others have attempted to devise systems that can produce a gloss explicating the semantic relationship that holds between the constituents of complex nominals (e.g. family estate ~ estate owned by a family). Such attempts at achieving even limited &amp;quot;interpretation&amp;quot; result in large sets of possible relationships but fail to account for all compounds. Furthermore, they have generally been restricted to termforms with two constituents. For complex termforms with three or more constituents, merely identifying how constituents are nested, i.e., between which constituents there exists a semantic relationship, can be difficult to automate (Sparck-:lones, 1985).</Paragraph>
      <Paragraph position="3"> In most cases, however, term recognition can be achieved without interpreting the meaning of the term and without analyzing the internal structure of complex termforms. Many term-recognition systems like TERMINO (David, 1990), the noun-phrase detector of LOGOS (Logos, 1987), LEXTER (Bourigault, 1994), etc., nevertheless attempt to recognize nested termforms. Encountering &amp;quot;automatic protection switching equipment&amp;quot;, systems adopting this Sin this respect, complex termforms, unlike collocations, must designate definable nodes of the conceptual system of an area of specialized human activity. Hence general trend may be as strong a collocation as general election, and yet only the latter be considered a term.</Paragraph>
      <Paragraph position="4">  approach would produce as output several nested termforms (switching equipment, protection switching, protection switching equipment, automatic protection, automatic protection switching) as well as the maximal termform automatic protection switching equipment. Because such systems list nested termforms in the absence of higher-level analysis, many erroneous &amp;quot;terms&amp;quot; are generated.</Paragraph>
      <Paragraph position="5"> It has been argued previously on pragmatic grounds (Lauriston, 1994) that a safer approach is to detect only the maximal termform. It could further be said that doing so is theoretically sound.</Paragraph>
      <Paragraph position="6"> Nesting termforms is a means by which an author achieves transparency. Once nested, however, a termform no longer fulfills the naming function. It serves as a mnemonic device. In different languages, different nested termforms are sometimes selected to perform this mnemonic function (e.g. on-line credit card checking, for which a documented French equivalent is vdrification de crddit au point de vente, literally &amp;quot;point-of-sale credit verification&amp;quot;). Only the maximal termform refers to the designated concept and thus only recognition of the maximal termform constitutes term recognition 4.</Paragraph>
      <Paragraph position="7"> Term interpretation may be required, however~ to correctly delimit complex termforms combined by  means of conjunctions. Consider the following three conjunctive expressions taken from telecommunication texts: (1) buffer content and packet delay distributions (2) mean misframe and frame detection times (3) generalized intersymbol-interference and jitter null free modulated signals Even the uninitiated reader would probably be inclined to interpret, correctly, that expression (1) is a combination of two complex termforms: buffer content distribution and packet delay distribution. Syntax or coarse semantics do nothing, however, to prevent an incorrect reading: buffer content delay distribution and buffer packet delay distribution. Expression (2) consists of words having the same sequence of grammatical categories as expression (1), but in which this second reading is, in fact, correct: mean misframe detection time and mean frame detection time. Although rather similar to the first two, conjunctive expression (3) is a single term, sometimes designated by the initialism GIJF.</Paragraph>
      <Paragraph position="8"> Complex termforms appearing in conjunctive expressions may thus require term interpretation for proper term recognition, i.e. reconstructing the conjuncts. If term recognition is to be carried out independently of and prior to term interpretation, as is 'This does not imply that analyzing the internal structure of complex termforms is valueless. It has the very important, but distinct, value of prodding clues to paradigmatic relationships between terms.</Paragraph>
      <Paragraph position="9"> presently feasible, then it can only be properly seen as &amp;quot;maximal termform recognition&amp;quot; with the meaning of &amp;quot;maximal termform&amp;quot; extended to include the outermost bracketing of structurally ambiguous conjunctive expressions like the three examples above.</Paragraph>
      <Paragraph position="10"> This extension in meaning is not a matter of theoretical soundness but simply of practical necessity. In summary, current systems recognize termforms but lack mechanisms to detect new terms resulting from several term-formation processes, particularly semantic drift. Under these circumstances, it is best to admit that &amp;quot;termform recognition&amp;quot; is the currently feasible objective and to measure performance in achieving it. Furthermore, since the nested structures of complex termforms perform a mnemonic rather than a naming function, it is theoretically unsound for an automatic term-recognition system to present them as terms. For purposes of measurement and comparison, &amp;quot;term recognition&amp;quot; should thus be regarded as &amp;quot;maximal termform recognition&amp;quot;. Once this goal has been reliably achieved, the output of a term-recognition system could feed a future &amp;quot;term interpreter&amp;quot;, that would also be required to recognize terms in ambiguous conjunctive expressions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML