File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1042_metho.xml
Size: 11,160 bytes
Last Modified: 2025-10-06 14:11:30
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1042"> <Title>TERMSERVICE - AN AUTOMATED SYSTEM FOR TERMINOLOGY SERVICES</Title> <Section position="3" start_page="0" end_page="265" type="metho"> <SectionTitle> BACKGROUND OF THE PROJECT TEE~SERVICE </SectionTitle> <Paragraph position="0"> Nobody working in a scientific or technical field can do without some information published in other languages. This implies that the need for translation is enormous, which means that for each language efforts must be made to compile the correct terminology.</Paragraph> <Paragraph position="1"> The latter, in its turn, needs to be standardized.</Paragraph> <Paragraph position="2"> As elsewhere, in our country translators and specialists in scientific or technical fields are asking to have the terminology relevant to their work made available in foreign languages.</Paragraph> <Paragraph position="3"> Moreover, conventional dictionaries, provided they exist, cause a lot of problems and often turn out to be an inefficient alternative. This is mainly due to the rather long publication terms which make a specialized dictiouary inadequate when it is finally published~ especially in rapidly developing fields wh~re the grov~h of terminology is most pronounced. Besides, a conventional dictionary, as a rule, makes no provision for updating its files, mainly from feedback coming from users.</Paragraph> <Paragraph position="4"> The only technique which offers enormou~ storage capacity and fast 266 B. NIKOLOVA and I. NENOVA retrieval of terminology is computerized data processing. A well set up terminology bank offers ample opportunities to solve the difficulties met by the translator or specialist in the field in dealing with scientific or technical texts.</Paragraph> <Paragraph position="5"> To meet these needs, the Laboratory of Mathematical Linguistics at the Institute of Mathematics with Computer Centre affiliated to the Bulgarian Academy of Sciences has set itself a number of short, medium and long-term aims, designed to lead to the setting up of automated terminology services - the project TERMSERVICE.</Paragraph> </Section> <Section position="4" start_page="265" end_page="265" type="metho"> <SectionTitle> FUNCTIONS A~ND USERS OF THE SYSTEM TERMSERVICE </SectionTitle> <Paragraph position="0"> The system TEk~VICE is designed to be used in the following environments: - in the computer-aided translation environment the terminological database can be used by human translators and specialists in scientific and technical fields as a computer-aided multilingual dictionary; - as far as the terminological environment is concerned, the data-base provides sufficient linguistic information to conduct research on terminology and to standardize terms, abbreviations, acronyms in several languages.</Paragraph> <Paragraph position="1"> As secondary resulting use environments we could mention: - machine translation systems which can adapt and incorporate the terminological database to serve their aims in translating natural language documents; - computer-aided instruction in foreign languages.</Paragraph> </Section> <Section position="5" start_page="265" end_page="265" type="metho"> <SectionTitle> ACQUISITION OF TERMINOLOGY </SectionTitle> <Paragraph position="0"> Dictionary-making is a popular trade with established traditional procedures for compiling the lexical files to be included in a printed volume. People are also aware of the traditionally high cost of developing user-oriented specialized terminology. This task is quite labour-cons~ming and requires team efforts from a large number of translators, terminologists and lexicographers.</Paragraph> <Paragraph position="1"> As far as the traditional methods of lexicography are concerned, the computer can help in alphabetizing and updating the files, which is a comparatively elementary leveldeg The literature notes the following ways of establishing linguistic material for the development of terminological databases: - analysis of original documents in each language, the comparative study of these documents giving real equivalents of professional language usage; - compilation of terminology by specialized institutions~ - inclusion of terminology contained in conventional dictionaries and other reference materials; - interaction between currently existing databases and terminology exchanges; - techniques ether than by means of text~ as, for instance, experiment~ inquiry~ introspectiondeg</Paragraph> </Section> <Section position="6" start_page="265" end_page="265" type="metho"> <SectionTitle> TERMSERVICE 267 PARALLEL TEXTS ANALYSIS </SectionTitle> <Paragraph position="0"> We have chosen to extract terminology from previously translated material by the method of parallel texts analysis. The idea was taken up from presently existing automated systems fer lexicographic services developed in the USSR by the All-Union Centre for Translation of Scientific and Technical Literature and Documentation, Moscow.</Paragraph> <Paragraph position="1"> Preference is given to this method since it yields most reliable and genuine results about the state-of-the-art of terminology and allows to perform the lexicographical work mostly automatically.</Paragraph> <Paragraph position="2"> An original text in one language and its translation in another are termed as parallel texts. The task is to separate the items of translation in the original text, to find their equivalents in the target-language translation, to establish a one-to-one correspondence between them and fix the latter in a correspondence file. We have accepted the following definition for &quot;an item of translation&quot;: an item of translation is the minimal language unit from the original text which is to be translated as a whole in the sense of no language units available in the translational text that reproduce the meaning of the components of the given item of translation, in case there are any.</Paragraph> <Paragraph position="3"> Some authors point that an amount of 100-200 thousand wordforms have to be processed in order to reach a point of strong decrease in the number of new terms to be included in the dictionary.</Paragraph> <Paragraph position="4"> THE PROGRA~ PACKAGE AND THE RESULTS The program package for automated compilation of terminology in English and Bulgarian contains 8 programs for processing of the parallel texts, and for compilation, maintenance and usage of a machine dictionary of English terms and their equivalents in Bulgariandeg The programs are written in PL/Ideg Original English texts from scientific pa~'rs and their translations in Bulgarian serve as initial lingu~ic corpus for lexicographic purposes. The aim is tc process the texts so that the labour of the linguist in compiling the machine dictionary would be facilitated to a maximum degree, thus obtaining in short terms a dictionary covering to a high extent the terminology relevant to the chosen scientific field.</Paragraph> <Paragraph position="5"> The output from the operation of the program package is~ as Tollows: - the texts of the original paper in English and its translation in Bulgarian in a form suitable for the coding of the translation equivalents; - a dictionary-concordance containing wordforms from the text, pointed by the lingais~ together with the neighbouring context within boundaries specified by the linguist again. The contextual examples have the advantage of giving each term with a more precise meaning than if it were isolated and serve to remove polysemyo - a dictionary of the wordforms from the original text and all of their equivalents that occu~ ~ in the Bulgarian translationdeg The package allows to introduce terms and their equivalents in the target-language~inexplicit form as well as to exclude separate 268 B. NIKOLOVA and I. NENOVA items from the dictionary.</Paragraph> <Paragraph position="6"> Parallel texts analysis is our main source of terminology but, of course, it is not the only one. We are making inquiries among the mathematics professionals in the field about the volume and scope of the terminology they find relevant to their work. Besides, the terminology we have extracted from translated texts could be mapped to a mono or bilingual dictionary, or manual, or handbook specialized in the field. The facilities of an on-llne mapping will additionally speed up and enlarge the possibilities of terminology acquisition.</Paragraph> <Paragraph position="7"> The program package could be of help not only to the linguist and lexicographer but presents computer aids to terminologists as well. The automatic context look-up supplies easily usage samples, the text concordance facilities come at hand for building up textoriented lists of terminology, and the merging of terminology lists from different sources offers opportunities to solve ambiguities or disagreements in translating a term.</Paragraph> </Section> <Section position="7" start_page="265" end_page="265" type="metho"> <SectionTitle> DESIGN OF THE SYSTEM TERMSERVICE </SectionTitle> <Paragraph position="0"> The system TERMSERVICE is designed to support several languages, the main three being Bulgarian, English and Russian. We use direct linkage between pairs of meanings of a certain term in different languages.</Paragraph> <Paragraph position="1"> The fields of specialization of the database are scientific and technical.</Paragraph> <Paragraph position="2"> For each source-language term (a single word or a phrase) the following information is contained in the database: - target-language equivalents, - synonyms, if any; - subject-field code; - grammatical code, spelling variants, standard abbreviations; - contextual examples of usage of the term; - definitions are supplied for source-language terms having no equivalent in ~he target language; - possible word combinations with the headword, if any; - the source the term was extracted from; - cross-references to other terms.</Paragraph> <Paragraph position="3"> Since the system is not oriented only towards professional translators, general and specific terminology is complemented by commonuse lexems, their selection being motivated by frequency of usages The logical access to the database can be accomplished by a lexical item, by synonym, subject-field code, grammatical code, source of the entry.</Paragraph> <Paragraph position="4"> The modes of access to the database are batch or interactive query and computer output to microfilm or microfiche. Printed dictionaries will also be generated from the databasedeg Source-language terms are stored in conventional dictionary fox~o To assist the user's query, however, the project envisages a block of automatic reduction of inflected forms to standard formdeg Till that time the system will work with the help of human pre-editingo</Paragraph> </Section> <Section position="8" start_page="265" end_page="265" type="metho"> <SectionTitle> TERMSERVICE 269 </SectionTitle> <Paragraph position="0"> The project also envisages automatic recording of unsatisfied look-up requests for update purposes.</Paragraph> </Section> class="xml-element"></Paper>