File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/86/c86-1150_abstr.xml

Size: 5,786 bytes

Last Modified: 2025-10-06 13:46:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1150">
  <Title>Tong Loong C%eong 'Computer Aided Translation - Teclmlcal Raport Co~oilatlon'</Title>
  <Section position="1" start_page="0" end_page="639" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paDem ~esents tim remf\[ts obtained by an English to Malay camputer translation system at the level of a lab~mat~y prototype. %~le translation output obtained for a selected text (secondary school 6~\]e~ist~y textbook) is evaluated using a grading scheme based on ease oPS post-editing. The effect of a clmnge in area and typology of text is investigated by c~paring with the translation output obtained for a University level Cc~iputer Science text. An analysis of the p~ohle~s which give rise to incGrTeet translations is discussed. %~ds paper also ~vides statistical infcmmation on the English to Malay translation ~st~u and concludes with an outline of further wc~k being carried out on this system with the &amp;Ira of att&amp;ising an industrial prototype.</Paragraph>
    <Paragraph position="1"> i. The Eng!\[sh t_qMal_a~franslationSsSSSSSSSS_2~trm Baak~reusd Computer Aided T~anslation (CAT) research at Universiti Sa~m MalsysL~ (USM) began in 1976 as an individual research effcet. However, at that time, the work is more appropriately classified under hat, real language data ~cessing, including topics such as 'istilah' (temdnalogy) information retrieval, ,Malay ~otf(~m extraction, parsing of Malay sentences using context-free g~sn~s asd Malay language teaching tools \[Tong 78, Chang 78\].</Paragraph>
    <Paragraph position="2"> In 1978, research into CAT was initiated, and by 1979, the researchers a~ U~4 began to develop g~mmmr medels for F~qglish to Malay translation using the software tool ~ \[GETA 78\].</Paragraph>
    <Paragraph position="3"> In 1980, a natior~l wc~kshop was conducted in USM, where a pilot English to Malay tr~uslatlon system was desmnstrated.</Paragraph>
    <Paragraph position="4"> Financial sup~mt beckons available, and li~ther development on the basic translation model was ca~ied out \[Tong 82, van Klinken 84, zsharin 84\].</Paragraph>
    <Paragraph position="5"> In 1984, a per~nanent Computer-Aided-@ranslation Project unit was set up at U~4, and full-time research staff were assigned to this project. Members of this project group now include t~o computer scientists, one linguist, and five lexicographer / edit~ / te~minologlst. '\[his group was assigned the task of producing a labm, atory ~ototype for Englisll to Mal~ translation, and the result of their efforts is presented in this report.</Paragraph>
    <Paragraph position="6"> S~stem Envlro~ment The AK\[ANE system .is an integrated software environment for cemput er~alded-translat lon, including tools for compiling grammes and dictionaries, and fer processing corpus of the source and tm~et texts. The CAT concepts beldnd this system is well-known and weE-documented \[Boltet and Vauquols 1985\]. This softwaPe has been prog~ammd using different levels of computer Isngnmges, from IBM assembly (PL360) to PL/I, and making extensive use of system tools of the IBM VM/CM~ ~stem - XEDIT and EX~. (~e of its advantages is efficiency (as cemDared to other similm, aystems), which means that it can execute with reasonable speed even on a combatively sa~ll computem system. USM's experience ~ith the ARIANE system ires been vary satisfactery, and we doubt very much .if another ~ystem cot~ld have been mdgrated asd utillsed at this University with similar success. Althou$~ theme lind been s~me criticisms about ARIANE in the literature, our experience has sho~ that insplte of its recognised weaknesses and drawbacks, it remains an extr~ly powemgul and practical set of tools f~r the development of CAT systess. Of course, the methodology pioneered at GETA \[Vanquois 75\] has been incorF~mated into ninny 'new' systems today.</Paragraph>
    <Paragraph position="7"> On the physical side, the ARIAN~ system itself occnpies about 8 Mbyte of secondary storage, while the usem n~cldne requiwes m\]othem 5 Mbyte for storing the linguistic data (grammar models and dictionaries, but not including the source and target texts and their intermediate :~esults). A vi~tlml m~movy size of 2 Mbytes is used f~. the execution of all the trasslations f~n ~qglish to Malsy desc1~bed in this rep(mt.</Paragraph>
    <Paragraph position="8"> Translation Model and Executinn Time The ~%glish to Malay translation system consists of three main dictionm,ies -source English, Engllsh-Malay transfer, target Malay - and i~Ive gr~mmr models. The size of these various components are as follows: Dictior~ries:  The executinn time for translation is estimated at 1.0097 Mild (million of instructions per word). This is consistent with times measured at GhTA, Grer~:)ble \[Boitet and VauqtDis 81~\]. In prastlcal terms, tlda means that on U~4's I~4 4381 system (estimated at 2.1 MIPS), the. translation time is approx~l~tely 0.48 second of ~.h~tual CPU tJ,le per word. This fignme is based on the translation time for about 3,000 wn?ds taken i~n the selected text. The Imoportionate time for each I~se of the tranalation r~ocess is as follows: percent mo~phologlcal analysis 0.33 structural ~ma3.vsis 55.21 lexical transfer 0.44 structumal transfem 11.34 stractural generation 31.47 morphological generation 1.21 From the above, it can be seen that the three dictionary retrieval phases togethem account for only 2 % of the time, while the struct~.al ~m/ysis plmse, used up more titan half the total tame, with the l~st taken up ~ the structural ganeration (about one-tldrd) and the structural tr~sfer phases. %TLis result is  again consistent with those for other translatienmodels at GETA, Grenoble.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML