File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1150_metho.xml
Size: 12,191 bytes
Last Modified: 2025-10-06 14:11:56
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1150"> <Title>Tong Loong C%eong 'Computer Aided Translation - Teclmlcal Raport Co~oilatlon'</Title> <Section position="2" start_page="639" end_page="640" type="metho"> <SectionTitle> 2. The Qualitz of Translation Gradin~ Scheme </SectionTitle> <Paragraph position="0"> In orde~ to assess the 'quality' of the translation output, a grading scheme (from grade A to grade F) was devised using a sentence as the benndary of assessment. This scheme is based on the ease of posfi-edit~ig the translation output, and not on the quality ur standard of trsaslation in the inure usual sense.</Paragraph> <Paragraph position="1"> Currently, there is no estahlished method of evaluating ccrnputer-alded-trasslatlon or mechanical translation output.</Paragraph> <Paragraph position="2"> Fase of post-editlng is a measure which also takes into account the ease of understsnding as we\]/ as the accuracy of translation.</Paragraph> <Paragraph position="3"> Two impurtant fact,s which affect say grading scheme is the typology of the source text itself asd the expert knowledge of the evaluatur in that particular area of text. Some method of evaluating the ease of undurstandlng of the source text and scme definition of a neutral evaluator are ~urequlsiten to shy stasdardlsed evaluation scheme.</Paragraph> <Paragraph position="4"> %~le grading scheme proposed in this report is a measure of the time required to edit sentences translated by the cc~C/outer, ranging 9tom fast (as in grade A where no pest-editing is reqtdred) to slow (as in grade F where a sentence has to be retranslated manually). There has been no attempt to catag~mise the source sentences into different degrees of difficulty ur length. Hence, the typology of text used in this evaluation must be burne in mind when assessing the overall results. Although grades are assigned to ~ndividual sentences, the source texts were extracted by paragraphs, and hence, the continuity of the text is maintained. The actual grading itself was carried out by more th2m one individual in urder to reduce (as much as possible) the effect of individual 'bias'. After careful scrutiny, it was concluded that variation in the results obtained is within expected limits, thus allowing broad conclusions to be drawn on the effectlveness/usefkflness of the translation ~stem.</Paragraph> <Paragraph position="5"> The grades assigned to translated sentences are as follows: A: ~ect translation, no modification required.</Paragraph> <Paragraph position="6"> B: list of alternative wurds selected by post-editur.</Paragraph> <Paragraph position="7"> C: understasdable translation (with preservation of meaning), single word correct lens without reference to source text.</Paragraph> <Paragraph position="8"> D: as in C, but referencs to source text is necessary.</Paragraph> <Paragraph position="9"> E: major modifications with reference to source text.</Paragraph> <Paragraph position="10"> F: retrasslated menua~.</Paragraph> <Paragraph position="11"> Results for Selected Area and Text A C~lemistry textbook fcm upper secondary school was chosen as the first text for the development of the laberatem-y prototype. A total of 393 sentences were extracted at rasdsn from this textbook and translated by the cemputer. The translation output is then graded by three htmen post-editors asd the result given below is based on their ccmbimed evaluation.</Paragraph> <Paragraph position="12"> The above result shows that 76 % of translated sentences are 'understandable' (no reference to English source text is necessary) and requi~es, at the most, only mimur modifications during pest-editing.</Paragraph> <Paragraph position="13"> Effect of a C~ in Area and ~ The new text is a University level Cemputer Science textbook, h~mu which 207 sentences were e~<tracted, translated by the computer, and then graded. The result is as follows: As expected, the qus31ty of translation in tints case is lower than that fur the Chemistry text. Most of the additional problems encountered can be solved either throu~l dictiossry coding ur minor modiflcatiens in the grEmmmr. With these changes, the qtm31ty of translation for the Computer Science text is expected to be raised to the sane level as that fur the Chemistry text.</Paragraph> </Section> <Section position="3" start_page="640" end_page="640" type="metho"> <SectionTitle> 3. Emlstlr~ Problems Classification </SectionTitle> <Paragraph position="0"> An attempt was made to analyse the problems encountered, i.e. the errurs in translation output. This involves a tedious process of correctly identItyiog the source of each ereor found in the trasslation output, and then classifying then according to the phase of translation (i.e. analysis, trassfur or generation) at which they occur. The purpose is to identify simple problems which can be solved in the existing system through modifications to the linguistic data, while more c~plex ~oblams can be the subject of further research. This analysis of errors &Iso provides statistical infsrmation on their distribution and importasce, hence giving some guidelines as to their priority for fur thur investigation.</Paragraph> <Section position="1" start_page="640" end_page="640" type="sub_section"> <SectionTitle> The AnalXsis Phase </SectionTitle> <Paragraph position="0"> The problems of a,bigui~ and coordination account for more than l~qlf of the errors at the ~alyals plmse. The probl~n of ~m~iguity here refers to smbi~ties which remain unresolved at the end of analysis and to cases of erroneous dls~nblguation.</Paragraph> <Paragraph position="1"> This type of problem is by far the most important, accounting fur close to 50 percent of the e~isting errors found in the anslysis phase.</Paragraph> <Paragraph position="2"> Ambiguities which remain unresolved include vurb/noun ( ' foam1', 'wurks' ,'use' ), verb/adjective ( 'direct', 'total' ), verb/yen ( ' .. is unglazed paper.. ' ), noun/adjective ( 'routine', 'plural' ), vemb/vlng ( '.. painting of...' ), adJ/pronoun ( 'other' ). lasl Courdinatien (apposition, inclusion) is a serious structuz, al Droblem not Imndled particularly well by the existing gra,mar model. Many different types of elements can participate in coordination (apposition, inclusion) and exsmplss of cases not considered in the current grammar are: complex noun phrases, prepositions, V~'~.l clauses, interrogatives, adjunct phrases.</Paragraph> <Paragraph position="3"> ( 'to ~d fk~m and within.. ' ) ( '...but ..... and ..... ' ) ('why .... and do .... ') ( '..hot and humid.. ' ) 0tber ex.rors in the analysis plmse are re/ative\]y less cemplex and can be solved throu~l modifications or im~movements in the morphological and sta'uctl~'al analysis ~am~l,s and in the coding of the source dictior~my. F~re~s in this category are: - errors in mogpholoElcal coding, including idinmatic ~essions and ccs~pound words; - ~uD\]~\]o~31 ) 8tr%ictltres in the cua~ent llDdel, such as (elision) ' although large enc~l to pass tl~u~l..' (embedded imperative) '; hence the insta~/ction: shake the bottle.'</Paragraph> <Paragraph position="5"> '..the same temperature as that at which.. ',</Paragraph> <Paragraph position="7"> Various bugs stl\]l exist in the mm/ysis g~amma~ model Itself and these will be corrected as part of the maintenance on ;he translation s~sfi6~n.</Paragraph> <Paragraph position="8"> ~be T~ansfe~ P~mse The ~gomadn ~roblems at the Imassfea, phase are the Jnc~nplete for incorrect) choice of target lexlcals, and the t~ansfer of I diematlc expressions.</Paragraph> <Paragraph position="9"> The diss~bi~uation of a source l~ical which car~y mare than one meaning and which is t~anslated bY different target lexicals accounts for more tlmn half of tile stagers at transfer. %~le source of this problem is actlm33~ at the ~lalysls ~lase, which was unable to ~moduce a suPficiently deep level of intexTmetation (e.g. se~sntics and sesmntie relations) to solve the ~bJ~uity which manifests itself only at trm\]sfer.</Paragraph> <Paragraph position="10"> The two categories of words which are most problematic are the verbal :\['(X~llS ( '~eveal' , ' assa, e', ~ call ' ) and the ~cepositians ( 'in' ,'by' ,'to'). Although dis~bi~uation rules based on context are ~s~loyed du~ the structural transfer phase, they can only solve relatively st~a~tfca~ard cases. For the more dlff:tcult eases, the current av~oach of displaying a list of multiple choices of words to the human post-editor seems to be t/~e most acceptable solution. Much deeper work in state semantics and semantic relations will imve to be carried out in to im~x)ve on this. Even if such improvements are found, there is still tlle question of weighing the cost of such sopldsticatod in~cessing by the cemputea&quot; (which is expected to be very high) a~nst the cost of l~m~ post-editing.</Paragraph> <Paragraph position="11"> Id~o,mtic expressions are nc~ms/.ly coded directly in the source dictio,~ry. Unfortunate\].y, the ARI/d~ softw~ does not \[movide suZficJent facilities at analysis or at transfer plmse to cater for scs~ of the c~Dlex manipulations requi~ed. S~me idiomatic expr(~ssions are ambiguous (i.e. they can be considex~d idlc~atic only in cemtalu context), and hence, there is tlle problem of (~samb~uat~\]g thCSl dlIvID.g ana\]$sis. Also, scsle English idi~tic 6~pressions are particularly diCficult to trasslate into Malay, and perhaps other target l~%~mges as well. The Gene~ation Phase Er~s during structaral generation are relatively few, and also relatively minor 9rc~ the point of view of post-editing.</Paragraph> <Paragraph position="12"> Most errors daxdng this phase will give rise to grade C sentences if there are no other type of ~s in the sentence.</Paragraph> <Paragraph position="13"> The main ~obl~ns are as follows: ** Podition of elements in cc~plex noun phrase.</Paragraph> <Paragraph position="14"> Most of the ex~o~s are dim to the incc~'ect placement of the ~eposition 'b~\[J' (similar I.o 'of' but not as ccmnonly used) in a complex MaI~v noml pl)rase. Other e\].6ments of the noun phrase which give rise to errors are the '-lag' or '-an' f~n used as ~ adjective, sad tlle lexicals 'other' and 'only' which seem difficult to tra~slate into Y~lay. Very often, m~ adjective Jn Malay is introduced by the relative pronoun 'yasg'. However, thrum sccns to be no consistent rule for this. Certain lexicals always require a 'yang', ~lle others only undex, cart~tu not well-defined condliions.</Paragraph> <Paragraph position="15"> *~ Position of' adverbs and sdJuncts of clauses.</Paragraph> <Paragraph position="16"> Tl~is imobl~u is not very well ~westigated in the exlst~ n~xle\], and can hopefL~\]Y= be improved llpon at a latex, stage. *~ Relative clause introduced by a prepasition.</Paragraph> <Paragraph position="17"> the relative clause introduced by a iz, epesition ( 'in which', from where', etc. ) is psrt~cular.ly difficult to translate into MolaY= (even for htmT~n tmanslatcm). Forn~l l~%~tistic study is being carried out into possible target struetm'es. T~ds is one specific case whereh5 r linguistic research is initiated s~ccifically to cater for the needs of cemputer-aidedtrasslation. null ~le generation of Malay prenouas.</Paragraph> <Paragraph position="18"> &\]othem di@ficult ~oblem is the translation of same ~onouns- 'it', 'they', 'anothex,', 'one', 'lat+-er', ~ffmmer ', 'those'. ~e Malay \]an~lage sometimes tequilas a repetition of tile referenced object in place of the pronoun. Even when this is not necessary, as in the case of a ~onoun referriog to an undefined abject, it may be incorrect to translate directly with the equivalent ~.~noun ('ia', 'merely', 'yang lain', 'kita').</Paragraph> <Paragraph position="19"> Again further ~mvesti~ation into the linguistic aspects of this problem will be necessary ~fore an acceptable solution can be found.</Paragraph> <Paragraph position="20"> source: 'move i~om one ~ of the solid to another' cemputem: 'bemgerak dari 1 ~:~ pepeJal kepada y_~ laln' edited: 'be, rgm,ak dari 1 b~'~_____/pepeJal kepada ~ ~%\[~K lain'</Paragraph> </Section> </Section> class="xml-element"></Paper>