File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1075_intro.xml
Size: 4,165 bytes
Last Modified: 2025-10-06 14:00:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1075"> <Title>Application of Analogical Modelling to Example Based Machine Translation</Title> <Section position="4" start_page="0" end_page="516" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Ideally, an EBMT system must determine correspondences at a sub-sentence level if optimal adaptation of matching fragments is to be achieved (Collins, B., & Cunningham, P. 1995). In practice, EBMT systems that operate at sub-sentence level involve the dynamic derivation of the optimum length of segments of the input sentence by analysing the available parallel corpora. This requires a procedure for determining the best &quot;cover&quot; of an input text by segments of sentences contained in the database (Nirenburg, S.</Paragraph> <Paragraph position="1"> Domashnev, C., Grannes, D. 1993), (Cranias, L. et al 1994), (Frederking, R., Nirenburg, S., 1994), (Sato, S. 1995). What is needed is a procedure for aligning parallel texts at sub-sentence level, (Sadler, V., Vendehnans, R. 1990), (Boutsis, S., Piperidis, S. 1998). If sub-sentence alignment is available, the approach is fully automated but is quite vulnerable to the problem of low quality, as well as to translational ambiguity problems when the produced segments are rather small.</Paragraph> <Paragraph position="2"> Several approaches aim at proceeding a step further, by attempting to build a transfer-rule base in the form of abstract representations through different types of generalization processes applied on the available corpora relying on different levels of linguistic information and processing (Kaji et al. 92), (Juola, P. 1994), (Furuse, O., Iida, H. 1996), (Veale, T. and Way, A. 1997), (McTait, K., et al.</Paragraph> <Paragraph position="3"> 1999), thus providing more complete &quot;context&quot; information to the translation phase. The deeper the linguistic analysis involved in such a process, the more flexible the final translation structures will be and the better the quality of the results. However, tiffs kind of analysis unquestionably leads to more computationally expensive and difficult to obtain systems. Our approach consists in a fully modular analogical fiamework, which can cope with lack of resources, and will perform even better when these are available.</Paragraph> <Paragraph position="4"> Analogical Modelling (AM) has been proposed as an alternative model of language usage. The main assumption underlying this approach is that many aspects of speaker performance are better accounted for in terms of &quot;analogy&quot;, i.e. the identification of similarities and differences with forms in memory (the lexicon), than by referring to explicit and inaccessible rules. By &quot;analogy&quot; we mean the process of matching between an input pattern and a database of stored examples (exemplars). The result of this matching process is a collection of examples called the &quot;analogical set&quot; and classification of the input pattern is achieved through extrapolation fi'om this set. At any given time, the main source of knowledge consists in a database of stored translation examples. These examples themselves are used to classify new items, without intermediate abstraction in the form of rules. In order to achieve this exhaustive database search is needed, and during this search, less relevant examples need to be discarded. All text features are equally important initially, and serve to partition the database into several disjoint classes of examples.</Paragraph> <Paragraph position="5"> In contrast to most of the analogy-based systems our approach applies tile same principles during the learning phase in an attempt to extract appropriate generalizations (translatiou rules) based on similarities and differences between input exemplars. In this way, analogy is treated as more than simple pairwisc simila,ity between input and database exemplars, rather it is conside,'ed as the main relation underlying a more complex network of relations between database exemplars.</Paragraph> </Section> class="xml-element"></Paper>