File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/89/e89-1042_ackno.xml
Size: 11,964 bytes
Last Modified: 2025-10-06 13:51:41
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1042"> <Title>ON FORMALISMS AND ANALYSIS, GENERATION AND SYNTHESIS IN MACHINE TRANSLATION Zaharin Yusoff</Title> <Section position="2" start_page="0" end_page="0" type="ackno"> <SectionTitle> ATLAS </SectionTitle> <Paragraph position="0"> Fig. 3 - the central role of general formalisms On specifications for analysis and synthesis The two main processes in MT are analysis and synthesis (a third process called transfer is present if the approach is not interlingual). Analysis is the process of obtaining some representation(s) of meaning (adequate for translation) from a given text, while synthesis is the reverse process of obtaining a text from a given representation of meaning 1. Analysis and synthesis can be considered to be two different ways of interpreting a single concept, this concept being a correspondence between the set of all possible texts and the set of all possible representations of meaning in a language. This correspondence is basically made up of a set of texts (T), a set of representations (S), and a relation between the two R(T,S), defined in terms of relations between elements of T and elements of S. We illustrate this in Figure 4.</Paragraph> <Paragraph position="1"> texts and their representations Supposing that a correspondence as given in Figure 4 has been defined, analysis is then the process of interpreting the relation R(T,S) in such a way that given a text t, its corresponding representation s is obtained. Conversely, synthesis is the process of interpreting R(T,S) in such a way that given s, t is obtained. Clearly, a general formalism to be used as specifications must be capable of defining the correspondence in Figure 4. Defining the correspondence may entail defining just one, two, or all three components of Figure 4 depending on the complexity of the results required. When one works on a natural language, one cannot hope to define the set of texts T (unless it is a very restricted sublanguage). Instead, one would attempt to define it by means of the definition of the other two components. As an example, the CFG formalism defines only the component R(T,S) by means of context-free rules. This component generates the set of texts (t) as well as all possible representations (S) given by the parse trees. The formalism of GB defines the relation R(T,S) by means of context-free rules (constrained by the Xbar-theory), moveo~ rules (constrained by bounding theory), the phonetic interpretative component and the logical interpretative component. This relation generates the set of all texts (T) and all candidate representations (S) (logical structures). The set S is however further defined (constrained) by the binding theory, 0theory and the empty category principle. As a third example, the STCG formalism defines R(T,S) by means of its rules, which in turn generates S and T. The set S is however further defined by means of constraints on the writing of the STCG rules.</Paragraph> <Paragraph position="2"> Having set the specifications for analysis and synthesis by means of a general formalism, one can then proceed to implement the analysis and synthesis. Ideally, one should have an interpreter for the formalism that works both ways. However, an interpreter alone is not enough to complete a MT system : one has to consider other components like a morphological analyser, a morphological generator, monolingual dictionaries, and for noninterlingual systems, a transfer phase and bilingual dictionaries. In fact, such an interpreter alone will not complete the analysis nor the synthesis, a point which shall be discussed as of the next paragraph. For these reasons, the specifications given by the general formalism are usually implemented using available integrated systems, and hence in their SLLPs.</Paragraph> <Paragraph position="3"> For analysis, apart from the linguistic rules given by the general formalism, there is the algorithmic component to be added. This is the control structure that decides on the sequence of application of rules. A general formalism does not, and should not, include the algorithmic component in its description. The description should be static. There is also the problem of lexical and structural ambiguities, which a general formalism does not, and should not, take into consideration either. A fully descriptive and modular specification for analysis should have separate components for linguistic rules (given by the formalism), algorithmic structure, and disambiguation rules. Apart from being theoretically attractive, such modularity leads to easier maintenance (this discussion is taken further in \[Zaharin 88\]); but most important is the fact the same linguistic rules given by the - 322 formalism will serve as specifications for synthesis, whereas the algorithmic component and disambiguation rules will not.</Paragraph> <Paragraph position="4"> In general, synthesis in MT lacks a proper definition, in particular for transfer systems 2. It is for this reason (and other reasons similar to those for analysis) That the specifications for synthesis given by the general formalism play a major role but do not suffice for the whole synthesis process. To clarify this point, let us look at the classical global picture for MT in second generation s.ystems given in Figure 5. The figure gives the possible levels for transfer from the word level up to interlingua, the higher one goes the deeper the meaning.</Paragraph> <Paragraph position="5"> Most current systems attempt to go as high as the level of semantic relations (eg. AGENT, PATIENT, INSTRUMENT) before embarking on the transfer. Most systems also retain some lower level information (eg. logical relations, syntactic functions and syntagmatic classes) as the analysis goes deeper, and the information gets mapped to their equivalents in the target language. The reason for this is that certain lower level information may be needed to help choose the target text to be generated amongst the many possibilities that can be generated from a given target representation; the other reason is for cases that fail to attain a complete analysis (hence fail-soft measures).</Paragraph> <Paragraph position="6"> The consequence to the above is that the output of the transfer, and hence the input to synthesis, may contain a mixture of the information. Some of this information are pertinent, namely the information associated to the level of transfer (in this case the semantic relations, and to a large extent the logical relations), while the rest are indicative. The latter can be considered as heuristics that helps the choice of the target text as described above.</Paragraph> <Paragraph position="7"> Whatever the level of transfer chosen, there is certainly a difference between the input to synthesis and the representative structure described in the set S in Figure 4, the latter being precisely the representative structure specified in the general formalism. In consequence, if the synthesis is to be implemented true to the specifications given by the general formalism (which have also served as the specifications for analysis), the synthesis phase has to be split into two subphases: the first phase has the role of transforming the input into a structure conforming to the one specified by the formalism (let us call this subphase SYN1), and the other does exactly as required by the general formalism, ie. generate the required text from the given structure (call this phrase SYN2). The translation process is then as illustrated in Figure 6.</Paragraph> <Paragraph position="8"> As mentioned, the phase SYN2 is exactly as specified by the general formalism used as specifications. What is missing is the algorithmic component, which is the control structure which decides on the applications of rules.</Paragraph> <Paragraph position="9"> However, the phase SYN1 needs some careful study. Some indication is given in the discussion on some of our current work.</Paragraph> <Paragraph position="10"> Relevant to the discussion in this paper, the following is some current work undertaken within the cooperation in MT between PTMK (Projek</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Terjemahan Melalui Komputer) in Penang and GETA (Groupe d'Etudes </SectionTitle> <Paragraph position="0"> pour la Traduction Automatique) in Grenoble.</Paragraph> <Paragraph position="1"> The formalisms of SG, and its more formal version STCG, have been used as specifications for analysis and synthesis since 1983, namely for MT applications for French-English, English-French and English-Malay, using the ARIANE system. However, not only the implementations have been in the SLLP ROBRA in ARIANE, the transfer from specifications (given by the general formalism) to the implementation formalism has also been done manually. One .project undertaken is the construction of an interpreter for the STCG which will do both analysis and generation. Some appropriate modifications will enable the interpreter to handle synthesis (SYN2 above). At the moment, implementation specifications are about to be completed, and the implementation is proposed to be carried out in the programming language C.</Paragraph> <Paragraph position="2"> Another project is the construction of a compiler that generates a synthesis program in ROBRA from a given set of specifications written in SG or STCG. Implementation specifications for SYN2 is about to be completed, and the implementation is proposed to be carded out in Turbo-Pascal. The algorithmic component in SYN2 will be automatically deduced from the REFERENCE mechanism of the SG/STCG formalism. The automatic generation of a SYN1 program poses a bigger problem. For this, the output specifications are given by the SG/STCG rules, but as mentioned earlier, the input specifications can be rather vague. To overcome this problem, we are forced to look more closely into the definitions of the various levels of interpretation as indicated in Figure 5, from which we should be able to separate out the pertinent from the indicative type of information in the input structure to SYN1 (as discussed earlier). Once this is done, the interpretation of SG/STCG rules for generating a SYN1 program in ROBRA will not pose such a big problem (the problem is theoretical, not of implementation in fact, specifications for implementation for this latter part have been laid down, pending on the results of the theoretical research).</Paragraph> <Paragraph position="3"> Concluding remarks The MT literature cites numerous formalisms. The formalisms, can be generally classed as linguistic - 324 formalisms, SLLPs and general formalisms. The linguistic formalisms are designed purely for linguistic work, while SLLPs, although designed for MT work, may lack certain desirable properties like bidirectionality, declarativeness and portability. General formalisms have been designed to bridge the gap between the two extremes, but more important, they can serve as specifications in MT. However, such formalisms may still be insufficient to specify the entire MT process. There is perhaps a call for more theoretical foundations with more formal definitions for the various processes in MT.</Paragraph> <Paragraph position="4"> Footnotes 1. The term generation has sometimes been used in place of synthesis, but this is quite incorrect. Generation refers to the process of generating all possible texts from a given representation, usually an axiom, and this is irrelevant in MT apart from the fact that synthesis can be viewed as a subprocess of generation.</Paragraph> <Paragraph position="5"> 2. Interlingual systems may not lack the definition for synthesis, but they lack the definition for interlingua itself. To date, all interlingual systems can be argued to be transfer systems in a different guise.</Paragraph> </Section> </Section> class="xml-element"></Paper>