File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/89/e89-1042_abstr.xml

Size: 7,732 bytes

Last Modified: 2025-10-06 13:46:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="E89-1042">
  <Title>ON FORMALISMS AND ANALYSIS, GENERATION AND SYNTHESIS IN MACHINE TRANSLATION Zaharin Yusoff</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
ON FORMALISMS AND ANALYSIS, GENERATION AND
SYNTHESIS IN MACHINE TRANSLATION
</SectionTitle>
    <Paragraph position="0"> A formalism is a set of notation with well-defined semantics (namely for the interpretation of the symbols used and their manipulation), by means of which one formally expresses certain domain knowledge, which is to be utilised for specific purposes. In this paper, we are interested in formalisms which are being used or have applications in the domain of machine translation (MT). These can range from specialised languages for linguistic programming (SLLPs) in NIT, like ROBRA in the ARIANE system and GRADE in the Mu-system, to linguistic formalisms like those of the Government and Binding theory and the Lexical Functional Grammar theory. Our interest lies mainly in their role in the domain in terms of the ease in expressing linguistic knowledge required for MT, as well as the ease of implementation in NIT systems.</Paragraph>
    <Paragraph position="1"> We begin by discussing formalisms within the general context of MT, clearly separating the role of linguistic formalisms on one end, which are more apt for expressing linguistic knowledge, and on the other, the SLLPS which are specifically designed for MT systems.</Paragraph>
    <Paragraph position="2"> We argue for another type of formalism, the general formalism, to bridge the gap between the two. Next we discuss the role of formalisms in analysis and in generation, and then more specific to NIT, in synthesis. We sum up with a mention on a relevant part of our current work, the building of a compiler that generates a synthesis program in SLLP from a set of specifications written in a general formalism.</Paragraph>
    <Paragraph position="3"> On formalisms in MT The field of computational linguistics has seen many formalisms been introduced, studied and compared with other formalisms. Some get established and have been or are still being widely used, some get modified to suit newer needs or to be used for other purposes, while some simply die away. Those that we are interested in are formalisms which play some role in MT.</Paragraph>
    <Paragraph position="4"> The MT literature has cited formalisms like the formalisms for the government and Binding Theory (GB) \[Chomsky  we refer to the formalisms provided by these linguistic theories and not the  To put in perspective the discussions to follow, we present in Figure 1 a rather naive but adequate view of the role of certain formalisms in biT.</Paragraph>
    <Paragraph position="5"> - 319 -</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
General SLLPs
Formalisms
</SectionTitle>
      <Paragraph position="0"> Fig. 1 - The role of formalisms in MT.</Paragraph>
      <Paragraph position="1"> GB, LFG and GPSG formalisms are classed as linguistic formalisms as they have been designed purely for linguistic work, clearly reflecting the hypotheses of the linguistic theories they are associated to. Although there have been 'LFG-based' and 'GPSG- inspired' MT systems, a LFG or GPSG system for MT has yet to exist. Whether or not linguistic formalisms are suitable for MT (one argues that linguistic formalisms tend to lean towards generative processes as opposed to analysis, the latter being considered very important to MT) is not a major concern to linguists.</Paragraph>
      <Paragraph position="2"> Indeed it should not be, as one tends to get the general feeling that formal linguistics and MT are separate problems, although tapping from the same source. If this is indeed true, there is no reason why one should try to change linguistic formalisms into a form more suitable for MT.</Paragraph>
      <Paragraph position="3"> Linguistics has been, is still, and will continually be used in MT. What is currently been done is that linguistic knowledge, preferably expressed in formal terms using a linguistic formalism, is coded into a MT system by means of the SLLPs. SLLPs include formalisms like ATN, ROBRA, GRADE, METAL and Q- systems. Tree structures are the main type of data structure manipulated in MT systems, and the SLLPs are mainly tree transducers, string-tree transducers and/or tree-string transducers. Such mechanisms are arguably very suitable for defining the analysis process (parsing a text to some representation of its meaning) and the synthesis process (generating a text form a given representation of meaning). SLLPs which work on feature structures have also been introduced, but these also work on the same principle.</Paragraph>
      <Paragraph position="4"> Despite the fact that SLLPs are specifically designed for programming linguistic data, and that most of them separate the static linguistic data (linguistic rules) from the algorithmic data (the control structure), the problem is that they are still basically programming languages. Indeed, during the period of their inception, they may have been thought of as the MT's answer to a linguistic formalism, but it is no longer true these days. To begin with, most if not all SLLPs are procedural in nature, which means that a description can be read in only one direction (not bidirectional), either for analysis or for synthesis. Consequently, for every natural language treated in a MT system, two sets of data will have to be written: one for analysis and one for synthesis. Furthermore, also due to this procedural nature, ling.uistic rules in SLLPs are usually written with some algorithm in mind. Hence, although separated from the algorithmic component, these linguistic rules are not totally as declarative as one would have hoped (not declarative). For these reasons, as well as for the fact that SLLPs are very system oriented, data written in SLLPs are rarely retrievable for use in other systems (not portable). It was due to these shortcomings that other formalisms for MT which are bidirectional, declarative and not totally system oriented have been designed.</Paragraph>
      <Paragraph position="5"> Such formalisms include the SG and its more formal version, the STCG. One first notes that these formalisms are not designed to replace linguistic formalisms. There may be some linguistic justifications (e.g. in terms of the linguistic model \[Zaharin 87b\], but - 320 they are designed principally for bridging the gap between linguistic formalisms and SLLPs. Such formalisms are designed to cater for MT problems, and hence may not directly reflect linguistic hypotheses but simply have the possibility to express them in a manner more easibly interl?.retable for MT. They are declarative m nature and also bidirectional. Only one set of data is required to describe both analysis and generation. They are also general in nature, meaning that it is possible to express different linguistic theories using these formalisms, and also that it is possible to implement these formalisms using various SLLPs. One can view such formalisms as specifications for writing SLLPs, as illustrated in Figure 2 (akin to specifications used in software engineering).</Paragraph>
      <Paragraph position="6">  Other formalisms that can be considered to be within this class of general formalisms are TAG, FUG, and perhaps DCG. With such formalisms, one may express knowledge from various linguistic theories (possibly a mixture), and that the same set of represented knowledge may be implemented for both analysis and synthesis using various SLLPs in different MT systems (as illustrated in</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML