File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1007_metho.xml
Size: 8,435 bytes
Last Modified: 2025-10-06 14:12:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1007"> <Title>Machine Translation Using Isomorphic UCGs</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Isomorphic Grammars </SectionTitle> <Paragraph position="0"> We can recognise two basic relations of relevance in translation.</Paragraph> <Paragraph position="1"> namely, &quot;possible translation&quot; (which is symmetric}, and &quot;best translation&quot; given the current context and much extra-linguistic knowledge (which is not symmetric}. We take the task of the lin.~ guistic component of an MT system to be a correct and complete characterisation of the former, and will have nothing further to say about the latter.</Paragraph> <Paragraph position="2"> An important problem that arises in an interlingual translation system is what Landsbergen \[Landsbergen 87b\] calls the &quot;subset problem&quot;. If the analysis component generates a set L of interlindeg gum expressions, and the generation component accepts a set L I of them, the only sentences that can be translated are those that correspond to expressions in the intersection L N L ~. If the gram-.</Paragraph> <Paragraph position="3"> mars of the source and target languages are written independently, there is no way of guaranteeing that they map the languages into the same subset.</Paragraph> <Paragraph position="4"> The problem arises because a sufficiently powerful system of&quot; interlingual representation will contain an infinite number of logically equivalent expressions that represent a meaning of a given Source Language expression. Of course, the Source Language grammar will only associate a single one of these with a given SL expression. However, in the absence of specific tuning, this is not guaranteed to be the same one that the Target Language grammar associates with any of the translation equivalents.</Paragraph> <Paragraph position="5"> Therefore, SL and TL grammars must be tuned to each other.</Paragraph> <Paragraph position="6"> This is not a problem specific to interlingual translation: in the transfer approach to MT system design, this tuning is effected by an explicit transfer module. The use of Isomorphic Grammars is another way of being explicit about this, tuning the grammars themselves rather than their inputs/outputs, which offers a greater possibility of bi-directionality than the transfer approach.</Paragraph> <Paragraph position="7"> Landsbergen assumes the existence of compositional grammars for two languages, that is, grammars in which i) basic expressions correspond to semantic primitives and ii) each syntactic rule that builds up a complex linguistic expreaqion from simpler ones is paired with a semantic rule that builds the meaning of the complex expression from the meanings of the simpler ones.</Paragraph> <Paragraph position="8"> The tuning of grammars consists in ensuring that there it~ a basic expression in one grammar corresponding to each basic ex-~ pression in the other, and that for each semantic rule there is a corresponding syntactic rule in each grammar. Two expressions are then considered possible translations of each other if they can be derived from corresponding basic expressions by applying cor~ responding syntactic rules. In other words, they are possible transo lations of each other if they are built from expressions having the same rneaning, by using syntactic rules that perform the same semantic oper,tions. Note the lack of directional specificity in this definition of the &quot;possible translation&quot; relation.</Paragraph> <Paragraph position="9"> / v 8 The ~monohngual) UCG formalis~n Many recent grarmnar formalisms \[Shieber 86\] represent linguistic objects as t~ts of attribute-.value pairs. Values taken by these attributes may be atomic, variables, or they may thenmelves be sets of attribate-value pairs, so these objects *nay be thought of as Directed Acyclic Graphs (DAGs), in which directed arcs represent feature% and the nodes at the end of these represent values. Such formalisms t~pically support re-entrancy, that is, they provide a mechanism 5)r specifying that object~s at the end of different paths are the same object.</Paragraph> <Paragraph position="10"> Unification Gategorinl Grarimaar is such a formalism, which combines a categorial treatment of syntax with semantics similar to Kamp's :Vliscourse Representation \[Kamp 81\]. Each linguistic expression licensed by the grammar corresponds to what is called a sign. A sigt~ consists of four main entries or features, which are explained below: 1. phonology (orthography in the present cruse) 2. synta): 3. semantics 4. The o:der in which the terms combine.</Paragraph> <Paragraph position="11"> Typical signs for the lexical entries Mary and sings *nay then look something like the following: phon: &quot;Mary&quot; synt: npA nmn: sing gen: fern Se~l: mary ord: Order and phon: sings phon: Pho i synt: sentA\[ tense: flu \]/ sFnt: npA sexfl: SeE~ oral: post sere.&quot; \[\]q\[siugs(E,Sem)\] ord. ~ O*der pets: snlg These are briefly explained below. Note that in the above example, as ehewhere, the Prolog-like convention is adopted that constants start with lower-case or are within quotes, and variables start with upper-case. Also, for the sake of simplicity in an introductory example, the first example above differs from the standard UCG practice of typeoraising noun phrases, which follows Montague arm others.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.\]. Syntax </SectionTitle> <Paragraph position="0"> There are 4 basic categories: nouns (noun), sentences (seat), noun phrases (np) and prepositional phrases (pp). These may be further specified by features (such as nuniher, gender, etc.). Features are indicated by the operator A.</Paragraph> <Paragraph position="1"> A category is either a basic category, or of the form A/B, where A is ~ category and B is a sign. Combination of signs is determined by the rule of function application, which allows a functor sign with syntax A/B to combine with an argument sign B t, to give a sign like the funetor sign but with syntax A. The combination is licensed if B and B' unify, and if the functor and argument signs appear in the order specified by the value of the order feature in B (if the order feature of an argument is pre its functor must precede it, and if it is poet the functor follows it).</Paragraph> <Paragraph position="2"> The unification may further instantiate variables in the functor sign (in particular, the semantics). Although Function Application is the main combination rule, there are a few important unary rules, such as Gap Deletion, pp-insertion, and others. Unlike many other extended Categorial Grammars, UCG does not have Func~ tional Composition, as a similar effect is achieved by the technique of Gap Threading, based on work by Johnson and Klein \]Johnson and Klein 86\]. However it is envisaged that a richer set of binary rules, and a reduction or elimination of unary rules, will be necessary if the Isomorphic Grammars approach is to be extended to typologically diverse languages.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Semantics </SectionTitle> <Paragraph position="0"> The semantic formalism used in UCG is similar to Kamp's DRT, but with a Davidsonlan treatment of predicates. It is called InL (Indexed Language) and is described in \[Zeevat 86\]. A sentence like: If a linguist owns a donkey, she writes about it is represented in InL by: \[S1\]\[\[S2\]\[\[X\]linguist(X), \[Y\]donkey(Y), \[S2\]own(S2,Z,Y)\] ==~ \[E\]write_about( E,X,Y)\]\] There is an important difference between InL and DRT: each formula introduces a discourse referent, or index ($1 and $2 above) which corresponds to the semantic object introduced by the formula. Since events, states etc. are primitive semantic objects, InL permits a first order treatment of modifiers.</Paragraph> <Paragraph position="1"> Indices contain information about the sortal nature of the discourse referent in question. The sorts are coded into a subsumption lattice, and consist of bundles of features which may be uni~ fled. Unification ensures for instance that predicates have argu.ments of the right sort.</Paragraph> </Section> </Section> class="xml-element"></Paper>