File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-1410_metho.xml

Size: 11,210 bytes

Last Modified: 2025-10-06 14:07:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1410">
  <Title>Machine Translation with Grammar Association: Some Improvements and the Loco C Model</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Using ECGI language models
</SectionTitle>
    <Paragraph position="0"> The ECGI algorithm (Rulot and Vidal, 1987) is a heuristic technique for the inference of acyclic finite-state automata from positive samples, and determinism can be imposed a posteriori by a well-known transformation for regular grammars.</Paragraph>
    <Paragraph position="1"> Therefore, in principle, ECGI provides exactly the kind of language model Grammar Association needs. Moreover, it was (without imposing determinism) the inference technique employed in (Vidal et al., 1993).</Paragraph>
    <Paragraph position="2"> Informally, ECGI works as follows. With the first sample sentence, it builds an initial automaton consisting in a linear path representing the sentence. Words label states (instead of arcs) and there are two special non-labelled states: the initial one and the final one. For each new sentence, if it is already recognized by the automaton built so far, nothing happens; otherwise, if the current model does not recognize the sentence, new arcs and states are added to the most suitable path (according to a minimum-cost criterion) for recognition to be possible. In a sense, it is like constructing a new path for the new sentence and then finding a maximal merge with a path in the automaton. null For further discussion on some features of the ECGI algorithm, let us first consider the following set of five sentences: (1) &amp;quot;some snakes eat rats&amp;quot;; (2) &amp;quot;some people eat snakes&amp;quot;; (3) &amp;quot;some people eat rats&amp;quot;; (4) &amp;quot;some people are dangerous&amp;quot;; (5) &amp;quot;snakes are dangerous&amp;quot;. Figure 1 shows how ECGI 3Obviously, any algorithm for finding the minimum-cost path in a graph is applicable.</Paragraph>
    <Paragraph position="3">  incrementally builds an automaton able to recognize the whole training set and, moreover, performs some generalizations. For instance, after considering the two first sentences (subfigure b), two more sentences are also represented in the current automaton: &amp;quot;some snakes eat snakes&amp;quot; and &amp;quot;some people eat rats&amp;quot;.</Paragraph>
    <Paragraph position="4"> Thus, when this last sentence is actually presented to the algorithm, there is no need for the automaton to be updated. On the contrary, sentences 4 and 5 imply the addition of new elements and the finally inferred automaton is the one shown in subfigure d.</Paragraph>
    <Paragraph position="5"> Though successful application of ECGI to a variety of tasks has been reported,4 the method  suffers from some drawbacks. For instance, the level of generalization is sometimes lower than expected. In the example presented in Figure 1, when &amp;quot;snakes are dangerous&amp;quot; is employed for updating the model in subfigure c, instead of adding a new state and two arcs to the path corresponding to &amp;quot;some people are dangerous&amp;quot;, the solution in Figure 2 seems to be an appealing alternative: adding just two arcs, more reasonable generalization is obtained. Nevertheless, ECGI chooses the solution in Figure 1 because it searches for just one path to be modified with a minimal number of new elements, and does not take into account combinations of different paths.</Paragraph>
    <Paragraph position="6"> On the other hand, ECGI can suffer from inadequate generalization, especially at early stages of the incremental construction of the automaton. If &amp;quot;some people eat snakes&amp;quot; and &amp;quot;snakes are dangerous&amp;quot; were the first two sentences presented to ECGI, the algorithm would try to make use of the state &amp;quot;snakes&amp;quot; of the initial model for representing the occurrence of that word in the second sentence, leading to an automaton which would recognize &amp;quot;sentences&amp;quot; as&amp;quot;some people eat snakes are dangerous&amp;quot;, or simply &amp;quot;snakes&amp;quot;. The situation that produces this kind of undesired behaviour of the method is characterized by the confluence of a couple of circumstances: a word in a new sentence is also present in the current model, but with a different function, and that automaton has not enough adequate structural information for offering a better merging to the new sentence.</Paragraph>
    <Paragraph position="7"> As pointed out by Prieto and Vidal (1992), a proper ordering of the set of sentences presented to ECGI can provide more compact models, and we think that better ones too. The ordering we propose here simply follows, first, a decreasinglength criterion and then, for breaking ties, applies any dictionary-like ordering. Thus, we try to avoid the problem discussed in the previous paragraph by providing the inference algorithm with as much as possible structural information at first stages of automaton construction and, moreover, dictionary-like ordering inside each length is aimed at frequently presenting to ECGI new sentences that are similar to the previous ones.</Paragraph>
    <Paragraph position="8"> Furthermore, a very common way to reduce the complexity of problems involving languages is the definition of word categories, which can be manually designed or automatically extracted from data (Martin et al., 1995). We think categorization helps in solving the problem of undesired merges and also in increasing the generalization abilities of ECGI. In order to illustrate this point, let us consider a category a86 animalsa87 consisting of words &amp;quot;snakes&amp;quot;, &amp;quot;rats&amp;quot; and &amp;quot;people&amp;quot; in the very simple example of Figure 1. Words can be substituted for the appropriate category in the original sentences; then, the modified sentences are presented to the inference algorithm; finally, categories in the automaton are expanded.</Paragraph>
    <Paragraph position="9"> Figure 3 shows the automata that are successively built in that process.</Paragraph>
    <Paragraph position="10"> As said at the beginning of this section, determinism must be imposed a posteriori for the language models to fit our formal framework. In addition, we will apply them a minimization process in order to simplify the problem that the corresponding association model will have to solve.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Loco C: A new association model
</SectionTitle>
    <Paragraph position="0"> Following a data-driven approach, a Grammar Association system needs to learn from examples an association model capable to estimate the probabilities required by our recently developed framework, that is, the probability of each rule in the grammar that models the output language, conditioned on its left-hand side and the derivation of the input sentence.</Paragraph>
    <Paragraph position="1"> Among the different association models we have studied (Prat, 1998), it is worth emphasizing one we have specifically developed for playing that role in Grammar Association systems: the Loco C model. We based our design on the IBM models 1 and 2 (Brown et al., 1993), but taking into account that our model must generate correct derivations in a given grammar, not any se- null ample of Figure 1.</Paragraph>
    <Paragraph position="2"> quence of rules.5 Moreover, we wanted to model the probability estimation for each output rule as an adequately weighted mixture,6 along with keeping the maximum-likelihood re-estimation of its parameters within the growth transformation framework (Baum and Eagon, 1967; Gopalakr5In those simple IBM translation models, an output sequence (of words) is randomly generated from a given input one by first choosing its length and then, for each position in the output sequence, independently choosing an element (word). If the relation between input and output derivations (sequences of rules) has to be explicitly modelled, the choices of output elements can no longer be independent because a rule is only applicable if its left-hand side has just appeared in the output derivation.</Paragraph>
    <Paragraph position="3"> 6In IBM models, all words in the input sequence have the same influence in the random choice of output words (model 1) or they have a relative influence depending on their positions (model 2). In the case of derivations, we are interested in modelling those relative influences taking into account rule identities (instead of rule positions).</Paragraph>
    <Paragraph position="4"> ishnan et al., 1991). After exploring some similar alternatives (and discarding them because of their poor results in a few translation experiments), Loco C was finally defined as explained below.7 The Loco C model assumes a random generation process (of an output derivation, given an input one) which begins with the starting symbol of the output grammar as the &amp;quot;current sentential form&amp;quot; and then, while the current sentential form contains a non-terminal, iteratively performs the following sequence of two random choices: in Choice 1, one of the rules in the input derivation is chosen; in Choice 2, the non-terminal in the current sentential form is rewritten using a randomly chosen rule of the output grammar.</Paragraph>
    <Paragraph position="5"> The behaviour of the model depends on two kinds of parameters, each one guiding one of the choices mentioned above. Formally, given an input derivation a56 a45 and an output non-terminal a90 a43 to be rewritten, the probability of an input rule  &amp;quot;automatic variable selection&amp;quot; of the input rules that are relevant for discriminatively choosing among the next applicable output rules.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
MLA Task
</SectionTitle>
      <Paragraph position="0"> Spanish: &amp;quot;un c'irculo oscuro est'a encima de un c'irculo&amp;quot; English: &amp;quot;a dark circle is above a circle&amp;quot; Spanish: &amp;quot;se elimina el cuadrado oscuro que est'a debajo del c'irculo y del tri'angulo&amp;quot; English: &amp;quot;the dark square which is below the circle and the triangle is removed&amp;quot; Simplified Tourist Task Spanish: &amp;quot;nos vamos a ir el d'ia diez a la una de la tarde.&amp;quot; English: &amp;quot;we are leaving on the tenth at one in the afternoon.&amp;quot; Spanish: &amp;quot;?puedo pagar la cuenta con dinero en efectivo?&amp;quot; English: &amp;quot;can I pay the bill in  using the rule a68 a43 .</Paragraph>
      <Paragraph position="1"> Consequently, the corresponding likelihood function is not polynomial, but rational, so Baum-Eagon inequality (1967) cannot be applied and Gopalakrishnan et al. inequality (1991) must be used, instead, in order to develop a Loco C model re-estimation algorithm based on growth transformations. Fortunately, both the computational complexity of the resulting re-estimation algorithm (same order as with IBM model 1) and the experimental results are satisfactory.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML