XML Viewer - c00-1075

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1075_metho.xml
Size: 19,043 bytes
Last Modified: 2025-10-06 14:07:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1075">
  <Title>Application of Analogical Modelling to Example Based Machine Translation</Title>
  <Section position="5" start_page="516" end_page="520" type="metho">
    <SectionTitle>
2. General
</SectionTitle>
    <Paragraph position="0"> The main idea behind otu&amp;quot; approach is based o,1 the observation that given any source and target language sentence pair, any alteration of the source sentence will most likely result in one or more changes in the respective target, while it is also highly likely that constant and variable units of the source sentence correspond to constant and variable target units respectively. Apart from cases of so called &amp;quot;translational divergences&amp;quot; (Dorr, B. 1994) as well as cases of idiomatic expressions, in most eases the above assumption hokts true. Especially in the case of technical sublanguagcs, where rather literal and accurate translation is expected, * y '~ &amp;quot;translational divergences are limited while idiomatic expressions can be captured and finally rejected fiom the main process, through certain constraints, as this will be explained later on.</Paragraph>
    <Paragraph position="1"> The matching process as this is described by (Daelemans W., et al, 1997) based on Skousen's analogical modelling algorithm (Skousen, R. 1989), consists of two subsequent stages. The first stage of the matching process is the construction of &amp;quot;subcontexts&amp;quot;, these are sets o1' examples and they are obtained by matching the input tmttern, feature by feature, to each database item on an equal/notequal base, and classify the database examples accordingly. Taking the input pattern ABC as an example eight (=2 3) different and mutually disjoint subcontexts would be constructed: ABC, ~,BC, ABC, ABC, ABC, ABC, ABC, ABC where the macron denotes complementation. Thus exemplars in the second class share only the second and third feature with the input pattern.</Paragraph>
    <Paragraph position="2"> in the following stage &amp;quot;supraeontexts&amp;quot; are constructed by generalising over specific feature values. This is done by systematically discarding features fi'om the input pattern and taking the union of the subcontexts that are subsumed by this new pattern. Supracontexts can be ordered with respect to generality, so that most specific supracontext contains items that share all features with the input pattern while the less specific ones those items that share at least one feature. The most general supracontext contains all database examples whether or not they share any features with the input pattern.</Paragraph>
    <Paragraph position="3"> Some exemplary supracontexts together with the respective subeontexts for the input pattern ABC are I)rcsented in the following table*  1,1 addition, our approach introduces a second dimension to tile above described process, that of language, by simultaneously performing the matching process to target language equivalents and aligning individual results, based on the principles described earlier. Therefore, what we are ultimately searching for, are source and target sentence pairs for which evidence of correspondence between any or all of respective subcontcxts within the available training corpora is available. This will subsequently lead to links between respective supracontexts. For example :</Paragraph>
    <Section position="1" start_page="516" end_page="517" type="sub_section">
      <SectionTitle>
3.1 Translation Templates
</SectionTitle>
      <Paragraph position="0"> Supracontexts and translation templates can be viewed as two sides of the same coin.</Paragraph>
      <Paragraph position="1"> Generalization through unification on feature values of neigbbouring sentences, if these satisfy, certain criteria, leads to more abstract expressions of bilingual pairs of &amp;quot;pseudo-sentences&amp;quot;, consisting of sequences of constant and variable elements,</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"> where variable elements are represented by special symbols (&amp;quot;Xi&amp;quot;) and constant-fixed elements act as the context in each case.</Paragraph>
    </Section>
    <Section position="2" start_page="517" end_page="517" type="sub_section">
      <SectionTitle>
3.2 Translation Units
</SectionTitle>
      <Paragraph position="0"> Discarded features (represented by the &amp;quot;-&amp;quot; symbol) of corresponding supracontexts, rising from variable elements of the matching sentences, correspond to the translation units of the respective translation patterns. As a result, single or multi-word elements (translation units) of source and target language appearing within corresponding supracontext positions, are linked and stored, comprising the bilingual translation unit lexicon.</Paragraph>
    </Section>
    <Section position="3" start_page="517" end_page="518" type="sub_section">
      <SectionTitle>
3.3 The Analogical Network
</SectionTitle>
      <Paragraph position="0"> The main linguistic object for which matching is performed is not the sentence but pairs of source and target sentences/exemplars. Therefore, matching between linguistic objects is performed in two dimensions simultaneously, that is between source and target sentences of matching pairs respectively. The result of the process, if certain conditions are met, are stored in an &amp;quot;analogical network&amp;quot; (Federici, S. &amp; Pirrelli V., 1994) of inter-sentence and intrasentence relations between these exemplars and their generalizations. A rather simple example of this is presented ill Figure 1.</Paragraph>
      <Paragraph position="1"> Different parts of matching sentences are replaced by corresponding variables, and are consequently assigned the role of translation units, while similar/constant parts are considered to be the context under which variable units are instantiated.</Paragraph>
      <Paragraph position="2"> The union of context and variables establishes the &amp;quot;generalized&amp;quot; translation (paradigmatic) patterns between source and target language. The similar (constant) and different (variable) parts between source and target sentences are factored out and presented as separate nodes in the above diagram.</Paragraph>
      <Paragraph position="3"> For each sentence we can view its constituent single or multi-word, constant or variable units as separate nodes, where links between these nodes indicate the syntagmatic relations between them, that is, the way they actually appear and are ordered in the respective sentence. The vertical axis represents the paradigmatic dimension of available alternants, that is, the information concerning which substrings are in complementary distribution with respect to the same syntagmatic context i.e.</Paragraph>
      <Paragraph position="4"> with respect to the same context 'degCustomizing __  settings&amp;quot;. Syntagmatic links constitute the intrasentence relations/links between sentence constituents Mille paradigmatic ones correspond to the interscntential relations. Furthermore, a third dhnension is added to the whole framework, that of the &amp;quot;l'mguage&amp;quot;, since all principles are applied simultaneously to both source sentences and their target equivalents. In case, linguistic annotations are available, they are appropriately incorporated in the respective nodes.</Paragraph>
      <Paragraph position="5"> At this point no conflicts are resolved. All possible patterus are stored in the network including conflicting as well as overlapping patterns.</Paragraph>
      <Paragraph position="6"> However, all links both paradigmatic and syntagmatic are weighted by frequency information. Tiffs will eventually provide the necessary informatiou to disable and even discard certain false or useless variables or templates.</Paragraph>
    </Section>
    <Section position="4" start_page="518" end_page="518" type="sub_section">
      <SectionTitle>
3.4 The Algorithm
</SectionTitle>
      <Paragraph position="0"> Translation templates as well as translation units are treated as paradigmatic flexible structures that depend on the available evidence. As new data come into the system, rules can be extended or even replaced by other more general ones. It is usually assulned that there is only one fixed way to assign a structural representation to a symbolic object either be a translation unit or a translation template.</Paragraph>
      <Paragraph position="1"> 14owever, it is obvious that in our approach there is no initial fixed definition of this particular structure, rather it is left up to the training corpus and the learning mechanism. As was expected, under this kind of analogy-based approach, linguistic objects were determined based on the paradiglnatic context they appeared in, resulting in a more flexible and also corpus dependent definition of translation units.</Paragraph>
    </Section>
    <Section position="5" start_page="518" end_page="518" type="sub_section">
      <SectionTitle>
Search Space Reduction
</SectionTitle>
      <Paragraph position="0"> In general, if sentence matching were unconstrained and all resulting matches were stored in tile analogical network, then the number of all links (inter/intra-sentential) for N equal to the number of translation patterns learned through the process and L equal to the number of words in a sentence (template) would be : while the complexity of the learning phase is also increased by tile fact that each candidate rule needs to be verified against the available corpus, introducing an additional parameter S, that of the size of tile training corpora (in number of sentences).</Paragraph>
      <Paragraph position="1"> Moreover, if a rather straightforward approach in matching was to be followed, the complexity involved for each individual candidate senteuce would be enormous. In such an approach, for each candidate sentence, all corresponding subcontexts would have to be identified and verified against the available corpora. For instance, a sentence of length L would generate 2 ~' subcontexts, thus resulting in 0(2 L) required search actions against the available corpora. Even if constraints would be set upon the length of possible ignore (variable) areas, for example = 5 words, the process would still be too complex. For example for a sentence of length L = 10 and for variables of length up to 5 words, the possible subcontexts that have to be matched against the corpus would be (,,,)+/:)_,_ (,;)+ (:)+ (,;)--,o+,, 210 + 36 =- 421, where terms of tile previous cquation correspond to the subcontexts with variables of length 1 to 5 respectively.</Paragraph>
      <Paragraph position="2"> The SSR methodology, depends on the specific needs of the particular task. Run-time pruning of possible matches can speed up the learning process, however it also reduces system recall &amp; coverage.</Paragraph>
      <Paragraph position="3"> On the other hand, constraints on paradigmatic relations are more reliable providing better results but cannot contribute to the speed of the learning process. SSR was based on an efficient indexing and retrieval mechanism (Wilhnan, N. 1994) allowing fast identification of &amp;quot;relevant&amp;quot; sentences based on comlnon single/multi-word units. In this way, the search space for each individual candidate was significantly reduced to a smaller set of possible matching sentences.</Paragraph>
    </Section>
    <Section position="6" start_page="518" end_page="519" type="sub_section">
      <SectionTitle>
Distance Metric
</SectionTitle>
      <Paragraph position="0"> The main objects of knowledge generated by the learning process are the translation patterns and the bilingual lexicon of translation units. During the learning process, both sources are enriched when possible. Sentences are analysed and encoded to two-dimensional vectors based on the words (first dimension) and the linguistic annotations (second dimension) they might contain. Then sentence vectors are compared on an equal - not equal basis  through a Levensthein or Edit distance algorithrn (Damerau, F. 1964), (Oflazer, K. 1996). The algorithm, implemented through a dynamic programming framework (Stephen, G. 1992), computes the minimum number of required editing actions (insertions, deletions, substitutions, movements and transpositions) in order to transform one sentence into another through an inverse backtracking procedure. The final similarity score is computed by assigning appropriate weights to these actions. For the time being only insertions and deletions were accounted for. More complex actions, like transpositions or movements of words and their influence in the final translation pattern will be the focus of future work.</Paragraph>
    </Section>
    <Section position="7" start_page="519" end_page="519" type="sub_section">
      <SectionTitle>
Variable Elements
</SectionTitle>
      <Paragraph position="0"> Diflbrences between matching sentences result in coupling of corresponding source and target words, as explained earlier in this section, thus enriching the lexicon with new information. Coupling is restricted to content words. Content words can usually be replaced by other words of the same category acting as potential variables (Kaji, H. et al 1992). On the other hand fimctional words do present an &amp;quot;abnormal&amp;quot; translational behavior, since they sometimes act as optional units which do not appear in both source and target segments, other times have a one-to-one correspondence, yet it is not rare that they affect the target pattern (especially when they participate in verb complementation). &amp;quot;Exclusion lists&amp;quot; were used for this purpose in order to reject functional words from acting as translation w~riables.</Paragraph>
      <Paragraph position="1"> Workflow All sentences are stored as vectors of constituent words-annotations. Functional words are marked as such. The process runs iteratively tbr all sentences starting l'rom sentences of length 1 to the maximum length appearing in the training corpus. The process terminates in case of an unsuccesslifl loop, meaning an iteration where no new information either translation units or templates were extracted. The learning process consisting of five subsequent phases, is depicted in detail in Figure 2 :</Paragraph>
    </Section>
    <Section position="8" start_page="519" end_page="519" type="sub_section">
      <SectionTitle>
Phasel Search Space Reduction : Extract an
</SectionTitle>
      <Paragraph position="0"> initial set of possibly relevant sentences tbr the current input sentence.</Paragraph>
      <Paragraph position="1">  against the previous set. Matching candidates are</Paragraph>
    </Section>
    <Section position="9" start_page="519" end_page="520" type="sub_section">
      <SectionTitle>
Phase3 Identification of Subcontexts :For each
</SectionTitle>
      <Paragraph position="0"> naatching candidate, identify the respective subcontext of the input sentence that it adheres to.</Paragraph>
      <Paragraph position="1"> Examine target language equivalents. Resolve differences between source and target language matching candidates based on already existing intbrmation contained in the bilingual lexicon.</Paragraph>
      <Paragraph position="2"> During this process, the bilingual translation unit lexicon is enriched with any successfldly resolved difference (even if the particular candidate will not finally lead to a new translation pattern).</Paragraph>
    </Section>
    <Section position="10" start_page="520" end_page="520" type="sub_section">
      <SectionTitle>
Phase4 Identification of Supracontcxts : Based
</SectionTitle>
      <Paragraph position="0"> on tile ah'eady identified subcontexts produce tile respective supraeontexts through unification of respective variable feature values.</Paragraph>
    </Section>
    <Section position="11" start_page="520" end_page="520" type="sub_section">
      <SectionTitle>
Phase5 Extraction of Translation Patterns :
</SectionTitle>
      <Paragraph position="0"> Construct corresponding translation patterns from existing supraeontexts. Update analogical network.</Paragraph>
      <Paragraph position="1"> In case a pattern ahcady exists, update the weight of its constituent links.</Paragraph>
      <Paragraph position="2"> At the end of the learning process the analogical network has been enriched with all possible translation patterns and variables/units extracted fi'om the available corpora. Conflict resolution and network refinement in general is performed on the final results, where all information is available as described in the next section.</Paragraph>
    </Section>
    <Section position="12" start_page="520" end_page="520" type="sub_section">
      <SectionTitle>
3.5 Network Refinement
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, tile analogical network contains all translation alternatives For individual translation units as well as all translation patterns resulting fiom the learning process, l lowever, link weight information is also included in the above framework representing the validily of a particular relation against the training corpus.</Paragraph>
      <Paragraph position="1"> Translation alternatives of individual units (in our case words) are implicitly classified through their context, that is the constant part of the translation patterns they participate in. These will constitute the main selection criterion during translation, l lowevcr, fi'equency inlimnation is also used in order to disable and finally discard obsolete or erroneous translation unit alternatives.</Paragraph>
      <Paragraph position="2"> Translation templates are compared with respect to their source and target language constituent patterns: (a) Conflicting templates, that is templates sharing only one of the two patterns are subsequcntly checked in terms of weight information. Templates of equivalent weights are considered equally effective. This is usually tile case where different translations are produced fi'om the same source pattern due to semantic difliarences on the variables it contains. Conflicting templates with significantly low weights (under a predefined threshold), are judged ineffective or &amp;quot;exceptional&amp;quot; (Nomiyama, It. 1992) and are flagged as such in order to receive a special treatment during the translation phase (Watanabe, 1t. 1994). These can even be disabled or discarded fiom the network depending on their significance weight lhrough a dynamic 'Torgetting and remembering&amp;quot; process (Streiter, O. et al, 1999). (b) Overlapping templates, where both source and target patterns of one template can be generated from the other by coupling words of the constant part of the template through valid translation alternatives included in the network, are identified and the more general ones are preferred. A basic requirement is that tile set of all translation alternatives instantiating the variables of the more general template is a superset of those instantiating the less general one. In any other case, both templates are retained. And finally, (c) complementary templates, are also identified and replaced by their union.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML