File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/a94-1034_abstr.xml
Size: 8,194 bytes
Last Modified: 2025-10-06 13:47:58
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1034"> <Title>USING SIqVTA CTICDEPENDENCIES FOR WORD ALIGNMENT</Title> <Section position="1" start_page="0" end_page="188" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We attack the problem of aligning words from pairs of bilingual sentences, rather than the well-known, and somewhat easier, problem of aligning sentences. The method that we develop is based on the use of bilingual dictionaries, having supposed that lemmatization has taken place. We first show that this method performs poorly in terms of silence and noise. To improve its performance we introduce syntactic dependency relations between the words in each of the two sentences considered. In this sense the syntagmatic level comes to the rescue of the paradigmatic level at which the alignment actually takes place.</Paragraph> <Paragraph position="1"> I. Introduction Given that two sentences F and E are translations of each other. Is there a simple method for aligning the words in each sentence? In other words is word alignment algorithmically simple to implement? We shall see that this problem is extremely delicate, even to be done by hand. To convince oneself, one need merely attempt it in order to see how quickly the choices pass from trivial to complicated. The difficulties stem from there not always being a simple one-to-one correspondence between the words of the two sentences. One word may correspond to many words (an expression); in other cases, one or many words may correspond to no other words. On the other hand, word order is rarely maintained, and to top things off, different syntactic status create complicated pairings when they do exist.</Paragraph> <Paragraph position="2"> IL Conventions and restrictions concerning manual alignment of words Let's begin by the cases that will be excluded due to the excessive level of difficulty that they present.</Paragraph> <Paragraph position="3"> Consider the two sentences ~ *</Paragraph> <Paragraph position="5"> 1. All the pairs of sentences as examples are extracted from The Acoustics of the Harpsichord (SCIENTIFIC AMER/CAN, February 1991) and its French translation L'acoustique du clavecin (POLrR LA SCIENCE, avril 1991) Although these sentences correspond to each other in the text that they appeared in, we cannot establish an alignment of their words. We will not study these cases for a few reasons: first, outside of their context it is difficult, even for human readers, to affirm their semantic relation as a translation; secondly, in order to align these sentences, the entire sentences must be considered as an expression, and this is debatable.</Paragraph> <Paragraph position="6"> How can manual alignments be represented? We will distinguish the alignment of words and groups of words whose mutual translation is established with the aid of a bilingual dictionary from alignments that are made from a local recomposition based on human &quot;comprehension&quot; of the two sentences.</Paragraph> <Paragraph position="7"> We will use the equal-sign (=) to mark links which come from a bilingual dictionary and the star symbol (*) to mark comprehension correspondences. We will call the first type of correspondence &quot;lexical correspondence&quot; and the second type &quot;contextual correspondence&quot;.</Paragraph> <Paragraph position="8"> The alignments (l-n) (m-l) or (m-n) are characterized by the presence, on the same line, of more than one *</Paragraph> <Paragraph position="10"> III. Hypothesis As a basis of our algorithm we find the following hypothesis. Consider two sentences F and E which are translations of each other.</Paragraph> <Paragraph position="11"> We say that two words j~ and ej, belonging to F and E respectively, correspond to each other if: i) they are translations of each other; ii) they enter into the same dependency relations with their neighbors; iii) they occupy the same positions.</Paragraph> <Paragraph position="12"> IV. Potential Alignments Consider the two sentences F and E. The potential alignment of words is obtained by comparing each of the words of one sentence with all of those from the second sentence. The comparisons ~, ej) are established with the help of a simple word transfer dictionary and the results are stored in a m x n matrix (m being the number of words in the French sentence and n in the English). Each element receives a note that is higher if the two words are: i) translations in the dictionary, ii) long, iii) in the same position. V. Ambiguity, noise and silence An alignment is 'ambiguous' if more than one solution is produced. Typology of errors (noise, silence): We will call errors of noise those alignments created between words should not be aligned, and errors of silence missing alignments between words which were manually aligned.</Paragraph> <Paragraph position="13"> VI. The reasons for noise and silence Noise: At the root of noisy alignments we find the problem ofpolysemy. When it is not resolved, it causes words to be aligned through sense that are improper in the current context.</Paragraph> <Paragraph position="14"> Another source of error corresponds to simple errors of alignment: the two words are translations of each other but in the present context they should not be aligned. For example, in the following sentences areas28 was incorrectly aligned with zones17.</Paragraph> <Paragraph position="15"> Silence: The main problem is something missing from the dictionary: either the head word is not present, or the correct translation is absent. This is essentially the non-recognition of synonymy that is the problem.</Paragraph> <Paragraph position="16"> VII. Resolution by Analogic Reasoning In order to reduce both noise and silence, we use a mechanism based on analogical reasoning. This is based on the following fundamental hypothesis: paradigmatic relations can help determine syntagmatic relations and vice-versa.</Paragraph> <Paragraph position="17"> Using monolingual dependency relations.</Paragraph> <Paragraph position="18"> The resolution mechanism can be understood from the following diagram.</Paragraph> <Paragraph position="19"> On this figure are represented four words of which two are aligned (~, ep). Syntactic dependencies between two other pairs of words ~ j~\] and \[ep eq\] are also represented. We want to know how valid the alignment betweenJ~ and e e is.</Paragraph> <Paragraph position="20"> To answer this, we reason in the following way: 1. On the syntagmatic plane, \[\] sincej~ is in relation withj~ (the relation R/ being supposed valid), \[\] since e e is in relation with eq (the relation R, being valid), 2. on the paradigmatic plane, \[\] since ~ is the translation of eq (supposing the alignment relation P2 is valid), then we conclude, by analogy, that the alignment relation Pl is also valid, in other words that j~ and e e are translations of each other in this context. This degree of validity will be stronger as the dependency relations Rf and 1% are close (identical or compatible) and as P1 and P2 get close to identity.</Paragraph> <Paragraph position="21"> We will call strong resolution one that confirms an existing potential alignment, and weak resolution one that negates an existing alignment or that creates a new alignment.</Paragraph> <Paragraph position="22"> VIII. Conclusion The algorithm presented here subdivides into three phases. The first phase is construction: based on lexical proximity, we try to establish all the possible links between the words of the two sentences being aligned. The second phase is one of elimination: using syntactic dependencies we attempt to resolve ambiguous attachments and to undo nonambiguous but erroneous attachments. The third step is again one of construction: we attempt to reduce silence.</Paragraph> <Paragraph position="23"> We repeat that even human solutions to alignments are subject to wide variations, which shows the difficulty of problem.</Paragraph> <Paragraph position="24"> Ackowledgements to tiadhemi Achour, Chiraz Ben Othman, Emna Souissi and Gregory Grefenstette.</Paragraph> </Section> class="xml-element"></Paper>