File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1074_intro.xml
Size: 10,005 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1074"> <Title>Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora</Title> <Section position="3" start_page="0" end_page="446" type="intro"> <SectionTitle> 2 Alignments and flow networks </SectionTitle> <Paragraph position="0"> Let us first consider the following a)Jgned sentences, with the actual alignment beween words I: Assuming that we have probabilities of associating English and French words, one way to find the preceding alignment is to search for the most probable alignment under the constraints that any given English (resp. French) word is associated to one and only one French (resp. English) word. We can view a connection between an English and a French word as a flow going from an English to a French word. The preceding constraints state that the outgoing flow of an English word and the ingoing one of a French word must equal 1. We also have connections entering the English words, from a source, and leaving the French ones, to a sink, to control the flow quantity we want to go through the words.</Paragraph> <Section position="1" start_page="444" end_page="444" type="sub_section"> <SectionTitle> 2.1 Flow networks </SectionTitle> <Paragraph position="0"> We meet here the notion of flow networks that we can formalise in the following way (we assume that the reader has basic notions of graph theory).</Paragraph> <Paragraph position="1"> Definition 1: let G = (17, E) be a directed connected graph with m edges. A flow in G is a vector =(91,~2, &quot; ~m) T~R m (where T denotes the transpose of a matrix) such as, for each vertex i E V:</Paragraph> <Paragraph position="3"> where w+(i) denotes the set of edges entering vertex i, whereas w-(i) is the set of edges leaving vertex i.</Paragraph> <Paragraph position="4"> We can, furthermore, associate to each edge u of G = (V,E) two numbers, b~, and eu with b~, _< c,,, which will be called the lower capacity bound and the upper capacity bound of the edge.</Paragraph> <Paragraph position="5"> Definition 2: let G = (1/'.. E) be a directed connected graph with lower and upper capacity bounds. We will say that a flow 9in G is a feasible flow in G if it satisfies the following capacity constraints:</Paragraph> <Paragraph position="7"> Finally, let us associate to each edge u of a directed connected graph G = (V, E) with capacity intervals \[b~; c~\] a cost %, representing the cost (or inversely the probability) to use this edge in a flow. We can define the total cost, 7 x 9, associated to a flow 9 in G as follows:</Paragraph> <Paragraph position="9"> Definition 3: let G = (V,E) be a connected graph with capacity intervals Ibm; c~\], u 6 E and costs %,u 6 E. We will call minimal cost flow the feasible flow in G for which 7 x C/2 is minimal.</Paragraph> <Paragraph position="10"> Several algorithms have been proposed to compute the minimal cost flow when it exists. We will not detail them here but refer the interested reader to (Ford and Fulkerson, 1962; Klein, 1967).</Paragraph> </Section> <Section position="2" start_page="444" end_page="446" type="sub_section"> <SectionTitle> 2.2 Alignment models </SectionTitle> <Paragraph position="0"> Flows and networks define a general framework in which it is possible to model alignments between words, and to find, under certain constralnts, the best alignment. We present now an instance of such a model, where the only parameters involved are association probabilities between English and French words, and in which we impose that any English, respectively French word, has to be aligned with one and only one French, resp. English, word, possibly empty. We can, of course, consider different constraints. The constraints we define, though they would yield to a complex computation for the EM algorithm, do not privilege any direction in an underlying translation process.</Paragraph> <Paragraph position="1"> This model defines for each pair of aligned sentences a graph G(V, E) as follows: * V comprises a source, a sink, all the English and French words, an empty English word, and an empty French word, * E comprises edges from the source to all the English words (including the empty one), edges from all the French words (including the empty one) to the sink, an edge from the sink to the source, and edges from all English words (including the empty one) to all the French words (including the empty one) 2.</Paragraph> <Paragraph position="2"> * from the source to all possible English words (excluding the empty one), the capacity interval is \[1;1\], 2The empty words account for the fact that words may not be aligned with other ones, i.e. they are not exphcitely translated for example.</Paragraph> <Paragraph position="3"> * from the source to the empty English word, the capacity interval is \[O;maz(le, 1/)\], where l I is the number of French words, and l~ the number of English ones, * from the English words (including the empty one) to the French words (including the empty one), the capacity interval is \[0;1\], * from the French words (excluding the empty one) to the sink, the capacity interval is \[1;1\].</Paragraph> <Paragraph position="4"> * from the empty French word to the sink, the capacity interval is \[0; rnaz(l~, l/)\], * from the sink to the source, the capacity interval is \[0; max(le, l/)\].</Paragraph> <Paragraph position="5"> Once such a graph has been defined, we have to assign cost values to its edges, to reflect the different association probabilities. We will now see how to define the costs so as to relate the minimal cost flow to a best alignment. Let a be an alignment, under the above constraints, between the English sentence es, and the French sentence f~. Such an alignment a can be seen as a particular relation from the set of English words with their positions, including empty words, to the set of French words with their positions, including empty words (in our framework, it is formally equivalent to consider a single empty word with larger upper capacity bound or several ones with smaller upper capacity bounds; for the sake of simplicity in the formulas, we consider here that we add as many empty words as necessary in the sentences to end up with two sentences containing le + l/ words). An alignment thus connects each English word, located in position i, el, to a French word, in position j, fj. We consider that the probability of such a connection depends on two distinct and independent probabilities, the one of linking two positions, p(%(i) = a~), and the one of linking two words, p(a~(ei) = f~). We can then write:</Paragraph> <Paragraph position="7"> where P(a,e~,f~) is the probability of observing the alignment a together with the English and French sentences, es and f~, and (a,e,f)~ -1 is a shorthand for (al, .., ai-1, el,.., el-l, fal,.', fa,-i ).</Paragraph> <Paragraph position="8"> Since we simply rely in this model on association probabilities, that we assume to be independent, the only dependencies lying in the possibilities to associate words across languages, we can simplify the above formula and write: le+l 1</Paragraph> <Paragraph position="10"> where a~ -1 is a shorthand for (al,..,ai-1).</Paragraph> <Paragraph position="11"> p(ei, f~,) is a shorthand for p(a~(ei) = f~,) that we will use throughout the article. Due to the constraints defined, we have: p(ei, f~,\[a~) = 0 if ai E a~ -1, and p(ei, PS,) otherwise.</Paragraph> <Paragraph position="12"> Equation (5) shows that if we define the cost associated to each edge from an English word ei (excluding the empty word) to a French word fj (excluding the empty word) to be 7~ = -lnp(ei, fj), the cost of an edge involving an empty word to be e, an arbitrary small positive value, and the cost of all the other edges (i.e. the edges from SoP and SiP) to be 1 for example, then the minimal cost flow defines the alignment a for which P(a, es, fs) is ma~mum, under the above constraints and approximations.</Paragraph> <Paragraph position="13"> We can use the following general algorithm based on maximum likelihood under the maximum approximation, to estimate the parameters of our model:</Paragraph> <Paragraph position="15"> set some initial value to the different parameters of the model, for each sentence pair in the corpus, compute the best alignment (or an appro~mation of this alignment) between words, with respect to the model, and update the counts of the different parameters with respect to this alignment (the ma~mum likelihood estimators for model free distributions are based on relative frequencies, conditioned by the set of best alignments in our case), go back to step 2 till an end condition is reached.</Paragraph> <Paragraph position="16"> This algorithm converges after a few iterations. Here, we have to be carefull with step 1. In particular, if we consider at the beginning of the process all the possible alignments to be equiprobable, then all the feasible flows are minimal cost flows. To avoid this situation, we have to start with initial probabilities which make use of the fact that some associations, occurring more often in the corpus, should have a larger probability. Probabilities based on relative frequencies, or derived fl'om the measure defined in (Dunning, 1993), for example, allow to take this fact into account.</Paragraph> <Paragraph position="17"> We can envisage more complex models, including distortion parameters, multiword notions, or information on part-of-speech, information derived from bilingual dictionaries or from thesauri. The integration of new parameters is in general straigthforward. For multi-word notions, we have to replace the capacity values of edges connected to the source and the sink with capacity intervals, which raises several issues that we will not address in this paper. We rather want to present now an application of the flow network model to multilingual terminology extraction.</Paragraph> </Section> </Section> class="xml-element"></Paper>