File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1050_intro.xml
Size: 9,190 bytes
Last Modified: 2025-10-06 14:05:46
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1050"> <Title>Weighted Rational Transductions and their Application to Human Language Processing</Title> <Section position="3" start_page="262" end_page="264" type="intro"> <SectionTitle> 2. Theory </SectionTitle> <Paragraph position="0"> In the transduction cascade (1), each step corresponds to a mapping from input-output pairs (r, s) to probabilities P(slr).</Paragraph> <Paragraph position="1"> More formally, steps in the cascade will be weighted transductions T : 27 x F* ~ K where 27 and F* the sets of strings over the alphabets Z and F, and K is an appropriate set of weights, for instance the real numbers between 0 and 1 in the case of probabilities. We will denote by T- 1 the inverse of T defined by T(t, s) = T(s, t).</Paragraph> <Paragraph position="2"> The right-most step of (1) is not a transduction, but rather an information source, in that case the language model. We will represent such sources as weighted languages L : Z* ~ K.</Paragraph> <Paragraph position="3"> Given two transductions S : Z* x F* ~ K and T : F* x A* ---~ K, we can define their composition S o T by</Paragraph> <Paragraph position="5"> is is clear that S o T represents P(sk Isi).</Paragraph> <Paragraph position="6"> A weighted transduction S : Z* x F* --. K can be applied to a weighted language L : Z* ~ K to yield a weighted language over F. It is convenient to abuse notation somewhat and use</Paragraph> <Paragraph position="8"> Furthermore, if M is a weighted language over F, we can reverse apply S to M, written S o M = M o (S-:). For example, ifS represents P(sk \[so) and M represents P(so) in (1), then S o M represents P(po, pk).</Paragraph> <Paragraph position="9"> Finally, given two weighted languages M, N : Z* ~ K we define their intersection, also by convenient abuse of notation</Paragraph> <Paragraph position="11"> In any cascade R1 o ... o Rm, with the Ri for 1 < i < m appropriate transductions and R1 and Rm transductions or languages, it is easy to see that the order of association of the o operators does not matter. For example, if we have L o S o T o M, we could either apply S to L, apply T to the result and intersect the result with M, or compose S with T, reverse apply the result to M and intersect the result with L. We are thus justified in our use of the same symbol for composition, application and intersection, and we will in the rest of the paper use the term &quot;(generalized) composition&quot; for all of these operations.</Paragraph> <Paragraph position="12"> For a more concrete example, consider the transduction cascade for speech recognition depicted in Figure 1, where A is the transduction from acoustic observation sequences to phone sequences, D the transduction from phone sequences to word sequences (essentially a pronunciation dictionary) and M a weighted language representing the language model. Given a particular sequence of observations o, we can represent it as the trivial weighted language O that assigns 1 to o and 0 to any other sequence. Then O o A represents the acoustic likelihoods of possible phone sequences that generate o, O o A o D the aeoustic-lexical likelihoods of possible word sequences yielding o, and O o A o D o M the combined acoustic-lexicallinguistic probabilities of word sequences generating o. The word string w with the highest weight (0 o A o D o M)(w) is precisely the most likely sentence hypothesis generating o.</Paragraph> <Paragraph position="13"> Exactly the same construction could have been carried out with weights combined by rain and sum instead of sum and product in the definitions of application and intersection, and</Paragraph> <Paragraph position="15"> in that case the string w with the lowest weight (O o A o D o M)(w) 'would the best hypothesis. More generally, the sum and product operations in (4), (5) and (6) can be replaced by any two operations forming an appropriate semiring \[7, 8, 9\], of which numeric addition and multiplication and numeric minimum and addition are two examples 1 Generalized composition is thus the main operation involved in the construction and use of transduction cascades. As we will see in a moment, for rational languages and transductions, all instances of generalized composition are implemented by a uniform algorithm, the join of two weighted finite automata.</Paragraph> <Paragraph position="16"> In addition to those operations, weighted languages and transductions can be constructed from simpler ones by the operations shown in Table 1, which generalize in a straightforward way the regular operations well-known from traditional automata theory \[1\]. In fact, the rational languages and transductions are exactly those that can be built from singletons by applications of scaling, sum, concatenation and closure.</Paragraph> <Paragraph position="17"> For example, assume that for each word w in a lexicon we are given a rational transduction D,o such that D~ (p, w) is the probability that w is realized as the phone sequence p. Note that this crucially allows for multiple pronunciations for w.</Paragraph> <Paragraph position="18"> Then the rational transduction (~ D,o) * gives the probabilities for realizations of word sequences as phone sequences (ignoring possible cross-word dependencies, which will be discussed in the next section).</Paragraph> <Paragraph position="19"> Kleene's theorem states that regular languages are exactly those representable by finite-state acceptors \[1\]. Its generalization to the weighted case and to transducers states that weighted rational languages and transducers are exactly those that can be represented by finite automata \[8\]. Furthermore, all the operations on languages and transductions we have discussed have finite-automata counterparts, which we have implemented. Any cascade representable in terms of those operations can thus be implemented directly as an appropriate combination of the programs implementing each of the operations.</Paragraph> <Paragraph position="20"> lAdditional conditions to guarantee the existence of certain infinite sums may be necessary for certain semirings, for details see \[7\] and \[8\]. In the present setting, a K-weighted finite automaton.,4 consists of a finite set of states Qa and a finite set Aa of transitions s//~ ql q --, between states, where x is an element of the set of transition labels AA and k E K is the transition weight. An associative concatenation operation u * v must defined between transition labels, with identity element ect. As usual, each automaton has an initial state iA and a final state assignment, which we represent as column vector of weights FA indexed by states:. A K-weighted finite automaton with AA = Z* is just a weighted finite-state acceptor (WFSA). On the other hand, ifAA = Z* x F* with concatenation defined by (r, s). (u, v) = (ru, sv), we have a weightedfinite-state transducer (WFST).</Paragraph> <Paragraph position="21"> As usual, we can define a path in an automaton .,4 as a sequence of connected transitions /3 = (q0, xl, kl, ql), * *., (qra-1, Xm, kin, qm). Such a path has label LA(p) = xl ..... z,~, weight Wa(15) = kl '' &quot;krn and final weight F - W~ (p) = WA(pP)FA(qm). We call ff reduced if it is the empty path or ifxl # e, and we write p ~,~ p' if k is the sum of the weights of all reduced paths with label u from q to q~.</Paragraph> <Paragraph position="22"> The language of automaton .,4 is defined as f~I~(~) where I.a(u) is the set of paths in .,4 with label u that start in the initial state i.d. Obviously, if .,4 is an acceptor, \[.A\] is a weighted language, and ifA is a transducer \[,4\]\] is a weighted transduction. The appropriate generalization of Kleene's theorem to weighted acceptors and transducers states that under mild conditions on the weights (which for instance are satisfied by the rain, sum semiring), weighted rational languages and transductions are exactly those defined by weighted automata as outlined here \[8\].</Paragraph> <Paragraph position="23"> Weighted acceptors and transducers are thus faithful imple- null Given two automata ..4 and B and a new label set J, and a partial label join function ~: A~ x An ~ J, we define their join by t~ as a new automaton C with label set J, states</Paragraph> <Paragraph position="25"> Different choices of t~ correspond to the instances of generalized composition: for intersection, Aa = An = Z*,</Paragraph> <Paragraph position="27"> join is the automata counterpart of generalized composition, and we will use the composition symbol indiferently in what follows to represent either composition or join.</Paragraph> <Paragraph position="28"> The operation between automata thus defined has a direct dynamic-programming implementation in which reachable join states (q, q') are placed in a queue and extended in turn usng (7). By organizing this queue according to the weights of least-weight paths from the start state, we can combine join computation with search for lowest-weight paths, and subautomata of the join with states reachable by paths with weights within a beam of the best path.</Paragraph> </Section> class="xml-element"></Paper>