XML Viewer - p03-1006

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1006_metho.xml
Size: 27,797 bytes
Last Modified: 2025-10-06 14:08:12
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1006">
  <Title>Generalized Algorithms for Constructing Statistical Language Models</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Counting
</SectionTitle>
    <Paragraph position="0"> This section describes a counting algorithm based on general weighted automata algorithms. Let a129 a3 a9a11a103a107a12a71a104a25a12a71a105a72a12a18a100a13a12a32a202a46a12a71a203a160a12a18a108a20a12a71a109a164a22 be an arbitrary weighted automaton over the probability semiring and let a134 be a regular expression defined over the alphabet a100 . We are interested in counting the occurrences of the sequences a167a35a29a35a131a13a9a76a134a135a22 in a129 while taking into account the weight of the paths where they appear.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Definition
</SectionTitle>
      <Paragraph position="0"> When a129 is deterministic and pushed, or stochastic, it can be viewed as a probability distribution a163 over all strings</Paragraph>
      <Paragraph position="2"/>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 The weight
</SectionTitle>
    <Paragraph position="0"> a138a138 a129a83a139a139a116a9a90a167a66a22 associated by a129 to each string a167 is then a163 a9a90a167a66a22 . Thus, we define the count of the sequence</Paragraph>
    <Paragraph position="2"> where a210a211a123a210 a191 denotes the number of occurrences of a167 in the string a211 , i.e., the expected number of occurrences of a167 given a129 . More generally, we will define the count of a167 as above regardless of whether a129 is stochastic or not.</Paragraph>
    <Paragraph position="3"> In most speech processing applications, a129 may be an acyclic automaton called a phone or a word lattice output by a speech recognition system. But our algorithm is general and does not assume a129 to be acyclic.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Algorithm
</SectionTitle>
      <Paragraph position="0"> We describe our algorithm for computing the expected counts of the sequences a167a69a29a70a131a13a9a76a134a135a22 and give the proof of its correctness.</Paragraph>
      <Paragraph position="1"> Let a212 be the formal power series (Kuich and Salomaa, 1986) a212 over the probability semiring defined by a212 a3</Paragraph>
      <Paragraph position="3"> Lemma 1 For all a214 a29a7a100a83a133 , a9 a212 a12 a214 a22a141a3 a210a214 a210 a191 .</Paragraph>
      <Paragraph position="4"> Proof. By definition of the multiplication of power series in the probability semiring:</Paragraph>
      <Paragraph position="6"> This proves the lemma.</Paragraph>
      <Paragraph position="7"> a212 is a rational power series as a product and closure of the polynomial power series a213 and a167 (Salomaa and Soittola, 1978; Berstel and Reutenauer, 1988). Similarly, since a134 is regular, the weighted transduction defined by a9a101a100a92a58a135a43a61a118a61a47a119a22a32a133a152a9a90a134a87a58a31a134a135a22a78a9a101a100a92a58a70a43a23a118a78a47a46a22a71a133 is rational. Thus, by the theorem of Sch&amp;quot;utzenberger (Sch&amp;quot;utzenberger, 1961), there exists a weighted transducer a97 defined over the alphabet a100 and the probability semiring realizing that transduction. Figure 1 shows the transducer a97 in the particular case of a100a42a3a181a43a23a28a66a12a32a67a23a47 .</Paragraph>
      <Paragraph position="8"> 1There exist a general weighted determinization and weight pushing algorithms that can be used to create a deterministic and pushed automaton equivalent to an input word or phone lattice (Mohri, 1997).</Paragraph>
      <Paragraph position="9"> Proposition 1 Let a129 be a weighted automaton over the probability semiring, then:</Paragraph>
      <Paragraph position="11"> This ends the proof of the proposition.</Paragraph>
      <Paragraph position="12"> The proposition gives a simple algorithm for computing the expected counts of a134 in a weighted automaton a129 based on two general algorithms: composition (Mohri et al., 1996) and projection of weighted transducers. It is also based on the transducer a97 which is easy to construct. The size of a97 is in a225 a9 a210 a100 a210 a54 a210 a129a26a226 a210 a22 , where a129a26a226 is a finite automaton accepting a134 . With a lazy implementation of</Paragraph>
      <Paragraph position="14"> reducing the size of the representation of a97 to a225 a9 a210 a129 a226 a210 a22 .</Paragraph>
      <Paragraph position="15"> The weighted automaton a227 a3a147a182 a201 a9a11a129a8a220a59a97a13a22 contains a118 transitions. A general a118 -removal algorithm can be used to compute an equivalent weighted automaton with no a118 transition. The computation of a138a138 a227 a139a139a197a9a76a167a66a22 for a given a167 is done by composing a227 with an automaton representing a167 and by using a simple shortest-distance algorithm (Mohri, 2002) to compute the sum of the weights of all the paths of the result.</Paragraph>
      <Paragraph position="16"> For numerical stability, implementations often replace probabilities with a73a68a62a166a63a53a65 probabilities. The algorithm just described applies in a similar way by taking a73a68a62a64a63a49a65 of the weights of a97 (thus all the weights of a97 will be zero in that case) and by using the log semiring version of composition and a118 -removal.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 GRM Utility and Experimental Results
</SectionTitle>
      <Paragraph position="0"> An efficient implementation of the counting algorithm was incorporated in the GRM library (Allauzen et al., 2003). The GRM utility grmcount can be used in particular to generate a compact representation of the expected counts of the a2 -gram sequences appearing in a word lattice (of which a string encoded as an automaton is a special case), whose order is less or equal to a given integer. As an example, the following command line: grmcount -n3 foo.fsm &gt; count.fsm creates an encoded representation count.fsm of the a2 -gram sequences, a2a4a228a145a229 , which can be used to construct a trigram model. The encoded representation itself is also given as an automaton that we do not describe here.</Paragraph>
      <Paragraph position="1"> The counting utility of the GRM library is used in a variety of language modeling and training adaptation tasks.</Paragraph>
      <Paragraph position="2"> Our experiments show that grmcount is quite efficient.</Paragraph>
      <Paragraph position="3"> We tested this utility with 41,000 weighted automata outputs of our speech recognition system for the same number of speech utterances. The total number of transitions of these automata was a21a61a230a74a158 a230 M. It took about 1h52m, including I/O, to compute the accumulated expected counts of all a2 -gram, a2a231a228a232a229 , appearing in all these automata on a single processor of a 1GHz Intel Pentium processor Linux cluster with 2GB of memory and 256 KB cache.</Paragraph>
      <Paragraph position="4"> The time to compute these counts represents just a148a233a85a234 th of the total duration of the 41,000 speech utterances used in our experiment.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Representation of a235 -gram Language
</SectionTitle>
    <Paragraph position="0"> Models with WFAs Standard smoothed a2 -gram models, including backoff (Katz, 1987) and interpolated (Jelinek and Mercer, 1980) models, admit a natural representation by WFAs in which each state encodes a conditioning history of length less than a2 . The size of that representation is often prohibitive. Indeed, the corresponding automaton may have  a210 a236 transitions. Thus, even if the vocabulary size is just 1,000, the representation of a classical trigram model may require in the worst case up to one billion transitions. Clearly, this representation is even less adequate for realistic natural language processing applications where the vocabulary size is in the order of several hundred thousand words.</Paragraph>
    <Paragraph position="1"> In the past, two methods have been used to deal with this problem. One consists of expanding that WFA ondemand. Thus, in some speech recognition systems, the states and transitions of the language model automaton are constructed as needed based on the particular input speech utterances. The disadvantage of that method is that it cannot benefit from offline optimization techniques that can substantially improve the efficiency of a recognizer (Mohri et al., 1998). A similar drawback affects other systems where several information sources are combined such as a complex information extraction system. An alternative method commonly used in many applications consists of constructing instead an approximation of that weighted automaton whose size is practical for offline optimizations. This method is used in many large-vocabulary speech recognition systems.</Paragraph>
    <Paragraph position="2"> In this section, we present a new method for creating an exact representation of a2 -gram language models with WFAs whose size is practical even for very large-vocabulary tasks and for relatively high a2 -gram orders. Thus, our representation does not suffer from the disadvantages just pointed out for the two classical methods. We first briefly present the classical definitions of a2 -gram language models and several smoothing techniques commonly used. We then describe a natural representation of a2 -gram language models using failure transitions. This is equivalent to the on-demand construction referred to above but it helps us introduce both the approximate solution commonly used and our solution for an exact offline representation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Classical Definitions
</SectionTitle>
      <Paragraph position="0"> In an a2 -gram model, the joint probability of a string</Paragraph>
      <Paragraph position="2"> where the conditioning history a240 a151 consists of zero or more words immediately preceding a142 a151 and is dictated by the order of the a2 -gram model.</Paragraph>
      <Paragraph position="3"> Let a205 a9 a240 a142a26a22 denote the count of a2 -gram a240 a142 and let</Paragraph>
      <Paragraph position="5"> a22 be the maximum likelihood probability of a142 given a240 , estimated from counts.</Paragraph>
      <Paragraph position="6"> a241 a237a141a238 is often adjusted to reserve some probability mass for unseen a2 -gram sequences. Denote by a242a237a141a238 a9a90a142 a210 a240 a22 the adjusted conditional probability. Katz or absolute discounting both lead to an adjusted probability a242a237a141a238 .</Paragraph>
      <Paragraph position="7"> For all a2 -grams a240 a3a171a142 a240 a165 where a240 a29a135a100 a150 for some a159a31a243 a21 , we refer to a240 a165 as the backoff a2 -gram of a240 . Conditional probabilities in a backoff model are of the form:</Paragraph>
      <Paragraph position="9"> where a23a3a24 is a factor that ensures a normalized model.</Paragraph>
      <Paragraph position="10"> Conditional probabilities in a deleted interpolation model are of the form:</Paragraph>
      <Paragraph position="12"> where a23a3a24 is the mixing parameter between zero and one.</Paragraph>
      <Paragraph position="13"> In practice, as mentioned before, for numerical stability, a73a68a62a166a63a53a65 probabilities are used. Furthermore, due the Viterbi approximation used in most speech processing applications, the weight associated to a string a167 by a weighted automaton representing the model is the minimum weight of a path labeled with a167 . Thus, an a2 -gram language model is represented by a WFA over the tropical semiring.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Representation with Failure Transitions
</SectionTitle>
      <Paragraph position="0"> Both backoff and interpolated models can be naturally represented using default or failure transitions. A failure transition is labeled with a distinct symbol a35 . It is the default transition taken at state a144 when a144 does not admit an outgoing transition labeled with the word considered.</Paragraph>
      <Paragraph position="1"> Thus, failure transitions have the semantics of otherwise.</Paragraph>
      <Paragraph position="3"> The set of states of the WFA representing a backoff or interpolated model is defined by associating a state a144a36a24 to each sequence of length less than a2 found in the corpus:</Paragraph>
      <Paragraph position="5"> Its transition set a106 is defined as the union of the following set of failure transitions: a43a164a9a90a144a14a45a8a24a23a179a197a12a46a35a161a12a78a73a68a62a64a63a49a65a25a9a47a23a48a24a164a22a15a12a32a144a49a24a23a179a90a22a72a124a49a144a14a45a8a24a23a179a120a29a68a103a36a47 and the following set of regular transitions:</Paragraph>
      <Paragraph position="7"> Figure 2 illustrates this construction for a trigram model.</Paragraph>
      <Paragraph position="8"> Treating a118 -transitions as regular symbols, this is a deterministic automaton. Figure 3 shows a complete Katz backoff bigram model built from counts taken from the following toy corpus and using failure transitions:</Paragraph>
      <Paragraph position="10"> where a63 sa64 denotes the start symbol and a63 /sa64 the end symbol for each sentence. Note that the start symbol a63 sa64 does not label any transition, it encodes the history a63 sa64 . All transitions labeled with the end symbol a63 /sa64 lead to the single final state of the automaton.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Approximate Offline Representation
</SectionTitle>
      <Paragraph position="0"> The common method used for an offline representation of an a2 -gram language model can be easily derived from the representation using failure transitions by simply replacing each a35 -transition by an a118 -transition. Thus, a transition that could only be taken in the absence of any other alternative in the exact representation can now be taken regardless of whether there exists an alternative transition.</Paragraph>
      <Paragraph position="1"> Thus the approximate representation may contain paths whose weight does not correspond to the exact probability of the string labeling that path according to the model.  with failure transitions.</Paragraph>
      <Paragraph position="2"> Consider for example the start state in figure 3, labeled with a63 sa64 . In a failure transition model, there exists only one path from the start state to the state labeled a28 , with a cost of 1.108, since the a35 transition cannot be traversed with an input of a28 . If the a35 transition is replaced by an a118 -transition, there is a second path to the state labeled a28 - taking the a118 -transition to the history-less state, then the a28 transition out of the history-less state. This path is not part of the probabilistic model - we shall refer to it as an invalid path. In this case, there is a problem, because the cost of the invalid path to the state - the sum of the two transition costs (0.672) - is lower than the cost of the true path. Hence the WFA with a118 -transitions gives a lower cost (higher probability) to all strings beginning with the symbol a28 . Note that the invalid path from the state labeled</Paragraph>
      <Paragraph position="4"> a64 to the state labeled a67 has a higher cost than the correct path, which is not a problem in the tropical semiring.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 Exact Offline Representation
</SectionTitle>
      <Paragraph position="0"> This section presents a method for constructing an exact offline representation of an a2 -gram language model whose size remains practical for large-vocabulary tasks.</Paragraph>
      <Paragraph position="1"> The main idea behind our new construction is to modify the topology of the WFA to remove any path containing a118 -transitions whose cost is lower than the correct cost associated by the model to the string labeling that path.</Paragraph>
      <Paragraph position="2"> Since, as a result, the low cost path for each string will have the correct cost, this will guarantee the correctness of the representation in the tropical semiring.</Paragraph>
      <Paragraph position="3"> Our construction admits two parts: the detection of the invalid paths of the WFA, and the modification of the topology by splitting states to remove the invalid paths.</Paragraph>
      <Paragraph position="4"> To detect invalid paths, we determine first their initial non-a118 transitions. Let a106a66a65 denote the set of a118 -transitions of the original automaton. Let a163 a175 be the set of all paths</Paragraph>
      <Paragraph position="6"> a159a40a43a122a19 , leading to state a144 such that for all a137 , a137a24a3a231a21a141a158a78a158a61a158a71a159 , a140a161a138 a136 a151 a139 is the destination state of some a118 -transition.</Paragraph>
      <Paragraph position="7"> Lemma 2 For an a2 -gram language model, the number of paths in a163 a175 is less than the a2 -gram order: a210a163 a175 a210a38a37 a2 .  therefore, by recursion, a210a163 a175 a210 a3 a210 a163 a175a20a73 a210 a54a48a21a13a3 a210 a240 a142 a210a74a37 a2 . We now define transition sets a75 a175a85a175 a179 (originally empty) following this procedure: for all states a71a181a29a127a103 and all a146a115a3a111a136a152a148a161a158a61a158a78a158a71a136a46a150a69a29 a163a76a70 , if there exists another path a146a141a165 and transition a136a44a29a154a106a59a65 such that a2a141a138 a136a61a139a172a3a127a140a120a138a146a160a139 , a140a161a138 a146a161a165a64a139a172a3a127a140a161a138 a136a78a139 , and a137a15a138a146a161a165a199a139a160a3a8a137a15a138a146a160a139 , and either (i) a2a141a138a146a123a165a194a139a96a3a8a2a141a138a146a160a139 and a142a34a138 a136a23a146a160a139 a37 a142a34a138 a146a20a165a64a139 or (ii) there exists a136a152a165a91a29a44a106 a65 such that a140a161a138 a136a46a165a194a139a56a3a147a2a141a138 a146a20a165a194a139 and a2a141a138 a136a46a165a194a139a160a3a8a2a141a138a146a160a139 and a142a34a138 a136a61a146a160a139 a37 a142a34a138 a146a81a165a168a136a119a165a194a139 , then we add a136 a148 to the set: a75a78a77 a222a185a119a223a77 a222a185 a179 a223a3a79 a75a78a77 a222a185a119a223a77 a222a185 a179 a223 a41a4a43a23a136a46a148a119a47 . See figure 4 for an illustration of this condition. Using this procedure, we can determine the set:</Paragraph>
      <Paragraph position="9"> This set provides the first non-a118 transition of each invalid path. Thus, we can use these transitions to eliminate invalid paths.</Paragraph>
      <Paragraph position="10">  ified following the procedure just outlined is equivalent to the exact online representation with failure transitions. Proof. Assume that there exists a string a89 for which the WFA returns a weight a80a142a16a9a90a89a119a22 less than the correct weight a142a34a9a90a89a23a22 that would have been assigned to a89 by the exact online representation with failure transitions. We will call an a118 -transition a136 a151 within a path a146a184a3a204a136 a148 a158a78a158a78a158a32a136 a150 invalid if the next non-a118 transition a136 a69 , a91a92a43a111a137 , has the label a142 , and there is a transition a136 with a140a161a138 a136a61a139a26a3 a140a161a138 a136a27a151a76a139 and  a139 for some a91a40a43a48a137 . By definition, a136a152a151a166a55a161a148a120a158a61a158a78a158a71a136 a69 a29 a163a76a70 , since intersection will occur before any a118 -transitions are traversed in a146 . Then it must be the case that a136a152a151a166a55a161a148a86a29a93a75 a236 a222a84 a67 a223a77 a222a84 a67 a223 , requiring the path to be removed from the WFA. This is a contradiction.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.5 GRM Utility and Experimental Results
</SectionTitle>
      <Paragraph position="0"> Note that some of the new intermediate backoff states (a85a144 ) can be fully or partially merged, to reduce the space requirements of the model. Finding the optimal configuration of these states, however, is an NP-hard problem.</Paragraph>
      <Paragraph position="1"> For our experiments, we used a simple greedy approach to sharing structure, which helped reduce space dramatically. null Figure 5 shows our example bigram model, after application of the algorithm. Notice that there are now two history-less states, which correspond to a144 and a80a144 in the algorithm (no a85a144 was required). The start state backs off to a144 , which does not include a transition to the state labeled a28 , thus eliminating the invalid path.</Paragraph>
      <Paragraph position="2"> Table 1 gives the sizes of three models in terms of transitions and states, for both the failure transition and a118 -transition encoding of the model. The DARPA North American Business News (NAB) corpus contains 250 million words, with a vocabulary of 463,331 words. The Switchboard training corpus has 3.1 million words, and a vocabulary of 45,643. The number of transitions needed for the exact offline representation in each case was between 2 and 3 times the number of transitions used in the representation with failure transitions, and the number of states was less than twice the original number of states.</Paragraph>
      <Paragraph position="3"> This shows that our technique is practical even for very large tasks.</Paragraph>
      <Paragraph position="4"> Efficient implementations of model building algorithms have been incorporated into the GRM library.</Paragraph>
      <Paragraph position="5"> The GRM utility grmmake produces basic backoff models, using Katz or Absolute discounting (Ney et al., 1994) methods, in the topology shown in fig- null a35 versus the exact offline representation.</Paragraph>
      <Paragraph position="6"> ure 3, with a118 -transitions in the place of failure transitions. The utility grmshrink removes transitions from the model according to the shrinking methods of Seymore and Rosenfeld (1996) or Stolcke (1998). The utility grmconvert takes a backoff model produced by grmmake or grmshrink and converts it into an exact model using either failure transitions or the algorithm just described. It also converts the model to an interpolated model for use in the tropical semiring. As an example, the following command line: grmmake -n3 counts.fsm &gt; model.fsm creates a basic Katz backoff trigram model from the counts produced by the command line example in the earlier section. The command: grmshrink -c1 model.fsm &gt; m.s1.fsm shrinks the trigram model using the weighted difference method (Seymore and Rosenfeld, 1996) with a threshold of 1. Finally, the command: grmconvert -tfail m.s1.fsm &gt; f.s1.fsm outputs the model represented with failure transitions. 5 General class-based language modeling Standard class-based or phrase-based language models are based on simple classes often reduced to a short list of words or expressions. New spoken-dialog applications require the use of more sophisticated classes either derived from a series of regular expressions or using general clustering algorithms. Regular expressions can be used to define classes with an infinite number of elements. Such classes can naturally arise, e.g., dates form an infinite set since the year field is unbounded, but they can be easily represented or approximated by a regular expression. Also, representing a class by an automaton can be much more compact than specifying them as a list, especially when dealing with classes representing phone numbers or a list of names or addresses.</Paragraph>
      <Paragraph position="7"> This section describes a simple and efficient method for constructing class-based language models where each class may represent an arbitrary (weighted) regular language. null Let a205 a148a46a12 a205 a201a53a12a78a158a78a158a61a158a78a12 a205 a236 be a set of a2 classes and assume that each class a205 a151 corresponds to a stochastic weighted automaton a129 a151 defined over the log semiring. Thus, the weight a138a138 a129a13a151a90a139a139a116a9a90a142a13a22 associated by a129a26a151 to a string a142 can be interpreted as a73a68a62a166a63a53a65 of the conditional probability a163 a9a76a142</Paragraph>
      <Paragraph position="9"> Each class a205 a151 defines a weighted transduction: a129 a151 a73a66a125 a205 a151 This can be viewed as a specific obligatory weighted context-dependent rewrite rule where the left and right contexts are not restricted (Kaplan and Kay, 1994; Mohri and Sproat, 1996). Thus, the transduction corresponding to the class a205 a151 can be viewed as the application of the following obligatory weighted rewrite rule: a129 a151 a125 a205 a151a20a95 a118 a118 The direction of application of the rule, left-to-right or right-to-left, can be chosen depending on the task 2. Thus, these a2 classes can be viewed as a set of batch rewrite rules (Kaplan and Kay, 1994) which can be compiled into weighted transducers. The utilities of the GRM Library can be used to compile such a batch set of rewrite rules efficiently (Mohri and Sproat, 1996).</Paragraph>
      <Paragraph position="10"> Let a97 be the weighted transducer obtained by compiling the rules corresponding to the classes. The corpus can be represented as a finite automaton a134 . To apply the rules defining the classes to the input corpus, we just need to compose the automaton a134 with a97 and project the result on the output:</Paragraph>
      <Paragraph position="12"> a134 can be made stochastic using a pushing algorithm (Mohri, 1997). In general, the transducer a97 may not be unambiguous. Thus, the result of the application of the class rules to the corpus may not be a single text but an automaton representing a set of alternative sequences.</Paragraph>
      <Paragraph position="13"> However, this is not an issue since we can use the general counting algorithm previously described to construct a language model based on a weighted automaton. When</Paragraph>
      <Paragraph position="15"> a131a24a9a90a129a13a151a101a22 , the language defined by the classes, is a code, the transducer a97 is unambiguous.</Paragraph>
      <Paragraph position="16"> Denote now by a88a96 the language model constructed from the new corpus a88a134 . To construct our final class-based language model a96 , we simply have to compose a88a96 with a97 a153a96a148 and project the result on the output side:  A more general approach would be to have two transducers a97a161a148 and a97a20a201 , the first one to be applied to the corpus and the second one to the language model. In a probabilistic interpretation, a97a56a148 should represent the probability distribution a163 a9 a205 a151 a210a142a13a22 and a97 a201 the probability distribution  we are not limited to this probabilistic interpretation and that our approach can still be used if a97a91a148 and a97a20a201 do not represent probability distributions, since we can always push a88a134 and normalize a96 .</Paragraph>
      <Paragraph position="17"> Example. We illustrate this construction in the simple case of the following class containing movie titles: a37 movie a43a13a3a115a43a27a9 batmana12a71a19a74a158 a5a27a22a15a12a23a9 batman returnsa12a32a19a60a158a97a27a22a15a47 The compilation of the rewrite rule defined by this class and applied left to right leads to the weighted transducer a97 given by figure 6. Our corpus simply consists of the sentence &amp;quot;batman returns&amp;quot; and is represented by the automaton a134 given by figure 7. The corpus a88a134 obtained by composing a134 with a97 is given by figure 7.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML