File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1309_metho.xml
Size: 11,520 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1309"> <Title>Implementing Voting Constraints with Finite State Transducers</Title> <Section position="5" start_page="93" end_page="97" type="metho"> <SectionTitle> 4 Implementing Voting Constraints with Finite State Transducers </SectionTitle> <Paragraph position="0"> The approach described above can also be implemented by finite state transducers. For this, we view the parses of the tokens making up a sentence as acyclic a finite state recognizer (or an identity transducer \[4\]). The states mark word boundaries, transitions are labeled with labels are of the sort L = (wi, tij, vij), and the rightmost node denotes the finalstate.</Paragraph> <Paragraph position="1"> This approach is very different from that of Roche and Schabes \[9\] who use transducers to implement Brill's transformation-based tagging approach \[1\]. It shares certain concepts with Tz0ukermann and Radev's use of weighted finite state tra~nsdueers for tagging \[13\] in that both approaches combine statistical and hand-crafted linguistic information, but employ finite state devices in very different ways.</Paragraph> <Paragraph position="2"> The basic idea behind using finite state transducers is that the voting constraint rules can be represented as transducers which increment the votes of the matching input sequence segments Thus the ambiguities of the tokens were limited to the ones found in the training corpus.</Paragraph> <Paragraph position="3"> by an appropriate amount, but ignore and pass through unchanged, segments they are not sensitive to. When an identity finite state transducer corresponding to an input sentence is composed with a constraint transducer, the output is a slightly modified version of the sentence transducer with possibly additional transitions and states, where the votes of some of the transition are labels have been appropriately incremented. When the sentence transducer is composed with all the constraint transducers in sequence, all possible votes are cast and the final sentence transducer reflects all the votes. The parses on the path with the highest total vote, from the start to any of the final states, can then be selected. The key point here is that due to &quot;the nature of the composition operator, the constraint transducers can, if necessary, be composed off.line first, giving a single constraint transducer, which can then be composed with every sentence transducer once.</Paragraph> <Paragraph position="4"> Using a finite state framework provides, by its nature, some additional descriptive advantages in describing rules. For instance, one can use rules involving the Kleene star so that a single rule such as (rTAG--MD\], \[TAG=RIt\] *, \[TAC/=Vlt\] ; 100) can deal with any number of intervening adverbials. 2</Paragraph> <Section position="1" start_page="94" end_page="97" type="sub_section"> <SectionTitle> 4.1 The Transducer Architecture </SectionTitle> <Paragraph position="0"> We use the Xerox Finite State Tools to implement our approach. The finite state transducer system consists of the following components, depicted in of Figure 3.</Paragraph> <Paragraph position="1"> The lexicon transducer The lexicon transducer implements \[ L \[ .... \]+ j.3, where the transducer L maps a token to all its possible tags/parses, also inserting the relevant lexical votes for each parse. In our current implementation for English, the transducer L is the union of a set of transducers of the sort: 2 Note that in this case the vote will be added to all matching parses, thus depending on how many sequential parses match the *'ed constraint, the total vote contribution of the rules will differ. This may actually be desirable to promote larger votes for longer matches.</Paragraph> <Paragraph position="2"> We use the Xerox regular expression language (see http://vmw.xrce.xerox, com/researctdmltt/fst/home.html) to describe our regular expressions.</Paragraph> <Paragraph position="4"> So a &quot;lookdown&quot; of the token said will result on the lower side of the transducer outputs (VBD/said<+98>) (VBN/said<+l>) (JJ/said<+l>). Thus when a sentence transducer (representing just the lexical items) is composed with the lexicon transducer as depicted at the top of Figure 3, one gets a transducer with lexical ambiguities and also appropriate votes inserted, which can then be composed with the constraint transducers.</Paragraph> <Paragraph position="5"> Voting Constraint Transducers Each voting constraint rule is represented by a transducer that checks if the constraints imposed that rule are satisfied on the input, and if so, appropriately increments the votes at the relevant input positions. In order to describe the transducer architecture more clearly, let us concentrate on a specific example rule: Let us assume that the input to the transducer is represented as a sequence of triplets of the sort (tag word vote) 4. The transducer corresponding to the regular expression below will increment the vote fields of a sequence of any two triplets by 100, provided the first one has tag MD and the second one has tag VB.</Paragraph> <Paragraph position="7"> This transducer is the composition of four transducers (separated by 'the composition operator . o.). The top transducer (1) constrains the input to valid triples, s The second transducer brackets with ( and ), any sequence of such triplets matching the given rule constraints, using the longest match bracket operator \[5\] .6 Thus any sequence of two triplets in the input sequence where the first has a tag MD and the second has a tag VB are bracketed by this transducer. The 4 Please note that this is a slightly different order than described earlier. In practice, this order was found to generate smaller transducers duz~ng compositions.</Paragraph> <Paragraph position="8"> 5 Here W0BD denotes a regular expression wh/ch describes an arbitrary sequence of English characters.</Paragraph> <Paragraph position="9"> TAGS denotes a regular expression which is the union of all (possibly mslti-chazacter) tag symbols.</Paragraph> <Paragraph position="10"> VOTES denotes a regular expression of the sort &quot;<&quot; \['+&quot; I&quot;-&quot;3 DIGITS+ &quot;>&quot; with DIGITS being the union of all decimal digit symbols.</Paragraph> <Paragraph position="11"> s Note that this simple version does not deal with rules whose constraints may overlap (e.g.</Paragraph> <Paragraph position="12"> (\[TAG=NN\],\[TAG=NN\]; 100)).</Paragraph> <Paragraph position="13"> third transducer (3) either passes through the unbracketed sections of the input (as indicated by the first part of the disjunct), or increments by 100 the vote fields of the triplets within the * brackets { and }. The ADD100 is a transducer that &quot;adds&quot; 100 to the vote field of the matching triplet. It is the 99-fold composition of an ADD1 transducer with itself. The AI)D1 transducer will add one to a (signed) number at its upper side input, z When compiled this constraint rule becomes a transducer with 75 states and 1,197 arcs.</Paragraph> <Paragraph position="14"> The transducers for all constraints are obtained in a similar way. and composed off-line giving one big transducer which can do the appropriate vote updates in appropriate places. In practice, the final voting constraint transducer may be big, so instead, one can leave it as a cascade of a small number of transducers.</Paragraph> </Section> <Section position="2" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 4.2 Operational Aspects </SectionTitle> <Paragraph position="0"> A sentence such as &quot;I can can the can.&quot; is represented as the transducer corresponding to the the regular expression \[ <BS> I can can the can . <ES>\] s When this transducer is composed with the lexicon transducer, the resulting transducer corresponds to the following regular expression:</Paragraph> <Paragraph position="2"> which allows for 64 possible &quot;readings.&quot; After this transducer is composed with the voting constraint transducer(s), one gets a transducer which still has 64 readings, but now the labels reflect votes from any matching constraints. A simple DAG longest path algorithm (e.g. \[3\]) on the DAG of the resulting transducer gives the largest voted path as</Paragraph> </Section> </Section> <Section position="6" start_page="97" end_page="98" type="metho"> <SectionTitle> 5 Implementation </SectionTitle> <Paragraph position="0"> We have developed two PERL-based rule compilers for compiling lexicon files and constraints, into scripts which are then compiled into transducers by the Xerox finite state tools. In this section we provide some information about the transducers obtained from the WSJ Corpus experiments.</Paragraph> <Paragraph position="1"> 7 This is a bit modified version of the transducer described at http://w~.rxrc.xerox.coa/research/mltt/fst/fsexuples.html, dealing with signed numbers. The ADD1 transducer can be composed with itself off line any number of times to get a transducer sddin 8 any number.</Paragraph> <Paragraph position="2"> s For better readability, the obligatory spaces between wo~d symbols will not be shown from now on. The lexicon transducer compiled from about 16,000 unique lexical tokens from the training set had 37,208 states, and 52,912 arcs. The three sets of constraints for 2-grams 3-grams and hand-crafted constraints (sets 2, 3 and 4 in Figure 2 respectively) were compiled separately into three constraint transducers with 19,954 states and 296,545 arcs, 56,910 states and 685,365 arcs and 334,215 states, 2,651,550 arcs, respectively. It is certainly possible to combine these transducers by composition at compile time. If size becomes a problem, one can have smaller transducers, which are sequentially composed with the sentence transducer at tag time. For instance, when the hand-crafted constraints are split into three groups of about 200 each, the three resulting transducers are of size 63,865 states, 467,966 arcs, 44,831 states, 306,257 arcs and 33,862 states, 233,401 arcs, respectively, the collective size of which is less than the size of fully composed one. We have not really optimized the hand-crafted constraints for finite state compilation but it is certainly possible to reduce the number oi~ such constraints by utilizing operators such as the Kleene star, complementation, etc.</Paragraph> <Paragraph position="3"> Another observation during constraint compilation is that as constraints are being compiled, the size of intermediate compositions do not grow explosively. Thus the problem alluded to by Tapanainen in a similar approach \[11\], does not seem to occur here since an intersection is not being computecl..The results that we have provided earlier are from a C implementation. The tagging speed with the finite state transducers in the current environment is not very high, since for each sentence, the transducers have to be loaded from the disk. But with a suitable application interface to the lower level functions of the Xerox finite state environment, the tagging speed can be improved significantly.</Paragraph> <Paragraph position="4"> The system deals with unknown words in a rather trivial way, by attaching any meaningful open class word tags to unknown words and later picking the one(s) selected by the voting process.</Paragraph> </Section> class="xml-element"></Paper>