File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2197_metho.xml
Size: 16,233 bytes
Last Modified: 2025-10-06 14:15:01
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2197"> <Title>Transforming Lattices into Non-deterministic Automata with Optional Null Arcs</Title> <Section position="4" start_page="1205" end_page="1206" type="metho"> <SectionTitle> 2 The basic algorithm </SectionTitle> <Paragraph position="0"> We now describe our basic transformation procedures. Modifications permitting the creation of epsilon arcs will be discussed below.</Paragraph> <Paragraph position="1"> Lattice.to.automaton, our top-level procedure, initializes two global variables and creates and initializes the new automaton. The variables are *candidate.a-ares* (a-arcs created to represent the current lnode) and *unconneetable.a-arcs* (a-arcs which could not be connected when processing previous lnodes) During automaton initialization, an initial.anode is created and supplied with a full set of lares: all outgoing larcs of the initial lnode are included. We then visit ever)' lnode in the lattice in topological order, and for each lnode execute our central procedure, handle.eurrent.lnode.</Paragraph> <Paragraph position="2"> handle.current.lnode: This procedure creates an a-arc to represent the current lnode and connects it (and any pending a-arcs previously unconnectable) to the automaton under construction.</Paragraph> <Paragraph position="3"> We proceed as follows: (1) If eurrent.lnode is the initial lattice node, do nothing and exit. (2) Otherwise, check whether any a-arcs remain on *unconnectable.a-arcs* from previous processing If so, push them onto *candidate.aarcs*. (3) Create a candidate automaton arc, or candidate.a-arc, and push it onto *candidate.aarcs*. 1 (4) Loop until *candidate.a-arcs* is exhausted. On each loop, pop a candidate.a-arc and try to connect it to the automaton as follows: Seek potential connecting.anodes on the automaton If none are found, push candidate.a-arc onto *unconnectable.a-arcs*, otherwise, try to merge the set of connect-Ing.anodes. CWhether or not the merge succeeds, the result will be an updated set of connecting.anodes.) Finally, execute link.candidate (below) to connect candidate.a-arc to connectlng.anodes, null Two aspects of this procedure require clarification. null First, what is the criterion for seeking potential connecing.anodes for candidate.a-arc? These are nodes already on the automaton whose reflected larcs intersect with those of the origin of candidate.a-arc.</Paragraph> <Paragraph position="4"> Second, what is the final criterion for the success or failure of an attempted merge among connecting,anodes? The resulting anode must not be ill-formed in the sense already outlined above. A good merge indicates that the a-arcs leading to the merged anode compose a legitimate set of common prefixes for candidate.aarc. null link.candidate: The final procedure to be explained has the following purpose: Given a candidate.a-arc and its connecting.anodes (the anodes, already merged so far as possible, whose 1 The new a-arc receives the label of the \[node which it reflects. Its origin points to all of that \[node' s incoming larcs, and its extremity points to all of its outgoing larcs. Larc.origin.groups and lare.extremity.</Paragraph> <Paragraph position="5"> groups are computed for each new anode. None of the new automaton objects are entered on the automaton yet.</Paragraph> <Paragraph position="6"> larcs intersect with the larcs of the a-arc origin), seek a final connecting.anode, an anode to which the candidate.a-arc can attach (see below). If there is no such anode, it will be necessary to split the candidate.a-are using the procedure split.a-arc. If there is such an anode, a we connect to it, possibly after one or more applications of split.anode to split the connecting.anode. null A connecting.anode is one whose reflected larcs are a superset of those of the candidate.a-arCs origin This condition assures that all of the lnodes to be reflected as incoming a-arcs of the connectable anode have outgoing lares leading to the lnode to be reflected as candidate.a-arc.</Paragraph> <Paragraph position="7"> Before stepping through the link.candidate procedure in detail, let us preview split.a-are and split.anode, the subprocedures which split candidate.a-arc or connecting.anodes, and their significance.</Paragraph> <Paragraph position="8"> split.a-arc: This subroutine is needed when (1) the origin of candidate.a-arc contains both initial and non-initial lares, or (2) no connecting.anode can be found whose larcs were a superset of the larcs of the origin of candidate.aare. In either case, we must split the current candidate.a-are into several new candidate.aarcs, each of which can eventually connect to a connecting.anode. In preparation, we sort the lares of the current candidate.a-art's origin according to the connecting.anodes which contain them. Each grouping of lares then serves as the lares set of the origin of a new candidate.aarc, now guaranteed to (eventually) connect. We create and return these candidate.a-arcs in a list, to be pushed onto *candidate.a-arcs*. The original candidate.a-are is discarded.</Paragraph> <Paragraph position="9"> split.anode. This subroutine splits connecting.anode when either (1) it contains both final and non-final lares or (2) the attempted connection between the origin of candidate.a-are and connecting.anode would give rise to an ill-formed anode. In case (1), we separate final from non-final lares, and establish a new splittee anode for each partition. The splittee containing only non-final larcs becomes the conneclng.anode for further processing. In case (2), some larc origin groups in the attempted merge do not intersect with all larc extremity groups.</Paragraph> <Paragraph position="10"> We separate the larcs in the non-intersecting origin groups from those in the intersecting origin groups and establish a splittee anode for each partition. The splittee with only intersecting origin groups can now be connected to candidate.a-arc with no further problems.</Paragraph> <Paragraph position="11"> In either case, the original anode is discarded, and both splittees are (re)connected to the a-arcs of the automaton. (See available pseudocode for details.) We now describe link.candidate in detail. The procedure is as follows: Test whether connecting.anode contains both initial and non-initial larcs; if so, using split.a-arc, we split candidate.a-arc, and push the splittees onto *candidate.a-arcs* Otherwise, seek a connecting.anode whose lares are a superset of the lares of the origin of a-arc If there is none, then no connection is possible during the current procedure call. Split candidate.a-are, push all splittee a-arcs onto *candidate.a-ares*, and exit. If there is a connecting.anode, then a connection can be made, possibly after one or more applications of split.anode. Check whether connecting.anode contains both final and non-final larcs. If not, no splitting will be necessary, so connect candidate.a-arc to connecting.anode.</Paragraph> <Paragraph position="12"> But if so, split connecting.anode, separating final from non-final lares The splitting procedure returns the splittee anode having only non-final lares, and this anode becomes the connecting.anode Now attempt to connect candidate.a-arc to connecting.anode. If the merged anode at the connection point would be illformed, then split connecting.anode (a second time, if necessary). In this case, split.anode returns a connectable anode as connecting.anode, and we connect candidate.a-are to it.</Paragraph> <Paragraph position="13"> A final detail in our description of lattice.to.automaton concerns the special handling of the flnal.lnode. For this last stage of the procedure, the subroutine which makes a new candidate.a-arc makes a dummy a-arc whose (real) origin is the final.anode. This anode is stocked with lares reflecting all of the final larcs. The dummy candidate.a-arc can then be processed as usual. When its origin has been connected to the automaton, it becomes the final.anode, with all final a-arcs as its incoming a-arcs, and the automaton is complete.</Paragraph> </Section> <Section position="5" start_page="1206" end_page="1207" type="metho"> <SectionTitle> 3 Epsilon (null) transitions </SectionTitle> <Paragraph position="0"> The basic algorithm described thus far does not permit the creation of epsilon transitions, and thus yields automata which are not minimal.</Paragraph> <Paragraph position="1"> However, epsilon arcs can be enabled by varying the current procedure split.a-arc, which breaks an unconnectable candidate.a-are into several eventually connectable a-arcs and pushes them onto *candidate.a-arcs*.</Paragraph> <Paragraph position="2"> In the splitting procedure described thus far, the a-arc is split by dividing its origin; its label and extremity are duplicated. In the variant (proposed by the third author) which enables epsilon a-arcs, however, if the antecedence condition (below) is verified for a given splittee aarc, then its label is instead 7. (epsilon); and its extremity instead contains the larcs of a sibling splittee's origin. This procedure insures that the sibling's origin will eventually connect with the epsilon a-arc's extremity. Splittee a-arcs with epsilon labels are placed at the top of the list pushed onto *candidate.a-ares* to ensure that they will be connected before sibling splittees.</Paragraph> <Paragraph position="3"> What is the antecedence condition? Recall that during the present tests for split.a-are, we partition the a-arc's origin larcs. The antecedence condition obtains when one such larc partition is antecedent to another partition. Partition PI is antecedent to P2 if every larc in P1 is antecedent to every larc in P2. And larcl is antecedent to larc2 if, moving leftward in the lattice from larc2, one can arrive at an lnode where larcl is an outgoing larc.</Paragraph> <Paragraph position="4"> A final detail: the revised procedure can create duplicate epsilon a-arcs. We eliminate such redundancy at connection time: duplicate epsilon a-arcs are discarded, thus aborting the connection procedure.</Paragraph> </Section> <Section position="6" start_page="1207" end_page="1209" type="metho"> <SectionTitle> 4 Extended example </SectionTitle> <Paragraph position="0"> We now step through an extended example showing the complete procedure in action. Several epsilon arcs will be formed.</Paragraph> <Paragraph position="1"> We show anodes containing numbers indicating their reflected lares We show lare.origin.</Paragraph> <Paragraph position="2"> groups on the left side of anodes when relevant, and larc.extremity.groups on the right.</Paragraph> <Paragraph position="3"> Consider the lattice of Arabic forms shown in Figure 3. After initializing a new automaton, we proceed as follows: * Visit lnode W, constructing this candidate.a-arc: null (r)w+ The a-arc is connected to the initial anode. Visit lnode F, constructing this date.a-are: candi-The only connecting.anode is that containing the label of the initial lnode, > After connection, we obtain: W 1 Visit lnode L, constructing date.a-are: this C/andi-Anodes 1 and 2 in the automaton are connecting.anodes. We try to merge them, and get: The tentative merged anode is well-formed, and the merge is completed. Thus, before connection, the automaton appears as follows. (For graphic economy, we show two a-arcs with common terminals as a single a-arc with two labels.) w I (r) Now, in link.candidate, we split candidate.a-arc so as to separate inital larcs from other larcs. The split yields two candidate.a-ares: the first contains arc 9, since it departs from the origin lnode; and the second contains the other arcs. @L(c) (r)L(c) Following our basic procedure, the connection of these two arcs would give the following automaton: However, the augmented procedure will instead create one epsilon and one labeled transition. Why? Our split separated larc 9 and larcs (3, 13) in the candidate.a-are. But larc 9 is antecedent to larcs 3 and 13. So the splittee candidate.a-are whose origin contains larc 9 becomes an epsilon a-arc, which connects to the automaton at the initial anode. The sibling splittee -- the a-arc whose origin contains (3, 13) -- is processed as usual. Because the epsilon a-arc's extremity was given the lares of this sibling's origin, connection of the sibling will bring about a merge between that extremity and anode 1. The result is as follows: Anode 1 is the tentative connection point for the candidate.a-are, since its larc set has the intersection (4, 14) ~qth that of eandidate.a-are's origin.</Paragraph> <Paragraph position="4"> Once again, we split candidate.a-are, since it contains larc 10, one of the lares of the initial node. But larc l0 is an antecedent of arcs 4 and 14. We thus create an epsilon a-arc with larc 10 in its origin which would connect to the initial anode. Its extremity will contain larcs 4 and 14, and would again merge with anode 1 during the connection of the sibling splittee. However, the epsilon a-arc is recognized as redundant, and eliminated at connection time. The sibling a-arc labeled S connects, to anode 1, giving Visit lnode A, constructing this candidate.a-are null Q The two connecting.anodes for the candidate.a-arc are 2 and 3. Their merge succeeds, yielding: We now split the candidate.a-are, since it finds no anode containing a superset of its origin's lares: larcs (12, 19, 21) do not appear in the merged connecting.anode. Three splittee candi- null date automaton arcs are produced, with three larc sets in their origins: (5, 18), (12, 19), and (21). But larcs 12 and 19 are antecedents of larcs 5 and 18. Thus one of the splittees will become an epsilon a-arc which will, after all siblings have been connected, span from anode 1 to anode 2. And since (21) is also antecedent to (5, 18) a second sibling will become an epsilon a-arc from the initial anode to anode 2. The third sibling splittee connects to the same anode, giving Figure 4.</Paragraph> <Paragraph position="5"> Visit lnode N, constructing this candidate.aare: null The connecting.anode is anode 2. Once again, a split is required, since this anode does not conrain arcs 11, 16, and 22. Again, three candidate.a-ares are composed, with larc sets (6, 17), (11, 16) and (22). But the last two sets are antecedent to the first set. Two epsilon arcs would thus be created, but both already exist. After connection of the third sibling splittee, the automaton of Figure 5 is obtained.</Paragraph> <Paragraph position="6"> * Visit lnode K, constructing this candidate.aarc: null We find and successfully merge connecting.anodes (3 and 4). For reasons already discussed, the candidate.a-arc is split into two siblings. The first, with an origin containing larcs (15, 16), will require our first application of split.anode to divide anode 1. The division is necessary because the connecting merge would be ill-formed, and connection would create the parasite path KTB. The split creates anode 4 (not shown) as the extremity of a new pair of a-arcs W, F-- a second a-arc pair departing the initial anode with this same label set.</Paragraph> <Paragraph position="7"> The second splittee larc contains in its origin state lares 7 and 8. It connects to both anode 3 and anode 4, which successfully merge, giving the automaton of Figure 6.</Paragraph> <Paragraph position="8"> Visit lnode T, constructing this candidate.aare: null The arc connects to the automaton at anode 5. Visit lnode B, making this candidate.a-arc: The arc connects to anode 6, giving the final automaton of Figure 7.</Paragraph> <Paragraph position="9"> Conclusion and Plans The algorithm for transforming lattices into non-deterministic finite state automata which we have presented here has been successfully applied to lattices derived from dictionaries, i.e. very large corpora of strings (Meddeb-Hamrouni (1996), pages 205-217).</Paragraph> <Paragraph position="10"> Applications of the algorithm to the parsing of speech recognition results are also planned: lattices of phones or words produced by speech recognizers can be converted into initialized charts suitable for chart parsing.</Paragraph> </Section> class="xml-element"></Paper>