File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0111_metho.xml
Size: 10,823 bytes
Last Modified: 2025-10-06 14:09:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0111"> <Title>Vi-xfst: A Visual Regular Expression Development Environment for Xerox Finite State Tool</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Overview of xfst </SectionTitle> <Paragraph position="0"> xfst is a sophisticated command-line-oriented interface developed by Xerox Research Centre Europe, for building large nite state transducers for language processing applications. Users of xfst employ a high-level regular expression language which provides an extensive palette of high-level operators.1 Such regular expressions are then compiled into nite state transducers and interpreted by a run-time engine built into the tool. xfst also provides a further set of commands for combining, testing and inspecting the nite state transducers produced by the regular expression compiler. Transducers may be loaded onto a stack maintained by the system, and the top-most transducer on the stack is available for testing or any further operations. Transducers can also be saved to les which can later be reused or used by other programs in the Xerox nite state suite.</Paragraph> <Paragraph position="1"> Although xfst provides quite useful debugging facilities for testing nite state networks, it does not provide additional functionality beyond the command- null fsCompiler/fssyntax-explicit.html.</Paragraph> <Paragraph position="2"> line interface to alleviate the complexity of developing large scale projects. Building a large scale nite state transducer-based application such as a morphological analyzer or a shallow nite state parser, consisting of tens to hundreds of regular expressions, is also a large software engineering undertaking. Large nite state projects can utilize the make functionality in Linux/Unix/cygwin environments, by manually entering ( le level) dependencies between regular expressions tered into a make le. The make program then invokes the compiler at the shell level on the relevant les by tracking the modi cation times of les. Since whole les are recompiled at a time even when a very small change is made, there may be redundant recompilations that may increase the development time.</Paragraph> <Paragraph position="3"> 3 Vi-xfst a visual interface to xfst As a development environment, Vi-xfst has two important features that improve the development process of complex large scale nite state projects with xfst.</Paragraph> <Paragraph position="4"> 1. It enables the construction of regular expressions by combining previously de ned regular expressions via a drag-and-drop interface.</Paragraph> <Paragraph position="5"> 2. As regular expressions are built by combining other regular expressions, Vi-xfst keeps track of the topological structure of the regular expression how component regular expressions relate to each other. It derives and maintains the dependency relationships of a regular expression to its components, and via transitive closure, to the components they depend on. This structure and dependency relations can then be used to visualize a regular expression at various levels of detail, and also be used in very ne-grained recompilations when some regular expressions are modi ed.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Using Vi-xfst </SectionTitle> <Paragraph position="0"> In this section, we describe important features Vi-xfst through some examples.2 The rst example is for a simple date parser described in Karttunen et al. (1996). This date parser is implemented in xfst using the following regular expressions:3 sion which can then be subsequently referred to in later regular expressions. |denotes the union operator. 0 (without quotes) denotes the empty string traditionally represented by in the literature. The quotes &quot;are used to literalize sequence of symbols which have special roles in the regular expression language. The most important regular expression above is AllDates, a pattern that describes a set of calendar dates. It matches date expressions such as Sunday, January 23, 2004 or just Monday. The subsequent regular expression AllDatesParser uses the longest match downward bracket operator (the combination of @->and ...) to de ne a transducer that puts [ and ] around the longest matching patterns in the input side of the transducer.</Paragraph> <Paragraph position="1"> Figure 1 shows the state of the screen of Vi-xfst just after the AllDatesParser regular expression is constructed. In this gure, the left side window shows, under the Definitions tab, the regular expressions de ned. The top right window shows the template for the longest match regular expression slots lled by drag and drop from the list on the left. The AllDatesParserregular expression is entered by selecting the longest-match downward bracket operator (depicted with the icon @-> with ... underneath) from the palette above, which then inserts a template that has empty slots three in this case. The user then picks up regular expressions from the left and drops them into the appropriate slots. When the regular expression is completed, it can be sent to the xfst process for compilation. The bottom right window, under the Messages tab, shows the messages received from the xfst process running in the background during the compilation of this and the previous regular expressions.</Paragraph> <Paragraph position="2"> Figure 2 shows the user testing a regular expression loaded on to the stack of the xfst. The left window under the Networkstab, shows the networks pushed on to the xfst stack. The bottom right window under Test tab lists a series of input, one of which can be selected as the input string and then applied up or down to the topmost network on the stack.4 The result of application appears on the bottom pane on the right. In this case, we see the input with the brackets inserted around the longest matching date pattern, Sunday, January 23, 2004 in this case.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Visualizing regular expression structure </SectionTitle> <Paragraph position="0"> When developing or testing a large nite state transducer compiled from a regular expression built as a hierarchy of smaller regular expressions, it is very helpful, especially during development, to visualize the overall structure of the regular expression to easily see how components relate to each other.</Paragraph> <Paragraph position="1"> Vi-xfst provides a facility for viewing the structure of a regular expression at various levels of detail.</Paragraph> <Paragraph position="2"> To illustrate this, we use a simple cascade of transducers simulating a coke machine dispensing cans of soft drink when the right amount of coins are dropped in.5 The regular expressions for this ex- null senting the cross-product and .o. representing the composition of transducers, and caret operator ( )denoting the repeated concatenation of its left argument as many times as indicated by its right argument.</Paragraph> <Paragraph position="3"> Figure 1: Constructing a regular expression via the drag-and-drop interface The last regular expression here BuyCoke de nes a transducer that consist of the composition of two other transducers. The transducer [ CENTS ]* maps any sequence of symbols n, d, and q representing, nickels, dimes and quarters, into the appropriate number of cents, represented as a sequence of c symbols. The transducer SixtyFiveCents maps a sequence of 65 c symbols to the symbol PLONK representing a can of soft drink (falling).</Paragraph> <Paragraph position="4"> Figure 3 shows the simplest visualization of the BuyCoke transducer in which only the top level components of the compose operator (.o.) are displayed. The user can navigate among the visible regular expressions and zoom into any regular expressions further, if necessary. For instance, Figure 4 shows the rendering of the same transducer after the top transducer is expanded where we see the union of three cross-product operators, while Figure 5 shows the rendering after both components are expanded. When a regular expression laid out, the user can select any of the regular expressions displayed and make that the active transducer for testing (that is, push it onto the top of the xfst transducer stack) and rapidly navigate among the regular expressions without having to remember their names and locations in the les.</Paragraph> <Paragraph position="5"> As we re-render the layout of a regular expression, we place the components of the compose and cross-product operators in a vertical layout, and others in of the components to be displayed in a rectangular bounding box. It is also possible to render the upward and downward replace operators in a vertical layout, but we have opted to render them in a horizontal layout (as in Figure 1). The main reason for this is that although the components of the replace part of such an expression can be placed vertically, the contexts need to be placed in a horizontal layout. A visualization of a complex network employing a different layout of the replace rules is shown in Figure 6 with the Windows version of Vi-xfst. Here we see a portion of a Number-to-English mapping network7 where different components are visualized at different structural resolutions.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Interaction of Vi-xfst with xfst </SectionTitle> <Paragraph position="0"> Vi-xfst interacts with xfst via inter-process communication. User actions on the Vi-xfst side get translated to xfst commands and get sent to xfst which maintains the overall state of the system in its own universe. Messages and outputs produced by xfst are piped back to Vi-xfst, which are then parsed and presented back to the user. If a direct API is available to xfst, it would certainly be possible to implement tighter interface that would provide better error-handling and slightly improved interaction with the xfst functionality.</Paragraph> <Paragraph position="1"> All the les that Vi-xfst produces for a project are directly compatible with and usable by xfst; that is, as far as xfst is concerned, those les are valid regular expression script les. Vi-xfst maintains all the additional bookkeeping as comments in these les and such information is meaningful only to Vi-xfst and used when a project is re-loaded to recover all dependency and debugging information originally computed or entered. Currently, Vi-xfst has some primitive facilities for directly importing hand generated les for xfst to enable manipulation of already existing projects.</Paragraph> </Section> </Section> class="xml-element"></Paper>