File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0111_intro.xml

Size: 4,561 bytes

Last Modified: 2025-10-06 14:02:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0111">
  <Title>Vi-xfst: A Visual Regular Expression Development Environment for Xerox Finite State Tool</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Finite state machines are widely used in many language processing applications to implement components such as tokenizers, morphological analyzers/generators, shallow parsers, etc. Large scale nite state language processing systems built using tools such as the Xerox Finite State Tool (Karttunen et al., 1996; Karttunen et al., 1997; Beesley and Karttunen, 2003), van Noord's Prolog-based tool (van Noord, 1997), the AT&amp;T weighted nite state machine suite (Mohri et al., 1998) or the INTEX System (Silberztein, 2000), involve tens or hundreds of regular expressions which are compiled into nite state transducers that are interpreted by the underlying run-time engines of the (respective) tools.</Paragraph>
    <Paragraph position="1"> Developing such large scale nite state systems is currently done without much of a support for the software engineering aspects. Regular expressions are constructed manually by the developer with a text-editor and then compiled, and the resulting transducers are tested. Any modi cations have to be done afterwards on the same text le(s) and the whole project has to be recompiled many times in a development cycle. Visualization, an important aid in understanding and managing the complexity of any large scale system, is limited to displaying the nite state machine graph (e.g., Gansner and North (1999), or the visualization functionality in INTEX (Silberztein, 2000)). However, such visualization (sort of akin to visualizing the machine code of a program written in a high-level language) may not be very helpful, as developers rarely, and possibly never, think of such large systems in terms of states and transitions. The relationship between the regular expressions and the nite state machines they are compiled into are opaque except for the simplest of regular expressions. Further, the size of the resulting machines, in terms of states and transitions, is very large, usually in the thousands to hundreds of thousands states, if not more, making such visualization meaningless. On the other hand, it may prove quite useful to visualize the structural components of a set of regular expressions and how they are put together, much in the spirit of visualizing the relationships amongst the data objects and/or modules in a large program. However such visualization and other maintenance operations for large nite state projects spanning over many les, depend on tracking the structural relationships and dependencies among the regular expressions, which may prove hard or inconvenient when text-editors are the only development tool.</Paragraph>
    <Paragraph position="2"> This paper presents a visual interface and development environment, Vi-xfst (Y lmaz, 2003), for the Xerox Finite State Tool, xfst, one of the most sophisticated tools for constructing nite state language processing applications (Karttunen et al., 1997).</Paragraph>
    <Paragraph position="3">  Vi-xfst enables incremental construction of complex regular expressions via a drag-and-drop interface, treating simpler regular expressions as Lego Blocks . Vi-xfst also enables the visualization of the structure of the regular expression components, so that the developer can have a bird's eye view of the overall system, easily understanding and tracking the relationships among the components involved.</Paragraph>
    <Paragraph position="4"> Since the structure of a large regular expression (built in terms of other regular expressions) is now transparent, the developer can interact with regular expressions at any level of detail, easily navigating among them for testing and debugging. Vi-xfst also keeps track of the dependencies among the regular expressions at a very ne-grained level. So, when a certain regular expression is modi ed as a result of testing or debugging, only the dependent regular expressions are recompiled. This results is an improvement in development time, by avoiding le level recompiles which usually causes substantial redundant regular expression compilations.</Paragraph>
    <Paragraph position="5"> In the following sections, after a short overview of the Xerox xfst nite state machine development environment, we describe salient features of Vi-xfst through some simple examples.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML