File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1074_intro.xml
Size: 4,131 bytes
Last Modified: 2025-10-06 14:06:56
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1074"> <Title>Robust, Finite-State Parsing for Spoken Language Understanding</Title> <Section position="3" start_page="573" end_page="574" type="intro"> <SectionTitle> 2 Robust Finite-state Parsing </SectionTitle> <Paragraph position="0"> CMU's Phoenix system is implemented as a recursive transition network (RTN). This is similar to Abney's system of finite-state-cascades (1996). Both parsers have a &quot;stratal&quot; system of levels. Both are robust in the sense of skipping over out-of-grammar areas, and building up structural islands of certainty. And both can be fairly described as run-time chart-parsers. However, Abney's system inserts bracketing and tagging information by means of cascaded transducers, whereas Phoenix accomplishes the same thing by storing state information in the chart edges themselves -- thus using the chart edges like tokens. PROFER is similar to Phoenix in this regard.</Paragraph> <Paragraph position="1"> Phoenix performs a depth-first search over its textual input, while Abney's &quot;chunking&quot; and &quot;attaching&quot; parsers perform best-first searches (1991). However, the demands of a tightlycoupled, real-time system argue for a breadth-first search-strategy, which in turn argues for the use of a finite-state parser, as an efficient means of supporting such a search strategy.</Paragraph> <Paragraph position="2"> PROFER is a strictly sequential, breadth-first parser.</Paragraph> <Paragraph position="3"> PROFER uses a regular grammar formalism for defining the patterns that it will parse from the input, as illustrated in Figures 1 and 2.</Paragraph> <Paragraph position="4"> Net name tags correspond to bracketed (i.e., &quot;tagged&quot;) elements in the output. Aside from ............. l ~.~C/:3 deg&quot;&quot; 7 ......... &quot;; ::::::::::::::::::::: ................ : .................................... .................... , ....................... ............. ' i ....</Paragraph> <Paragraph position="5"> rip.gin ','~i ~. \])~.'., i~:::ii~\]);;~.: .I rewrite patterns \] ! ! net names, a grammar definition can also contain non-terminal rewrite names and terminals. Terminals are directly matched against input* Non-terminal rewrite names group together several rewrite patterns (see Figure 2), just as net names can be used to do, but rewrite names do not appear in the output.</Paragraph> <Paragraph position="6"> Each individual rewrite pattern defines a &quot;conjunction&quot; of particular terms or sub-patterns that can be mapped from the input into the non-terminal at the head of the pattern block, as illustrated in (Figure 1). Whereas, the list of patterns within a block represents a &quot;disjunction&quot; (Figure 2).</Paragraph> <Paragraph position="7"> ~i iii !i ~agt,a ,'~i \[id\] ................................................. .. ~ ~ ...... ~. ~:~:~ (two) &quot;\]ii~i :.::::i~~ ii;i; ~ |\[ii::: i~ :\] ........... ; .............................................................................................. ........... {~! ii::~i\] Since not all Context-Free Grammar (CFG) expressions can be translated into regular expressions, as illustrated in Figure 3, some restrictions are necessary to rule out the possibility of &quot;center-embedding&quot; (see the right-most block in Figure 3). The restriction is that neither a net name nor a rewrite name can appear in one of its own descendant blocks of rewrite patterns.</Paragraph> <Paragraph position="8"> Even with this restriction it is still possible to define regular grammars that allow for self- null embedding to any finite depth, by copying the net or rewrite definition and giving it a unique name for each level of self-embedding desired.</Paragraph> <Paragraph position="9"> For example, both grammars illustrated in Figure 4 can robustly parse inputs that contain some number of a's followed by a matching number of b's up to the level of embedding defined, which in both of these cases is four deep.</Paragraph> </Section> class="xml-element"></Paper>