File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/p94-1040_intro.xml

Size: 4,374 bytes

Last Modified: 2025-10-06 14:05:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1040">
  <Title>RELATING COMPLEXITY TO PRACTICAL PERFORMANCE IN PARSING WITH WIDE-COVERAGE UNIFICATION GRAMMARS</Title>
  <Section position="3" start_page="0" end_page="287" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> General-purpose natural language (NL) analysis systems have recently started to use declarative unification-based sentence grammar formalisms; systems of this type include SRI's CLARE system (Alshawi et al., 1992) and the A1vey NL Tools (ANLT; Briscoe et al., 1987a). Using a declarative formalism helps ease the task of developing and maintaining the grammar (Kaplan, 1987). In addition to syntactic processing, the systems incorporate lexical, morphological, and semantic processing, and have been applied successfully to the analysis of naturally-occurring texts (e.g. Alshawi et al., 1992; Briscoe &amp; Carroll, 1993).</Paragraph>
    <Paragraph position="1"> Evaluations of the grammars in these particular systems have shown them to have wide coverage (Alshawi et al., 1992; Taylor, Grover &amp;= Briscoe, 1989) 2. However, although the practical throughput of parsers with such realistic grammars is important, for example when process- null project 4/1/1261 'Extensions to the Alvey Natural Language Tools' and by EC ESPRIT BRA-7315 'ACQUILEX-II'. I am grateful to Ted Briscoe for comments on an earlier version of this paper, to David Weir for valuable discussions, and to Hiyan Alshawi for assistance with the CLARE system.</Paragraph>
    <Paragraph position="2"> 2For example, Taylor et al. demonstrate that the ANLT grammar is in principle able to analyse 96.8% of a corpus of 10,000 noun phrases taken from a variety of corpora.</Paragraph>
    <Paragraph position="3"> ing large amounts of text or in interactive applications, there is little published research that compares the performance of different parsing algorithms using wide-coverage unification-based grammars. Previous comparisons have either focussed on context-free (CF) or augmented CF parsing (Tomita, 1987; Billot &amp; Lang, 1989), or have used relatively small, limited-coverage unification grammars and lexicons (Shann, 1989; Bouma &amp; van Noord, 1993; Maxwell &amp; Kaplan, 1993). It is not clear that these results scale up to reflect accurately the behaviour of parsers using realistic, complex unification-based grammars: in particular, with grammars admitting less ambiguity parse time will tend to increase more slowly with increasing input length, and also with smaller grammars rule application can be constrained tightly with relatively simple predictive techniques. Also, since none of these studies relate observed performance to that of other comparable parsing systems, implementational oversights may not be apparent and so be a confounding factor in any general conclusions made.</Paragraph>
    <Paragraph position="4"> Other research directed towards improving the throughput of unification-based parsing systems has been concerned with the unification operation itself, which can consume up to 90% of parse time (e.g. Tomabechi, 1991) in systems using lexicalist grammar formalisms (e.g. HPSG; Pollard &amp; Sag, 1987). However, parsing algorithms assume more importance for grammars having more substantial phrase structure components, such as CLARE (which although employing some HPSG-like analyses still contains several tens of rules) and the ANLT (which uses a formalism derived from GPSG; Gazdar et al., 1985), sincethe more specific rule set can be used to control which unifications are performed.</Paragraph>
    <Paragraph position="5"> In NL analysis, the syntactic information associated with lexical items makes top-down parsing less attractive than bottom-up (e.g. CKY; Kasami, 1965; Younger, 1967), although the latter is often augmented with top-down predic- null tion to improve performance (e.g. Earley, 1970; Lang, 1974; Pratt, 1975). Section 2 describes three unification-based parsers which are related to polynomial-complexity bottom-up CF parsing algorithms. Although incorporating unification increases their complexity to exponential on grammar size and input length (section 3), this appears to have little impact on practical performance (section 4). Sections 5 and 6 discuss these findings and present conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML