File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1033_intro.xml

Size: 4,556 bytes

Last Modified: 2025-10-06 14:05:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1033">
  <Title>TTP: A FAST AND ROBUST PARSER FOR NATURAL LANGUAGE</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Recently, there has been a growing demand for fast and reliable natural language processing tools, capable of performing reasonably accurate syntactic analysis of large volumes of text within an acceptable time. A full sentential parser that produces complete mmlysis of input, may be considered reasonably fast if the average parsing time per sentence falls anywhere between 2 and 10 seconds. A large volume of text, perhaps a gigabyte or more, would contain as many as 7 million sentences. At the speed of say, 6 sec/sentence, this much text would require well over a year to parse. While 7 million sentences is a lot of text, this much may easily he contained in a fair-sized text database. Therefore, the parsing speed would have to be increased by at least a factor of 10 to make such a task manageable.</Paragraph>
    <Paragraph position="1"> In this paper we describe a fast and robust natural language parser that can analyze written text and generate regularized parse structures at a speed of below 1 second per sentence. In the experiments conducted on variety of natural langauge texts, including technical prose, news messages, and newspaper articles, the average parsing time varied between 0.4 sec/sentence and 0.7 see/sentence, or between 1600 and 2600 words per minute, as we tried to find an acceptable compromise between parser's speed and precision.l It has long been assumed that in order to gain speed, one may have to trade in some of the purser's accuracy. For example, we may have to settle for partial parsing that would recognize only selected grammatical structures (e.g. noun phrases; Ruge et al., 1991), or would avoid making difficult decisions (e.g.</Paragraph>
    <Paragraph position="2"> pp-attachment; Hindle, 1983). Much of the overhead and inefficiency comes from the fact that the lexical and structural ambiguity of natmal language input can only be dealt with using limited context information available to the parser. Partial parsing techniques have been used with a considerable success in processing large volumes of text, for example AT&amp;T's Fidditch (Hindle and Rooth, 1991) parsed 13 million words of Associated Press news messages, while MIT's parser (de Marcken, 1990) was used to process the 1 million word Lancaster/Oslo/Bergen (LOB) corpus. In both cases, the parsers were designed to do partial processing only, that is, they would never attempt a complete analysis of certain constructions, such as the attachment of pp-adjuncts, subordinate clauses, or coordinations. This kind of partial analysis may be sufficient in some applications because of a relatively high precision of identifying correct syntactic dependencies. 2 However, the ratio at which these dependencies are identified (that is, the recall level) isn't sufficiently high due to the inherently partial character of the parsing process. The low recall means that many of the important dependencies are lost in parsing, and t These results were obtained on a 21 MIPS SparcStafion ELC. The experiments were performed within an information re- trieval system so that the final recall and precision statistics were used to rnealurc effectiwmess of the panmr.</Paragraph>
    <Paragraph position="3"> a Hindle and Rooth (1991) and Church and Hanks (1990) used partial parses generated by Fidditch to study word ~urrt.ncC/ patterns m syntactic contexts.</Paragraph>
    <Paragraph position="4"> ACRES DE COLING-92, NANTES, 23-28 AOr~ 1992 1 9 8 PROC. OF COL1NG-92. NANTES, AOO. 23-28, 1992 therelore partial parsing may not be suitable in applications such as information extraction or document retrieval.</Paragraph>
    <Paragraph position="5"> The alternative is to create a parser that would attempt to produce a complete parse, and would resort to partial or approxim~ analysis only under exceptional conditions such as an extra-grammatical input or a severe time pressure. Encountering a construction that it couldn't handle, the parser would first try to produec an approxinmte analysis of the difficult fragment, and then resume normal processing for the rest of the input. The outcome is a kind of &amp;quot;fitted&amp;quot; parse, reflecting a compromise between the actual input and grammar-encoded preferences (imposed, mainly, in rule ordering))</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML