File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0302_intro.xml

Size: 6,779 bytes

Last Modified: 2025-10-06 14:06:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0302">
  <Title>Global Thresholding and Multiple-Pass Parsing*</Title>
  <Section position="2" start_page="0" end_page="11" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, we examine thresholding techniques for statistical parsers. While there exist theoretically efficient (O (n 3)) algorithms for parsing Probabilistic Context-Free Grammars (PCFGs) and related formalisms, practical parsing algorithms usually make use of pruning techniques, such as beam thresholding, for increased speed.</Paragraph>
    <Paragraph position="1"> We introduce two novel thresholding techniques, global thresholding and multiple-pass parsing, and one significant variation on traditional beam thresholding. We examine the value of these techniques when used separately, and when combined. In order to examine the combined techniques, we also introduce an algorithm for optimizing the settings *This material is based in part upon work supported by the National Science Foundation under Grant No.</Paragraph>
    <Paragraph position="2"> IRI-9350192 and a National Science Foundation Graduate Student Fellowship. I would also like to thank Michael Collins, Rebecca Hwa, Lillian Lee, Wheeler Ruml, and Stuart Shieber for helpful discussions, and comments on earlier drafts, and the anonymous reviewers for their extensive comments.</Paragraph>
    <Paragraph position="3">  of multiple thresholds. When all three thresholding methods are used together, they yield very significant speedups over traditional beam thresholding, while achieving the same level of performance.</Paragraph>
    <Paragraph position="4"> We apply our techniques to CKY chart parsing, one of the most commonly used parsing methods in natural language processing. In a CKY chart parser, a two-dimensional matrix of cells, the chart, is filled in. Each cell in the chart corresponds-to a span of the sentence, and each cell of the chart contains the nonterminals that could generate that span. Cells covering shorter spans are filled in first, so we also refer to this kind of parser as a bottom-up chart parser.</Paragraph>
    <Paragraph position="5"> The parser fills in a cell in the chart by examining the nonterminals in lower, shorter cells, and combining these nonterminals according to the rules of the grammar. The more nonterminals there are in the shorter cells, the more combinations of nonterminals the parser must consider.</Paragraph>
    <Paragraph position="6"> In some grammars, such as PCFGs, probabilities are associated with the grammar rules. This introduces problems, since in many PCFGs, almost any combination of nonterminals is possible, perhaps with some low probability. The large number of possibilities can greatly slow parsing. On the other hand, the probabilities also introduce new opportunities. For instance, if in a particular cell in the chart there is some nonterminal that generates the span with high probability, and another that generates that span with low probability, then we can remove the less likely nonterminal from the cell. The less likely nonterminal will probably not be part of either the correct parse or the tree returned by the parser, so removing it will do little harm. This technique is called beam thresholding.</Paragraph>
    <Paragraph position="7"> If we use a loose beam threshold, removing only those nonterminals that are much less probable than the best nonterminal in a cell, our parser will run only slightly faster than with no thresholding, while</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
Thresholding
</SectionTitle>
      <Paragraph position="0"> performance measures such as precision and recall will remain virtually unchanged. On the other hand, if we use a tight threshold, removing nonterminals that are almost as probable as the best nonterminal in a cell, then we can get a considerable speedup, but at a considerable cost. Figure 1 shows the tradeoff between accuracy and time.</Paragraph>
      <Paragraph position="1"> In this paper, we will consider three different kinds of thresholding. The first of these is a variation on traditional beam search. In traditional beam search, only the probability of a nonterminal generating the terminals of the cell's span is used. We have found that a minor variation, introduced in Section 2, in which we also consider the prior probability that each nonterminal is part of the correct parse, can lead to nearly an order of magnitude improvement.</Paragraph>
      <Paragraph position="2"> The problem with beam search is that it only compares nonterminals to other nonterminals in the same cell. Consider the case in which a particular cell contains only bad nonterminals, all of roughly equal probability. We can't threshold out these nodes, because even though they are all bad, none is much worse than the best. Thus, what we want is a thresholding technique that uses some global information for thresholding, rather than just using information in a single cell. The second kind of thresholding we consider is a novel technique, global thresholding, described in Section 3. Global thresholding makes use of the observation that for a non-terminal to be part of the correct parse, it must be part of a sequence of reasonably probable nonterminals covering the whole sentence.</Paragraph>
      <Paragraph position="3"> The last technique we consider, multiple-pass parsing, is introduced in Section 4. The basic idea is that we can use information from parsing with one grammar to speed parsing with a~other. We run two passes, the first of which is fast and simple, eliminating from consideration many unlikely potential constituents. The second pass is more complicated and slower, but also more accurate. Because we have already eliminated many nodes in our first pass, the second pass can run much faster, and, despite the fact that we have to run two passes, the added savings in the second pass can easily outweigh the cost of the first one.</Paragraph>
      <Paragraph position="4"> Experimental comparisons of these techniques show that they lead to considerable speedups over traditional thresholding, when used separately. We also wished to combine the thresholding techniques; this is relatively difficult, since searching for the optimal thresholding parameters in a multi-dimensionai space is potentially very time consuming. We .designed a variant on a gradient descent search algorithm to find the optimal parameters. Using all three thresholding methods together, and the parameter search algorithm, we achieved our best results, running an estimated 30 times faster than traditional beam search, at the same performance level.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML