File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/p92-1006_evalu.xml

Size: 11,207 bytes

Last Modified: 2025-10-06 14:00:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1006">
  <Title>Parsing*</Title>
  <Section position="6" start_page="43" end_page="45" type="evalu">
    <SectionTitle>
5. Results of Experiments
</SectionTitle>
    <Paragraph position="0"> The Picky parser was tested on 3 sets of 100 sentences which were held out from the rest of the corpus during training. The training corpus consisted of 982 sentences which were parsed using the same grammar that Picky used. The training and test corpora are samples from the MIT's Voyager direction-finding system. 7 Using Picky's grammar, these test sentences generate, on average, over 100 parses per sentence, with some sentences generated over 1,000 parses.</Paragraph>
    <Paragraph position="1"> The purpose of these experiments is to explore the impact of varying of Picky's parsing algorithm on parsing accuracy, efficiency, and robustness. For these experiments, we varied three attributes of the parser: the phases used by parser, the maximum number of edges the parser can produce before failure, and the minimum probability parse acceptable.</Paragraph>
    <Paragraph position="2"> In the following analysis, the accuracy rate represents the percentage of the test sentences for which the highest probability parse generated by the parser is identical to the &amp;quot;correct&amp;quot; pa.rse tree indicated in the parsed test corpus, s Efficiency is measured by two ratios, the prediction ratio and the completion ratio. The prediction ratio is defined as the ratio of number of predictions made by the parser  parser generates a plausible parse for a sentences which has multipie plausible int.erpretations, the parse is considered cc~rrcct. Also. if the parser generates a correct; pal'se~ I)ll~ the parsecl test corpus contains an incorrect parse (i.e. if there is an error in the answer key), the parse is considered col-rect.</Paragraph>
    <Paragraph position="3">  during the parse of a sentence to the number of constituents necessary for a correct parse. The completion ratio is the ratio of the number of completed edges to the number of predictions during the parse of sentence.</Paragraph>
    <Paragraph position="4"> Robustness cannot be measured directly by these experiments, since there are few ungrammatical sentences and there is no implemented method for interpreting the well-formed substring table when a parse fails. However, for each configuration of the parser, we will explore the expected behavior of the parser in the face of ungrammatical input.</Paragraph>
    <Paragraph position="5"> Since Picky has the power of a pure bottom-up parser, it would be useful to compare its performance and efficiency to that of a probabilistic bottom-up parser. However, an implementation of a probabilistic bottom-up parser using the same grammar produces on average over 1000 constituents for each sentence, generating over 15,000 edges without generating a parse at all! This supports our claim that exhaustive CKY-like parsing algorithms are not feasible when probabilistic models are applied to them.</Paragraph>
    <Section position="1" start_page="44" end_page="45" type="sub_section">
      <SectionTitle>
5.1. Control Configuration
</SectionTitle>
      <Paragraph position="0"> The control for our experiments is the configuration of Picky with all three phases and with a maximum edge count of 15,000. Using this configuration, :Picky parsed the 3 test sets with an 89.3% accuracy rate. This is a slight improvement over Pearl's 87.5% accuracy rate reported in \[12\].</Paragraph>
      <Paragraph position="1"> Recall that we will measure the efficiency of a parser configuration by its prediction ratio and completion ratio on the test sentences. A perfect prediction ratio is 1:1, i.e. every edge predicted is used in the eventual parse.</Paragraph>
      <Paragraph position="2"> However, since there is ambiguity in the input sentences, a 1:1 prediction ratio is not likely to be achieved. Picky's prediction ratio is approximately than 4.3:1, and its ratio of predicted edges to completed edges is nearly 1.3:1.</Paragraph>
      <Paragraph position="3"> Thus, although the prediction ratio is not perfect, on average for every edge that is predicted more than one completed constituent results.</Paragraph>
      <Paragraph position="4"> This is the most robust configuration of Picky which will be attempted in our experiments, since it includes bidirectional parsing (phase II) and allows so many edges to be created. Although there was not a sufficient number or variety of ungrammatical sentences to explore the robustness of this configuration further, one interesting example did occur in the test sets. The sentence How do I how do I get to MIT? is an ungranm~atical but interpretable sentence which begins with a restart. The Pearl parser would have generated no analysis tbr the latter part of the sentence and the corresponding sections of the chart would be empty.</Paragraph>
      <Paragraph position="5"> Using bidirectional probabilistic prediction, Picky produced a correct partial interpretation of the last 6 words of the sentence, &amp;quot;how do I get to MIT?&amp;quot; One sentence does not make for conclusive evidence, but it represents the type of performance which is expected from the Picky algorithm.</Paragraph>
      <Paragraph position="6"> 5.2. Phases vs. Efficiency Each of Picky's three phases has a distinct role in the parsing process. Phase I tries to parse the sentences which are most standard, i.e. most consistent with the training material. Phase II uses bidirectional parsing to try to complete the parses for sentences which are nearly completely parsed by Phase I. Phase III uses a simplistic heuristic to glue together constituents generated by phases I and II. Phase III is obviously inefficient, since it is by definition processing atypical sentences. Phase II is also inefficient because of the bidirectional predictions added in this phase. But phase II also amplifies the inefficiency of phase III, since the bidirectional predictions added in phase II are processed further in phase III.</Paragraph>
      <Paragraph position="7">  statistics for Picky configured with different subsets of Picky's three phases.</Paragraph>
      <Paragraph position="8"> In Table 1, we see the efficiency and accuracy of Picky using different, subsets of the parser's phases. Using the control parser (phases I, II, and II), the parser has a 4.3:1 prediction ratio and a 1.3:1 completion ratio.</Paragraph>
      <Paragraph position="9"> By omitting phase III, we eliminate nearly half of the predictions and half the completed edges, resulting in a 2.15:1 prediction ratio. But this efficiency comes at the cost of coverage, which will be discussed in the next section.</Paragraph>
      <Paragraph position="10"> By omitting phase II, we observe a slight reduction in predictions, but an increase in completed edges. This behavior results from the elimination of the bidirectional predictions, which tend to genera.re duplicate edges.</Paragraph>
      <Paragraph position="11"> Note that this configuration, while slightly more efficient,  is less robust in processing ungrammatical input.</Paragraph>
      <Paragraph position="12"> 5.3. Phases vs. Accuracy For some natural language applications, such as a natural language interface to a nuclear reactor or to a computer operating system, it is imperative for the user to have confidence in the parses generated by the parser.</Paragraph>
      <Paragraph position="13"> Picky has a relatively high parsing accuracy rate of nearly 90%; however, 10% error is far too high for faultintolerant applications.</Paragraph>
      <Paragraph position="14">  phase which the parser reached in processing the test sentences.</Paragraph>
      <Paragraph position="15"> Consider the data in Table 2. While the parser has an overall accuracy rate of 89.3%, it is.far more accurate on sentences which are parsed by phases I and II, at 97%.</Paragraph>
      <Paragraph position="16"> Note that 238 of the 300 sentences, or 79%, of the test sentences are parsed in these two phases. Thus, by eliminating phase III, the percent error can be reduced to 3%, while maintaining 77% coverage. An alternative to eliminating phase III is to replace the length-based heuristic of this phase with a secondary probabilistic model of the difficult sentences in this domain. This secondary model might be trained on a set of sentences which cannot be parsed in phases I and II.</Paragraph>
      <Paragraph position="17"> 5.4. Edge Count vs. Accuracy In the original implementation of the Picky algorithm, we intended to allow the parser to generate edges until it found a complete interpretation or exhausted all possible predictions. However, for some ungrammatical sentences, the parser generates tens of thousands of edges without terminating. To limit the processing time for the experiments, we implemented a maximum edge count which was sufficiently large so that all grammatical sentences in the test corpus would be parsed. All of the grammatical test sentences generated a parse before producing 15,000 edges. However, some sentences produced thousands of edges only to generate an incorrect parse. In fact, it seemed likely tha,t there might be a correlation between very high edge counts and incorrect parses. We tested this hypothesis by varying the maximum edge count.</Paragraph>
      <Paragraph position="18"> In Table 3, we see an increase in efficiency and a decrease  statistics for 7~icky configured with different maximum edge count.</Paragraph>
      <Paragraph position="19"> in accuracy as we reduce the maximum number of edges the parser will generate before declaring a sentence ungrammatical. By reducing the maximum edge count by a factor of 50, from 15,000 to 300, we can nearly cut in half the number of predicts and edges generated by the parser. And while this causes the accuracy rate to fall from 89.3% to 79.3%, it also results in a significant decrease in error rate, down to 2.7%. By decreasing the maximum edge count down to 150, the error rate can be reduced to 1.7%.</Paragraph>
      <Paragraph position="20"> 5.5. Probability vs. Accuracy Since a probability represents the likelihood of an interpretation, it is not unreasonable to expect the probability of a parse tree to be correlated with the accuracy of the parse. However, based on the probabilities associated with the &amp;quot;correct&amp;quot; parse trees of the test sentences, there appears to be no such correlation. Many of the test sentences had correct parses with very low probabilities (10-1deg), while others had much higher probabilities (10-2). And the probabilities associated with incorrect parses were not distinguishable from the probabilities of correct parses.</Paragraph>
      <Paragraph position="21"> The failure to find a correlation between probability a.nd accuracy in this experiment does not prove conclusively that no such correlation exists. Admittedly, the training corpus used for all of these experiments is far smaller than one would hope to estimate the CFG with CSP model parameters. Thus, while the model is trained well enough to steer the parsing search, it may not be sufficiently trained to provide meaningful probability values.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML