File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/p92-1024_evalu.xml

Size: 3,728 bytes

Last Modified: 2025-10-06 14:00:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1024">
  <Title>Development and Evaluation of a Broad-Coverage Probabilistic Grammar of English-Language Computer Manuals</Title>
  <Section position="9" start_page="190" end_page="191" type="evalu">
    <SectionTitle>
5. Experimental Results
</SectionTitle>
    <Paragraph position="0"> We report results below for two test sets. One (Test Set A) is drawn from the 600,000-word subsection of our corpus of computer manuals text which we referred to above. The other (Test Set B) is drawn from our full 40-million-word computer manuals corpus. Due to a more or less constant error rate of 2.5% in the treebank parses themselves, there is a corresponding built-in margin of error in our scores. For each of the two test sets, results are presented first for the linguistic task: making sure that a correct parse is present in the set of parses the grammar proposes for each sentence of the test set. Second, results are presented for the statistical task, which is to ensure that the parse which is selected as most likely, for each sentence of the test set, is a correct parse.</Paragraph>
    <Paragraph position="1">  Recall (see above) that the geometric mean of the number of parses per word, or equivalently the total number of parses for the entire test set, must be held constant over the course of the grammar's development, to eliminate trivial solutions to the coverage task. In the roughly year-long period since we began work on the computer manuals task, this average has been held steady at roughly 1.35 parses per word. What this works out to is a range of from 8 parses for a 7-word sentence, through 34 parses for a 12-word sentence, to 144 parses for a 17-word sentence. In addition, during this development period, performance on the task of picking the most likely parse went from 58% to 73% on Test Set A. Periodic results on Test Set A for the task of providing at least one correct parse for each sentence are displayed in Table 8.</Paragraph>
    <Paragraph position="2"> We present additional experimental results to show that our grammar is completely separable from its accompanying &amp;quot;semantics&amp;quot;. Note that semantic categories are not &amp;quot;written into&amp;quot; the grammar; i.e., with a few minor exceptions, no rules refer to them. They simply percolate up from the lexical items to the non-terminal level, and contribute information to the mnemonic productions which constitute the parameters of the statistical training model.</Paragraph>
    <Paragraph position="3"> An example was given in Section 3 of a case in which the version of our grammar that includes semantics out-performed the version of the same grammar without semantics. The effect of the semantic information in that particular case was apprently to bias the trained grammar towards choosing a correct parse as most likely. However, we did not quantify this effect when we presented the example. This is the purpose of the experimental results shown in Table 9. Test B was used to test our current grammar, first with and then without semantic categories in the mnemonics.</Paragraph>
    <Paragraph position="4"> It follows from the fact that the semantics are not written into the grammar that the coverage figure is the same with and without semantics. Perhaps surprising, however, is the slight degree of improvement due to the semantics on the task of picking the most likely parse: only 2 percentage points. The more detailed parametriza- null tion with semantic categories, which has about 13,000 mnemonics achieved only a modest improvement in parsing accuracy over the parametrization without semantics, which has about 4,600 mnemonics.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML