File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/p93-1006_evalu.xml
Size: 3,022 bytes
Last Modified: 2025-10-06 14:00:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1006"> <Title>USING BRACKETED PARSES TO EVALUATE A GRAMMAR CHECKING APPLICATION</Title> <Section position="11" start_page="41" end_page="267" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> Our 297-sentence corpus had the following characteristics. The length of the sentences ranged between three words and 32 words. The median sentence length was 12 words, and the mean was 13.8 words, s Table 2 shows the aggregated outcomes for the three reports.</Paragraph> <Paragraph position="1"> The table shows the coverage of the system and the impact of the spurious parses. The coverage is reflected in the Unmodified Bracketed column, where 248 parses indicates a coverage of 84 percent for the underlying system in this domain. The table also reveals that there were 24 spurious parses in the unbracketed corpus, corresponding to no valid parse tree in our grammar. The Modified Bracketed column shows the effect on the report generator of forcing the system to have the same coverage as the unbracketed run.</Paragraph> <Paragraph position="2"> Table 3 shows by type the errors detected in instances where errors were reported. The Spurious Error column indicates the number of errors from the unbracketed sentences which we judged to be bad. The Missed Errors column indicates errors which were missed in the unbracketed report, but which showed up in the modified bracketed 8. Since most of the sentences in our corpus were intended to be in Simplified English, it is not surprising that they tended to be under the 20 word limit imposed by the standard.</Paragraph> <Paragraph position="3"> For this data, the estimate of Precision (rate of correct error critiques for unbracketed data) is (302-64)/302, or 79 percent. We estimate that this precision rate is accurate to within 5 percent with 95 percent confidence. Our estimate of Recall (rate of correct critiques from the set of possible critiques) is (267-29)/267, or 89 percent. We estimate that this Recall rate is accurate to within 4 percent with 95 percent confidence.</Paragraph> <Paragraph position="4"> It is instructive to look at a report that contains an incorrectly identified error. The following report resulted from our unbracketed test run: ff strut requires six fluid ounces or more to fill, find leakage source and repair.</Paragraph> <Paragraph position="5"> Two commands - possible error: find leakage source and repair The bracketed run produced a no-parse for this sentence because of an inadequacy in our grammar that blocked fill from parsing as a verb. Since it parsed as a noun in the unbracketed run, the system complained thatfill was allowed as a verb. In our statistics, we counted thefill Noun error as an incorrect POS error and the requires Verb error as a correct one. This critique contains two POS errors, one TWO-COMMAND error, and two MIS-SING ARTICLE error. Four of the five error critiques are accurate.</Paragraph> </Section> class="xml-element"></Paper>