File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0601_intro.xml

Size: 1,534 bytes

Last Modified: 2025-10-06 14:01:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0601">
  <Title>Reading Comprehension Programs in a Statistical-Language-Processing Class*</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
Methods Results
1 Best of Deep Read 36
2 BOW Stem Coref Class 37
3 BOV Stem NE Coref Tfidf Subj Why MainV 38
4 BOV Stem NE Defaults Coref 38
5 BOV Stem NE Defaults Qspecific 41
</SectionTitle>
    <Paragraph position="0"> tandem. We implemented several of those metrics ourselves, but to keep things simple we only report results on one of them how often (in percent) the program answers a question by choosing a correct sentence (as judged in the answer mark-ups). Following \[3\] we refer to this as the &amp;quot;humsent&amp;quot; (human annotated sentence) metric. Note that if more than one sentence is marked as acceptable, a program response of any of those sentences is considered correct. If no sentence is marked, the program cannot get the answer correct, so there is an upper bound of approximately 90% accuracy for this metric.</Paragraph>
    <Paragraph position="1"> The results were both en- and discouraging. On the encouraging side, three of the four groups were able to improve, at least somewhat, on the previous best results.</Paragraph>
    <Paragraph position="2"> On the other hand, the extra annotation we provided (machine-generated parses of all the sentences \[1\] and machine-generated pronoun coreference information \[2\]) proved of limited utility.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML