File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0604_intro.xml
Size: 6,959 bytes
Last Modified: 2025-10-06 14:00:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0604"> <Title>Answer Extraction Towards better Evaluations of NLP Systems</Title> <Section position="3" start_page="21" end_page="23" type="intro"> <SectionTitle> ...k 2 Why Reading Comprehension </SectionTitle> <Paragraph position="0"> Tests via QA are Too :Difficult Reading comprehension tests are designed to measure how well human readers understand what they read. Each story comes with a set of questions about information that is stated or implied in the text. The readers demonstrate their understanding of the story by answering the questions about it. Thus, reading comprehension tests assume a cognitive process of human beings. This process involves expanding the mental model of a text by using its implications and presuppositions, retrieving the stored information, performing inferences to make implicit information explicit, and generating the surface strings that express this information. Many different forms of knowledge take part in this process: linguistic, procedural and world knowledge. All these forms coalesce in the memory of the reader and it is very difficult to clearly distinguish and reconstruct them in a QA system. At first sight the story published in (WRC, 2000) is easy to understand because the sentences are short and cohesive. But it turns out that a classic QA system would need vast amounts of knowledge and inference rules in order to understand the text and to give sensible answers.</Paragraph> <Paragraph position="1"> Let us investigate what kind of information a full-fledged QA system needs in order to answer the questions that come with the reading comprehension test (Figure 1) and discuss how difficult it is to provide this information.</Paragraph> <Paragraph position="2"> To answer the first question (1) Who collects maple sap? the system needs to know that the mass noun sap in the text sentence Farmers collect the sap.</Paragraph> <Paragraph position="3"> is indeed the maple sap mentioned in the question. The compound noun maple sap is a semantically narrower term than the noun sap and encodes an implicit relation between the first element maple and the head noun sap. This relation names the origin of the material. Since no explicit information about the relation between the two objects is available in the text an ideal QA system would have to assume such a relation by a form of abductive reasoning.</Paragraph> <Paragraph position="4"> How.Maple Syrup is Made Maple syrup comes from sugar maple trees. At one time, maple syrup was used to make sugar.</Paragraph> <Paragraph position="5"> This is why the tree is called a &quot;sugar&quot; maple tree.</Paragraph> <Paragraph position="6"> Sugar maple trees make sap. Farmers collect the sap. The best time to collect sap is in February land March. The nights must be cold and the days warm.</Paragraph> <Paragraph position="7"> The framer drills a few small holes in each tree. He puts a spout in each hole. Then he hangs a bucket on the end of each spout. The bucket has a cover to keep rain and snow out. The sap drips into the bucket. About 10 gallons of sap come from each hole.</Paragraph> <Paragraph position="8"> 1. Who collects maple sap? (Farmers) 2. What does the farmer hang from a spout? (A bucket) 3. When is sap collected? (February and March) 4. Where does the maple sap come from? (Sugar maple trees) 5. Why is the bucket covered? (to keep rain and snow out) To answer the second question (2) What does the farmer hang from a spout? successfully the system would need at least three different kinds of knowledge: First, it would need discourse knowledge to resolve the intersentential co-reference between the anaphor he and the antecedent the farmer in the following text sequence: The farmer drills- a few small holes in each tree. \[...\] Then he hangs a bucket ...</Paragraph> <Paragraph position="9"> Although locating antecedents has proved to be one of the hard problems of natural language processing, the anaphoric reference resolution can be done easily in this case because the antecedent is the most recent preceding noun phrase thgt agrees in gender, number and person. null Second, the system would require linguistic knowledge to deal with the synonymy relation between hang on and hang .from, and the attachment ambiguity of the prepositional phrase used in the text sentence and the query.</Paragraph> <Paragraph position="10"> Third, the system needs an inference rule that makes somehow clear that the noun phrase a spout expressed in the query is entailed in the more complex noun phrase the end of each spout in the text sentence. Additionally, to process this relation the system would require an inference rule of the form:</Paragraph> <Paragraph position="12"> asks for the time point when' ~ap is collected but the text gives only a rule-like recommendation null The best time to collect sap is in February and March.</Paragraph> <Paragraph position="13"> with an additional constraint The nights must be cold and the days warm. and does not say that the sap is in fact collected in February and March. The bridging inference that the system would need to model here is not founded on linguistic knowledge but on world knowledge. Solving this problem is very hard. It could be argued that default rules may solve such problems but it is not clear whether formal methods are able to handle the sort of default reasoning required for representing common-sense reasoning.</Paragraph> <Paragraph position="14"> To give an answer for the fourth question (4) Where does the maple sap come .from? the system needs to know that maple sap comes from sugar maple trees. This information is not explicitly available in the text. Instead of saying where maple sap comes from the text says where maple syrup comes from: Maple syrup comes .from sugar maple trees.</Paragraph> <Paragraph position="15"> There exists a metonymy relation between these two compound nouns. The compound noun maple syrup (i.e. product) can only be substituted by maple sap (i.e. material), if the system is able to deal with metonymy. Together with the information in the sentence Sugar maple trees make sap.</Paragraph> <Paragraph position="16"> and an additional lexical inference rule in form of a meaning postulate IF X makes Y THEN Y comes from X.</Paragraph> <Paragraph position="17"> the system could deduce (in theory) first sap and then by abductive reasoning assume that the sap found is maple sap. Meaning postulates are true by virtue of the meaning they link. Observation cannot prove them false.</Paragraph> <Paragraph position="18"> To answer the fifth question (5) Why is the bucket covered? the system needs to know that the syntactically different expressions has a cover and is covered have the same propositional content. The system needs an explicit lexical inference rule in form of a conditional equivalence</Paragraph> </Section> class="xml-element"></Paper>