File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2012_intro.xml
Size: 5,073 bytes
Last Modified: 2025-10-06 14:02:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2012"> <Title>Answer Validation by Keyword Association</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 2 Answer Validation by Keyword Association 2.1 Keyword Association </SectionTitle> <Paragraph position="0"> Here is an example of the multiple-choice quiz.</Paragraph> <Paragraph position="1"> Q1: Who is the director of &quot;American Graffiti&quot;? a: George Lucas b: Steven Spielberg c: Francis Ford Coppola d: Akira Kurosawa Suppose that you do not know the correct answer and try to find it using a search engine on the Web. The simplest way is to input the query &quot;American Graffiti&quot; to the search engine and skim the retrieved pages. This strategy assumes that the correct answer may appear on the page that includes the keyword &quot;American Graffiti&quot;. A little cleverer way is to consider the number of pages that contain both the keyword and a choice. This number can be estimated from the hits of a search engine when you input a conjunct query &quot;American Graffiti&quot; and &quot;George Lucas&quot;. Based on this assumption, it is reasonable to hypothesize that the choice which has the largest hits is the answer. For the above question Q1, this strategy works. Table 1 shows the hits of the conjunct queries for each of the choices. We used &quot;google &quot; as a search engine.</Paragraph> <Paragraph position="2"> Here, let X be the set of keywords, Y be the choice. Function hits is defined as follows.</Paragraph> <Paragraph position="4"> The conjunct query with &quot;George Lucas&quot;, which is the correct answer, returns the largest hits.</Paragraph> <Paragraph position="5"> Here, the question Q1 can be regarded as a question on the strength of association between keyword and an choice, and converted into the following form.</Paragraph> <Paragraph position="6"> We call this association between the keyword and the choice as keyword association.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.2 How to Select Keywords </SectionTitle> <Paragraph position="0"> It is important to select appropriate keywords from a question sentence. Consider the following question.</Paragraph> <Paragraph position="1"> Q2: Who is the original author of the famous movie &quot;Lord of the Rings&quot;? a: Elijah Wood b: JRR Tolkien c: Peter Jackson d: Liv Tyler The numbers of hits are shown in Table 2. Here, let X be &quot;Lord of the Rings&quot;, X Rings&quot; and &quot;original author&quot;. When you select the title of this movie &quot;Lord of the Rings&quot; as a keyword, the choice with the maximum hits is &quot;Peter Jackson&quot;, which is not the correct answer &quot;JRR Tolkien&quot;. However, if you select &quot;Lord of the Rings&quot; and &quot;original author&quot; as keywords, this question can be solved by selecting the choice with maximum hits. Therefore, it is clear from this example that how to select appropriate keywords is important.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.3 Forward and Backward Association </SectionTitle> <Paragraph position="0"> For certain questions, it is not enough to generate a conjunct query consisting of some key-words and a choice, and then to simply select the choice with maximum hits. This section introduces more sophisticated measures for selecting an appropriate answer. Consider the follow- null ing question.</Paragraph> <Paragraph position="1"> Q3: Where is Pyramid? a: Canada b: Egypt c: Japan d: China The numbers of hits are shown in Table 3. In this case, given a conjunct query consisting of akeyword&quot;Pyramid&quot;andachoice,thechoice with the maximum hits, i.e., &quot;Canada&quot; is not the correct answer &quot;Egypt&quot;. Why could not this question be solved? Let us consider the hits of the choices alone. The hits of the atomic query &quot;Canada&quot; is about seven times larger than the hits of the atomic query &quot;Egypt&quot;. With this observation, we can hypothesize that the hits of a conjunct query &quot;Pyramid&quot; and a choice are affected by the hits of the choice alone. Therefore some normalization might be required.</Paragraph> <Paragraph position="2"> Based on the analysis above, we employ the metrics proposed by Sato and Sasaki (2003). Sato and Sasaki (2003) has proposed two metrics for evaluating the strength of the relation of two terms. Suppose that X be the set of keywords and Y be the choice. In this paper, we call the hits of a conjunct query consisting of keywords X and a choice Y , which is normalized by the hits of X,asforward association FA(X,Y ). We also call the hits of a conjunct query X and Y , which is normalized by the hits of Y ,asbackward association BA(X,Y ).</Paragraph> </Section> </Section> class="xml-element"></Paper>