XML Viewer - w06-3807

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3807_metho.xml
Size: 16,284 bytes
Last Modified: 2025-10-06 14:10:59
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3807">
  <Title>Learning of Graph-based Question Answering Rules</Title>
  <Section position="3" start_page="37" end_page="39" type="metho">
    <SectionTitle>
2 Question Answering Rules
</SectionTitle>
    <Paragraph position="0"> In one form or another, a question answering rule must contain the following information:  1. a pattern that matches the question; 2. a pattern that matches the corresponding answer sentence; and 3. a pointer to the answer in the answer sentence  The patterns in our rules are expressed as graphs with vertices containing variables. A vertex with a variable can unify with a subgraph. For example, Figure 1 shows two graphs and a pattern that matches both graphs.</Paragraph>
    <Paragraph position="1">  percase) Such patterns are used to match the graph representation of the question. If a pattern defined in a rule matches a question sentence, then the rule applies to the sentence.</Paragraph>
    <Paragraph position="2"> Our rules specify the pattern of the answer sentence in an unusual way. Instead of keeping a pattern to match the answer sentence, our rules define an extension graph that will be added to the graph of the question. The rationale for this is that we want to reward answer sentences that have a high similarity with the question. Therefore, the larger the number of vertices and edges that are shared between the question and the answer, the better. The extension graph contains information that simulates the difference between a question sentence and a sentence containing an answer.</Paragraph>
    <Paragraph position="3"> For example, lets us use graph representations of syntactic dependency structures. We will base our representation on the output of Connexor (Tapanainen and J&amp;quot;arvinen, 1997), but the choice of parser is arbitrary. The same method applies to the output of any parser, as long as it can be represented as a graph. In our choice, the dependency structure is represented as a bipartite graph where the lexical entries are the vertices represented in boxes and the dependency labels are the vertices represented in ovals. Figure 2 shows the graphs of a question and an answer sentence, and an extensionof the question graph. The answer is shown in thick lines, and the extension is shown in dashed lines. This is what we aim to reproduce with our graph rules. In particular, the extension of the question graph is such that the graph of the answer sentence becomes a subgraph of the extended question graph.</Paragraph>
    <Paragraph position="4"> The question and answer sentence of Figure 2 have an almost identical dependency graph and consequently the extension required to the question graph is very small. Sentence pairs with more differences would induce a more substantial extension graph.</Paragraph>
    <Paragraph position="5">  Notethattheextendedgraphstillcontainstherepresentation of information that does not appear in the answer sentence, namely the question term what book. There is no need to remove any element from the question graph because, as we will see later, the criteria to score the answer extracted are based on the overlap between graphs.</Paragraph>
    <Paragraph position="6"> In sum, a graph rule has the following components: null Rp a question pattern; Re an extension graph, which is a graph to be added  to the question graph; and Ra a pointer to the answer in the extension graph An example of a rule is shown in Figure 3. This rule is derived from the pair of question and answer sentence shown in Figure 2.</Paragraph>
    <Paragraph position="7">  Re is in dashed lines, and Ra is in thick lines. The rule can be used with a fresh pair of question qi and answer sentence asi. Let us use the notation Gr(s) to denote the graph that represents the string s. Also, unless said explicitly, names starting with uppercase denote graphs, and names starting with lowercase denote strings. Informally, the process to find the answer is:  1. If Gr(qi) matches Rp then the rule applies.</Paragraph>
    <Paragraph position="8"> Otherwise try a new rule.</Paragraph>
    <Paragraph position="9"> 2. Extend Gr(qi) with re to produce a new graph EReqi .</Paragraph>
    <Paragraph position="10"> 3. Compute the overlap between EReqi and Gr(asi).</Paragraph>
    <Paragraph position="11"> 4. If a part of Ra is in the resulting overlap, then expand its projection on Gr(asi).</Paragraph>
    <Paragraph position="12">  The crucial point in the process is to determine the projection of an overlap on the answer sentence, and then to extend it. Once the overlap is found in step 3, if this overlap includes part of the annotated answer, that is if it includes Ra, then part of the answer will be the string in the answer sentence that corresponds to the overlap. The full answer can be retrieved by expanding the answer found in the overlap by following the outgoing edges in the graph of qi What book did Michael Ende write in 1984? extended with the extension graph (Re) of Figure 3  In Figure 5 the overlap between the extended question graph and the answer sentence graph contains the answer fragment novel. After expanding it weobtainthefullanswerthe novel titled &amp;quot;The Never</Paragraph>
    <Section position="1" start_page="39" end_page="39" type="sub_section">
      <SectionTitle>
Ending Story&amp;quot;.1
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="4" start_page="39" end_page="42" type="metho">
    <SectionTitle>
3 Learning of Graph Rules
</SectionTitle>
    <Paragraph position="0"> To learn a QA rule we need to determine the information that is common between a question and a sentence containing an answer. In terms of graphs, this is a variant of the well-known problem of finding the maximum common subgraph (MCS) of two graphs (Bunke et al., 2002).</Paragraph>
    <Paragraph position="1"> The problem of finding the MCS of two graphs is known to be NP-complete, but there are implementations that are fast enough for practical uses, especially if the graphs are not particularly large (Bunke etal., 2002). Giventhatourgraphsareusedtorepresent sentences, their size would usually stay within a few tens of vertices. This size is acceptable.</Paragraph>
    <Paragraph position="2"> There is an algorithm based on Conceptual Graphs (Myaeng and L'opez-L'opez, 1992) which is particularly efficient for our purposes.Their method follows the traditional procedure of building the association graph of the two input graphs. However, in 1Note that this answer is not an exact answer according to the TREC definition since it contains the string the novel titled; one further step would be needed to extract the exact answer; this is work for further research.</Paragraph>
    <Paragraph position="3"> contrast with the traditional approach, which finds the cliques of the association graph (and this is the part that is NP-complete), the method by Myaeng and L'opez-L'opez (1992) first simplifies the association graph by merging some of its vertices, and then it proceeds to searching the cliques. By so doing the algorithm is still exponential on the size of n, but now n is smaller than with the traditional approach for the same input graphs.</Paragraph>
    <Paragraph position="4"> The method presented by Myaeng and L'opez-L'opez (1992) finds connected graphs but we also need to find overlaps that form unconnected graphs.</Paragraph>
    <Paragraph position="5"> For example, Figure 6 shows two graphs and their MCS. The resulting MCS is an unconnected graph, though Myaeng and L'opez-L'opez (1992)'s algorithm returns the two parts of the graph as independent MCSs. It is easy to modify the original algorithm to obtain the desired output, as we did.</Paragraph>
    <Paragraph position="6"> Graph 1 Graph 2  Given two graphs G1 and G2, then their MCS is MCS(G1,G2). To simplify the notation, we will often refer to the MCS of two sentences as MCS(s1,s2). This is to be understood to be the MCS of the graphs of the two sentences MCS(Gr(s1),Gr(s2)).</Paragraph>
    <Paragraph position="7"> Let us now assume that the graph rule R is originatedfromapair(q,as)inthetrainingcorpus, where q is a question and as a sentence containing the answer a. The rule components are built as follows: Rp is the MCS of q and as, that is, MCS(q,as).</Paragraph>
    <Paragraph position="8"> Re is the path between the projection of Rp in Gr(as) and the actual answer Gr(a).</Paragraph>
    <Paragraph position="9"> Ra is the graph representation of the exact answer.</Paragraph>
    <Paragraph position="10">  Note that this process defines Rp as the MCS of question and answer sentence. Consequently, Rp is a subgraph of both the question and the answer sentence. This constraint is stronger than that of a typical QA rule, where the pattern needs to match the question only. The resulting question pattern is  thereforemoregeneralthanitcouldbehadonemanually built the rule. Rp does not include questiononly elements in the question pattern because it is difficult to determine what components of the question are to be added to the pattern, and what components are idiosyncratic to the specific question used in the training set.</Paragraph>
    <Paragraph position="11"> Rules learnt this way need to be generalised in order to form generic patterns. Wecurrently use a simple method of generalisation: convert a subset of the vertices into variables. To decide whether a vertex can be generalised a list of very common vertices is used. This is the list of &amp;quot;stop vertices&amp;quot;, in analogy to the concept of stop words in methods to detect key-words in a string. Thus, if a vertex is not in the list of stop vertices, then the vertex can be generalised.</Paragraph>
    <Paragraph position="12"> The list of stop vertices is fixed and depends on the graph formalism used.</Paragraph>
    <Paragraph position="13"> For the question answering process it is useful to associate a weight to every rule learnt. The rule weight is computed by testing the accuracy of the rule in the training corpus. This way, rules that overgeneralise acquire a low weight. The weight W(r) of a rule r is computed according to its precision on the training set:</Paragraph>
    <Paragraph position="15"> The above method has been applied to graphs representing the logical contents of sentences. There has been a long tradition on the use of graphs for this kind of sentence representation, such as Sowa's Conceptual Graphs (Sowa, 1979), and Quillian's Semantic Nets (Quillian, 1968). In our particular experiment we have used a graph representation that can be built automatically and that can be used efficiently for QA (Moll'a and van Zaanen, 2006).</Paragraph>
    <Paragraph position="16"> A Logical Graph (LG) is a directed, bipartite graph with two types of vertices, concepts and relations. null Concepts Examplesofconceptsareobjectsdog, table, events and states run, love, and properties red, quick.</Paragraph>
    <Paragraph position="17"> Relations Relations act as links between concepts. To facilitate the production of the LGs we have decided to use relation labels that represent verb argument positions. Thus, the relation 1 indicates the link to the first argument of a verb (that is, what is usually a subject). The relation 2 indicates the link to the second argument of a verb (usually the direct object), and so forth. Furthermore, relations introduced by prepositions are labelled with the prepositions themselves. Our relations are therefore close to the syntactic structure.</Paragraph>
    <Paragraph position="18"> An example of a LG is shown in Figure 7, where the concepts are pictured in boxes and the relations are pictured in ovals.</Paragraph>
    <Paragraph position="19"> The example in Figure 7 shows LG's ability to provide the graph representation of sentences with embedded clauses. In contrast, other theories (such as Sowa (1979)'s Conceptual Graphs) would represent the sentence as a graph containing vertices that are themselves graphs. This departs from the usual definition of a graph, and therefore standard Graph Theory algorithms would need to be adapted for Conceptual Graphs. An advantage of our LGs, therefore, is that they can be manipulated with standard Graph Theory algorithms such as the ones described in this paper.</Paragraph>
    <Paragraph position="20"> Using the LG as the graph representation of questions and answer sentences, we implemented a proof-of-concept QA system. The implementation and examples of graphs are described by Moll'a and van Zaanen (2005) and here we only describe the method to generalise rules and the decisions taken to choose the exact answer.</Paragraph>
    <Paragraph position="21"> The process to generalise rules takes advantage of the two kinds of vertices. Basically, relation vertices represent names of relations and we considered these to be important in the rule. Consequently relations edges were left unmodified in the generalised rule. Concept vertices are generalised by replacing them with generic variables, except for a specific set of &amp;quot;stop concepts&amp;quot; which were not generalised. The list of stop concepts is very small:  tom 1 believe 2  Every question/answer pair in the training corpus generates one rule (or more if we use a process of increasingly generalising the rules). Since the rule is based on deep linguistic information, it generalises over syntactic paraphrases. Consequently, a small training corpus suffices to produce a relatively large number of rules.</Paragraph>
    <Paragraph position="22"> The QA system was trained with an annotated corpus of 560 pairs of TREC questions and answer sentences where the answers were manually annotated. We only tested the ability of the system to extract the exact answers. Thus, the system accepted pairs of question and answer sentences (where the sentence is guaranteed to contain an answer), and returned the exact answer. Given a question and answer sentence pair, the answer is found by applying all matching rules. All strings found as answers are ranked by multiplying the rule weights and the sizes of the overlaps. If an answer is found by several rules, its score is the sum of all scores of each individual sentence. Finally, if an answer occurs in the question it is ignored. The results of a five-fold cross validation on the annotated corpus gave an accuracy (percentage of questions where the correct answer was found) of 21.44%. Given that the QA system does not do any kind of question classification and it does not use any NE recogniser, the results are satisfactory. null</Paragraph>
  </Section>
  <Section position="5" start_page="42" end_page="42" type="metho">
    <SectionTitle>
5 Related Research
</SectionTitle>
    <Paragraph position="0">  TherehavebeenotherattemptstolearnQArulesautomatically. For example, Ravichandran and Hovy (2002) learns rules based on simple surface patterns.</Paragraph>
    <Paragraph position="1"> Given that surface patterns ignore much linguistic information, it becomes necessary to gather a large corpus of questions together with their answers and sentences containing the answers. To obtain such a corpus Ravichandran and Hovy (2002) mine the Web to gather the relevant data.</Paragraph>
    <Paragraph position="2"> Other methods learn patterns based on syntactic information. For example, Shen et al. (2005) develop a method of extracting dependency paths connecting answers with words found in the question.</Paragraph>
    <Paragraph position="3"> However we are not aware of any method that attempts to learn patterns based on logical information, other than our own.</Paragraph>
    <Paragraph position="4"> There is recent interest on the use of graph methods for Natural Language Processing, such as document summarisation (Mihalcea, 2004) document retrieval (Montes-y-G'omez et al., 2000;</Paragraph>
    <Section position="1" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
Mishne,2004),andrecognitionoftextualentailment
</SectionTitle>
      <Paragraph position="0"> (Pazienza et al., 2005). The present very workshop shows the current interest on the area. However, we are not aware of any significant research about the use of conceptual graphs (or any other form of graph representation) for question answering other than our own.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML