File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1317_metho.xml

Size: 20,482 bytes

Last Modified: 2025-10-06 14:07:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1317">
  <Title>Automated Construction of Database Interfaces: Integrating Statistical and Relational Learning for Semantic Parsing</Title>
  <Section position="3" start_page="0" end_page="134" type="metho">
    <SectionTitle>
2 Overview of the Approach
</SectionTitle>
    <Paragraph position="0"> This section reviews our overall approach using an interface developed for a U.S.</Paragraph>
    <Paragraph position="1"> Geography database (Geoquery) as a sample application (ZeUe and Mooney, 1996) which is available on the Web (see hl:tp://gvg, cs. utezas, edu/users/n~./geo .html).</Paragraph>
    <Section position="1" start_page="0" end_page="133" type="sub_section">
      <SectionTitle>
2.1 Semantic Representation
</SectionTitle>
      <Paragraph position="0"> First-order logic is used as a semantic representation language. CHILL has also been applied to a restaurant database in which the logical form resembles SQL, and is translated  automatically into SQL (see Figure 1). We explain the features of the Geoquery representation language through a sample query: Input: &amp;quot;W'hat is the largest city in Texas?&amp;quot; Quc~'y: a nswer(C,largest(C,(city(C),loc(C,S), const (S,stateid (texas))))).</Paragraph>
      <Paragraph position="1"> Objects are represented as logical terms and are typed with a semantic category using logical functions applied to possibly ambiguous English constants (e.g. stateid(Mississippi), riverid(Mississippi)). Relationships between objects are expressed using predicates; for instance, Ioc(X,Y) states that X is located in Y. We also need to handle quantifiers such as 'largest'. We represent these using meta-predicates for which at least one argument is a conjunction ofliterals. For example, largest(X, Goal) states that the object X satisfies Goal and is the largest object that does so, using the appropriate measure of size for objects of its type (e.g. area for states, population for cities). Finally, an nn.qpeci~ed object required as an argument to a predicate can appear elsewhere in the sentence, requiring the use of the predicate const(X,C) to bind the variable X to the constant C. Some other database queries (or training examples) for the U.S. Geography domain are shown below: What is the capital of Texas? a nswer(C,(ca pital(C,S),const(S,stateid (texas)))). What state has the most rivers running through it? a nswer(S,most (S,R,(state(S),rlver(R),traverse(R,S)))).</Paragraph>
    </Section>
    <Section position="2" start_page="133" end_page="134" type="sub_section">
      <SectionTitle>
2.2 Parsing Actions
</SectionTitle>
      <Paragraph position="0"> Our semantic parser employs a shift-reduce architecture that maintains a stack of previously built semantic constituents and a buffer of remaining words in the input. The parsing actions are automatically generated from templates given the training data. The templates are INTRODUCE, COREF_VABS, DROP_CON J, LIFT_CON J, and SttIFT. INTRODUCE pushes a predicate onto the stack based on a word appearing in the input and information about its possible meanings in the lexicon. COREF_VARS binds two arguments of two different predicates on the stack. DROP_CONJ (or LIFT_CON J) takes a predicate on the stack and puts it into one of the arguments of a meta-predicate on the stack.</Paragraph>
      <Paragraph position="1"> SHIFT simply pushes a word from the input buffer onto the stack. The parsing actions are tried in exactly this order. The parser also requires a lexicon to map phrases in the input into specific predicates, this lexicon can also be learned automatically from the training data (Thompson and Mooney, 1999).</Paragraph>
      <Paragraph position="2"> Let's go through a simple trace of parsing the request &amp;quot;What is the capital of Texas?&amp;quot; A lexicon that maps 'capital' to 'capital(_,_)' and 'Texas' to 'const(_,stateid(texas))' su.~ces  here. Interrogatives like &amp;quot;what&amp;quot; may be mapped to predicates in the lexicon if necessary. The parser begins with an initial stack and a buffer holding the input sentence. Each predicate on the parse stack has an attached buffer to hold the context in which it was introduced; words from the input sentence are shifted onto this buffer during parsing.</Paragraph>
      <Paragraph position="3"> The initial parse state is shown below: Parse Stack: \[answer(_,_):O\] Input Buffer: \[what,is,the,ca pital,of,texas,?\] Since the first three words in the input buffer do not map to any predicates, three SHIFT actions are performed. The next is an INTRODUCE as 'capital' is at the head of the input buffer: Parse Stack: \[capital(_,_): O, answer(_,_):\[the,is,what\]\] Input Buffer: \[capital,of,texas,?\] The next action is a COREF_VARS that binds the first argument of capital(_,_) with the first argument of answer(_,_).</Paragraph>
      <Paragraph position="4"> Parse Stack: \[capital(C,_): O, answer(C,_):\[the,is,what\]\] Input Buffer: \[capital,of,texas,?\] The next sequence of steps axe two SHIFT's, an INTRODUCE, and then a COR.EF_VARS: Parse Stack: \[const(S,stateid(texas)): 0' ca pital(C,S):\[of, ca pital\], answer(C,_):\[the,is,what~ Input Buffer: \[texas,?\] The last four steps are two DROP_CONJ's followed by two SHIFT's: Parse Stack: \[answer(C, (capital(C,S), const(S,stateld(texas)))): \[?,texas,of, ca pital,the,is,what\]\] Input Buffer: I\] This is the final state and the logical query is extracted from the stack.</Paragraph>
    </Section>
    <Section position="3" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
2.3 Learning Control Rules
</SectionTitle>
      <Paragraph position="0"> The initially constructed parser has no constraints on when to apply actions, and is therefore overly general and generates n11rnerous spurious parses. Positive and negative examples are collected for each action by parsing each tralnlng example and recordlng the parse states encountered. Parse states to which an action should be applied (i.e. the action leads to building the correct semantic representation) are labeled positive examples for that action. Otherwise, a parse state is labeled a negative example for an action if it is a positive example for another action below the current one in the ordered list of actions. Control conditions which decide the correct action for a given parse state axe learned for each action from these positive and negative examples.</Paragraph>
      <Paragraph position="1"> The initial CHILL system used ILP (Lavrac and Dzeroski, 1994) to learn Prolog control rules and employed deterministic parsing, using the learned rules to decide the appropriate parse action for each state. The current approach learns a model for estimating the probability that each action should be applied to a given state, and employs statistical parsing (Manning and Schiitze, 1999) to try to find the overall most probable parse, using beam search to control the complexity. The advantage of ILP is that it can perform induction over the logical description of the complete parse state without the need to pre-engineer a fixed set of features (which vary greatly from one domain to another) that are relevant to making decisions. We maintain this advantage by using ILP to learn a committee of hypotheses, and basing probability estimates on a weighted vote of them (Ali and Pazzani, 1996). We believe that using such a probabilistic relational model (Getoor and Jensen, 2000) combines the advantages of frameworks based on first-order logic and those based on standard statistical techniques.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="134" end_page="135" type="metho">
    <SectionTitle>
3 The TABULATE ILP Method
</SectionTitle>
    <Paragraph position="0"> This section discusses the ILP method used to build a committee of logical control hypotheses for each action.</Paragraph>
    <Section position="1" start_page="134" end_page="135" type="sub_section">
      <SectionTitle>
3.1 The Basic TABULATE Algorithm
</SectionTitle>
      <Paragraph position="0"> Most ILP methods use a set-covering method to learn one clause (rule) at a time and construct clauses using either a strictly top-down (general to specific) or bottom-up (specific to general) search through the space of possible rules (Lavrac and Dzeroski, 1994). TAB-ULATE, 1 on the other hand, employs both bottom-up and top-down methods to construct potential clauses and searches through the hypothesis space of complete logic programs (i.e. sets of clauses called theories). It uses beam search to find a set of alternative hypotheses guided by a theory evaluation metric discussed below. The search starts with aTABULATB stands for Top-doera And Bottom-Up cLAuse construction urith Theory Evaluation.</Paragraph>
      <Paragraph position="1">  Input: t(X,,...,Xn): the target concept to learn ~+: the (B examples ~-: the (9 examples Output: Q: a queue of learned theories</Paragraph>
      <Paragraph position="3"> for each G~ E ,,~, next~ = empty if Gj satisfies the noise criteria; otherwise, G$}  the most specific hypothesis (the set of positive examples each represented as a separate clause). Each iteration of the loop attempts to refine each of the hypotheses in the current search queue. There are two cases in each iteration: 1) an existing clause in a theory is refined or 2) a new clause is begun. Clauses are learned using both top-down specialiT.~tion using a method similar to FOIL (Quinlan, 1990) and bottom-up generalization using Least General Generalizations (LGG's). Advantages of combining both ILP approaches were explored in CHILLIN (ZeUe and Mooney, 1994), an ILP method which motivated the design of TABULATE. An outline of the TABULATE algorithm is given in Figure 2.</Paragraph>
      <Paragraph position="4"> A noise-handling criterion is used to decide when an individual clause in a hypothesis is sufficiently accurate to be permanently retained. There are three possible outcomes in a refinement: 1) the current clause satisfies the noise-handling criterion and is simply returned (nextj is set to empty), 2) the current clause does not satisfy the noise-handling criteria and all possible refinements are returned (neztj is set to the refined clause), and 3) the current clause does not satisfy the noise-handling criterion but there are no further refinements (neztj is set to fai O. If the refinement is a new clause, clauses in the current theory subs-reed by it are removed. Otherwise, it is a specialization of an existing clause. Positive examples that are not covered by the resulting theory, due to specializing the clause, are added back into theory as individual clauses. Hence, the theory is always maintained complete (i.e. covering all positive examples). These final steps are performed in the Complete procedure.</Paragraph>
      <Paragraph position="5"> The termination criterion checks for two conditions. The first is satisfied if the next search queue does not improve the sum of the metric score over all hypotheses in the queue. Second, there is no clause currently being built for each theory in the search queue and the last finished clause of each theory satisfies the noise-handling criterion. Finally, a committee of hypotheses found by the algorithm is returned.</Paragraph>
    </Section>
    <Section position="2" start_page="135" end_page="135" type="sub_section">
      <SectionTitle>
3.2 Compression and Accuracy
</SectionTitle>
      <Paragraph position="0"> The goal of the search is to find accurate and yet simple hypotheses. We measure accuracy using the m-estimate (Cestnik, 1990), a smoothed measure of accuracy on the training data which in the case of a two-class problem is defined as:</Paragraph>
      <Paragraph position="2"> where s is the n-tuber of positive examples covered by the hypothesis H, n is the total number of examples covered, p+ is the prior probability of the class (9, and m is a smoothing parameter.</Paragraph>
      <Paragraph position="3"> We measure theory complexity using a metric slmi\]ar to that introduced in (Muggleton and Buntine, 1988). The size of a Clause having a Head and a Body is defined as follows (ts=&amp;quot;term size&amp;quot; and ar=&amp;quot;arity'):</Paragraph>
      <Paragraph position="5"> The size of a clause is roughly the n,,mber of variables, constants, or predicate symbols it contains. The size of a theory is the sum of the sizes of its clauses. The metric M(H) used as the search heuristic is defined as:</Paragraph>
      <Paragraph position="7"> where C is a constant used to control the relative weight of accuracy vs. complexity. We ass~,me that the most general hypothesis is as good as the most specific hypothesis; thus, C is determined to be:</Paragraph>
      <Paragraph position="9"> where Et, Eb are the accuracy estimates of the most general and most specific hypotheses respectively, and St, Sb are their sizes.</Paragraph>
    </Section>
    <Section position="3" start_page="135" end_page="135" type="sub_section">
      <SectionTitle>
3.3 Noise Handling
</SectionTitle>
      <Paragraph position="0"> A clause needs no further refinement when it meets the following criterion (as in RIPPER (Cohen, 1995)):</Paragraph>
      <Paragraph position="2"> where p is the number of positive examples covered by the clause, n is the number of negative examples covered and -1 &lt;/~ _&lt; 1 is a parameter. The value of ~ is decreased whenever the sum of the metric over the hypotheses in the queue does not improve although some of them still have ,nflni~hed or failed clauses.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="135" end_page="138" type="metho">
    <SectionTitle>
4 Statistical Semantic Parsing
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="135" end_page="137" type="sub_section">
      <SectionTitle>
4.1 The Parsing Model
</SectionTitle>
      <Paragraph position="0"> A parser is a relation Parser C_ Sentences x Queries where Sentences and Queries are the sets of natural language sentences and database queries respectively. Given a sentence I * Sentences, the set Q(1) = {q * Queries I (l, q) * Parser} is the set of queries that are correct interpretations of I.</Paragraph>
      <Paragraph position="1"> A parse state consists of a stack of lexicalized predicates and a list of words from the input sentence. S is the set of states reachable by the parser. Suppose our learned parser has n different parsing actions, the ith action a/is a function a/(s) : ISi -+ OSi where ISi G S is the set of states to which the action is applicable and OSi C_ S is the set of states constructed by the action. The function ao(l) : Sentences ~ IniS maps each sentence l to a corresponding unique initial parse state in In/S C_ S. A state is called afinalstate if there exists no parsing action applicable to it. The partial function a,+l(s) : FS ~ Queries is defined as the action that retrieves the query from the final state s 6 FS C S if one exists.</Paragraph>
      <Paragraph position="2"> Some final states may not &amp;quot;contain&amp;quot; a query (e.g. when the parse stack contain.q predicates with unbound ~rariables) and therefore it is a partial function. When the parser meets such a final state, it reports a failure.</Paragraph>
      <Paragraph position="3"> A path is a finite sequence of parsing actions. Given a sentence 1, a good state s is one such that there exists a path from it to a query q 6 Q(1). Otherwise, it is a bad state.</Paragraph>
      <Paragraph position="4"> The set of parse states can be uniquely divided into the set of good states and the set of bad states given l and Parser. S + and S- are the sets of good and bad states respectively.</Paragraph>
      <Paragraph position="5"> Given a sentence l, the goal is to construct the query ~ such that</Paragraph>
      <Paragraph position="7"> where I ~ q means a path exists from l to q.</Paragraph>
      <Paragraph position="8"> Now, we need to estimate P(q * Q(1) I l =-~ q). First, we notice that:</Paragraph>
      <Paragraph position="10"> venience we drop the conditions and denote the above probabilities as P(q * Q(l)) and</Paragraph>
      <Paragraph position="12"> ditions in the following discussion. The equation states that the probability that a given query is a correct meaning for I is the same as the probability that the final state (reached by parsing l) is a good state. We need to determine in general the probability of having a good resulting parse state. Given any parse state s i at the jth step of parsing and an action ai such that si+1 = a/(sj), we have:</Paragraph>
      <Paragraph position="14"> where IS~ = ISi N S + and OS~ = OS~ N S +.</Paragraph>
      <Paragraph position="15"> Since no parsing action can produce a good  parse state from a bad one, the second term is zero. Now, we are ready to derive P(q *</Paragraph>
      <Paragraph position="17"> where ak denotes the index of which action is applied at the kth step. We assume that = P(sl * I~aa) ~ 0 (which may not be true in general). Now, we have</Paragraph>
      <Paragraph position="19"> Next we describe how we estimate the probabili~ of the goodness of each action in a given state (P(~(s) * o$ I s * I~)). We n~ not estimate 7 since its value does not affect the outcome of equation (7).</Paragraph>
    </Section>
    <Section position="2" start_page="137" end_page="137" type="sub_section">
      <SectionTitle>
4.2 Estimating Probabilities for
Parsing Actions
</SectionTitle>
      <Paragraph position="0"> The committee of hypotheses learned by TABULATE is used to estimate the probability that a particular action is a good one to apply to a given parse state. Some hypotheses are more &amp;quot;important&amp;quot; than others in the sense that they carry more weight in the decision. A weighting parameter is also included to lower the probability estimate of actions that appear fm'ther down the decision list. For actions ai where 1 &lt; i &lt; n - 1:</Paragraph>
      <Paragraph position="2"> where s is a given parse state, pos(i) is the position of the action ai in the list of actions applicable to state s, Ak and 0 &lt; /~ &lt; 1 are weighting parameters, z Hi is the set of hypotheses learned for the action ai, and ~k A~ = 1.</Paragraph>
      <Paragraph position="3"> To estimate the probability for the last action an, we devise a simple test that checks if the maximum of the set A(s) of probability estimates for the subset of the actions 2p is set to 0.95 for all the experiments performed. {al,..., an-l} applicable to s is less than or equal to a threshold a. If A(s) is empty, we assume the maxlrn,,rn is zero. More precisely,</Paragraph>
      <Paragraph position="5"> where a is the threshold, 3 c(an(s) * Ob~) is the count of the number of good states produced by the last action, and c(s * IS~) is the count of the number of good states to which the last action is applicable.</Paragraph>
      <Paragraph position="6"> Now, let's discuss how P(ai(s) * OS~ ~ I hk) and Ak are estimated. If hk ~ s (i.e. hk covers s), we have</Paragraph>
      <Paragraph position="8"> where Pc and ne are the number of positive and negative examples covered by hk respectively. Otherwise, if h~ ~= s (i.e. hk does not cover s), we have PCai(s) * OS 7&amp;quot; I hk) -- p&amp;quot; + 8.n,, Pu +nu (15) where Pu and nu are the n,,rnber of positive and negative examples rejected by hk respectively. /9 is the probability that a negative example is mislabelled and its value can be estimated given # (in equation (6)) and the total nnrnber of positive and negative examples. null One could use a variety of linear combination methods to estimate the weights Ak (e.g. Bayesian combination (Buntine, 1990)). However, we have taken a simple approach and weighted hypotheses based on their relative simplicity:</Paragraph>
      <Paragraph position="10"/>
    </Section>
    <Section position="3" start_page="137" end_page="138" type="sub_section">
      <SectionTitle>
4.3 Searching for a Parse
</SectionTitle>
      <Paragraph position="0"> To find the most probably correct parse, the parser employs a beam search. At each step, the parser finds all of the parsing actions applicable to each parse state on the queue and calculates the probability of goodness of each of them using equations (12) and (13). It then SThe threshold is set to 0.5 for all the experiments performed.  computes the probability that the resulting state of each possible action is a good state using equation (11), sorts the queue of possible next states accordingly, and keeps the best B options. The parser stops when a complete parse is found on the top of the parse queue or a failure is reported.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML