File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1027_intro.xml
Size: 7,367 bytes
Last Modified: 2025-10-06 14:02:04
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1027"> <Title>Learning theories from text</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Some background </SectionTitle> <Paragraph position="0"> (Pulman, 2000) showed that it was possible to learn a simple domain theory from a disambiguated corpus: a subset of the ATIS (air travel information service) corpus (Doddington and Godfrey, 1990). Ambiguous sentences were annotated as shown to indicate the preferred reading: [i,would,like, [the,cheapest,flight, The 'good' and the 'bad' parses were used to produce simplified first order logical forms representing the semantic content of the various readings of the sentences. The 'good' readings were used as positive evidence, and the 'bad' readings (or more accurately, the bad parts of some of the readings) were used as negative evidence. Next a particular Inductive Logic Programming algorithm, Progol (Muggleton, 1995), was used to learn a theory of prepositional relations in this domain: i.e. what kinds of entities can be in these relations, and which cannot:</Paragraph> <Paragraph position="2"> The +any declaration says that there are no prior assumptions about sortal restrictions on these predicates. Among others generalisations like the following were obtained (all variables are implicitly universally quantified):</Paragraph> <Paragraph position="4"> This domain theory was then used successfully in disambiguating a small held-out section of the corpus, by checking for consistency between logical forms and domain theories.</Paragraph> <Paragraph position="5"> While the numbers of sentences involved in that experiment were too small for the results to be statistically meaningful, the experiment proved that the method works in principle, although of course in reality the notion of logical consistency is too strong a test in many cases. Note also that the results of the theory induction process are perfectly comprehensible - the outcome is a theory with some logical structure, rather than a black box.</Paragraph> <Paragraph position="6"> The method requires a fully parsed corpus with corresponding logical forms. Using a similar technique, we have experimented with slightly larger datasets, using the Penn Tree Bank (Marcus et al., 1994) since the syntactic annotations for sentences given there are intended to be complete enough for semantic interpretation, in principle, at least.</Paragraph> <Paragraph position="7"> In practice, (Liakata and Pulman, 2002) report, it is by no means easy to do this. It is possible to recover partial logical forms from a large proportion of the treebank, but these are not complete or accurate enough to simply replicate the ATIS experiment. In the work reported here, we selected about 40 texts containing the verb 'resign', all reporting, among other things, 'company succession' events, a scenario familiar from the Message Understanding Conference (MUC) task (Grishman and Sundheim, 1995). The texts amounted to almost 4000 words in all. Then we corrected and completed some automatically produced logical forms by hand to get a fairly full representation of the meanings of these texts (as far as is possible in first order logic). We also resolved by hand some of the simpler forms of anaphoric reference to individuals to simulate a fuller discourse processing of the texts. To give an example, a sequence of sentences like: J.P. Bolduc, vice chairman of W.R. Grace & Co.</Paragraph> <Paragraph position="8"> (...) was elected a director. He succeeds Terrence D. Daniels,... who resigned.</Paragraph> <Paragraph position="9"> was represented by the following sequence of literals: null verb(e1,elect).</Paragraph> <Paragraph position="10"> funct_of('J.P._Bolduc',x1).</Paragraph> <Paragraph position="11"> ...</Paragraph> <Paragraph position="12"> subj(e1,unspecified).</Paragraph> <Paragraph position="13"> obj(e1,x1).</Paragraph> <Paragraph position="14"> description(e1,x1,director,de1).</Paragraph> <Paragraph position="15"> verb(e5,succeed).</Paragraph> <Paragraph position="16"> subj(e5,x1).</Paragraph> <Paragraph position="17"> funct_of('Terrence_D._Daniels',x6).</Paragraph> <Paragraph position="18"> obj(e5,x6).</Paragraph> <Paragraph position="19"> verb(e4,resign).</Paragraph> <Paragraph position="20"> subj(e4,x6).</Paragraph> <Paragraph position="21"> The representation is a little opaque, for various implementation reasons. It can be paraphrased as follows: there is an event, e1, of electing, the sub-ject of which is unspecified, and the object of which is x1. x1 is characterised as 'J P Bolduc', and e1 assigns the description de1 of 'director' to x1. There is an event e5 of succeeding, and x1 is the subject of that event. The object of e5 is x6, which is characterised as Terrence D Daniels. There is an event e4 of resigning and the subject of that event is x6. The reason for all this logical circumlocution is that we are trying to learn a theory of the 'verb' predicate, in particular we are interested in relations between the arguments of different verbs, since these may well be indicative of causal or other regularities that should be captured in the theory of the company succession domain. If the individual verbs were represented as predicates rather than arguments of a 'verb' predicate we would not be able to generalise over them: we are restricted to first order logic, and this would require higher order variables. null We also need to add some background knowledge. We assume a fairly simple flat ontology so as to be able to reuse existing resources. Some entities were assigned to classes automatically using clustering techniques others had to be done by hand. The set of categories used were: company, financial instrument, financial transaction, location, money, number, person, company position, product, time, and unit (of organisation). As before, the representation has these categories as an argument of a 'class' predicate to enable generalisation: null class(person,x1).</Paragraph> <Paragraph position="22"> class(company,x3).</Paragraph> <Paragraph position="23"> etc.</Paragraph> <Paragraph position="24"> Ideally, to narrow down the hypothesis space for ILP, we need some negative evidence. But in the Penn Tree Bank, only the good parse is represented. There are several possible ways of obtaining negative data, of course: one could use a parser trained on the Tree Bank to reparse sentences and recover all the parses. However, there still remains the problem of recovering logical forms from 'bad' parses. An alternative would be to use a kind of 'closed world' assumption: take the set of predicates and arguments in the good logical forms, and assume that any combination not observed is actually impossible. One could generate artificial negative evidence this way.</Paragraph> <Paragraph position="25"> Alternatively, one can try learning from positive only data. The ILP systems Progol (Muggleton, 1995) and Aleph (Srinivasan, 1999) are able to learn from positive only data, with the appropriate settings. Likewise, so-called 'descriptive' ILP systems like WARMR (DeHaspe, 1998) do not always need negative data: they are in effect data mining engines for first order logic, learning generalisations and correlations in some set of data.</Paragraph> </Section> class="xml-element"></Paper>