File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-0723_abstr.xml
Size: 10,796 bytes
Last Modified: 2025-10-06 13:41:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0723"> <Title>Learning IE Rules for a Set of Related Concepts</Title> <Section position="1" start_page="0" end_page="117" type="abstr"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The growing availability of on-line text has led to an increase in the use of automatic knowledge acquisition approaches from textual data. In fact, a number of Information Extraction (IE) systems has emerged in the past few years in relation to the MUC conferences 1. The aim of an IE system consists in automatically extracting pieces of information from text, being this information relevant for a set of prescribed concepts (scenario). One of the main drawbacks of applying IE systems is the high cost involved in manually adapting them to new domains and text styles.</Paragraph> <Paragraph position="1"> In recent years, a variety of Machine Learning (ML) techniques has been used to improve the portability of IE systems to new domains, as in SRV (Freitag, 1998), RAPIER (Califf and Mooney, 1997), LIEP (Huffman, 1996), CRYSTAL (Soderland et al., 1995) and WHISK (Soderland, 1999) . However, some drawbacks remain in the portability of these systems: a) existing systems generally depend on the supported text style and learn IE-rules either for structured texts, semi-structured texts or free text , b) IE systems are mostly single-concept learning systems, c) consequently, an extractor (e.g., a rule set) is learned for each concept within the scenario in an independent manner, d) the order of execution of the learners is set manually, and so are the scheduling and way of combination of the resulting extractors, and e) focusing on the training data, the size of available training corpora can be inadequate to accurately learn extractors for all the concepts within the scenario 2.</Paragraph> <Paragraph position="2"> text style and domain.</Paragraph> <Paragraph position="3"> This paper describes EVIUS, a multi-concept learning system for free text that follows a multi-strategy constructive learning approach (MCL) (Michalshi, 1993) and supports insufficient amounts of training corpora. EVIUS is a component of a multilingual IE system, M-TURBIO (Turmo et al., 1999).</Paragraph> <Paragraph position="4"> 2 EVIUS. Learning rule sets for a set of related concepts The input of EVIUS is both a partially-parsed semantically-tagged 3 training corpus and a description of the desired target structure. This description is provided as a set of concepts C related to a set of asymmetric binary relations, T~.</Paragraph> <Paragraph position="5"> In order to learn set S of IE rule sets for the whole C, EVIUS uses an MCL approach integrating constructive learning, closed-loop learning and deductive restructuring (Ko, 1998). In this multi-concept situation, the system determines which concepts to learn and, later, incrementally updates S. This can be relatively straightforward when using knowledge about the target structure in a closed-loop learning approach. Starting with C, EVIUS reduces set b/of unlearned concepts iteratively by selecting subset P C/g formed by the primitive concepts in/.4 and learning a rule set for each c E P 4 For instance, the single colour scenario 5 in fig- null sort of C is possible, which starts with a set of primitive concepts.</Paragraph> <Paragraph position="6"> 5Our testing domain is mycology. Texts consists of Spanish descriptions of specimens. There is a rich variety of colour descriptions including basic colours, intervals, changes, etc.</Paragraph> <Paragraph position="7"> ure 1 is provided to learn from instances of the following three related concepts: colour, such as in instance &quot;azul ligeramente claro&quot; (slightly pale blue), colour_interval, as in &quot;entre rosa y rojo sangre&quot; (between pink and blood red), and to_change, as in &quot;rojo vira a marr6n&quot; (red changes to brown).</Paragraph> <Paragraph position="8"> Initially, Lt = C = { colour, colour_interval, to_change}. Then, EVIUS calculates 7 9 ={colour} and once a rule set has been learned for colour, the new L/={colour_interval, to_change} is studied identifying 79 = L/.</Paragraph> <Paragraph position="9"> to to from from In order to learn a rule set for a concept, EVIUS uses the relational learning method explained in section 3, and defines the learning space by means of a dynamic predicate model. As a pre-process of the system, the training corpus is translated into predicates using the following initial predicate model: a) attributive meta-predicates: pos_X(A), isa_X(A), has_hypernym_X(A), word_X(A) and lemma_X(A), where X is instantiated with closed categories, b) relational meta-predicates: distance_le._X(A,B), stating that there are X terminal nodes, at most, between A and B, and c) relational predicates: ancestor(A,B), where B is the syntactic ancestor of A, and brother(A,B), where B is the right brother node of A sharing the syntactic ancestor.</Paragraph> <Paragraph position="10"> Once a rule set for concept c is learned, new examples are added for further learning by means of a deductive restructuring approach: training examples are reduced to generate a more compact and useful knowledge of the learned concept. This is achieved by using the induced rule set and a syntactico-semantic transformational grammar. Further to all this, a new predicate isa_c is added to the model.</Paragraph> <Paragraph position="11"> For instance, in figure 2 6 , the Spanish sentence &quot;su color rojo vira a marrSn oscuro&quot; (its red colour changes to dark brown) has 6Which is presented here as a partially-parsed tree for simplicity.</Paragraph> <Paragraph position="12"> S (n12) spec n a v prep/ n a</Paragraph> <Paragraph position="14"> spec n a v prep/( gnom .\</Paragraph> <Paragraph position="16"> two examples of colour, n3 and n6+n7, being these &quot;rojo&quot; (red) and &quot;marr6n'+&quot;oscuro&quot; (dark brown). No reduction is required by the former. However, the latter example is reduced to node n6'. As a consequence, two new attributes are added to the model: isa_colour(n3) and isa_colour(n6'). This new knowledge will be used to learn the concepts to_change and colour_interval.</Paragraph> <Paragraph position="17"> stances of already learned concepts related to c in the scenario. For instance, the ontology relation to_change(n3,n6') 7, in the same figure, means that the colour represented by instance n3 changes to that represented by n6'.</Paragraph> <Paragraph position="18"> Negative examples $- are automatically selected as explained in section 3.1.</Paragraph> <Paragraph position="19"> 7Note that, after the deductive restructuring step, both n3 and n6' are instances of the concept colour. If any uncovered examples set, g~-, remains after FOIL's performance, this is due to the lack of sufficient examples. Thus, the system tries to improve recall by growing set g+ with artificial examples (pseudo-examples), as explained in 3.2. A new execution of FOIL is done by using the new g+. The resulting rule set 7~ is combined with T~0 in order to create 7C/1 by appending the new rules from TC/~ to 7C/0. Consequently, the recall value of 7~1 is forced to be at least equal to that of 7~0, although the accuracy can decrease. A better method seems to be the merging of rules from 7~ and TO0 by studying empirical subsumptions. This last combination allows to create more compact and accurate rule sets.</Paragraph> <Paragraph position="20"> EVIUS uses an incremental learning approach to learn rule sets for each concept. This is done by iterating the process above while uncovered examples remain and the F1 score increment (AF1) is greater than pre-defined constant a: select g+ and generate g-</Paragraph> <Paragraph position="22"/> <Section position="1" start_page="116" end_page="116" type="sub_section"> <SectionTitle> 3.1 Generating relevant negative examples </SectionTitle> <Paragraph position="0"> Negative examples can be defined as any combination of terminal nodes out of g+. However, this approach produces an extremely large number of examples, out of which only a small sub-set is relevant to learn the concept. Related to this, (Freitag, 1998) uses words to learn only slot rules (learned from text-relation examples) , selecting as negative those non-positive word pairs that define a string as neither longer than the maximum length in positive examples, nor shorter than the minimum.</Paragraph> <Paragraph position="1"> A more general approach is adopted to define the distance between possible examples in the learning Space, applying a clustering method using positive examples as medoids s. The N nearest non-positive examples to each medoid can be selected as negative ones. Distance, in our case, must be defined as multidimensional due to the typology of occurring features. It is relatively easy to define distances between examples for word_X and lemma_X predicates, being 1 when X values are equal, and 0 otherwise. For isa_X predicates, the minimum of all possible conceptual distances (Agirre and Rigau, 1995) between X values in EWN has been used. Greater difficulty is encountered when defining a distance from a morpho-syntactic point of view (e.g., a pronoun seems to be closer to a noun than a verb). In (Turmo et al., 1999), the concept of 5-set has been presented as a syntactic relation generalization, and a distance measure has been based on this concept.</Paragraph> </Section> <Section position="2" start_page="116" end_page="117" type="sub_section"> <SectionTitle> 3.2 Creating pseudo-examples </SectionTitle> <Paragraph position="0"> A method has been used inspired by the generation of convex pseudo data (Breiman, 1998), in which a similar process to gene-combination in genetic algorithms is used.</Paragraph> <Paragraph position="1"> For each positive example c(A1,... ,An) 9 of concept c to be dealt with, an attribute vector is defined as</Paragraph> <Paragraph position="3"> where B1,..., Bn are the unrepeated terminal nodes from A1,..., An, context is the set of all predicates subsumed by the syntactico-semantic structure between the nearest positive example on the left and the nearest one on the right, and sem_XB~ is the list of isa_X and has_hypernym_X predicates for Bi.</Paragraph> <Paragraph position="4"> Then, for each example uncovered by the rule set learned by FOIL, a set of pseudo-examples is generated. A pseudo-example is built by combining both the uncovered example vector and a randomly selected covered one. This is done as follows: for each dimension, one of both possible values is randomly selected as value for the pseudo-example.</Paragraph> </Section> </Section> class="xml-element"></Paper>