File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1028_metho.xml

Size: 25,534 bytes

Last Modified: 2025-10-06 14:08:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1028">
  <Title>Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
0 MESSAGE: ID TST3-MUC4-0014
1 MESSAGE: TEMPLATE 1
2 INCIDENT: DATE 19-JAN-89
3 INCIDENT: LOCATION PERU: SAN JUAN BAUTISTA
(MUNICIPALITY)
4 INCIDENT: TYPE BOMBING
5 INCIDENT: STAGE OF EXECUTION ACCOMPLISHED
6 INCIDENT: INSTRUMENT ID &amp;quot;BOMB&amp;quot;
7 INCIDENT: INSTRUMENT TYPE BOMB:&amp;quot;BOMB&amp;quot;
8 PERP: INCIDENT CATEGORY TERRORIST ACT
9 PERP: INDIVIDUAL ID &amp;quot;SHINING PATH MEMBERS&amp;quot;
10 PERP: ORGANIZATION ID &amp;quot;SHINING PATH&amp;quot;
11 PERP: ORGANIZATION SUSPECTED OR ACCUSED BY
CONFIDENCE AUTHORITIES:&amp;quot;SHINING PATH&amp;quot;
12 PHYS TGT: ID -
13 PHYS TGT: TYPE -
14 PHYS TGT: NUMBER -
15 PHYS TGT: FOREIGN NATION -
</SectionTitle>
    <Paragraph position="0"> proach, where manually engineered rules were used for IE. More recently, machine learning approaches have been used for IE from semi-structured texts (Califf and Mooney, 1999; Soderland, 1999; Roth and Yih, 2001; Ciravegna, 2001; Chieu and Ng, 2002a), named entity extraction (Chieu and Ng, 2002b), template element extraction, and template relation extraction (Miller et al., 1998). These machine learning approaches have been successful for these tasks, achieving accuracy comparable to the knowledge-engineering approach.</Paragraph>
    <Paragraph position="1"> However, for the full-scale ST task of generic IE from free texts, the best reported method to date is still the knowledge-engineering approach. For example, almost all participating IE systems in MUC used the knowledge-engineering approach for the full-scale ST task. The one notable exception is the work of UMass at MUC-6 (Fisher et al., 1995).</Paragraph>
    <Paragraph position="2"> Unfortunately, their learning approach did considerably worse than the best MUC-6 systems. Soderland (1999) and Chieu and Ng (2002a) attempted machine learning approaches for a scaled-down version of the ST task, where it was assumed that the information needed to fill one template came from one sentence only.</Paragraph>
    <Paragraph position="3"> In this paper, we present a learning approach to the full-scale ST task of extracting information from free texts. The task we tackle is considerably more complex than that of (Soderland, 1999; Chieu and Ng, 2002a), since we need to deal with merging information from multiple sentences to fill one template. We evaluated our learning approach on the MUC-4 task of extracting terrorist events from free texts. We chose the MUC-4 task since manually prepared templates required for training are available.1 When trained and tested on the official benchmark data of MUC-4, our learning approach achieves accuracy competitive with the best MUC-4 systems, which were all built using manually engineered rules. To our knowledge, our work is the first learning-based approach to have achieved performance competitive with the knowledge-engineering approach on the full-scale ST task.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Task Definition
</SectionTitle>
    <Paragraph position="0"> The task addressed in this paper is the Scenario Template (ST) task defined in the Fourth Message Understanding Conference (MUC-4).2 The objective of this task is to extract information on terrorist events occurring in Latin American countries from free text documents. For example, given the input document in Figure 1, an IE system is to extract information items related to any terrorist events to fill zero or more database records, or templates. Each distinct terrorist event is to fill one template. An example of an output template is shown in Figure 2. Each of the 25 fields in the template is called a slot, and the string or value that fills a slot is called a slot fill.</Paragraph>
    <Paragraph position="1"> Different slots in the MUC-4 template need to be treated differently. Besides slot 0 (MESSAGE: ID) and slot 1 (MESSAGE: TEMPLATE), the other 23 slots have to be extracted or inferred from the text document. These slots can be divided into the following categories: String Slots. These slots are filled using strings extracted directly from the text document (slot 6, 9, 10, 12, 18, 19).</Paragraph>
    <Paragraph position="2"> Text Conversion Slots. These slots have to be inferred from strings in the document (slot 2, 14, 17, 21, 24). For example, INCIDENT: DATE has to be inferred from temporal expressions such as &amp;quot;TO- null and MUC-7, when other subtasks like NE and TE tasks were defined. Here, we adopted this terminology also in describing the full-scale IE task for MUC-4.</Paragraph>
    <Paragraph position="3"> Figure 3: ALICE: our information extraction system DAY&amp;quot;, &amp;quot;LAST WEEK&amp;quot;, etc.</Paragraph>
    <Paragraph position="4"> Set Fill Slots. This category includes the rest of the slots. The value of a set fill slot comes from a finite set of possible values. They often have to be inferred from the document.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Learning Approach
</SectionTitle>
    <Paragraph position="0"> Our supervised learning approach is illustrated in  requires manually extracted templates paired with their corresponding documents that contain terrorist events for training. After the training phase, ALICE is then able to extract relevant templates from new documents, using the model learnt during training. In the training phase, each input training document is first preprocessed through a chain of preprocessing modules. The outcome of the preprocessing is a full parse tree for each sentence, and coreference chains linking various coreferring noun phrases both within and across sentences. The core of ALICE uses supervised learning to build one classifier for each string slot. The candidates to fill a template slot are base (non-recursive) noun phrases. A noun phrase a0a2a1 that occurs in a training document a3 and fills a template slot a4 is used to generate one positive training example for the classifier of slot a4 . Other noun phrases in the training document a3 are negative training examples for the classifier of slot a4 . The features of a training example generated from a0a5a1 are the verbs and other noun phrases (serving roles like agent and patient) related to a0a5a1 in the same sentence, as well as similar features for coreferring noun phrases of a0a5a1 . Thus, our features for a template slot classifier encode semantic (agent and patient roles) and discourse (coreference) information.</Paragraph>
    <Paragraph position="1"> Our experimental results in this paper demonstrate that such features are effective in learning what to fill a template slot.</Paragraph>
    <Paragraph position="2"> During testing, a new document is preprocessed through the same chain of preprocessing modules.</Paragraph>
    <Paragraph position="3"> Each candidate noun phrase a0a5a1 generates one test example, and it is presented to the classifier of a template slot a4 to determine whether a0a5a1 fills the slot a4 . A separate template manager decides whether a new template should be created to include slot a4 , or slot a4 should fill the existing template.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Preprocessing
</SectionTitle>
      <Paragraph position="0"> All the preprocessing modules of ALICE were built with supervised learning techniques. They include sentence segmentation (Ratnaparkhi, 1998), part-of-speech tagging (Charniak et al., 1993), named entity recognition (Chieu and Ng, 2002b), full parsing (Collins, 1999), and coreference resolution (Soon et al., 2001). Each module performs at or near state-of-the-art accuracy, but errors are unavoidable, and later modules in the preprocessing chain have to deal with errors made by the previous modules.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Features in Training and Test Examples
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, the features of an example are generated based on a base noun phrase (denoted as baseNP), which is a candidate for filling a template slot. While most strings that fill a string slot are base noun phrases, this is not always the case. For instance, consider the two examples in Figure 4. In the first example, &amp;quot;BOMB&amp;quot; should fill the string slot IN-CIDENT: INSTRUMENT ID, while in the second example, &amp;quot;FMLN&amp;quot; should fill the string slot PERP: ORGANIZATION ID. However, &amp;quot;BOMB&amp;quot; is itself not a baseNP (the baseNP is &amp;quot;A BOMB EXPLO-SION&amp;quot;). Similarly for &amp;quot;FMLN&amp;quot;.</Paragraph>
      <Paragraph position="1"> As such, a string that fills a template slot but is itself not a baseNP (like &amp;quot;BOMB&amp;quot;) is also used to generate a training example, by using its smallest encompassing noun phrase (like &amp;quot;A BOMB EXPLO- null (1) ONE PERSON WAS KILLED TONIGHT AS THE RE-SULT OF A BOMB EXPLOSION IN SAN SALVADOR.</Paragraph>
      <Paragraph position="2"> (2) FORTUNATELY, NO CASUALTIES WERE REPORTED</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
AS A RESULT OF THIS INCIDENT, FOR WHICH THE
FMLN GUERRILLAS ARE BEING HELD RESPONSIBLE.
</SectionTitle>
    <Paragraph position="0"> not be filled by baseNPs.</Paragraph>
    <Paragraph position="1"> (1) MEMBERS OF THAT SECURITY GROUP ARE COMB-</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ING THE AREA TO DETERMINE THE FINAL OUTCOME
OF THE FIGHTING.
(2) A BOMB WAS THROWN AT THE HOUSE OF FRE-
DEMO CANDIDATE FOR DEPUTY MIGUEL ANGEL
BARTRA BY TERRORISTS.
</SectionTitle>
    <Paragraph position="0"> SION&amp;quot;) to generate the training example features.</Paragraph>
    <Paragraph position="1"> During training, a list of such words is compiled for slots 6 and 10 from the training templates. During testing, these words are also used as candidates for generating test examples for slots 6 and 10, in addition to base NPs.</Paragraph>
    <Paragraph position="2"> The features of an example are derived from the treebank-style parse tree output by an implementation of Collins' parser (Collins, 1999). In particular, we traverse the full parse tree to determine the verbs, agents, patients, and indirect objects related to a noun phrase candidate a0a5a1 . While a machine learning approach is used in (Gildea and Jurafsky, 2000) to determine general semantic roles, we used a simple rule-based traversal of the parse tree instead, which could also reliably determine the generic agent and patient role of a sentence, and this suffices for our current purpose.</Paragraph>
    <Paragraph position="3"> Specifically, for a given noun phrase candidate a0a5a1 , the following groups of features are used: Verb of Agent NP (VAg) When a0a2a1 is an agent in a sentence, each of its associated verbs is a VAg feature. For example, in sentence (1) of Figure 5, if a0a5a1 is MEMBERS, then its VAg features are COMB and DETERMINE.</Paragraph>
    <Paragraph position="4"> Verb of Patient NP (VPa) When a0a2a1 is a patient in a sentence, each of its associated verbs is a VPa feature. For example, in sentence (2) of Figure 5, if a0a5a1 is BOMB, then its VPa feature is THROW.</Paragraph>
    <Paragraph position="5"> Verb-Preposition of NP-in-PP (V-Prep) When a0a5a1 is the NP in a prepositional phrase PP, then this feature is the main verb and the preposition of PP.</Paragraph>
    <Paragraph position="6"> For example, in sentence (2) of Figure 5, if a0a5a1 is HOUSE, its V-Prep feature is THROW-AT.</Paragraph>
    <Paragraph position="7"> VPa and related NPs/PPs (VPaRel) If a0a2a1 is a patient in a sentence, each of its VPa may have its own agents (Ag) and prepositional phrases (Prep-NP). In this case, the tuples (VPa, Ag) and (VPa, Prep-NP) are used as features. For example, in</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
&amp;quot;GUARDS WERE SHOT TO DEATH&amp;quot;, if a0a5a1 is
</SectionTitle>
    <Paragraph position="0"> GUARDS, then its VPa SHOOT, and the prepositional phrase TO-DEATH form the feature (SHOOT, TO-DEATH).</Paragraph>
    <Paragraph position="1"> VAg and related NPs/PPs (VAgRel) This is similar to VPa above, but for VAg.</Paragraph>
    <Paragraph position="2"> V-Prep and related NPs (V-PrepRel) When a0a5a1 is the NP in a prepositional phrase PP, then the main verb (V) may have its own agents (Ag) and patients (Pa). In this case, the tuples (Ag, V-Prep) and (V-Prep, Pa) are used as features. For example,  used as a feature. In a parse tree, there is a head word at each tree node. In cases where a phrase does not fit into a parse tree node, the last word of the phrase is used as the head word. This feature is useful as the system has no information of the semantic class of a0a5a1 . From the head word, the system can get some clue to help decide if a0a5a1 is a possible candidate for a slot. For example, an a0a2a1 with head word PEASANT is more likely to fill the human target slot compared to another a0a5a1 with head word CLASH.</Paragraph>
    <Paragraph position="3"> Named Entity Class (NE) The named entity class of a0a5a1 is used as a feature.</Paragraph>
    <Paragraph position="4"> Real Head (RH) For a phrase that does not fit into a parse node, the head word feature is taken to be the last word of the phrase. The real head word of its encompassing parse node is used as another feature. For example, in the NP &amp;quot;FMLN GUERRILLAS&amp;quot;, &amp;quot;FMLN&amp;quot; is a positive example for slot 10, with head word &amp;quot;FMLN&amp;quot; and real head &amp;quot;GUERRILLA&amp;quot;. Coreference features Coreference chains found by our coreference resolution module based on decision tree learning are used to determine the noun phrases that corefer with a0a5a1 . In particular, we use the two noun phrases a0a5a1 a0 a1 and a0a2a1a3a2 a1 , where a0a5a1 a0 a1 (a0a5a1a4a2 a1 ) is the noun phrase that corefers with a0a5a1 and immediately precedes (follows) a0a2a1 . If such a preceding (or following) noun phrase a0a5a1a4a5 exists, we generate the following features based on a0a5a1 a5 : VAg, VPa, and N-Prep.</Paragraph>
    <Paragraph position="5"> To give an idea of the informative features used in the classifier of a slot, we rank the features used for a slot classifier according to their correlation metric values (Chieu and Ng, 2002a), where informative features are ranked higher. Table 1 shows the top-ranking features for a few feature groups and template slots. The bracketed number behind each feature indicates the rank of this feature for that slot classifier, ordered by the correlation metric value.</Paragraph>
    <Paragraph position="6"> We observed that certain feature groups are more useful for certain slots. For example, DIE is the top VAg verb for the human target slot, and is ranked 12 among all features used for the human target slot.</Paragraph>
    <Paragraph position="7"> On the other hand, VAg is so unimportant for the physical target slot that the top VAg verb is due to a preprocessing error that made MONSERRAT a verb.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Supervised Learning Algorithms
</SectionTitle>
      <Paragraph position="0"> We evaluated four supervised learning algorithms.</Paragraph>
      <Paragraph position="1"> Maximum Entropy Classifier (Alice-ME) The maximum entropy (ME) framework is a recent learning approach which has been successfully used in various NLP tasks such as sentence segmentation, part-of-speech tagging, and parsing (Ratnaparkhi, 1998). However, to our knowledge, ours is the first research effort to have applied ME learning to the full-scale ST task. We used the implementation of maximum entropy modeling from the opennlp.maxent package.3.</Paragraph>
      <Paragraph position="2"> Support Vector Machine (Alice-SVM) The Support Vector Machine (SVM) (Vapnik, 1995) has been successfully used in many recent applications such as text categorization and handwritten digit recognition. The learning algorithm finds a hyper-plane that separates the training data with the largest margin. We used a linear kernel for all our experiments. null  test example to the class which has the highest posterior probability. Add-one smoothing was used. Decision Tree (Alice-DT) The decision tree (DT) algorithm (Quinlan, 1993) partitions training examples using the feature with the highest information gain. It repeats this process recursively for each partition until all examples in each partition belong to one class.</Paragraph>
      <Paragraph position="3"> We used the WEKA package4 for the implementation of SVM, NB, and DT algorithms.</Paragraph>
      <Paragraph position="4"> A feature cutoff a6 is used for each algorithm: features occurring less than a6 times are rejected. For all experiments, a6 is set to 3. For ME and SVM, no other feature selection is applied. For NB and DT, the top 100 features as determined by chi-square are selected. While not trying to do a serious comparison of machine learning algorithms, ME and SVM seem to be able to perform well without feature selection, whereas NB and DT require some form of feature selection in order to perform reasonably well.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Template Manager
</SectionTitle>
      <Paragraph position="0"> As each sentence is processed, phrases classified as positive for any of the string slots are sent to the Template Manager (TM), which will decide if a new template should be created when it receives a new slot fill.</Paragraph>
      <Paragraph position="1"> The system first attempts to attach a date and a location to each slot fill a0a5a1 . Dates and locations are first attached to their syntactically nearest verb, by traversing the parse tree. Then, for each string fill a0a5a1 , we search its syntactically nearest verb a7 in the same manner and assign the date and location attached to a7 to a0a5a1 .</Paragraph>
      <Paragraph position="2"> When a new slot fill is found, the Template Manager will decide to start a new template if one of the following conditions is true: Date The date attached to the current slot fill is different from the date of the current template.</Paragraph>
      <Paragraph position="3">  This is determined by using location lists provided by the MUC-4 conference, which specify whether one location is contained in another. An entry in this list has the format of &amp;quot;PLACE-NAME1:PLACE-NAME2&amp;quot;, where PLACE-NAME2 is contained in PLACE-NAME1 (e.g., CUBA: HAVANA (CITY)).</Paragraph>
      <Paragraph position="4"> Seed Word The sentence of the current slot fill contains a seed word for a different incident type. A number of seed words are automatically learned for each of the incident types ATTACK, BOMBING, and KIDNAPPING. They are automatically derived based on the correlation metric value used in (Chieu and Ng, 2002a). For the remaining incident types, there are too few incidents in the training data for seed words to be collected. The seeds words used are shown in Table 2.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.5 Enriching Templates
</SectionTitle>
      <Paragraph position="0"> In the last stage before output, the template content is further enriched in the following manner: Removal of redundant slot fills For each slot in the template, there might be several slot fills referring to the same thing. For example, for HUM TGT: DESCRIPTION, the system might have found both &amp;quot;PRIESTS&amp;quot; and &amp;quot;JESUIT PRIESTS&amp;quot;. A slot fill that is a substring of another slot fill will be removed from the template.</Paragraph>
      <Paragraph position="1"> Effect/Confidence and Type Classifiers are also trained for effect and confidence slots 11, 16, and 23 (ES slots), as well as type slots 7, 13, and 20 (TS slots). ES slots used exactly the same features as string slots, while TS slots used only head words and adjectives as features. For such slots, each entry refers to another slot fill. For example, slot 23 may contain the entry &amp;quot;DEATH&amp;quot; : &amp;quot;PRIESTS&amp;quot;, where &amp;quot;PRIESTS&amp;quot; fills slot 19. During training, each training example is a fill of a reference slot (e.g., for slot 23, the reference slots are slot 18 and 19). For slot 23, for example, each instance will have a class such as DEATH or INJURY, or if there is no entry in slot 23, UNKNOWN EFFECT. During testing, slot fills of reference slots will be classified to determine if they should have an entry in an ES or a TS slot.</Paragraph>
      <Paragraph position="2"> Date and Location. If the system is unable to fill the DATE or LOCATION slot of a template, it will use as default value the date and country of the city in the dateline of the document.</Paragraph>
      <Paragraph position="3"> Other Slots. The remaining slots are filled with default values. For example, slot 5 has the default value &amp;quot;ACCOMPLISHED&amp;quot;, and slot 8 &amp;quot;TERROR-IST ACT&amp;quot; (except when the perpetrator contains strings such as &amp;quot;GOVERNMENT&amp;quot;, in which case it will be changed to &amp;quot;STATE-SPONSORED VIO-LENCE&amp;quot;). Slot 15, 17, 22, and 24 are always left unfilled.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> There are 1,300 training documents, of which 700 are relevant (i.e., have one or more event templates).</Paragraph>
    <Paragraph position="1"> There are two official test sets, i.e., TST3 and TST4, containing 100 documents each. We trained our system ALICE using the 700 documents with relevant templates, and then tested it on the two official test sets. The output templates were scored using the scorer provided on the official website.</Paragraph>
    <Paragraph position="2"> The accuracy figures of ALICE (with different learning algorithms) on string slots and all slots are listed in Table 3 and Table 4, respectively. Accuracy is measured in terms of recall (R), precision (P), and F-measure (F). We also list in the two tables the accuracy figures of the top 7 (out of a total of 17) systems that participated in MUC-4. The accuracy figures in the two tables are obtained by running the official scorer on the output templates of ALICE, and those of the MUC-4 participating systems (available  on the official web site). The same history file downloaded from the official web site is uniformly used for scoring the output templates of all systems (the history file contains the arbitration decisions for ambiguous cases).</Paragraph>
    <Paragraph position="3"> We conducted statistical significance test, using the approximate randomization method adopted in MUC-4. Table 5 shows the systems that are not significantly different from Alice-ME.</Paragraph>
    <Paragraph position="4"> Our system ALICE-ME, using a learning approach, is able to achieve accuracy competitive to the best of the MUC-4 participating systems, which were all built using manually engineered rules. We also observed that ME and SVM, the more recent machine learning algorithms, performed better than DT and NB.</Paragraph>
    <Paragraph position="5"> Full Parsing. To illustrate the benefit of full parsing, we conducted experiments using a subset of features, with and without full parsing. We used ME as the learning algorithm in these experiments. The results on string slots are summarized in Table 6. The  baseline system used only two features, head word (H) and named entity class (NE). Next, we added three features, VAg, VPa, and V-Prep. Without full parsing, these verbs were obtained based on the immediately preceding (or following) verb of a noun phrase, and the voice of the verb. With full parsing, these verbs were obtained based on traversing the full parse tree. The results indicate that verb features contribute to the performance of the system, even without full parsing. With full parsing, verbs can be determined more accurately, leading to better overall performance.</Paragraph>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> Although the best MUC-4 participating systems, GE/GE-CMU, still outperform ALICE-ME, it must be noted that for GE, &amp;quot;10 1/2 person months&amp;quot; were spent on MUC-4 using the GE NLTOOLSET , after spending &amp;quot;15 person months&amp;quot; on MUC-3 (Rau et al., 1992). With a learning approach, IE systems are more portable across domains.</Paragraph>
    <Paragraph position="1"> Not all occurrences of a string in a document that match a slot fill of a template provide good positive training examples. For example, in the same document, there might be the following sentences &amp;quot;THE</Paragraph>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
MNR REPORTS THE KIDNAPPING OF OQUELI
</SectionTitle>
    <Paragraph position="0"> COLINDRES...&amp;quot;, followed by &amp;quot;OQUELI COLIN-</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML