XML Viewer - m92-1025

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/m92-1025_metho.xml
Size: 21,379 bytes
Last Modified: 2025-10-06 14:13:12
<?xml version="1.0" standalone="yes"?>
<Paper uid="M92-1025">
  <Title>BOMBING ACCOMPLISHED &amp;quot;BOMB&amp;quot; BOMB: &amp;quot;BOMB&amp;quot; TERRORIST ACT &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; POSSIBLE: &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; DESTROYED: &amp;quot;-&amp;quot; &amp;quot;BODYGUARDS&amp;quot; SECURITY GUARD : &amp;quot;BODYGUARDS&amp;quot; PLURAL: &amp;quot;BODYGUARDS&amp;quot;</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
INTRODUCTIO N
</SectionTitle>
    <Paragraph position="0"> The GE NLTooLsET aims at extracting and deriving useful information from text using a knowledge-based , domain-independent core of text processing tools, and customizing the existing programs to each new task .</Paragraph>
    <Paragraph position="1"> The program achieves this transportability by using a core knowledge base and lexicon that adapts easil y to new applications, along with a flexible text processing strategy that is tolerant of gaps in the program 's knowledge base .</Paragraph>
    <Paragraph position="2"> The language analysis strategy in the NLTooLsET uses fairly detailed, chart-style syntactic parsing guided by conceptual expectations . Domain-driven conceptual structures provide feedback in parsing, con tribute to scoring alternative interpretations, help recovery from failed parses, and tie together information across sentence boundaries. The interaction between linguistic and conceptual knowledge sources at the leve l of linguistic relations, called &amp;quot;relation-driven control&amp;quot; was added to the system in a first implementation be fore MUC-4 .</Paragraph>
    <Paragraph position="3"> In addition to flexible control, the design of the NLTooLsET allows each knowledge source to influenc e different stages of processing . For example, discourse processing starts before parsing, although many decisions about template merging and splitting are made after parsing . This allows context to guide language analysis, while language analysis still determines context .</Paragraph>
    <Paragraph position="4"> The NLTooLsET, now in Version 3 .0, has been developed and extended during the three years since th e MUCK-II evaluation . During this time, several person-years of development have gone into the system . The fundamental knowledge-based strategy has remained basically unchanged, but various modules have been extended and replaced, and new components have been added while the system has served as a testbed for a variety of experiments . The only new module added for MUC-4 was a mechanism for dealing with spatia l and temporal information ; most of the other improvements to the system were knowledge base extensions , enhancements to existing components, and bug fixes .</Paragraph>
    <Paragraph position="5"> The next section briefly describes the major portions of the NLTooLsET and its control flow ; the remainder of the paper will discuss the application of the Toolset to the MUC-4 task .</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="177" type="metho">
    <SectionTitle>
SYSTEM OVERVIEW
</SectionTitle>
    <Paragraph position="0"> Processing in the NLTooLsET divides roughly into three stages : (1) pre-processing, consisting mainly of a pattern matcher and discourse processing module, (2) linguistic analysis, including parsing and semanti c  interpretation, and (3) post-processing, or template filling . Each stage of analysis applies a combination o f linguistic, conceptual, and domain knowledge, as shown in Figure 1 .</Paragraph>
    <Paragraph position="1">  The pre-processor uses lexico-semantic patterns to perform some initial segmentation of the text, identifying phrases that are template activators, filtering out irrelevant text, combining and collapsing som e linguistic constructs, and marking portions of text that could describe discrete events . This component i s described in [1] . Linguistic analysis combines parsing and word sense-based semantic interpretation wit h domain-driven conceptual processing . The programs for linguistic analysis are largely those explained i n [2, 3]--the changes made for MUC-4 involved mainly some additional mechanisms for recovering from faile d processing and heavy pruning of spurious parses . Post-processing includes the final selection of template s and mapping semantic categories and roles onto those templates . This component used the basic elements from MUCK-II, adding a number of specialized rules for handling guerrilla warfare, types, and refines th e discourse structures to perform the template splitting and merging required for MUC-3 and MUC-4.</Paragraph>
    <Paragraph position="2"> The control flow of the system is primarily from linguistic analysis to conceptual interpretation to domai n interpretation, but there is substantial feedback from conceptual and domain interpretation to linguisti c analysis. The MUC-4 version of the Toolset includes a version of a strategy called relation-driven control, which helps to mediate between the various knowledge sources involved in interpretation . Basically, relation-driven control gives each linguistic relation in the text (such as subject-verb, verb-complement, or verb adjunct) a preference score based on its interpretation in context . Because these relations can apply to a great many different surface structures, relation-driven control provides a means of combining preference s without the tremendous combinatorics of scoring many complete parses . Effectively, relation-driven contro l permits a &amp;quot;beam&amp;quot; strategy for considering multiple interpretations without producing hundreds or thousand s of new paths through the linguistic chart .</Paragraph>
    <Paragraph position="3"> The knowledge base of the system, consisting of a feature and function (unification-style) grammar wit h associated linguistic relations, and a core sense-based lexicon, still proves transportable and largely generic . The core lexicon contains over 10,000 entries, of which 37 are restricted because of specialized usage in th e MUC-4 domain (such as device, which always means a bomb, and plant, which as a verb usually means to place a bomb and as a noun usually means the target of an attack) . The core grammar contains abou t 170 rules, with 50 relations and 80 additional subcategories . There were 23 MUC-specific additions to this grammatical knowledge base, including 8 grammar rules, most of them dealing with unusual noun phrase s that describe organizations in the corpus .</Paragraph>
    <Paragraph position="4"> The control, pre-processing, and transportable knowledge base were all extremely successful for MUC-4 ; remarkably, lexical and grammatical coverage, along with the associated problems in controlling search an d selecting among interpretations, proved not to be the major stumbling blocks for our system . While the program rarely produce an incorrect answer as a result of a sentence interpretation error, it frequently fail s  to distinguish multiple events, resolve vague or subtle references, and pick up subtle clues from non-ke y sentences. These are the major areas for future improvements in MUC-like tasks .</Paragraph>
  </Section>
  <Section position="5" start_page="177" end_page="177" type="metho">
    <SectionTitle>
ANALYSIS OF TST2-0048
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="177" end_page="177" type="sub_section">
      <SectionTitle>
Overview of Example
</SectionTitle>
      <Paragraph position="0"> TST2-0048 is faily representative of how the NLTooLSET performed on MUC-4 . The program successfully interpreted most of the key sentences but missed some references and failed to tie some additional informatio n in to the main event . As a result, it filled two templates for what should have been one event and misse d some additional fills . The program thus derived 53 slots out of a possible 52, with 34 correct, 19 missing , and 19 spurious for .65 recall, .64 precision, and .35 overgeneration . We made no special effort to adapt th e system or fix problems for this particular example ; in fact, we used TST2 as a &amp;quot;blind&amp;quot; test and did not d o any development on that set at all .</Paragraph>
      <Paragraph position="1"> Detail of Message Ru n This example is actually quite simple at the sentence level 1 : The sentences are fairly short and grammatical , especially when compared to some of the convoluted propaganda stories, and TRUMP had no real problems with them . The story is difficult from a discourse perspective, because it returns to the main event (th e attack on Alvarado) essentially without any cue after describing a background event (the attack on Merino' s home). In addition, the story is difficult and a bit unusual in the implicit information that is captured in the answer key--that the seven children, because they were home when Merino's house was attacked, ar e targets . Most of the difference between our system's response and the correct templates was due to thes e two story-level problems .</Paragraph>
      <Paragraph position="2"> The program made one or two other minor mistakes ; for example, it was penalized for filling in &amp;quot;INDIVIDUAL&amp;quot; as a perpetrator (from the phrase AN INDIVIDUAL PLACED A BOMB ON THE ROOF OF THE ARMORED VEHICLE), an apparently correct fill that could have been resolved to &amp;quot;URBAN GUER-RILLAS&amp;quot;. It missed the SOME DAMAGE effect for the vehicle, which should have been inferred from th e fact that the story later says the roof of the vehicle collapsed .</Paragraph>
      <Paragraph position="3"> The system correctly parsed most of the main sentences, correctly linked the accusation in the firs t sentence to the murder of the Attorney General in the same sentence, and correctly separated the secon d event, which was distinguished by the temporal expression 5 days ago .</Paragraph>
      <Paragraph position="4"> As explained earlier, the Toolset uses pattern matching for pre-processing, followed by discourse processing, parsing and semantic interpretation, and finally template-filling . The pre-processor in this example filters out most of the irrelevant sentences (and, in this case, two of the relevant ones), recognizes mos t of the compound names (e .g. SALVADORAN PRESIDENT-ELECT ALFREDO CRISTIANI and AT-</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="177" end_page="179" type="metho">
    <SectionTitle>
TORNEY GENERAL ROBERTO GARCIA ALVARADO) . The pre-processor marks phrases that activate
</SectionTitle>
    <Paragraph position="0"> templates (such as A BOMB PLACED and CLAIMED CREDIT), brackets out phrases like source an d location (ACCORDING TO CRISTIANI and IN DOWNTOWN SAN SALVADOR), and tags a few word s with part-of-speech to help the parser (e .g. auxiliaries (HAS), complementizers (THAT), and certain verb s following &amp;quot;to&amp;quot; (COLLAPSE)) .</Paragraph>
    <Paragraph position="1"> The last stage of pre-processing is a discourse processing module, which attempts a preliminary segmentation of the input story using temporal, spatial, and other cues, event types, and looking for certain definit e and indefinite descriptions of events . In this case, the module identifies five potential segments. The first three turn out to be different descriptions of the same event (the killing of Alvarado), but they are late r correctly merged into one template . The fourth segment is correctly identified as a new event (the attack o n Merino's home). The fifth segment (describing the injury to Alvarado's bodyguards) is correctly treated a s a new description, but is never identified as being part of the same event as the attack on Alvarado .</Paragraph>
    <Paragraph position="2"> Linguistic analysis parses each sentence and produces (possibly alternative) semantic interpretations a t the sentence level . These interpretations select word senses and roles, heavily favoring domain-specific senses . The parser did fail in one important sentence in TST2-0048 : In the sentence &amp;quot;A 15-YEAR-OLD NIECE O F I See Appendix F for the text and answer templates for the example .</Paragraph>
  </Section>
  <Section position="7" start_page="179" end_page="183" type="metho">
    <SectionTitle>
MERINO 'S WAS INJURED &amp;quot; , it could not parse the apostrophe-s construct . This was a harmless failur e
</SectionTitle>
    <Paragraph position="0"> because it occurs between a noun phrase and a verb phrase, and one of the parser's recovery strategie s attaches any remaining compatible fragments that will contribute to a template fill.</Paragraph>
    <Paragraph position="1"> The interpretation of each sentence is interleaved with domain-driven analysis . The conceptual analyzer, TRUMPET, takes the results of interpreting each phrase and tries to map them onto domain-base d expectations, determining, for example, the appropriate role for the FMLN in &amp;quot;ACCUSED THE FMLN&amp;quot; a s well as associating &amp;quot;support&amp;quot; events (such as accusations and effects) with main events (such as attacks o r bombings). Because the discourse pre-processing module is prone to error, TRUMPET has begun to play a major role in resolving references as well as in guiding semantic interpretation .</Paragraph>
    <Paragraph position="2"> Post-processing maps the semantic interpretations onto templates, eliminating invalid fills (in this cas e none), combining certain multiple references (in the attack on Alvarado), and &amp;quot;cleaning up&amp;quot; the final output . Interpretation of Key Sentence s The TRUMP parser of the NLTooLSET successfully parsed and interpreted the first sentence (Si) and correctly applied conjunction reduction to get Cristiani as the accuser and get the `&amp;quot;SUSPECTED OR AC-CUSED BY AUTHORITIES&amp;quot; fill. Embedded clauses are typically handled in much the same way as main clauses, except that the main clauses often add information about the CONFIDENCE slot . The syste m correctly treats the main event and the accusing as a single event, in spite of ignoring the definite referenc e &amp;quot;THE CRIME&amp;quot; . In our system, linking an accusation (C-BLAME-TEMPLATE in the output below) to an event is the default .</Paragraph>
    <Paragraph position="3"> The following is the pre-processed input and final sentence-level interpretation of Si :  The next set of examples sentences (S11-13) are more difficult . There was one parser failure, with a successful recovery . As we have mentioned, we correctly identify this as a new event based on tempora l information, but filter out S12 because it has no explicit event reference . This is not a bug--this sort of implicit target description is fairly infrequent, so we chose not to address it at this stage .</Paragraph>
    <Paragraph position="5"> The system filters S21 (this is an omission, because &amp;quot;ESCAPED UNSCATHED&amp;quot; should be recognize d as an effect), but successfully interprets S22 and resolves &amp;quot;ONE OF THEM&amp;quot; to &amp;quot;BODYGUARDS&amp;quot; . Note that it is the pronoun &amp;quot;THEM&amp;quot;, not &amp;quot;ONE&amp;quot;, that gets resolved, using a simple reference resolution heuristi c that looks for the most recent syntactically and semantically compatible noun phrase . However, this action results in a penalty rather than a reward because the system does not tie the injury to the attack on Alvarado at the beginning of the story.</Paragraph>
    <Paragraph position="6">  case preceded by %, and blank slot (-) fills have been deleted to save space) .</Paragraph>
    <Paragraph position="7"> 0. MESSAGE: ID TST2-MUC4-0048 1. MESSAGE: TEMPLATE 1 2. INCIDENT: DATE - 19 APR 89 3. INCIDENT: LOCATION EL SALVADOR : SAI SALVADOR (CITY ) 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ACCOMPLISHED 6. INCIDENT: INSTRUMENT ID &amp;quot;BOMB&amp;quot; 7. INCIDENT: INSTRUMENT TYPE BOMB: &amp;quot;BOMB&amp;quot; 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID &amp;quot;URBAN GUERRILLAS&amp;quot; &amp;quot;INDIVIDUAL&amp;quot; % spurious fill 10. PERP: ORGANIZATION ID &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 11. PERP: ORGANIZATION.CONFIDENCE SUSPECTED OR ACCUSED BY AUTHORITIES : &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 12. PRYS TGT: ID &amp;quot;HIS VEHICLE&amp;quot; 13. PRYS TGT: TYPE TRANSPORT VEHICLE: &amp;quot;HIS VEHICLE&amp;quot; 14. PRYS TGT: NUMBER 1: &amp;quot;HIS VEHICLE&amp;quot; 16. PHYS TGT: EFFECT OF INCIDENT - I missed SOME DAMAGE : &amp;quot;HIS VEHICLE&amp;quot; 18. HUM TGT: LAME &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 19. HUM TGT: DESCRIPTION &amp;quot;ATTORNEY GENERAL&amp;quot; : &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; % missed fills DRIVER % missed fills BODYGUARDS &amp;quot;ATTORNEY GENERAL&amp;quot; % spurious fill 20. HUM TGT: TYPE GOVERNMENT OFFICIAL : &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot;  O. MESSAGE: ID TST2-MUC4-0048 1. MESSAGE: TEMPLATE 2 2. INCIDENT: DATE 14 APR 89 3. INCIDENT: LOCATION EL SALVADOR: SAI SALVADOR (CITY) 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ACCOMPLISHED 6. INCIDENT: INSTRUMENT ID &amp;quot;EXPLOSIVES&amp;quot; 7. INCIDENT: INSTRUMENT TYPE EXPLOSIVE : &amp;quot;EXPLOSIVES&amp;quot; 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID &amp;quot;GUERRILLAS&amp;quot; 10. PERP: ORGANIZATION ID &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 11. PERP: ORGANIZATION CONFIDENCE SUSPECTED OR ACCUSED BY AUTHORITIES: &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 12. PHYS TGT: ID &amp;quot;MERINO'S HOME&amp;quot; 13. PHYS TGT: TYPE CIVILIAN RESIDENCE: &amp;quot;MERINO'S HOME&amp;quot; % missed type should be GOVERNMENT OFFICE OR RESIDENn 14. PHYS TGT: NUMBER 1: &amp;quot;MERINO'S HOME&amp;quot; 19. HUM TGT : DESCRIPTION &amp;quot;NIECE OF MERINO&amp;quot;  1 completely spurious template; should have been merged with template 1 20. HUM TGT: TYPE 21. HUM TCT: LUMBER 23. HUM TGT : EFFECT OF INCIDENT 24. HUM TGT : TOTAL NUMBER 0. MESSAGE : ID 1. MESSAGE : TEMPLATE 2. INCIDENT: DATE 3. INCIDENT: LOCATION 4. INCIDENT: TYPE 6. INCIDENT : STAGE OF EXECUTION 6. INCIDENT: INSTRUMENT ID 7. INCIDENT: INSTRUMENT TYPE 8. PERP: INCIDENT CATEGORY 10. PEEP: ORGANIZATION ID 11. PERP: ORGANIZATION CONFIDENCE 16. PHYS TOT : EFFECT OF INCIDENT 19. HUM TGT: DESCRIPTION 20. HUM TGT: TYPE 21. HUM TGT: NUMBER 23. HUM TGT: EFFECT OF INCIDENT  Some of the missing information in the response template comes from failing to tie information in t o the main event or failing to recover implicit information . This is the case with the damage to the vehicle , which is described in passing, the children who were in Merino's home, and the driver who escaped unscathed . Almost all the rest of the departures owe to some aspect of reference resolution--from failing to recognize th e injury to the bodyguards as part of Alvarado 's murder, to the extra fills &amp;quot;INDIVIDUAL&amp;quot; and &amp;quot;ATTORNE Y GENERAL&amp;quot; that were co-referential with others. One of these turned out to be a simple bug, in that the title &amp;quot;ATTORNEY GENERAL&amp;quot; in our system was interpreted as a different type (GOVERNMENT OFFICIAL ) from the noun phrase &amp;quot;ATTORNEY GENERAL&amp;quot; (LEGAL OR JUDICIAL) ; thus the system failed to unify the references. However, the general problem of reference resolution is certainly one of the main areas wher e future progress can come .</Paragraph>
    <Paragraph position="8"> The other illustrative problem with this example is the degree to which relatively inconsequential fact s can be pieced together into an interpretation . There is no theoretical reason why our system didn't know about different forms of damage to vehicles, but we certainly wouldn 't want to spend a lot of time encoding this sort of knowledge . This turned out to be a rather tedious part of the MUC task. We did go so far as to have template filling heuristics, for example, that tell the system: (1) When vehicles explode near buildings , it is the buildings and not the vehicles that are the targets, (2) When parts of buildings are destroyed o r damaged (e.g. &amp;quot;the bomb shattered windows&amp;quot;) this means that the buildings sustained some damage, an d 18 4 (3) When body parts are damaged (e .g. &amp;quot;the bomb destroyed his head&amp;quot;), it is the owner of the body part s that is affected . However, such rules only scratch the surface of the reasoning that contributes to templat e filling.</Paragraph>
    <Paragraph position="9"> While the reference resolution problem is quite general and very interesting from a research perspective , the reasoning problem seems more MUC-specific, and it's hard to separate general reasoning issues from th e peculiar details of the fill rules .</Paragraph>
    <Paragraph position="10"> Aside from these problems, our system performed pretty well on this example, as for MUC on the whole . The recall and precision for this message were both over .60, with the program recovering most of the information from the text . As is typical from our MUC experience, the local processing of sentences was very accurate and complete, while the general handling of story level details and template filling had som e loose ends .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML