File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/m92-1035_metho.xml

Size: 17,712 bytes

Last Modified: 2025-10-06 14:13:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="M92-1035">
  <Title>Sl: SALVADORAN PRESIDENT-ELECT ALFREDO CRISTIAII CONDEMNED THE TERRORIST KILLING OF ATTORNEY</Title>
  <Section position="3" start_page="0" end_page="265" type="metho">
    <SectionTitle>
MUC-4 SYSTEM ARCHITECTUR E
</SectionTitle>
    <Paragraph position="0"> SRA's system as used for MUC-4 consists of the core NLP system SOLOMON, the Message Zoner, an d Extract, as shown in Figure 1. SOLOMON consists of 5 processing modules : Preprocessing, Syntax, Semantics, Discourse and Pragmatics modules . The data SOLOMON used for MUC-4 consists of the lexicons, the grammar, the patterns, and the knowledge bases . In order to handle MUC-4 messages, the Messag e Zoner and the Pragmatics module were significantly extended, and the MUC-4 specific lexicons and knowledge bases were added to the existing data. In the following, each of the modules is explained along wit h examples from message TST2-MUC4-0048 .</Paragraph>
    <Section position="1" start_page="0" end_page="259" type="sub_section">
      <SectionTitle>
Message Zoner
</SectionTitle>
      <Paragraph position="0"> The Message Zoner is the entry point for text into the MUC-4 Data Extraction system . It parses the free text areas of the incoming message into sections, tables, itemized lists, paragraphs, sentences, and individua l tokens. This processing is domain-independent. The Zoner also parses the formatted header information for the particular message type . The Zoner's output is a canonical structure that we use for all of our projects , including projects which deal with non-English texts . Only paragraphs that contain certain MUC-specifi c keywords are processed by SOLOMON .</Paragraph>
    </Section>
    <Section position="2" start_page="259" end_page="259" type="sub_section">
      <SectionTitle>
Preprocessing
</SectionTitle>
      <Paragraph position="0"> The Preprocessing module performs word- and phrase-level analyses of input sentences . Since there are three types of lexicons, namely, the domain lexicons, the core lexicon and the &amp;quot;shallow&amp;quot; lexicon derived from a large corpus (i.e. the Dow Jones corpus from the Penn Treebank), when there is more than one entry wit h the same category for a word, the entry from the more specific lexicon is preferred .</Paragraph>
      <Paragraph position="1"> In addition to regular lexical lookup and morphological analysis, the Preprocessing module uses variou s patterns to recognize productive multiwords and complex phrases like dates, personal names, organizatio n names, locations, and so on . Also, it performs acronym absorption, where an acronym after a proper nou n like &amp;quot;FARABUNDO MARTI NATIONAL LIBERATION FRONT (FMLN)&amp;quot; is removed from the output of preprocessing and learned by the system . The next time that acronym appears in isolation, preprocessin g will understand that it has the same meaning as the original proper noun . Spelling correction and unknown word handling based on morphological endings are also performed .</Paragraph>
    </Section>
    <Section position="3" start_page="259" end_page="259" type="sub_section">
      <SectionTitle>
Name Recognition
</SectionTitle>
      <Paragraph position="0"> During preprocessing, proper names like &amp;quot;ALFREDO CRISTIANI&amp;quot; and &amp;quot;ROBERTO GARCIA ALVARADO &amp;quot; are dynamically recognized by the Spanish name pattern, which has been developed for the MURASAKI project, using the first names as anchors . The output of preprocessing for &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; is shown in Figure 2 .</Paragraph>
      <Paragraph position="1"> In addition, subsequent references to parts of these names, like &amp;quot;GARCIA&amp;quot;, are resolved using the information learned by the pattern . In this way, we do not need to put all the possible name combination s in the lexicon, but rather put only first names in the lexicon .</Paragraph>
      <Paragraph position="2">  Sentences are parsed using an X-bar-based phrase structure grammar, and SRA's custom modification of the Tomita parser, which handles Japanese and Spanish as well as English . The parser output is grammatical structures called Functionally Labelled Templates (FLTs) which are built using a linguistic formalism that modifies and extends the f-structure of Lexical-Functional Grammar (LFG) . These structures mark grammatical functions, like subject, object, specifier, and complement . Since the FLT formalism i s language-independent, the same semantic interpretation module is used for all languages .</Paragraph>
      <Paragraph position="3"> Preparsing The MUC sentences are fairly long and complex, but in many cases SOLOMON will recognize major constituent boundaries using simple heuristics . For example, if a proper name is directly followed by a comma, some words, and another comma, then those words between the commas are assumed to be a constituent attaching to the proper name as an appositive (e .g. &amp;quot;ALFREDO CRISTIANI, NATIONALIST REPUBLICA N ALLIANCE (ARENA) PRESIDENT-ELECT,&amp;quot; ) . Other easily recognized probable constituents include &amp;quot;according to&amp;quot; phrases and &amp;quot;that&amp;quot; clauses following communication verbs . These smaller constituents are sen t to general parsing in isolation before processing the entire sentence .</Paragraph>
    </Section>
    <Section position="4" start_page="259" end_page="261" type="sub_section">
      <SectionTitle>
Debris Parsing
</SectionTitle>
      <Paragraph position="0"> If general syntactic parsing of a sentence or constituent either fails or is taking too much time, the Debri s Parsing module is invoked. First, the largest and best-weighted non-overlapping constituents recognize d during parsing are extracted from the parse stack . The rest of the input is sent back into general parsin g and debris parsing if necessary. When the entire sentence has been passed back to the parser, the resultin g constituents are put together in a debris FLT. These structures are handled by a special submodule o f Semantic Interpretation, called Debris Semantics.</Paragraph>
      <Paragraph position="1"> Semantic Interpretatio n The Semantic Interpretation module interprets the grammatical structures (FLTs) to produce language-independent meaning representations called Semantically Labelled Templates (SLTs) . It performs semantic ambiguity resolution both during parsing (to reduce the number of parses) and during the construction of SLTs (so that the best possible semantic interpretation is obtained .) The representation at this level is  language-independent because the representation language is based on the concepts in the knowledge bases which are shared among languages .</Paragraph>
      <Paragraph position="2"> Verb mapping information is derived from both lexicons and KBs . In general, a lexical entry tells how each surface syntactic role is mapped to its corresponding thematic role, and a KB entry tells what the semantic type restrictions on these roles are . When necessary, however, lexical idiosyncracies, either syntactic o r semantic, can be recorded in the lexicons . The mapping information for &amp;quot;accuse&amp;quot; is shown in Figure 3 . The semantic concepts representing verbs like &amp;quot;accuse&amp;quot;, &amp;quot;condemn&amp;quot;, and &amp;quot;blame&amp;quot; are subclasses of a concep t called JUDGEMENT-EVENT in our KB . The GOAL of this event (i.e. the embedded sentences under thes e verbs) are thus taken as facts, and mapped to the template as such .</Paragraph>
    </Section>
    <Section position="5" start_page="261" end_page="264" type="sub_section">
      <SectionTitle>
Debris Semantics
</SectionTitle>
      <Paragraph position="0"> When the Semantics module receives the output of Debris Parsing, it must process a collection of fragmentary syntactic constituents rather than a fully analyzed FLT . Debris Semantics will call general semanti c interpretation on each of these constituents and fit them together as best it can based on semantic knowledge and constraints . This involves choosing a top-level S from the syntactic fragments, fitting the othe r fragments into it, and producing the most salient semantic interpretation for the sentence .</Paragraph>
      <Paragraph position="1">  Nominalized verbs, which often describe terrorist events as in &amp;quot;THE KILLING OF ATTORNEY GENERA L ROBERTO GARCIA ALVARADO&amp;quot;, &amp;quot;THE MURDER OF 10 UNION MEMBERS&amp;quot;, and &amp;quot;THE ATTAC K ON FENASTRAS&amp;quot; are treated semantically like ordinary verbs . That is, the nouns &amp;quot;killing&amp;quot;, &amp;quot;murder&amp;quot;, an d &amp;quot;attack&amp;quot; are mapped to event frames in the KBs (i.e. MURDER, KILL, and ATTACK respectively), an d the modifying PPs of appropriate types become the THEME of these events, as in Figure 4 .</Paragraph>
      <Paragraph position="2">  Both pre- and post-appositives like &amp;quot;ATTORNEY GENERAL ROBERTO GARCIA ALVARADO&amp;quot; an d &amp;quot;MANUEL VALLEJO URIBE, A BUSINESSMAN&amp;quot; are interpreted so that the KB objects for the hea d nouns get additional class information provided by the appositives . In Figure 4, the appositive &amp;quot;ATTORNE Y GENERAL&amp;quot; is interpreted so that the frame MAN.472 representing &amp;quot;ROBERT GARCIA ALVARADO&amp;quot; obtains additional ISA information (i .e. GOVERNMENT-OFFICIAL) from the appositive . This semantic interpretation enables resolution of the subsequent reference to the same man by &amp;quot;THE ATTORNEY GENERAL&amp;quot; in S21 (cf . Appendix A) in discourse processing .</Paragraph>
      <Paragraph position="3">  some interesting phenomena such as partitives and super-subclass reference, this module needs the mos t work, especially to be able to handle phenomena which occur in other languages like Spanish and Japanese . Limited event discourse in terms of causality reasoning is done by Pragmatic Inferencing . For example, if it is mentioned that there was some terrorist attack and subsequently 3 people were found dead, we infer that the terrorist attack was the cause of the death of 3 people . Thus, we merge these 2 events into one terroris t event. We are planning to expand and incorporate the event discourse component into the Discourse module .  SOLOMON handles partitives well because many of the domains for which it has been used call for understanding complex quantity expressions . The partitives like &amp;quot;FOUR OF THE VICE PRESIDENT'S CHILDREN&amp;quot; and &amp;quot;ONE OF THEM &amp;quot; , are interpreted by semantics so that the head noun (e.g. &amp;quot;ONE&amp;quot; , &amp;quot;FOUR&amp;quot;) represents a part of the object represented by the NP in the &amp;quot;of&amp;quot; phrase. The NP in the &amp;quot;of&amp;quot; phrase of the partitive construction must be a definite NP . Thus, getting the correct interpretation for partitive s always requires correct definite anaphora resolution. In Figure 5, &amp;quot;THEM&amp;quot; in S22 was correctly resolved to &amp;quot;TWO BODYGUARDS&amp;quot;, which is represented by SECURITY-GUARD .292 in the SET-PARENT slot of ENTITY .299 representing &amp;quot;ONE&amp;quot; .</Paragraph>
      <Paragraph position="4"> Reference by Superclass Concept s The discourse resolution of &amp;quot;THE CRIME&amp;quot; to &amp;quot;KILLING &amp;quot; in 51 is handled by resorting to the KB hierarchy. One of SOLOMON's anaphora resolution strategies is to look for an antecedent whose concept is a subclas s of the concept represented by the anaphora . For example, in &amp;quot;John has a pet iguana, and he loves this lizard.&amp;quot;, &amp;quot;this lizard&amp;quot; is resolved to &amp;quot;a pet iguana&amp;quot; because the concept IGUANA is a subclass of the concep t &amp;quot;LIZARD&amp;quot; in the KB .</Paragraph>
      <Paragraph position="5"> The nominalized event reference &amp;quot;THE CRIME&amp;quot; is resolved in the same way. As explained earlier, a nominalized verb like &amp;quot;killing&amp;quot; is mapped to an event concept, in this case KILL, in the KB . The nou n &amp;quot;crime&amp;quot; is mapped to the concept ANTI-CREATION-EVENT, which has subclasses like MURDER, AT-TACK, BOMB-EVENT, DESTROY, and so on. KILL is also a subclass of ANTI-CREATION-EVENT, an d therefore &amp;quot;THE CRIME &amp;quot; is resolved to &amp;quot;KILLING &amp;quot; . In this way, the two events are merged and a singl e  This module was exploited extensively for the MUC-4 task in order to, perform reasoning needed to go fro m literal interpretation of messages in our semantic representation to the MUC-4 template representation . For example, in S11 of message 0048, &amp;quot;MERINO'S HOME&amp;quot; should be categorized as GOVERNMENT OFFIC E OR RESIDENCE because Merino is a vice president-elect . However, the default semantic type of &amp;quot;HOME&amp;quot; is CIVILIAN RESIDENCE, as shown in Figure 6 . From this representation to the actual template, one mus t infer that a residence occupied by a government official is a government residence .</Paragraph>
      <Paragraph position="6"> We made extensive use of the forward chainer of SRA's knowledge representation language TURNKE Y for this kind of reasoning . It should be made clear that none of the forward rules are specific to particula r terrorist incidents. Rather, all the rules reflect our commonsense reasoning . The rule which deals with the type of inference needed for the Merino example is Rule-025 in Figure 7 .</Paragraph>
      <Paragraph position="7"> In order to handle S12, where it should be determined that people in Merino's home were also targets , we added; after the final testing, another rule Rule-064, which says that any person inside a physical targe t is a human target .</Paragraph>
    </Section>
    <Section position="6" start_page="264" end_page="265" type="sub_section">
      <SectionTitle>
Extract
</SectionTitle>
      <Paragraph position="0"> The Extract module translates the domain-relevant portions of our language-independent meaning representation into database records. We maintain a strong distinction between code and data, and in fact use th e same code to output to several databases ; including flat template-style and more object-oriented schemas .</Paragraph>
      <Paragraph position="1"> Given a top-level event for each processed sentence in the text, Extract decides what subevents of thos e top-level events can be assumed true and therefore extracted from . For example, if killing is condemned, as in S1, then that killing is mapped to the database .</Paragraph>
      <Paragraph position="2"> We employ a fairly simple event merging strategy. Eventually we hope to handle this in discourse . Two events are merged when they have the same stage-of-execution, their &amp;quot;types&amp;quot; are compatible (i .e. either identical or one is just an attack), and one of the following conditions is met :  1. Both events have the same target .</Paragraph>
      <Paragraph position="3"> 2. Either event has no target .</Paragraph>
      <Paragraph position="4"> 3. Either event is only reporting deaths, injuries, or victims .</Paragraph>
      <Paragraph position="5">  Unfortunately, this strategy does not merge the events in S21-22 with the event described in Si since both incidents already have human targets . Of these merged events, Extract filters out those events which shoul d not be mapped according to the rather complicated description provided in the MUC-4 task documentation . To do the actual template filling, we rely on Extract data made up of kb-object/slot to db-table/field mapping rules and conversion functions for the individual values. For example, our AGENT slot in an ATTACK event corresponds to the PERPETRATOR fields in the MUC template . Information from th e free text of the message is combined with that in the header when the text is not explicit about the date o r location of the incidents .</Paragraph>
      <Paragraph position="6">  2. INCIDENT: DATE - 19 APR 89 3. INCIDENT: LOCATION EL SALVADOR 4. INCIDENT: TYPE ATTACK 5. INCIDENT: STAGE OF EXECUTION ACCOMPLISHED 6. INCIDENT: INSTRUMENT ID &amp;quot;BOMB&amp;quot; 7. INCIDENT: INSTRUMENT TYPE BOMB: &amp;quot;BOMB&amp;quot; 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID &amp;quot;NO GROUP&amp;quot; 10. PERP: ORGANIZATION ID &amp;quot;THE FARABUIDO MARTI NATIONAL LIBERATION FRONT &amp;quot; 11. PEEP: ORGANIZATION CONFIDENCE SUSPECTED OR ACCUSED BY AUTHORITIES : &amp;quot;THE FARABUNDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 12. PHYS TGT: ID 13. PHYS TGT: TYPE 14. PHYS TGT: NUMBER 15. PHYS TGT: FOREIGN NATION 16. PHYS TGT: EFFECT OF INCIDENT 17. PHYS TGT: TOTAL NUMBER 18. HUM TGT: NAME &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 19. HUM TGT: DESCRIPTION &amp;quot;ATTORNEY GENERAL&amp;quot; : &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 20. HUM TGT: TYPE GOVERNMENT OFFICIAL: &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 21. HUM TGT: NUMBER 1: &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 22. HUM TGT: FOREIGN NATION 23. HUM TGT: EFFECT OF INCIDENT DEATH: &amp;quot;ROBERTO GARCIA ALVARADO&amp;quot; 24. HUM TGT: TOTAL NUMBER 0. MESSAGE: ID TST2-MUC4-0048 1 . MESSAGE: TEMPLATE 2 2. INCIDENT: DATE - 19 APR 89 3. INCIDENT: LOCATION EL SALVADOR 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ACCOMPLISHED 6. INCIDENT: INSTRUMENT ID &amp;quot;A BOMB&amp;quot; 7. INCIDENT: INSTRUMENT TYPE BOMB: &amp;quot;A BOMB&amp;quot; 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID &amp;quot;AN INDIVIDUAL&amp;quot; 10. PERP: ORGANIZATION ID &amp;quot;THE FARABUIDO MARTI NATIONAL LIBERATION FRONT&amp;quot; 11. PERP: ORGANIZATION CONFIDENCE SUSPECTED OR ACCUSED BY AUTHORITIES : &amp;quot;THE FARABUNDO MARTI NATIONAL</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML