XML Viewer - m91-1028

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/m91-1028_metho.xml
Size: 16,819 bytes
Last Modified: 2025-10-06 14:12:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="M91-1028">
  <Title>Reference Resolution</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
STAGES OF PROCESSIN G
</SectionTitle>
    <Paragraph position="0"> The text goes through the five major stages of processing : lexical analysis, syntactic analysis, semantic analysis, reference resolution, and template generation (see Figure 1). In addition, some restructuring of the logica l form is performed both after semantic analysis and after reference resolution (only the restructuring after referenc e resolution is shown in Figure 1) . Processing is basically sequential: each sentence goes through lexical, syntactic , and semantic analysis and reference resolution ; the logical form for the entire message is then fed to template generation . However, semantic (selectional) checking is performed during syntactic analysis, employing essentiall y the same code later used for semantic analysis .</Paragraph>
    <Paragraph position="1"> Each of these stages is described in a section which follows .</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
LEXICAL ANALYSIS
Dictionary Forma t
</SectionTitle>
    <Paragraph position="0"> Our dictionaries contain only syntactic information : the parts of speech for each word, information about the complement structure of verbs, distributional information (e .g., for adjectives and adverbs), etc. We follow closely the set of syntactic features established for the NYU Linguistic String Parser . This information is entered in LIS P form using noun, verb, adjective, and adverb macros for the open-class words, and a word macro for other parts of speech :</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="183" type="metho">
    <SectionTitle>
(ADVERB &amp;quot;ABRUPTLY&amp;quot; :ATTRIBUTES (DSA) )
(ADJECTIVE &amp;quot;ABRUPT&amp;quot; )
(NOUN :ROOT &amp;quot;ABSCESS&amp;quot; :ATTRIBUTES (NCOUNT) )
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="6" start_page="183" end_page="184" type="metho">
    <SectionTitle>
(VERB :ROOT &amp;quot;ABSCOND&amp;quot; :OBJLIST (NULLOBJ PN (PVAL (FROM WITH))) )
</SectionTitle>
    <Paragraph position="0"> The noun and verb macros automatically generate the regular inflectional forms .</Paragraph>
    <Paragraph position="1"> Dictionary Files The primary source of our dictionary information about open-class words (nouns, verbs, adjectives, an d adverbs) is the machine-readable version of the Oxford Advanced Learner's Dictionary (&amp;quot;OALD&amp;quot;) . We have written programs which take the SGML (Standard Generalized Markup Language) version of the dictionary, extrac t information on inflections, parts of speech, and verb subcategorization (including information on adverbial particles and prepositions gleaned from the examples), and generate the LISP-ified form shown above . This is supplemented by a manually-coded dictionary (about 500 lines) for closed-class words and a few very common words .  For MUC-3 we used several additional dictionaries . There was a dictionary (about 800 lines) for Englis h words not defined in the OALD, or not adequately defined or too richly defined there . In addition, we extracted from the text and templates lists of organizations, locations, and proper names, and prepared small dictionaries fo r each (about 2000 lines total) .</Paragraph>
    <Paragraph position="2"> Lookup The text reader splits the input text into tokens and then attempts to assign to each token (or sequence of tokens, in the case of an idiom) a definition (part of speech and syntactic attributes) . The matching proces s proceeds in four steps: dictionary lookup, lexical pattern matching, spelling correction, and prefix stripping . Dictionary lookup immediately retrieves definitions assigned by any of the dictionaries (including inflected forms) , while lexical pattern matching is used to identify a variety of specialized patterns, such as numbers, dates, times , and possessive forms .</Paragraph>
    <Paragraph position="3"> If neither dictionary lookup nor lexical pattern matching is successful, spelling correction and prefix strippin g are attempted . Based on an analysis of the errors we found, we have used for MUC-3 a rather conservative spelling corrector, which identifies an input token as a misspelled form of a dictionary entry only if one of the two has a single instance of a letter while the other has a doubled instance of the letter (e .g., &amp;quot;mispelled&amp;quot; and &amp;quot;misspelled&amp;quot;) .1 The prefix stripper attempts to identify the token as a combination of a prefix and a word defined in the dictionary. We currently use a list of 17 prefixes, including standard English ones like &amp;quot;un&amp;quot; and MUC-3 specials like &amp;quot;narco-&amp;quot;.</Paragraph>
    <Paragraph position="4"> If all of these procedures fail, the word is tagged as a proper noun (name), since we found that most of our remaining undefined words were names .</Paragraph>
    <Section position="1" start_page="184" end_page="184" type="sub_section">
      <SectionTitle>
Filtering
</SectionTitle>
      <Paragraph position="0"> In order to avoid full processing of sentences which would make no contribution to the templates, we per form a keyword-based filtering at the sentence level : if a sentence contains no key terms, it is skipped. This filtering is done after lexical analysis because the lexical analysis has identified the root form of all inflected words ; these root forms provide links into the semantic hierarchy . The filtering can therefore be specified in terms of a small number of word classes, one of which must be present for the sentence to be worth processing .</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="184" end_page="184" type="metho">
    <SectionTitle>
SYNTACTIC ANALYSIS
</SectionTitle>
    <Paragraph position="0"> Syntactic analysis involves two stages of processing: parsing and syntactic regularization. At the core of the system is an active chart parser. The grammar is an augmented context-free grammar, consisting of BNF rules plu s procedural restrictions which check grammatical constraints not easily captured in the BNF rules . Most restrictions are stated in PROTEUS Restriction Language (a variant of the language developed for the Linguistic String Parser ) and translated into LISP; a few are coded directly in LISP [1] . For example, the count noun restriction (that singular countable nouns have a determiner) is stated as</Paragraph>
    <Paragraph position="2"/>
  </Section>
  <Section position="8" start_page="184" end_page="185" type="metho">
    <SectionTitle>
IF BOTH CORE Xcore IS NCOUNT AND Xcore IS SINGULAR
THEN IN LN, TPOS IS NOT EMPTY .
</SectionTitle>
    <Paragraph position="0"> Associated with each BNF rule is a regularization rule, which computes the regularized form of each node i n the parse tree from the regularized forms of its immediate constituents . These regularization rules are based on lambda-reduction, as in GPSG. The primary function of syntactic regularization is to reduce all clauses to a standard form consisting of aspect and tense markers, the operator (verb or adjective), and syntactically marked cases .</Paragraph>
    <Paragraph position="1"> For example, the definition of assertion, the basic S structure in our grammar, i s</Paragraph>
    <Paragraph position="3"> Here the portion after the single colon defines the regularized structure .</Paragraph>
    <Paragraph position="4"> The more standard corrector we used for MUCK-2, which allowed for any single insertion, deletion, transposition, or substitution, gav e too many incorrect matches .</Paragraph>
    <Paragraph position="5">  Coordinate conjunction is introduced by a metarule (as in GPSG), which is applied to the context-free components of the grammar prior to parsing . The regularization procedure expands any conjunction into a conjuntio n of clauses or of noun phrases .</Paragraph>
    <Paragraph position="6"> The output of the parser for the first sentence of DEV-0099, &amp;quot;POLICE HAVE REPORTED THAT TER-</Paragraph>
  </Section>
  <Section position="9" start_page="185" end_page="186" type="metho">
    <SectionTitle>
RORISTS TONIGHT BOMBED THE EMBASSIES OF THE PRC AND THE SOVIET UNION . &amp;quot; , is
</SectionTitle>
    <Paragraph position="0"> The system uses a chart parser operating top-down, left-to-right. As edges are completed (i.e., as nodes of the parse tree are built), restrictions associated with those productions are invoked to assign and test features of th e parse tree nodes . If a restriction fails, that edge is not added to the chart . When certain levels of the tree are complete (those producing noun phrase and clause structures), the regularization rules are invoked to compute a regularized structure for the partial parse, and selection is invoked to verify the semantic well-formedness of the structure (as noted earlier, selection uses the same &amp;quot;semantic analysis&amp;quot; code subsequently employed to translate the tre e into logical form) .</Paragraph>
    <Paragraph position="1"> One unusual feature of the parser is its weighting capability . Restrictions may assign scores to nodes ; the parser will perform a best-first search for the parse tree with the highest score . This scoring is used to implement various preference mechanisms:  * closest attachment of modifiers (we penalize each modifier by the number of words separating it from it s head) * preferred narrow conjoining for clauses (we penalize a conjoined clause structure by the number of words i t subsumes) * preference semantics (selection does not reject a structure, but imposes a heavy penalty if the structure doe s not match any lexico-semantic model, and a lesser penalty if the structure matches a model but with som e operands or modifiers left over) [2,3] * relaxation of certain syntactic constraints, such as the count noun constraint, adverb position constraints, an d comma constraints * disfavoring (penalizing) headless noun phrases and headless relatives (this is important for parsin g efficiency)  The grammar is based on Harris's Linguistic String Theory and adapted from the larger Linguistic Strin g Parser (LSP) grammar developed by Naomi Sager at NYU [4] . The grammar is gradually being enlarged to cove r more of the LSP grammar . The current grammar is 1200 lines of BNF and Restriction Language plus 300 lines of Lisp; it includes 150 non-terminals, 365 productions, and 103 restrictions .</Paragraph>
    <Paragraph position="2"> Over the course of MUC-2 and MUC-3 we have added several mechanisms for recovering from sentence s the grammar cannot fully parse ; these are described in our site report.</Paragraph>
  </Section>
  <Section position="10" start_page="186" end_page="186" type="metho">
    <SectionTitle>
SEMANTIC ANALYSIS AND REFERENCE RESOLUTION
</SectionTitle>
    <Paragraph position="0"> The output of syntactic analysis goes through semantic analysis and reference resolution and is then added t o the accumulating logical form for the message . Following both semantic analysis and reference resolution certai n transformations are performed to simplify the logical form . All of this processing makes use of a concept hierarch y which captures the class/subclass/instance relations in the domain .</Paragraph>
    <Paragraph position="1"> Semantic analysis uses a set of lexico-semantic models to map the regularized syntactic analysis into a semantic representation. Each model specifies a class of verbs, adjectives, or nouns and a set of operands ; for each operand it indicates the possible syntactic case markers, the semantic class of the operand, whether or not the operand is required, and the semantic case to be assigned to the operand in the output representation . For example , the model for &amp;quot;&lt;explosive-object&gt; damages &lt;target&gt; &amp;quot; is  The models are arranged in a shallow hierarchy with inheritance, so that arguments and modifiers which are share d by a class of verbs need only be stated once. The model above inherits only from the most general clause model, clause-any, which includes general clausal modifiers such as negation, time, tense, modality, etc . The evaluated MUC-3 system had 98 clause models, 14 nominalization models, and 31 other noun phrase models, a total of about 2000 lines . The class explosive-object in the clause model refers to the concept in the con cept hierarchy, whose entries have the form :  There are currently a total of 2098 concepts in the hierarchy, of which 1439 are place names .</Paragraph>
    <Paragraph position="2"> The output of semantic analysis is a nested set of entity and event structures, with arguments labeled by key words primarily designating semantic roles . For the first sentence of DEV-0099, the output is  Reference resolution is applied to the output of semantic analysis in order to replace anaphoric noun phrases (representing either events or entities) by appropriate antecedents . Each potential anaphor is compared to prior entities or events, looking for a suitable antecedent such that the class of the anaphor (in the concept hierarchy) is equal to or more general than that of the antecedent, the anaphor and antecedent match in number, the restrictiv e modifiers in the anaphor have corresponding arguments in the antecedent, and the non-restrictive modifiers (e .g., apposition) of the anaphor are not inconsistent with those of the antecedent . Special tests are provided for names (people may be referred to a subset of their names) and for referring to groups by typical members (&amp;quot;terrorist force &amp;quot; .. . &amp;quot;terrorists&amp;quot;). Some further discussion of reference resolution and the subsequent process of template merging i s included in a separate paper on discourse analysis in this volume (&amp;quot;Computational Aspects of Discourse in the Con text of MUC-3&amp;quot;) .</Paragraph>
  </Section>
  <Section position="11" start_page="186" end_page="186" type="metho">
    <SectionTitle>
Logical Form Transformations
</SectionTitle>
    <Paragraph position="0"> The transformations which are applied after semantic analysis and after reference resolution simplify and regularize the logical form in various ways. For example, if a verb governs an argument of a nominalization, th e argument is inserted into the event created from the nominalization: &amp;quot;x conducts the attack&amp;quot;, &amp;quot;x claims responsibility for the attack&amp;quot;, &amp;quot;x was accused of the attack&amp;quot; etc . are all mapped to &amp;quot;x attacks&amp;quot; (with appropriate settings of th e confidence slot). For example, the rule to take &amp;quot;X was accused of Y&amp;quot; and make X the agent of Y i s  This rule is used in message TST1-0099, for example, to expand THE EMBASSIES OF THE PRC AND THE SOVIET UNION into THE EMBASSY OF THE PRC AND THE EMBASSY OF THE SOVIET UNION .</Paragraph>
    <Paragraph position="1"> There are currently 32 such rules . These transformations are written as productions and applied using a simple data-driven production system interpreter which is part of the PROTEUS system .</Paragraph>
  </Section>
  <Section position="12" start_page="186" end_page="186" type="metho">
    <SectionTitle>
TEMPLATE GENERATOR
</SectionTitle>
    <Paragraph position="0"> Once all the sentences in an article have been processed through syntactic and semantic analysis, the resulting logical forms are sent to the template generator. The template generator operates in four stages . First, a frame structure resembling a simplified template (with incident-type, perpetrator, physical-target, human-target, date , location, instrument, physical-effect, and human-effect slots) is generated for each event . Date and location expressions are reduced to a normalized form at this point. In particular, date expressions such as &amp;quot;tonight&amp;quot;, &amp;quot;last month&amp;quot;, &amp;quot;last April&amp;quot;, &amp;quot;a year ago&amp;quot;, etc . are replaced by explicit dates or date ranges, based on the dateline of th e article. Second, a series of heuristics attempt to merge these frames, mergin g 18 9 o frames referring to a common target * frames arising from the same sentence * an effect frame following an attack frame (e .g., &amp;quot;The FMLN attacked the town. Seven civilians died .&amp;quot;) This merging is blocked if the dates or locations are different, the incident types are incompatible, or the perpetrators are incompatible. Third, a series of filters removes frames involving only military targets and those involvin g events more than two months old . Finally, MUC templates are generated from these frames .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML