File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/m98-1011_metho.xml
Size: 11,034 bytes
Last Modified: 2025-10-06 14:14:51
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1011"> <Title>NYU: Description of the Proteus#2FPET System as Used for MUC-7 ST</Title> <Section position="3" start_page="0" end_page="3" type="metho"> <SectionTitle> STRUCTURE OF THE PROTEUS IE SYSTEM </SectionTitle> <Paragraph position="0"> Figure 1 shows an overview of our IE system.</Paragraph> <Paragraph position="1"> The system is a pipelineofmodules, each drawingon attendant knowledge bases #28KBs#29 to process its input, and passes itsoutputtothenext module. Themodular design ensures that control is encapsulated in immutable, domain-independent core components, while the domain-speci#0Cc information resides in the knowledge bases. It is thelatter whichneed to be customized for eachnew domain and scenario, as discussed in thenext section.</Paragraph> <Paragraph position="2"> The lexical analysis module #28LexAn#29 is responsible for splittingthedocumentintosentences, andthe sentences intotokens. LexAn draws on a set of on-line dictionaries; these includethe general COMLEX syntactic dictionary,and domain-speci#0Cc listsofwords andnames. As the result, eachtoken receives a reading, or a list of alternative readings,incasethetoken is syntactically ambiguous. A reading consistsof alistoffeatures andtheir values #28e.g., #5Csyntactic category = Noun&quot;#29. LexAn optionally invokesastatistical part-of-speechtagger, which eliminates unlikely readings for eachtoken.</Paragraph> <Paragraph position="3"> Thenext three phases operatebydeterministic, bottom-up, partial parsing, or pattern matching;the patterns are regular expressions which trigger associated actions. This style of text analysis, #7B as contrasted with full syntactic parsing,#7Bhas gained the wider popularitydue to limitations on the accuracy of full syntactic parsers, andtheadequacy of partial, semantically-constrained, parsing for this task #5B3,2,1#5D. The name recognition patterns identify proper names in thetext byusing local contextual cues, such as capitalization, personal titles #28#5CMr.&quot;, #5CEsq.&quot;#29, and company su#0Exes #28#5CInc.&quot;, #5CCo.&quot;#29. Thenext module #0Cnds small syntactic units, such as basic NPs and VPs. When it identi#0Ces a phrase, the system marks the text segment withsemantic information, e.g. thesemantic class of thehead of the phrase.</Paragraph> <Paragraph position="4"> Thenext phase #0Cnds higher-level syntactic constructions using local semantic information: apposition, prepositional phrase attachment, limited conjunctions, and clausal constructions.</Paragraph> <Paragraph position="5"> The actions operateonthelogical form representation #28LF#29 of the discourse segments encountered so far. The discourse is thus a sequence of LFs correspondingtotheentities, relationships, andevents encountered in theanalysis. A LF is an object withnamed slots #28see example in #0Cgure 2#29. One slot in each LF, named #5CClass&quot;, has distinguished status, anddetermines thenumber andtype of other slotsthatthe object may contain. E.g., an entity of class #5CCompany&quot;has a slot called #5CName&quot;. It also contains a slot #5CLocation&quot; which pointsto another entity,therebyestablishing a relation between thelocation entityandthematrix entity.Events are speci#0Cc kinds of relations, usually havingseveral operands.</Paragraph> <Paragraph position="6"> Thesubsequentphases operateonthe logical forms builtinthepattern matchingphases. Reference resolution #28RefRes#29 links anaphoric pronouns totheir antecedentsandmerges other co-referring expressions. The discourse analysis moduleuses higher-level inference rules to build morecomplexevent structures, where At present, the result of the NYU MENE system, as used in theNEevaluation, does not yet feed into the ST processing. These marks are pointers to the correspondingentities, which are created and added to the list of logical forms representing the discourse.</Paragraph> <Paragraph position="7"> the informationneeded to extract a single complex fact is spread across several clauses. For example,there is a rule thatmerge a Mission entity with a corresponding Launch event. Atthis stage, we also convert all date expressions #28&quot;yesterday&quot;, &quot;last month&quot;,etc.#29 tostartingandendingdates as required for the MUC templates. Another set of rules formatsthe resultantLFintosuch a form as is directly translatable, in a one-to-one fashion, intothe MUC template structure, the translation performed bythe#0Cnal template-generation phase.</Paragraph> </Section> <Section position="4" start_page="3" end_page="5" type="metho"> <SectionTitle> PET USER INTERFACE </SectionTitle> <Paragraph position="0"> Our prior MUC experience has shown that building e#0Bectivepatterns for a new domain is a complex andtime-consuming part of the customization process; it is highly error-prone, and usually requires detailed knowledge of system internals. Withthis in view, wehave sought a disciplined method of customization of knowledge bases, andthepattern base in particular.</Paragraph> <Paragraph position="1"> Organization of Patterns Thepattern base is organized in layers, correspondingto di#0Berent levels of processing. This strati#0Ccation naturally re#0Dectsthe range of applicabilityofthepatterns. Atthelowest level are the most general patterns; they are applied #0Crst, and capture the most basic constructs. These includethe proper names, temporal expressions, expressions for numeric entities, and currencies. Atthenext level are thepatterns that perform partial syntactic analysis #28nounandverb groups#29. These are domain-independentpatterns, useful in a wide range of tasks. Atthenext level, are domain-speci#0Cc patterns, useful across a narrower range of scenarios, butstill having considerable generality.These patterns #0Cnd relationships amongentities, suchasbetween persons and organizations. Lastly,atthe highest level will be the scenario-speci#0Cc patterns, suchasthe clausal patterns that capture events.</Paragraph> <Paragraph position="2"> Proteus treatsthepatterns atthe di#0Berent levels di#0Berently.Thelowest level patterns, havingthe widest applicability,are built in as a core part of the system. These change little when the system is ported. The mid-range patterns, applicablein certain commonlyencountered domains,are provided as pattern libraries, which can be plugged in as required bythe extraction task. For example, for the domain of #5Cbusiness#2Feconomic news&quot;, Proteus has a library withpatterns that capture:</Paragraph> <Paragraph position="4"> Lastly,the system acquires the most speci#0Cc patterns directly from the user, on a per-scenario basis, through PET, a set of interactive graphical tools. In the process of buildingthe custom pattern base, PET engages the user only atthe level of surface representations, hidingtheinternal operation. The user's input Based on this information, PET automatically #0F creates theappropriatepatterns to extract the user-speci#0Ced structures from the user-speci#0Ced text #0F suggests generalizations for thenewly created patterns to broaden coverage.</Paragraph> <Section position="1" start_page="3" end_page="5" type="sub_section"> <SectionTitle> Pattern Acquisition </SectionTitle> <Paragraph position="0"> The initialpattern base consistsofthe built-in patterns andthe plugged-inpattern libraries corresponding tothe domain of interest. These serveasthe foundation for example-based acquisition. Thedevelopment cycle, from the user's perspective, consistsofiteratively acquiringpatterns toaugmentthepattern base. The acquisition process entails several steps: Enter an example: theuserenters a sentence containing a salientevent, #28or copies-pastes text from a documentthrough the corpus browser, a tool provided in the PET suite#29. We will consider the example #5CArianespace Co. has launched an Intelsat communications satellite.&quot; Choose an eventtemplate: the user selects from a menuofeventnames. A list of events, withtheir associated slots, must be given tothe system attheoutset, as part of the scenario de#0Cnition. This example will generateanevent called #5CLaunch&quot;, with slots as in #0Cgure 4: Vehicle, Payload, Agent, Site,etc. Apply existingpatterns: the system applies the currentpatterns tothe example, to obtain an initial analysis, as in #0Cgure 3. In the example shown, the system identi#0Ced some noun#2Fverb groups andtheir semantic types. For each elementitmatches, the system applies minimalgeneralization, #28in the sense thatto be any less general, the elementwould havetomatchthe example text literally#29. The system then presents theanalysis totheuserand initiates an interaction withher: np#28C-company#29 vg#28Launch#29 np#28Satellite#29 Tunepattern elements: the user can modify eachpattern elementinseveral ways: choose theappropriate level of generalization of its concept class, within thesemantic concept hierarchy; force the elementtomatch the correspondingtext in the original example literally;makethe elementoptional; remove it; etc. In this example,theusershouldlikelygeneralize #5Csatellite&quot;tomatchany phrase designating apayload,andgeneralize theverb #5Claunch&quot;to a class containingits synonyms, #28e.g. #5C#0Cre&quot;#29: np#28C-company#29 vg#28C-Launch#29 np#28C-Payload#29 Fill event slots: the user speci#0Ces howpattern elements are used to #0Cll slotsintheeventtemplate. Clicking on an element displays its logical form #28LF#29. The user can drag-and-dropthe LF, or anysub-component thereof, into a slot in thetarget event, as in #0Cgure 4.</Paragraph> <Paragraph position="1"> Build pattern: when the user #5Caccepts&quot; it, the system builds a new pattern tomatchthe example, and compiles the associated action;the action will #0Cre when thepattern matches, and will #0Cll the slotsinthe eventtemplateasinthe example. Thepattern is then added tothepattern base, which can be saved for later use.</Paragraph> <Paragraph position="2"> Syntactic generalization: Actually,thepattern base would acquire much more than the basic pattern that the user accepted. The system applies built-in meta-rules #5B1, 4#5D, to produce a set of syntactic transformations from a simple active clause pattern or a bare noun phrase. For this, active example, thepattern base will automatically acquire itsvariants: the passive, relative, relative passive, reduced relative, etc. also insertsoptional modi#0Cers intothe generated variants#7Bsuchassentence adjuncts, etc.,#7Bto broaden the coverage ofthepattern. In consequence, a passivepattern whichthe system acquires fromthis simpleexample will matchtheeventinthewalk-through message, #5C... said Televisa expects asecond Intelsat satellite to be launchedbyArianespacefrom French Guyana later this month ...&quot;, withthehelp of lower-level patterns for named objects, andlocativeandtemporal sentence adjuncts.</Paragraph> </Section> </Section> class="xml-element"></Paper>