File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/m92-1029_metho.xml
Size: 7,040 bytes
Last Modified: 2025-10-06 14:13:14
<?xml version="1.0" standalone="yes"?> <Paper uid="M92-1029"> <Title>MCDONNELL DOUGLAS ELECTRONIC SYSTEMS COMPANY : DESCRIPTION OF THE TEXUS SYSTEM AS USED FOR MUC- 4 Amnon Meyers and David de Hilster</Title> <Section position="4" start_page="210" end_page="210" type="metho"> <SectionTitle> SYSTEM INFORMATIO N </SectionTitle> <Paragraph position="0"> TexUS has 89,000 lines of code and runs on Sun SPARCstations in C and Sunview . The system customized for MUC4 has about 1800 vocabulary words (not counting conjugations), with an additional 4,000 Hispanic names an d geographic locations. The analyzer uses about 260 rules and processes text at about 2 words per second.</Paragraph> </Section> <Section position="5" start_page="210" end_page="212" type="metho"> <SectionTitle> MUC4 DISCUSSIO N </SectionTitle> <Paragraph position="0"> That our current system is in transition is clearly evidenced by comparing the MUC3 and MUC4 scores. In fact, the rescored MUC3 results are better than those of the current system .</Paragraph> <Paragraph position="1"> We have also implemented extensive automated testing facilities to augment the MUC scoring apparatus . We have used the testing system in preparing for MUC4 and will make extensive use of it during the remainder of 1992 t o improve performance on the MUC task.</Paragraph> <Paragraph position="2"> Relevancefilter: Keyword and key pattern search helps identify relevant portions of the text . Lexical: The lexical pass is primarily concerned with identifying locations and names, and implements n-gram methods to decide if unknown words are English or not . To augment the system's vocabulary, the lexical pass made extensive use of the Collin's English Dictionary (CED) . The set list of locations is used by the lexical pass, as is a set of personal names extracted from the development corpus . Spelling correction and morphological analysi s algorithms also apply to unknown words .</Paragraph> <Paragraph position="3"> For message 48, the lexical analyzer failed to find 'yet' in the CED, so n-gram analysis guessed that it is an English word. The word 'there' was absent from our core vocabulary, indicating the incompleteness of our coverage. The complete list of unknown words found in the CED for message 48 follows : abroad, accused, appointed, approve, armored, christian, closely, confirmed, considered, cordoned , credit, declared, democrat, drastic, elect, escaped, halt, including, intended, intersection, job, laws, legislative, linked, moments, napoleon, niece, noted, occasions, old, operation, possibility, prompt , reaction, replace, represent, responsible, roof, ruled, same, sources, stopped, street, termed, there , threatened, time, traveling, unscathed, warned The spelling corrector converted &quot;asssembly&quot; to &quot;assembly&quot; . Finally, all the names in the message, such as &quot;Roberto Garcia Alvarado&quot;, were correctly determined.</Paragraph> <Paragraph position="4"> Before and after lexical analysis, bottom-up passes through the message text located several types of idioms . Before lexical analysis, the following were found state of our knowledge base and the large degree of syntactic ambiguity supplied by the CED. For example, the first sentence in message 48 was segmented to the following components: In general, the assignment of np, vp, and pp was correct, even in this sentence. Lack of patterns such as <alphabetic> <hyphen> <alphabetic> led to the mishandling of &quot;president-elect&quot; . &quot;The terrorist killing&quot; is difficult to assign correctly in general, and TexUS did well to assign the noun sense of &quot;killing&quot; . As described for the lexical pass, &quot;roberto garcia alvarado and accused&quot; was misparsed because of a noun list pattern <noun> <and> <noun> tha t was overly unrestricted.</Paragraph> <Paragraph position="5"> Semantic analyis: The semantic structures produced for the first sentence in message 48 derive directly from the syntactic segmentation shown above . We have edited the internal semantic representation to be human-readable: event =condemned actors = (1) Salvadoran president, (2) elect, (3) alfredo cristiani.</Paragraph> <Paragraph position="6"> actions = (1) killing, (2) crime.</Paragraph> <Paragraph position="7"> objects = (1) terrorist, (2) attorney general, (3) roberto garcia alvarado and accused, (4) fmln The assignments are generally reasonable, except that merging of appositives and split noun phrases is not ye t implemented. In the two weeks following the formal MUC4 test, we have improved the semantic analyzer to outpu t separate event structures for nominal actions such as &quot;the terrorist killing&quot; and &quot;the crime&quot;, so that adverbial information can be properly attached to these events .</Paragraph> <Paragraph position="8"> After fixing some of the segmentation bugs noted earlier, the semantic output is greatly improved :</Paragraph> <Paragraph position="10"> Discourse analysis: Discourse analysis links or separates semantic information based on syntactic, semantic, an d discourse knowledge . In general, semantic information is separated or merged by comparing date/time, location, actors and objects . Actors and objects are classified as proper nouns, pronouns, or abstract nouns (e .g., &quot;the home&quot;) and are compared by successively relaxing constraints on agreement, as in the syntactic and semantic passes . If the object being compared is the name &quot;Garcia&quot;, then the first precedence for comparison will be other names such as &quot;Roberto Garcia Alvarado&quot; or &quot;Garcia Alvarado&quot; which contain the name &quot;Garcia&quot; . If none are found, a proper name is then matched with pronouns such as &quot;he&quot; or &quot;him&quot; . If that also fails, then &quot;Garcia&quot; is matched with abstrac t nouns such as &quot;the attorney general&quot;. Time, location, and other concepts are compared similarly .</Paragraph> <Paragraph position="11"> One construction not currently handled by the discourse analyzer is the phrase &quot;Merino's home &quot;. The discourse analyzer does not yet link possessive nouns with other nouns in the corpus, which would help classify &quot;Merino's home &quot; as a GOVERNMENT OFFICE OR RESIDENCE instead of CIVILIAN RESIDENCE .</Paragraph> <Paragraph position="12"> For meaningful work on message 48, discourse analysis depended on modifications to the earlier passes . In addition, we added a pragmatic rule that merged events based on location, allowing the attack on Merino's home to be merged with the fact that children were in the home at the time (sentences 11-13 of message 48) . In general, the discourse process works well on MUC messages when the prior passes produce a correct internal semanti c</Paragraph> </Section> class="xml-element"></Paper>