File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/92/a92-1024_relat.xml
Size: 3,728 bytes
Last Modified: 2025-10-06 14:16:03
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1024"> <Title>Automatic Extraction of Facts from Press Releases to Generate News Stories</Title> <Section position="3" start_page="170" end_page="170" type="relat"> <SectionTitle> 2. Related Work </SectionTitle> <Paragraph position="0"> Most text understanding systems have generally fallen into two categories: * systems which attempt to perform a complete linguistic analysis of the text o systems which perform partial understanding to accomplish certain specific understanding tasks Most of the linguistically-based systems perform a more or less pure syntactic analysis and a semantic and/or pragmatic analysis to arrive at a representation of the meaning of the text. TACITUS \[3\], PROTEUS \[5\], PUNDIT \[2\], CAUCUS \[9\], and the News Analysis System (bIAS) \[6\] all fall into this category. The systems differ in the specifics of the syntactic, semantic and pragmatic analysis used and in the degree to which the different levels of processing are integrated. For example, TACITUS' syntactic step does enough semantic processing to produce a logical form; a second step performs pragmatic tasks such as reference resolution. Some systems base their processing on a particular linguistic theory; for example, CAUCUS uses Lexical Functional Grammar and NAS uses a Government-Binding approach to syntax and semantics.</Paragraph> <Paragraph position="1"> Other systems use more idiosyncratic approaches to the analysis.</Paragraph> <Paragraph position="2"> These linguistically-based systems have a tremendous potential for complete understanding of a wide range of text, because, in theory, they do a complete analysis of the text. However, the processing of such systems tends to be relatively slow; in addition, these systems have tended to be used in research contexts in part because the range of coverage that they can provide is necessarily limited. A full analysis of text that covers diverse topics or that must be processed at a high rate of throughput is not feasible given the current state of the art.</Paragraph> <Paragraph position="3"> Systems which do not attempt a complete understanding of the text, but rather focus on specific understanding tasks are more likely to result in deployable applications.</Paragraph> <Paragraph position="4"> ATRANS \[7\], the only major deployed fact extraction system before JASPER, is the most notable example.</Paragraph> <Paragraph position="5"> ATRANS operates in the domain of international banking telexes, dealing with one major subclass of such telexes -money Iransfer telexes. ATRANS automatically extracts the information required to complete the transfer (the various banks mentioned in the telex, their roles in the money transfer, payment amounts, dates, security keys, etc.) and formats it for entry into the bank's automated transaction processing system. The understanding techniques used in ATRANS are based on caseframe analysis using the Conceptual Dependency formalism \[8\] which relies on semantics over syntax, and does not require a complete analysis of the text.</Paragraph> <Paragraph position="6"> General Electric's SCISOR system \[4\] uses a hybrid approach, combining syntactic and caseframe parsing. This allows it to exploit the strong top-down domain expectations provided by caseframes to deal with relevant fragments from text that it cannot fully analyze, while at the same time generating complete linguistic analyses when possible.</Paragraph> <Paragraph position="7"> SCISOR is also designed so that general grammatical knowledge and domain-specific knowledge are kept separate. This will greatly facilitate its transfer to other domains.</Paragraph> </Section> class="xml-element"></Paper>