File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/a92-1024_intro.xml
Size: 3,399 bytes
Last Modified: 2025-10-06 14:05:06
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1024"> <Title>Automatic Extraction of Facts from Press Releases to Generate News Stories</Title> <Section position="2" start_page="0" end_page="170" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> While a computer program that can provide complete understanding of arbitrary input text remains a distant dream, it is currently possible to construct natural language processing systems that provide a partial understanding of certain types of text with limited accuracy. Moreover, such systems can provide cost-effective solutions to commercially-significant business problems. This paper describes one such system: JASPER. JASPER (Journalist's Assistant for Preparing Earnings Reports) is a fact extraction system recently developed and deployed by Carnegie Group for Reuters Ltd. JASPER uses a template-driven approach and partial understanding techniques to extract certain key pieces of information from a limited range of text. Specifically, JASPER takes as input a live feed of company press releases from PR Newswire. It identifies which of those releases contain information on company earnings and dividends, and for those releases, it extracts a predetermined set of information. It then reformats that information into a candidate Reuters news story and ships it off to a financial journalist for validation or editing. JASPER improves both the speed and accuracy of producing Reuters stories and hence provides a significant competitive advantage in the fast-paced world of financial journalism.</Paragraph> <Paragraph position="1"> JASPER gets excellent results in terms of both accuracy and speed. It does this by combining frame-based knowledge representation, object-oriented processing, powerful pattern matching, and heuristics which take advantage of stylistic conventions, including lexical, syntactic, semantic, and pragmatic regularities observed in the text corpus. The shallow, localized processing approach that we have adopted focusses on the information to be extracted and ignores irrelevant text. The first phase of JASPER has been deployed at Reuters for use and testing.</Paragraph> <Paragraph position="2"> It provides a low-risk and high-value solution to a real-world business problem.</Paragraph> <Paragraph position="3"> JASPER's architecture facilitates transfer to other fact extraction applications; the domain-independent core which controls processing is separate from the application-specific knowledge base which makes decisions about extracting information, so only the latter needs to be rewritten for other applications. Still, the knowledge engineering required to build an application is significanL We estimate that the JASPER application involved approximately eight person months in knowledge engineering, apart from basic system development.</Paragraph> <Paragraph position="4"> Many significant business problems can be solved by similarly focussed fact extraction applications. The information extracted can used in a variety of ways, such as filling in values in a database, generating summaries of the input text, serving as a part of the knowledge in an expert system, or feeding into another program which bases decisions on it. We expect to develop many such applications in the future using similar techniques.</Paragraph> </Section> class="xml-element"></Paper>