File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1101_evalu.xml
Size: 4,843 bytes
Last Modified: 2025-10-06 13:58:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1101"> <Title>Knowledge-Based Multilingual Document Analysis</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Discussion and Future Work </SectionTitle> <Paragraph position="0"> The NAMIC system was created to provide an environment for automatic hypertextual authoring of multilingual news articles. In order to address that task, we created language processors in three languages (English, Italian and Spanish) which allows us to create a database of conceptually analysed text. The ability to analyse text in this way is vital for the authoring process, but is also applicable to a wide range of technologies, including Information Retrieval in general, and Question-Answering in particular.</Paragraph> <Paragraph position="1"> Information Retrieval (Spark Jones and Willett, 1997; Rijsbergen, 1979), or document retrieval as it is in practice, is a well used, robust technology which allows users to access some subset of documents by means of a set of keywords. However, the retrieval of answers to questions by keywords, whilst easy to implement, suffers by their restrictive nature. For example, a keyword based retrieval mechanism would be unable to distinguish between the queries who killed Lee Harvey Oswald? and who did Lee Harvey Oswald kill?, operating as they do by reducing these queries to a bag of stemmed words. By accessing the kind of knowledge base that we created in the Namic project where events and their relations are explicitly represented, an IR system would be able to distinguish between the above two queries or any other queries that require this kind of data mining.</Paragraph> <Paragraph position="2"> One possible future extension of the NAMIC scenario, is to move from only allowing users to browse through a space of connected articles to a system that supports journalists in the creation of news articles.</Paragraph> <Paragraph position="3"> State of the art techniques for searching, analysing, authoring and disseminating information in the news domain originating from diverse language sources are needed in order to support the working activities of authors (i.e. the journalists) within a complex environment for searching, elaborating and delivering news. The information so derived will enter the dissemination process (archives to the agencies and/or Web channels) and enhanced presentation to the user will be supported in a way that it can be readily understood, accepted, rejected or amended as necessary.</Paragraph> <Paragraph position="4"> Reporters covering the early stages of a &quot;breaking&quot; story rely on a format of questions. Typically, these questions include: What? Where? Who? When? But, although definitions of a news story include the originality of the event (&quot;Something that happened today which did not happen yesterday&quot;), coverage also relies on archives. Checks made in the potentially multilingual archives - increasingly comprised of digital resources - make up one of the most important phases in reporting. If such a search path can be imitated by a computer, this would greatly enhance the speed and accuracy of archive searches. For example, in the immediate aftermath of a crash involving a passenger airliner, a number of simple questions and answers may be addressed to the archive. Has this type of aircraft crashed before? If so, what happened? How many fatalities have there been in incidents involving this type of aircraft? Has there been a crash before at this airport? What are the main characteristics of this aircraft? What are those of the airport? Answers to these questions may prompt a series of subsidiary questions.</Paragraph> <Paragraph position="5"> The depth of interpretation which an experienced and educated journalist can bring to events cannot hope to be imitated by a computer, at least for some considerable time. However, what does seem possible is that a computerised assistant, a sort of electronic cub reporter, could assist the human journalist by finding and collating relevant archival materials in an intelligent fashion - i.e. without precise, low-level instruction from the journalist. This multi-lingual question-answering task would be aided by the development the proposed system.</Paragraph> <Paragraph position="6"> In conclusion, we believe that the creation of a sophisticated knowledge base resource can benefit many Information Technology applications - IR and Question Answering to name two. We were able to create such a resource in the NAMIC project by implementing a scalable IE system containing a robust world model based on EuroWordnet. We feel that this kind of automatic resource building will play a significant part of future IT applications.</Paragraph> </Section> class="xml-element"></Paper>