File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1028_intro.xml
Size: 1,279 bytes
Last Modified: 2025-10-06 14:05:25
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1028"> <Title>THE MURASAKI PROJECT: MULTILINGUAL NATURAL LANGUAGE UNDERSTANDING</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> This paper describes a multilingual data extraction system under development for the Department of Defense (Do\[)). The system, called Murasa.ki, processes Spanish and Japanese newspaper articles reporting AIDS disease statistics. Key to Murasaki's design is its language-independent and domain-independent architecture. The system consists of shared processing modules across the three languages it currently handles (English, Japanese, and Spanish), shared general and domain-specific knowledge bases, and separate data modules for language-specific knowledge such as grammars, lexicons, morphological data and discourse data. This data-driven architecture is crucial to the success of Murasaki as a language-independent system; extending Murasaki to additional languages can be done for the most part merely by adding new data. Some of the data can be added with user-friendly tools, others by exploiting existing on-line data or by deriving relevant data from corpora.</Paragraph> </Section> class="xml-element"></Paper>