File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1101_intro.xml

Size: 1,776 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1101">
  <Title>Knowledge-Based Multilingual Document Analysis</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Modern information technologies are faced with the problem of selecting, filtering, linking and managing growing amounts of multilingual information to which access is usually critical. Our work is motivated by the linking of multilingual information in a wide range of domains. Although this problem appears to be directly related to the Information Retrieval task, we aimed to link articles, not in the broad sense of clustering documents related to the same topic, but rather more specifically linking particular pieces of information together from different documents. Furthermore, we found that IE research, although appropriate for our task, was not designed for the scale/variety of different domains that we needed to process. In general, creating the world model necessary for the addition of a new domain to an IE system is a time-consuming process. As such, we designed an IE system that could be semi-automatically and easily adapted to new domains - a process we will refer to as large scale IE. The key to creating new world models relied on incorporating large amounts of domain knowledge. As a result we selected EuroWordnet as our base knowledge source. EuroWordnet has the advantages of 1) providing the foundation for broad knowledge across many domains and 2) is multilingual in nature. In this paper, we will explain how our system works, how the knowledge base was incorporated and a discussion of other applications that could make use of the same technology.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML