File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1034_intro.xml

Size: 1,392 bytes

Last Modified: 2025-10-06 14:06:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1034">
  <Title>Using SGML as a Basis for Data-Intensive NLP</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The theme of this paper is the design of software and data architectures for natural language processing using corpora. Two major issues in corpus-based NLP are: how best to deal with medium to large scale corpora often with complex linguistic annotations, and what system architecture best supports the reuse of software components in a modular and interchangeable fashion.</Paragraph>
    <Paragraph position="1"> In this paper we describe the LT NSL system (McKelvie et al, 1996), an architecture for writing corpus processing tools, which we have developed in an attempt to address these issues. This system is then compared with two other systems which address some of the same issues, the GATE system (Cunningham et al, 1995) and the IMS Corpus Workbench (Christ, 1994). In particular we address the advantages and disadvantages of an SGML approach compared with a non-SGML database approach. Finally, in order to back up our claims about the merits of SGML-based corpus processing, we present a number of case studies of the use of the LT NSL system for corpus preparation and linguistic analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML