File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/p01-1034_abstr.xml

Size: 957 bytes

Last Modified: 2025-10-06 13:42:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1034">
  <Title>XML-Based Data Preparation for Robust Deep Parsing</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar.</Paragraph>
    <Paragraph position="1"> Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the 'messiness' in real language data and improve parse performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML