File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/p01-1034_abstr.xml
Size: 957 bytes
Last Modified: 2025-10-06 13:42:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1034"> <Title>XML-Based Data Preparation for Robust Deep Parsing</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar.</Paragraph> <Paragraph position="1"> Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the 'messiness' in real language data and improve parse performance.</Paragraph> </Section> class="xml-element"></Paper>