File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/w95-0113_abstr.xml

Size: 1,078 bytes

Last Modified: 2025-10-06 13:48:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0113">
  <Title>Development of a Partially Bracketed Corpus with Part-of-Speech Information Only</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Resea/ch based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker to help the development of a partially bracketed corpus. The chunker partitions the part-of-speech sequence into segments called chunks. Rather than using a treebank as our training corpus, a corpus which is tagged with part-of-speech information only is used. The experimental results show the probabilistic chunker has more than 92% correct rate in outside test. The well-formed partially bracketed corpus is a milestone in the development of a treebank. Besides, the simple but effective chunker can also be applied to many natural language applications.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML