File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-3234_abstr.xml

Size: 952 bytes

Last Modified: 2025-10-06 13:44:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3234">
  <Title>Trained Named Entity Recognition Using Distributional Clusters</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition. The default feature set of BWI is augmented with features based on distributional term clusters induced from a large unlabeled text corpus. Using no traditional linguistic resources, such as syntactic tags or special-purpose gazetteers, this approach yields results near the state of the art in the MUC 6 named entity domain. Supervised learning using features derived through unsupervised corpus analysis may be regarded as an alternative to bootstrapping methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML