File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1671_abstr.xml

Size: 1,093 bytes

Last Modified: 2025-10-06 13:45:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1671">
  <Title>Learning Field Compatibilities to Extract Database Records from Unstructured Text</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Named-entity recognition systems extract entities such as people, organizations, and locations from unstructured text. Rather than extract these mentions in isolation, this paper presents a record extraction system that assembles mentions into records (i.e. database tuples). We construct a probabilistic model of the compatibility between field values, then employ graph partitioning algorithms to cluster fields into cohesive records. We also investigate compatibility functions over sets of fields, rather than simply pairs of fields, to examine how higher representational power can impact performance. We apply our techniques to the task of extracting contact records from faculty and student homepages, demonstrating a 53% error reduction over baseline approaches.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML