XML Viewer - n06-1009

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/n06-1009_abstr.xml

Size: 1,678 bytes

Last Modified: 2025-10-06 13:44:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1009">
  <Title>Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Deidentification of clinical records is a crucial step before these records can be distributed to non-hospital researchers.</Paragraph>
    <Paragraph position="1"> Most approaches to deidentification rely heavily on dictionaries and heuristic rules; these approaches fail to remove most personal health information (PHI) that cannot be found in dictionaries. They also can fail to remove PHI that is ambiguous between PHI and non-PHI.</Paragraph>
    <Paragraph position="2"> Named entity recognition (NER) technologies can be used for deidentification. Some of these technologies exploit both local and global context of a word to identify its entity type. When documents are grammatically written, global context can improve NER.</Paragraph>
    <Paragraph position="3"> In this paper, we show that we can deidentify medical discharge summaries using support vector machines that rely on a statistical representation of local context. We compare our approach with three different systems. Comparison with a rule-based approach shows that a statistical representation of local context contributes more to deidentification than dictionaries and hand-tailored heuristics. Comparison with two well-known systems, SNoW and IdentiFinder, shows that when the language of documents is fragmented, local context contributes more to deidentification than global context.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML