File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-2918_abstr.xml
Size: 1,165 bytes
Last Modified: 2025-10-06 13:45:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2918"> <Title>Using Gazetteers in Discriminative Information Extraction</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Much work on information extraction has successfully used gazetteers to recognise uncommon entities that cannot be reliably identi ed from local context alone. Approaches to such tasks often involve the use of maximum entropy-style models, where gazetteers usually appear as highly informative features in the model. Although such features can improve model accuracy, they can also introduce hidden negative effects. In this paper we describe and analyse these effects and suggest ways in which they may be overcome.</Paragraph> <Paragraph position="1"> In particular, we show that by quarantining gazetteer features and training them in a separate model, then decoding using a logarithmic opinion pool (Smith et al., 2005), we may achieve much higher accuracy. Finally, we suggest ways in which other features with gazetteer feature-like behaviour may be identi ed.</Paragraph> </Section> class="xml-element"></Paper>