File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/h05-1046_abstr.xml

Size: 1,405 bytes

Last Modified: 2025-10-06 13:44:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1046">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 363-370, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Disambiguating Toponyms in News</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML