File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/h05-1046_relat.xml

Size: 3,157 bytes

Last Modified: 2025-10-06 14:15:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1046">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 363-370, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Disambiguating Toponyms in News</Title>
  <Section position="8" start_page="368" end_page="369" type="relat">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Work related to toponym tagging has included harvesting of gazetteers from the Web (Uryupina 2003), hand-coded rules to place name disambiguation, e.g., (Li et al. 2003) (Zong et al. 2005), and machine learning approaches to the problem, e.g., (Smith and Mann 2003). There has of course been a large amount of work on the more general problem of word-sense disambiguation, e.g., (Yarowsky 1995) (Kilgarriff and Edmonds 2002).</Paragraph>
    <Paragraph position="1"> We discuss the most relevant work here.</Paragraph>
    <Paragraph position="2"> While (Uryupina 2003) uses machine learning to induce gazetteers from the Internet, we merely download and merge information from two popular Web gazetteers. (Li et al. 2003) use a statistical approach to tag place names as a LOCation class.</Paragraph>
    <Paragraph position="3"> They then use a heuristic approach to location normalization, based on a combination of hand-coded pattern-matching rules as well as discourse features based on co-occurring toponyms (e.g., a document with &amp;quot;Buffalo&amp;quot;, &amp;quot;Albany&amp;quot; and &amp;quot;Rochester&amp;quot; will likely have those toponyms disambiguated to New York state). Our TagDiscourse feature is more coarse-grained. Finally, they assume one sense per discourse in their rules, whereas we use it  as a feature CorefClass for use in learning. Overall, our approach is based on unsupervised machine learning, rather than hand-coded rules for location normalization.</Paragraph>
    <Paragraph position="4"> (Smith and Mann 2003) use a &amp;quot;minimally supervised&amp;quot; method that exploits as training data toponyms that are found locally disambiguated, e.g., &amp;quot;Nashville, Tenn.&amp;quot;; their disambiguation task is to identify the state or country associated with the toponym in test data that has those disambiguators stripped off. Although they report 87.38% accuracy on news, they address an easier problem than ours, since: (i) our earlier local ambiguity estimate suggests that as many as two-thirds of the gazetteer-ambiguous toponyms may be excluded from their test on news, as they would lack local discriminators (ii) the classes our tagger uses (Table 3) are more fine-grained. Finally, they use one sense per discourse as a bootstrapping strategy to expand the machine-annotated data, whereas in our case CorefClass is used as a feature.</Paragraph>
    <Paragraph position="5"> Our approach is distinct from other work in that it firstly, attempts to quantify toponym ambiguity, and secondly, it uses an unsupervised approach based on learning from noisy machine-annotated corpora using publicly available gazetteers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML