File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0107_concl.xml

Size: 2,327 bytes

Last Modified: 2025-10-06 13:53:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0107">
  <Title>Bootstrapping toponym classifiers</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> Lack of labelled training or test data is the bane of many word sense disambiguation efforts. For geographic name disambiguation, we can extract training and test instances from contexts where the toponyms are disambiguated by the document's author. Tagging accuracy is quite good, especially for news texts, which have a lower entropy in the disambiguation task. In real applications, however, we do not usually need to disambiguate toponyms that already have state or country labels; we need to disambiguate unmarked place names. We investigated the ability of our classifier to generalize by evaluating on hand-corrected texts with all toponyms marked and disambiguated. The mixed results show that more generalization power is needed in our models, particularly the back-off models that handle toponyms unseen in training.</Paragraph>
    <Paragraph position="1"> In future work, we hope to try further methods from WSD such as decision lists and transformation-based learning on the GND task. In any event, we hope that this should improve the accuracy on toponyms seen in training. As for disambiguating unseen toponyms, incorporating our prior work on heuristic proximity-base disambiguation into the probabilistic framework would be a natural extension. A fully hand-corrected test corpus of news text would also provide us with more robust evidence for classifier generalization.</Paragraph>
    <Paragraph position="2"> Evidence learned by classifiers to disambiguate toponyms includes the names of prominent people and industries in a particular place, as well as the topics and dates of current and historical events, and the titles of newspapers (see figures 1 and 2). In our news training corpus, for example, Hawaii was most strongly collocated with &amp;quot;lava&amp;quot; and Poland with &amp;quot;solidarity&amp;quot; (case was ignored). In addition to their use for GND, such associations should be useful in their own right for event detection (Smith, 2002), personal name disambiguation, and augmenting the information in gazetteers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML