File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0107_evalu.xml
Size: 3,543 bytes
Last Modified: 2025-10-06 13:58:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0107"> <Title>Bootstrapping toponym classifiers</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> We evaluate our system's performance on geographic name disambiguation using two tasks. For the first task, we use the same sort of untagged raw text used in training. We simply find the toponyms with disambiguating labels -- e.g., &quot;Portland, Maine&quot; --, remove the labels, and see if the system can restore them from context. For the second task, we use texts all of whose toponyms have been marked and disambiguated. The earlier heuristic system described in (Smith and Crane, 2001) was run on the texts and all disambiguation choices were reviewed by a human editor.</Paragraph> <Paragraph position="1"> Table 4 shows the results of these experiments. The baseline accuracy was briefly mentioned above: if a toponym has been seen in training, select the state or country with which it was most frequently associated. If a site was not seen, select the most frequent state or country from among the candidates in the gazetteer. The columns for &quot;seen&quot; and &quot;new&quot; provide separate accuracy rates for toponyms that were seen in training and for those that were not. Finally, the overall accuracy of the trained system is reported. For the American Memory and Civil War corpora, we report results on the hand-tagged as well as the raw text.</Paragraph> <Paragraph position="2"> Not surprisingly, in light of its lower conditional entropy, disambiguation in news text was the most accurate, at 87.38%. Not only was the system accurate on news text overall, but it degraded the least for unseen toponyms.</Paragraph> <Paragraph position="3"> The relative accuracy on the American Memory and Civil Hand-tagged data were available for the American Memory and Civil War corpora.</Paragraph> <Paragraph position="4"> War texts is also consistent with the entropies presented above. The classifier shows a more marked degradation when disambiguating toponyms not seen in training.</Paragraph> <Paragraph position="5"> The accuracy of the classifier on restoring states and countries in raw text is significantly, but not considerably, higher than the baseline. It seems that many of toponyms mentioned in text might be only loosely connected to the surrounding discourse. An obituary, for example, might mention that the deceased left a brother, John Doe, of Arlington, Texas. Without tagging our test sets to mark such tangential statements, it would be hard to weigh errors in such cases appropriately.</Paragraph> <Paragraph position="6"> Although accuracy on the hand-tagged data from the American memory corpus was better than for the raw text, performance on the Civil War tagged data (Grant's Memoirs) was abysmal. Most of this error seems came from toponyms unseen in training, for with the accuracy was 9.38%. In both sets of tagged text, moreover, the full classifier performed below baseline accuracy due to problems with unseen toponyms. The back-off state models are clearly inadequate for the minute topographical references Grant makes in his descriptions of campaigns. Including proximity to other places mentioned is probably the best way to overcome this difficulty. These problems suggest that we need to more robustly generalize from the kinds of environments with labelled toponyms to those without.</Paragraph> </Section> class="xml-element"></Paper>