File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0106_evalu.xml

Size: 1,745 bytes

Last Modified: 2025-10-06 13:59:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0106">
  <Title>InfoXtract location normalization: a hybrid approach to geographic references in information extraction [?]</Title>
  <Section position="7" start_page="7" end_page="7" type="evalu">
    <SectionTitle>
5 Experiment and Benchmark
</SectionTitle>
    <Paragraph position="0"> With the information from local context, discourse context and the knowledge of default senses, the location normalization process is efficient and precise.</Paragraph>
    <Paragraph position="1"> The testing documents were randomly selected from CNN news and from travel guide web pages. Table 2 shows the preliminary testing results using different configurations.</Paragraph>
    <Paragraph position="2"> As shown, local patterns (Column 4) alone contribute 12% to the overall performance while proper use of defaults senses and the heuristics (Column 5) can achieve close to 90%. In terms of discourse co-occurrence evidence, the new method using Prim's algorithm (Column 7) is clearly better than the previous method using Kruskal's algorithm (Column 6), with 13% enhancement (from 73.8% to 86.6%). But both methods cannot outperform default senses. Finally, when using all three types of evidence, the new hybrid method presented in this paper shows significant performance enhancement (96% in Column 9) over the previous method (81.9% in Column 8), in addition to a satisfactory solution to the efficiency problem.</Paragraph>
    <Paragraph position="3">  We observed that if a file contains more concentrated locations, such as the state introductions in the travel guides for California, Florida and Texas, the accuracy is higher than the relatively short news articles from CNN.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML