XML Viewer - h01-1008

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1008_intro.xml
Size: 2,353 bytes
Last Modified: 2025-10-06 14:01:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1008">
  <Title>Assigning Belief Scores to Names in Queries</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. DESCRIPTION OF MATCH
PROBABILITY CALCULATION FOR
PERSON NAMES
</SectionTitle>
    <Paragraph position="0"> The motivation for our work is an effort to develop a name search operator to find attorneys and judges in the news. In our particular application, we wish to allow users to search for newspaper references to attorneys and judges listed in a directory of U.S. legal professionals. This directory contains the curriculum vitae of approximately one million people. In this section, we show how we calculate person name match probability.</Paragraph>
    <Paragraph position="1"> To compute the probability of relevance or match probability for a name, we perform three steps. First, we compute a probability distribution for the first and last names in our name directory. This is our language model. Second, we compute a name's probability by multiplying its first name probability with its last name probability. Third, we compute its match probability by taking the reciprocal of the product of the name probability and the size of the human population likely to be referenced in the corpus. For our Wall Street Journal test corpus, we estimated this size to be approximately the size of the U.S. population or 300 million. Formulas for the three steps are shown below.</Paragraph>
    <Paragraph position="2"> where F = number of occurrences of first name, L = number of occurrences of last name, and N = number of names in the directory.</Paragraph>
    <Paragraph position="3">  where H = size of human population likely to be referenced by the collection.</Paragraph>
    <Paragraph position="4"> Example calculations for Trent Lott and John Smith are shown below in Table 1.</Paragraph>
    <Paragraph position="5"> In this example, the match probability for Trent Lott is approximately four orders of magnitude higher than the match probability for John Smith, while idf or document frequency suggests the likelihood of relevance for documents retrieved for John Smith is higher than for documents retrieved for Trent Lott. Both empirically and intuitively, match probability is a better predictor of relevance here than idf.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML