File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2176_intro.xml

Size: 2,987 bytes

Last Modified: 2025-10-06 14:06:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2176">
  <Title>Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities</Title>
  <Section position="3" start_page="1072" end_page="1072" type="intro">
    <SectionTitle>
2 Problem Description
</SectionTitle>
    <Paragraph position="0"> Let's define the relation DescriptionOf(E) to be the one between a named entity E and a noun phrase, D, describing the named entity.</Paragraph>
    <Paragraph position="1"> In the example shown in Table 1, there are two entity-description pairs.</Paragraph>
    <Paragraph position="3"> U.N. arms inspector&amp;quot; Chief U.N. arms inspector Richard Butler met Iraq's Deputy Prime Minister Tareq Aziz Monday after rejecting Iraqi attempts to set deadlines for finishing his work.</Paragraph>
    <Paragraph position="4">  Each entity appearing in a text can have multiple descriptions (up to several dozen) associated with it.</Paragraph>
    <Paragraph position="5"> We call the set of all descriptions related to the same entity in a corpus, a profile of that entity. Profiles for a large number of entities were compiled using our earlier system, PROFILE (Radev and McKeown, 1997). It turns out that there is a large variety in the size of the profile (number of distinct descriptions) for different entities. Table 1 shows a subset of the profile for Ung Huot, the former foreign minister of Cambodia, who was elected prime minister at some point of time during the run of our experiment. A few sample semantic features of the descriptions in Table 1 are shown as separate columns.</Paragraph>
    <Paragraph position="6"> We used information extraction techniques to collect entities and descriptions from a corpus and analyzed their lexical and semantic properties. null We have processed 178 MB 1 of newswire and analyzed the use of descriptions related to 11,504 entities. Even though PROFILE extracts other entities in addition to people (e.g., 1The corpus contains 19,473 news stories that cover the period October 1, 1997 - January 9, 1998 that were available through PROFILE.</Paragraph>
    <Paragraph position="7"> places and organizations), we have restricted our analysis to names of people only. We claim, however, that a large portion of our findings relate to the other types of entities as well. We have investigated 35,206 tuples, consisting of an entity, a description, an article ID, and the position (sentence number) in the article in which the entity-description pair occurs. Since there are 11,504 distinct entities, we had on average 3.06 distinct descriptions per entity (DDPE). Table 2 shows the distribution of DDPE values across the corpus. Notice that a large number of entities (9,053 out of the 11,504) have a single description. These are not as interesting for our analysis as the remaining 2,451 entities that have DDPE values between 2 and 24.</Paragraph>
    <Paragraph position="9"/>
  </Section>
class="xml-element"></Paper>
Download Original XML