File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0625_intro.xml

Size: 1,951 bytes

Last Modified: 2025-10-06 14:07:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0625">
  <Title>Normalized? Yes Yes Yes No Yes No Yes Yes Yes Yes Yes Yes</Title>
  <Section position="3" start_page="203" end_page="203" type="intro">
    <SectionTitle>
2 Definition of Similarity
</SectionTitle>
    <Paragraph position="0"> Similarity is a complex concept which has been widely discussed in the linguistic, philosophical, and information theory communities. For example, Frawley \[1992\] discusses all semantic typing in terms of two mechanisms: the detection of similarity and difference. Jackendoff \[1983\] argues that standard semantic relations such as synonymy, paraphrase, redundancy, and entailment all result from judgments of likeness whereas antonymy, contradiction, and inconsistency derive from judgments of difference. Losee \[1998\] reviews notions of similarity and their impact on information retrieval techniques. null For our task, we define two text units as similar if they share the same focus on a common concept, actor, object, or action. In addition, the common actor or object must perform or be subjected to the same action, or be the sub-ject of the same description. For example, Figure 1 shows three input text fragments (paragraphs) taken from the TDT pilot corpus (see Section 5.1)', all from the same topic on the forced landing of a U.S. helicopter in North Korea. null We consider units (a) and (b) in Figure 1 to be similar, because they both focus on the same event (loss of contact) with the same primary participant (the helicopter). On the other hand, unit (c) in Figure 1 is not similar to either (a) or (b). Although all three refer to a helicopter, the primary focus in (c) is on the emergency landing rather than the loss of contact.</Paragraph>
    <Paragraph position="1"> We discuss an experimental validation of our similarity definition in Section 5.2, after we introduce the corpus we use in our experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML