File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/x96-1033_intro.xml

Size: 3,751 bytes

Last Modified: 2025-10-06 14:06:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1033">
  <Title>A SIMPLE PROBABILISTIC APPROACH TO CLASSIFICATION AND ROUTING</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> One of the goals of the TIPSTER Phase H Extraction Project \[Contract Number 94-F133200-000\] has been to integrate extraction and detection technologies. In this paper we extend previous work (Guthde, et al) \[1\] on classifying texts into categories, and develop a methodology based on the classification technique for routing documents.</Paragraph>
    <Paragraph position="1"> By classifying and routing texts into categories we mean to include a variety of applications; categorizing texts by topic, by the language the text is written in, or by relevance to a specified task. The techniques used here are not language specific and can be applied to any language or domain.</Paragraph>
    <Paragraph position="2"> 2.1. The Intuitive Model The mathematical model we use in this paper formaliTes the intuitive notion that humans can identify the topic of an UlffamilJar article based on the occurrence of topic specific words and phrases. Note that most people can tell that the first passage below is about music, even though the word 'music' is not in the passage. Similarly, most people can tell that the second passage is from a sports article, even though the word 'sport' is never mentioned.</Paragraph>
    <Paragraph position="3"> &amp;quot;Before the release of his last studio album, 1993&amp;quot;s 'Ten Summoner's Tales', Sting commented that he could no longer put his whole heart into his work; it left him feeling too vulnerable. Not surprisingly, that disc was well-crafted, but a bit void of feeling--unfortunate, considering the wondrous synergy of heart and craft on Sting's masterwork, 1987's 'Nothing Like the Sun'.</Paragraph>
    <Paragraph position="4"> Sadly, 'Mercury Falling' makes 'Ten Summoner's Tales' seem brilliant by comparison, lf s as if Sting only made it because he looked at his calendar one day and realized, by golly, that it was time to make another record. Easily the worst album of what has until now been a remarkably successful career, the disc is aptly named: the temperature never seems to rise on this turgid effort.&amp;quot;  &amp;quot;Walter McCarty scored 24 points and Antoine Walker had 14 and nine rebounds as Kentucky pulled away in the second half to beat upstart San Jose State, 110-72, in the first round of the Midwest Regional in Dallas.</Paragraph>
    <Paragraph position="5"> The Wildcats (28-3), who are seeking their first national championship since 1978, will meet the winner of the Wisconsin-Green Bay-Virginia Tech game on Saturday at Reunion Arena.</Paragraph>
    <Paragraph position="6"> San Jose State, which was making its first NCAA Tournament appearance, gave Kentucky all it could handle in the first half, tying the game at 37-37 with 2:50 to play. The Wildcats then closed out the first half</Paragraph>
    <Paragraph position="8"> with an 11-4 run to build a 47-41 advantage at the intermission. null Olivier Saint-Jean finished with 18 points and seven rebounds for the Spartans (13-17), who were one of two teams in the NCAA Tournament with a losing record.&amp;quot;  The music passage has many music related words such as 'studio', 'album', 'disc', and 'record', and the sports passage has many sports related words such as 'scored', 'beat', 'championship', 'game', and 'rebounds'. Any of these words taken singly would not necessarily give a strong indication about the passage topic, but taken together they can predict with a high degree of certainty the topic of the passage.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML