File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1046_intro.xml
Size: 2,777 bytes
Last Modified: 2025-10-06 14:05:53
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1046"> <Title>Knowledge-based Automatic Topic Identification</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> As the amount of text available online keeps growing, it becomes increasingly difficult for people to keep track of and locate the information of interest to them. To remedy the problem of information overload, a robust and automated text summarizer or information extrator is needed. Topic identification is one of two very important steps in the process of summarizing a text; the second step is summary text generation.</Paragraph> <Paragraph position="1"> A topic is a particular subject that we write about or discuss. (Sinclair et al., 1987). To identify the topics of texts, Information Retrieval (IR) researchers use word frequency, cue word, location, and title-keyword techniques (Paice, 1990). Among these techniques, only word frequency counting can be used robustly across different domains; the other techniques rely on stereotypical text structure or the functional structures of specific domains.</Paragraph> <Paragraph position="2"> Underlying the use of word frequency is the assumption that the more a word is used in a text, the more important it is in that text. This method 1This research was funded in part by ARPA under order number 8073, issued as Maryland Procurement Contract # MDA904-91-C-5224 and in part by the National Science Foundation Grant No. MIP 8902426.</Paragraph> <Paragraph position="3"> recognizes only the literal word forms and nothing else. Some morphological processing may help, but pronominalization and other forms of coreferentiality defeat simple word counting. Furthermore, straightforward word counting can be misleading since it misses conceptual generalizations. For example: &quot;John bought some vegetables, fruit, bread, and milk.&quot; What would be the topic of this sentence? We can draw no conclusion by using word counting method; where the topic actually should be: &quot;John bought some groceries.&quot; The problem is that word counting method misses the important concepts behind those words: vegetables, fruit, etc. relates to groceries at the deeper level of semantics. In recognizing the inherent problem of the word counting method, recently people have started to use artificial intelligence techniques (Jacobs and ttau, 1990; Mauldin, 1991) and statistical techniques (Salton et al., 1994; Grefenstette, 1994) to incorporate the sementic relations among words into their applications. Following this trend, we have developed a new way to identify topics by counting concepts instead of words.</Paragraph> </Section> class="xml-element"></Paper>