File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0109_concl.xml
Size: 1,728 bytes
Last Modified: 2025-10-06 13:57:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0109"> <Title>EXPLOITING TEXT STRUCTURE FOR TOPIC IDENTIFICATION</Title> <Section position="8" start_page="108" end_page="109" type="concl"> <SectionTitle> 6. FINAL REMARKS </SectionTitle> <Paragraph position="0"> Two major benefits of using text structure in topic identification are an improvement in effectiveness and a considerable reduction of the text volume necessary for the correct identification of text topics.</Paragraph> <Paragraph position="1"> For instance, a fixed-length model using the first 20-word block requires only about one tenth of the words that are used in a full-text model and still performs significantly and consistently better than the latter. Contrary to our expectation, the results of the experiments cast some doubt as to the usefulness of paragraphs for topic identification.</Paragraph> <Paragraph position="2"> Fig. 10 shows a particular implementation of the present method (full-text model), which operates in the Emacs environment. The article shown in Fig. 10 is from Nihon Keizai Shimbun (1992).</Paragraph> <Paragraph position="3"> As a conclusion, let us mention a few points. The present paper demonstrated that evidence on text structure enhanced the performance on the identification of topical words in texts, which is based on a probabilistic model of text categorization. Importantly, we used texts which are not explicitly structured. A text structure is identified by measuring the similarity between segments comprising the text and its title. It was shown clearly that a text structure thus identified gives a good clue to finding out parts of the text most relevant to its content.</Paragraph> </Section> class="xml-element"></Paper>