File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1305_evalu.xml
Size: 9,293 bytes
Last Modified: 2025-10-06 13:58:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1305"> <Title>Topic Analysis Using a Finite Mixture Model</Title> <Section position="10" start_page="40" end_page="42" type="evalu"> <SectionTitle> 7 Experimental Results </SectionTitle> <Paragraph position="0"> We have evaluated the performance of our topic analysis method (STM) in terms of three aspects: topic structure adequacy, text segmentation accuracy, and topic identification accuracy.</Paragraph> <Section position="1" start_page="40" end_page="40" type="sub_section"> <SectionTitle> 7.1 Data Set </SectionTitle> <Paragraph position="0"> We know of no data available for the purpose of evaluation of topic analysis. We thus utilized Reuters news articles referred to as 'Reuters-21578,' which has been widely used in text classification v. We used a prepared SAn exception is the method proposed in (McCallure and Nigam, 1999), which, instead of labeled texts, uses unlabeled texts, pre-determined categories, and keywords defined by humans for each category.</Paragraph> <Paragraph position="1"> rAvailable at http://www.reseaxch.att.com/lewis/.</Paragraph> <Paragraph position="2"> split of the data 'Apte split,' which consists of 9603 texts for training and 3299 texts for test. All of the texts had already been classified into 90 categories by human subjects.</Paragraph> <Paragraph position="3"> For each text, we used the Oxford Learner's Dictionary s to conduct stemming, and removed 'stop words' (e.g., 'the,' 'and') that we had included on a previously prepared list.</Paragraph> <Paragraph position="4"> The average length of a text was about 115 words. (We did not use phrases, however, which would further improve experimental re- sults.)</Paragraph> </Section> <Section position="2" start_page="40" end_page="40" type="sub_section"> <SectionTitle> 7.2 Word Clustering </SectionTitle> <Paragraph position="0"> We conducted word clustering with 9603 training texts. 7340 individual words had a total frequency of more than 5, and we used them as seeds with which to collect frequently co-occurring words. The threshold for clustering 7 was set at 0.005, and this yielded 970 word clusters having more than one word (i.e., not simply containing a seed word alone). Note that the category labels of the training texts need not be used in clustering.</Paragraph> <Paragraph position="1"> We next conducted a topic analysis on all the 3299 texts. The thresholds of l, h, and 0 were set at 20, 3, and 0.05, respectively, on the basis of preliminary experimental results.</Paragraph> </Section> <Section position="3" start_page="40" end_page="41" type="sub_section"> <SectionTitle> 7.3 Topic Structure </SectionTitle> <Paragraph position="0"> We looked at the topic structures of the 3299 texts obtained by our method to determine how well they conformed to human intuition.</Paragraph> <Paragraph position="1"> For topic identification in this experiment, clusters in each block were sorted in descending order of their probabilities, and the top 7 seed words were extracted to represent the topics of the block.</Paragraph> <Paragraph position="2"> Figure 3 show results for the text with ID 14826; they generally agree well with human intuition. The text has been segmented into 5 blocks and the topics of each block is represented by 7 seed words. The main topic is represented by the seed-words 'trade-exporttariff-import.' The subtopics are represented by 'Japan-Japanese,' 'Taiwan,' 'Hong Kong,' etc. There were, however, a small number of errors. For example, the text should also have been segmented after sentences 11 and 13, but, due to limited sentence content, it was not. Furthermore, assigning subtopic of 'Button' (from 'Mr. Button') into block 3 (due to the high Shannon information value of the word 'Button') was also undesirable.</Paragraph> <Paragraph position="3"> SAvailable at ftp://sable.ox.ac.uk.</Paragraph> <Paragraph position="4"> identification vcords earning, share, profit, dividend acquisition, acquire, sell, buy currency, dollar, yen, stg grain, cereal, crop oil, crude, gas trade, export, import, tariff interest & rate ship, vessel, ferry, tanker wheat cori1, maize</Paragraph> </Section> <Section position="4" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 7.4 Main Topic Identification </SectionTitle> <Paragraph position="0"> We conducted an evaluation to determine whether or not the main topics in the topic structures obtained for the 3299 test texts could be approximately matched with the labels (categories) assigned to the test texts.</Paragraph> <Paragraph position="1"> Note that here labels are used only for evaluation, not for training. This is in contrast to the situation in most text classification experiments, in which labels are generally used both for training and for evaluation. It is not particularly meaningful, then, to compare the results for main topic identification obtained here with those for text classification.</Paragraph> <Paragraph position="2"> With STM, clusters in each block were sorted in descending order of their probabilities, and the top k seed words were extracted to represent the topics of the block. Furthermore, a seed word appearing in all the blocks of the text was considered to represent a main topic. When a text had not been segmented (i.e., has only one block), all top k seed words were considered to represent main topics.</Paragraph> <Paragraph position="3"> Table 1 lists the largest 10 categories in the Reuters data. On the basis of the definition of each of the 10 categories, we assigned based on our intuition to each of them the identification words that are listed in Table 1.</Paragraph> <Paragraph position="4"> For the evaluation, when the seed words for main topics contained at least one of the identification words, we considered our method to have identified the corresponding main topic equivalent to a human-determined category.</Paragraph> <Paragraph position="5"> We then evaluated these in terms of precision and recall. Here, precision is defined as the ratio of the number of decisions correctly made to the total number of decisions made.</Paragraph> <Paragraph position="6"> Recall is defined as the ratio of the mrmber of decisions correctly made to the total number of decisions which should have been correctly made.</Paragraph> <Paragraph position="7"> We also looked at the performance of Corn (cf., Section 6). For Corn, we extracted from a text the key words with the 20 largest Shannon information values, segmented the text using TextTiling, and extracted in each block the key words having the largest k probability values. Any key word extracted in all blocks was considered to represent a main topic. When the key words for main topics contained at least one of the identification words, we viewed that text as having the corresponding main topic.</Paragraph> <Paragraph position="8"> Table 2 shows the results achieved with STM and Corn in the case of k ~-- 7. 9 Table 3 shows the results in the case of k = 5. The comparison may be considered fair in that it requires each of the two methods to provide the same number of words to represent topics. Results indicate that STM significantly outperforms Corn, particularly in terms of recall. null The main reason for the higher performance achieved by STM is that it utilizes word cluster information. Figure 8 shows topic analysis results for the text with ID 15572 labeled with 'wheat.' The text contains only 15 content words (word types), thus all of the 15 words were extracted as key words and the text was not segmented by either method. Corn was unable to identify the main topic 'wheat,' because the probability of each of the relevant key words 'wheat' and 'flour' was low. In contrast, STM successfully identified the topic because the relevant key words were classified into the same cluster, and its probability was relatively high.</Paragraph> </Section> <Section position="5" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 7.5 Segmentation and Subtopic Identification </SectionTitle> <Paragraph position="0"> We collected the 50 longest test texts (referred to here as 'seed texts') from each of the 10 categories, and combined each with a test text randomly selected from other categories to produce 500 pseudo-texts. Placement of the seed text within its pseudo-text (i.e., before or after the other text) was determined randomly.</Paragraph> <Paragraph position="1"> We used both STM and Corn to segment each of the pseudo-texts into two blocks and identify subtopics. Table 4 shows the segmentation results for the two method evaluated ample, (Lewis and Ringnette, 1994).</Paragraph> <Paragraph position="2"> in terms of recall, precision, and error probability. Table 5 shows the results of subtopic identification as evaluated in terms of recall and precision. Error probability is a metric for evaluating segmentation results proposed in (Allan et ai., 1998; Beeferman etal., 1999). It is defined here as the probability that a randomly chosen pair of sentences a distance of k sentence apart is incorrectly segmented. 1deg Experimental results indicate that STM outperforms Corn in both segmentation and identification, n</Paragraph> </Section> </Section> class="xml-element"></Paper>