File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0207_metho.xml
Size: 3,923 bytes
Last Modified: 2025-10-06 14:09:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0207"> <Title>Text Type Structure and Logical Document Structure</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.5 Results </SectionTitle> <Paragraph position="0"> We performed several hundred classification tests with different combinations of data representation, classification algorithm, and classifier parameter setting. Table 1 summarizes some results of these experiments. The baseline (a 'classifier' guessing always the most frequent topic type) had an accuracy of 22%.</Paragraph> <Paragraph position="1"> The best combination of data representation and classifier setting achieved about 47% accuracy. In this configuration we used a mixture of the compound head representation (40%), the POS tag distribution (40%), the segment size (10%), and the selected DocBook features (10%). However, the combination of compound heads (50%) and part-of-speech tags (50%) and a similar combination inclassifier feature K E accuracy accuracy cluding a 2% portion of DocBook path structure features had similar results. In all experiments the KNN algorithm performed better than the simplified Rocchio algorithm. For illustrative purpose, we also included a configuration, where all other segments (i.e. including those from the same document) were available as training segments ('KNN*' in the second line of table 1).</Paragraph> <Paragraph position="2"> The variation of classification accuracy was very high both across the topic types and across the documents. In the best configuration of our classification experiments the average segment classification accuracy per document had a range from 22% to 77%, reflecting the fact that the document collection was very heterogeneous in many respects. The topic type resource had an average recall of 97.56% and an average precision of 91.86%, while several other topic types, e.g. rationale and dataAnalysis were near to zero both w.r.t. precision and recall.</Paragraph> <Paragraph position="3"> The most frequent error was the incorrect assignment of topic type othersWork to segments of topic types framework, concepts, and background.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.6 Discussion </SectionTitle> <Paragraph position="0"> The task of classifying small text segments, as opposed to whole documents, is a rather new application field for general domain-independent text categorization methods. Thus, we lack data from previous experiments to compare our own results with. Nevertheless, there are some conclusions to be drawn from our experiments.</Paragraph> <Paragraph position="1"> Although the results probably suffer from limitations of our data collection (small sample size, restricted thematic domain), our main conclusion is that at least some of the topic types of our hierarchy are successfully learnable. It is, however, questionable if an overall accuracy of less than 50% is sufficient for applications that require a high reliability. Moreover, it should be emphasized that our classification experiments were carried out on the basis of manually segmented input.</Paragraph> <Paragraph position="2"> The usage of structural information improved the accuracy results slightly, but the impact of this information source was clearly below our expectations.</Paragraph> <Paragraph position="3"> The effect of adding this kind of information was within the range of improvements which can also be achieved by fine-tuning a classifier parameter, such as K.</Paragraph> <Paragraph position="4"> A somewhat surprising result was that a pure part-of-speech tag representation achieved nearly 42% accuracy in combination with the bigram model.</Paragraph> <Paragraph position="5"> The usage of a bigram model improved the results in almost all configurations.</Paragraph> </Section> </Section> class="xml-element"></Paper>