File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0207_concl.xml

Size: 1,779 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0207">
  <Title>Text Type Structure and Logical Document Structure</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The best combination of data representation and classifier configuration included ch (40%), pos (40%), sz (10%) and df (10%), combined with a topic type bigram model, which yielded an accuracy of 47%. However, almost the same accuracy could be achieved by selecting ch and pos features only. Other test runs showed that the dbp features could not improve the results in any combination, although these features are the ones that indicate where a segment is situated in an article. An inspection of data representations revealed that, for a particular test document (i.e. text segment), the majority of training documents with an identical dpb representation are often assigned the desired topic type, but this majority is so small that many other test documents with identical dbp representation are mis-classified. An accuracy improvement might therefore be achieved by running different (local) KNN classifiers trained on different feature sets and combine their results afterwards.</Paragraph>
    <Paragraph position="1"> More future work will focus on the inspection of categories that have a very low precision and recall (such as rationale) with a possible review of the text type ontology. Furthermore, we aim at testing alternative algorithms (e.g. support vector machines), feature selection methods and at enlarging our training set. Besides, we will investigate the question, inhowfar our results are generalizable to scientific articles from other disciplines and languages.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML