File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1025_concl.xml

Size: 1,459 bytes

Last Modified: 2025-10-06 13:53:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1025">
  <Title>Language and Task Independent Text Categorization with Simple Language Models</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented an extremely simple approach for language and task independent text categorization based on character level n-gram language modeling. The approach is evaluated on four different languages and four different text categorization problems. Surprisingly, we observe state of the art or better performance in each case.</Paragraph>
    <Paragraph position="1"> We have also experimentally analyzed the influence of two factors that can affect the accuracy of this approach, and found that for the most part the results are robust to perturbations of the basic method. The wide applicability and simplicity of this approach makes it immediately applicable to any sequential data (such as natural language, music, DNA) and yields effective baseline performance. We are currently investigating more challenging problems like multiple category classification using the Reuters-21578 data set (Lewis, 1992) and subjective sentiment classification (Turney, 2002). To us, these results suggest that basic statistical language modeling ideas might be more relevant to other areas of natural language processing than commonly perceived.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML