File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-2013_abstr.xml

Size: 944 bytes

Last Modified: 2025-10-06 13:44:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2013">
  <Title>WordNet-based Text Document Clustering</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include ambiguity and synonymy, the former allowing for erroneous groupings and the latter causing similarities between documents to go unnoticed. In this research, na&amp;quot;ive, syntax-based disambiguation is attempted by assigning each word a part-of-speech tag and by enriching the 'bag-ofwords' data representation often used for document clustering with synonyms and hypernyms from WordNet.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML