File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3106_intro.xml

Size: 1,856 bytes

Last Modified: 2025-10-06 14:02:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3106">
  <Title>Clustering MeSH Representations of Biomedical Literature</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Document Collections
</SectionTitle>
    <Paragraph position="0"> Collections of documents can be obtained by several means. In the simplest situation, any sample of documents contained in PubMed can be obtained for the purposes of document clustering. Such a sampling may provide insight into the whole of PubMed, but is most likely not useful for specific text mining tasks.</Paragraph>
    <Paragraph position="1"> A more useful approach for targeted text mining is to build a query or collection of queries centered around a concept. For example, in studying prostate cancer, the query string prostate cancer is given to PubMed. The documents matching the query for prostate cancer are retrieved and processed for document clustering. The identified clusters represent potential topics contained in prostate cancer research. This approach has been used to build concept profiles for several text mining tasks (Srinivasan and Wedemeyer, 2003; Srinivasan, to appear).</Paragraph>
    <Paragraph position="2"> Other possible methods for obtaining document collections exist as well. In obtaining documents for a genome database, such as the Rat Genome Database (RGD) (Twigger et al., 2002), human curators combine queries of PubMed with an exhaustive reading of a limited number of journals. This may be viewed as another form of a concept-based collection. In this case, however, the collection captures several ill defined concepts; ones that cannot be specified with a small number of PubMed queries.</Paragraph>
    <Paragraph position="3"> This investigation considers both methods of obtaining document collections.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML