File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/w04-3110_relat.xml
Size: 4,248 bytes
Last Modified: 2025-10-06 14:15:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3110"> <Title>A Large Scale Terminology Resource for Biomedical Text Processing</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Since identification and classification of technical terms in biomedical text is an essential step in information extraction and other natural language processing tasks, most natural language processing systems contain a terminological resource of some sort. Some systems make use of existing terminological resources, notably the UMLS Metathesaurus, e.g. Rindflesch et al. (2000), Pustejovski et al. (2002); other systems rely on resources that have been specifically built for the application, e.g. Humphreys et al. (2000), Thomas et al. (2000). The UMLS Metathesaurus provides a semantic classification of terms drawn from a wide range of vocabularies in the clinical and biomedical domain (Humphreys et al., 1998). It does so by grouping strings from the source vo- null cabularies that are judged to have the same meaning into concepts, and mapping these concepts onto nodes or semantic types in a semantic network. Although the UMLS Metathesaurus is used in a number of biomedical natural language processing applications, we have decided not to adopt the UMLS Metathesaurus as the primary terminology resource in AMBIT for a variety of reasons.</Paragraph> <Paragraph position="1"> One of the reasons for this decision is that the Metathesaurus is a closed system: strings are classified in terms of the concepts and the semantic types that are present in the Metathesaurus and the semantic network, whereas we would like to be able to link our terms into multiple ontologies, including in-house ontologies that do not figure in any of the Metathesaurus' source vocabularies and hence are not available through the Metathesaurus.</Paragraph> <Paragraph position="2"> Moreover, we would also like to be able to have access to additional terminological information that is not present in the Metathesaurus, such as, for example, the annotations in the Gene Ontology (The Gene Ontology Consortium, 2001) assigned to a given human protein term.</Paragraph> <Paragraph position="3"> While the terms making up the the tripartite Gene Ontology are present in the UMLS Metathesaurus, assignments of these terms to gene products are not recorded in the Metathesaurus. Furthermore, as new terms appear constantly in the biomedical field we would like to be able to instantly add these to our terminological resource and not have to wait until they have been included in the UMLS Metathesaurus. Additionally, some medical terms appearing in patient notes are hospital-specific and are unlikely to be included in the Metathesaurus at all.</Paragraph> <Paragraph position="4"> With regard to systems that do not use the UMLS Metathesaurus, but rather depend on terminological resources that have been specifically built for an application, we note that these terminological resources tend to be limited in the following two respects. First, the structure of these resources is often fixed and in some cases amounts to simple gazetteer lists. Secondly, because of their fixed structure, these resources are usually populated with content from just a few sources, leaving out many other potentially interesting sources of terminological information.</Paragraph> <Paragraph position="5"> Instead, we intend for Termino to be an extensible resource that can hold diverse kinds of terminological information. The information in Termino is either imported from existing, outside knowledge sources, e.g. the Enzyme Nomenclature (http://www.</Paragraph> <Paragraph position="6"> chem.qmw.ac.uk/iubmb/enzyme/), the Structural Classification of Proteins database (Murzin et al., 1995), and the UMLS Metathesaurus, or it is induced from on-line raw text resources, e.g. Medline abstracts. Termino thus provides uniform access to terminological information aggregated across many sources. Using Termino removes the need for multiple, source-specific terminological components within text processing systems that employ multiple terminological resources.</Paragraph> </Section> class="xml-element"></Paper>