File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1054_abstr.xml
Size: 5,123 bytes
Last Modified: 2025-10-06 13:43:17
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1054"> <Title>Using knowledge from WordNet for conceptual</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing project to create a new lexical database called Medical WordNet (MWN), consisting of medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide medically validated contexts for MWN terms.</Paragraph> <Paragraph position="1"> The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications.</Paragraph> <Paragraph position="2"> 1 From WordNet to Medical WordNet WordNet is the principal lexical database used in natural language processing (NLP) research and applications. (Miller, 1995), (Fellbaum, ed., 1998) While WordNet's current version (2.0) has broad medical coverage, it manifests a number of defects, which reflect both the lack of domain expertise on the part of the responsible lexicographers, and also the fact that WordNet was not built for domain-specific applications. The research community has long been aware of these defects (Magnini and Strapparava, 2001), (Bodenreider and Burgun, 2002), (Burgun and Bodenreider, 2001), (Bodenreider, et al., 2003). Our response is to create Medical WordNet (MWN), a free-standing lexical database designed specifically for the needs of natural-language processing in the medical domain, with the goal of removing the 'noise' which is associated with the application of WordNet and similar resources to this specialized domain.</Paragraph> <Paragraph position="3"> MWN's initial focus is on English single-word expressions as used and understood by nonexperts. We systematically review WordNet's existing medical coverage by assembling a validated corpus of sentences involving specific medically relevant vocabulary. Input to our validation process includes the definitions of medical terms already existing in WordNet, and also sentences generated via the semantic relations linking such terms in WordNet. In addition, input includes sentences derived from online medical information services targeted to consumers.</Paragraph> <Paragraph position="4"> Our methodology is designed (1) to document natural language sentential contexts for each relevant word sense in such a way that the expressed information can be (2) validated by medical experts and (3) accessed automatically by NLP applications such as information retrieval, machine translation, question-answer systems, and text summarization.</Paragraph> <Paragraph position="5"> A major stumbling block for existing NLP applications is automatic sense disambiguation. An automatic system can detect with high reliability that a given occurrence of a word like feel or dead is a verb or adjective. But it cannot easily determine which of a variety of alternative meanings such polysemous words have in any given context.</Paragraph> <Paragraph position="6"> WordNet's architecture, designed for representing and distinguishing word senses, has made an important contribution towards a solution of the automatic word sense disambiguation problem. Our corpus of English language sentences relating to medical phenomena is designed to build upon this contribution. The corpus is restricted to grammatically complete, syntactically simple sentences in natural language which have been rated as understandable by non-expert human subjects in controlled questionnaire-based experiments. It is restricted in addition to sentences which are self-contained in the sense that they make no reference to any prior context and do not contain any proper names, or anaphoric elements (like it or he or then) that need to be interpreted with respect to other sentences or some surrounding discourse or context. This corpus is designed to be used initially for purposes of quality assurance of MWN and also to support the population of MWN by yielding new families of words and word senses for inclusion. As will become clear, however, our use of human validators will allow us to extend the usefulness of the corpus in a variety of ways. Thus we can use it to build new sorts of applications for information retrieval in the domain of consumer health. But it also allows new avenues of research in linguistics and psychology, for example in allowing us to explore individual and group differences in medical knowledge and vocabulary, and in understanding non-expert medical reasoning and decision-making.</Paragraph> </Section> class="xml-element"></Paper>