File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0308_concl.xml

Size: 7,380 bytes

Last Modified: 2025-10-06 13:53:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0308">
  <Title>Unsupervised, corpus-based method for extending a biomedical terminology</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> This study confirms the observations made in two previous studies taking advantage of adjectival modification phenomena in various tasks related to terminologies, in particular to suggest hyponymic relations among medical terms [Bodenreider et al.</Paragraph>
    <Paragraph position="1"> (2001)] and to assess the consistency of a biomedical terminology [Bodenreider et al. (2002)].</Paragraph>
    <Paragraph position="2"> Although a larger-scale evaluation would be required to fully assess the results, the major finding is that the method is effective at automatically identifying many new terms for inclusion into an extended terminological resource. However, the evaluation revealed some limitations which are analyzed below. Adaptation and generalization issues will be addressed as well.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Limitations
</SectionTitle>
      <Paragraph position="0"> The errors discovered during the manual review illustrate some of the limitations of this method.</Paragraph>
      <Paragraph position="1"> More exactly, these limitations are common to many NLP applications. Although acronyms were sometimes associated with their correct meaning in the Metathesaurus, in the set of terms reviewed manually, the presence of acronyms was responsible for 22% of the non-relevant associations. For example, the MEDLINE term individual black rats, whose two adjectival modifiers are allowable disease modifiers, is wrongly identified as a hyponym of recurrent acute tonsillitis because the acronym RAT is associated (as a synonym) with the disease recurrent acute tonsillitis in the Metathesaurus. In some cases, failure to identify the correct part of speech also resulted in inaccurate associations (e.g., controlling stress to stress where controlling was actually not an adjective). Not all truncated terms present in the Metathesaurus synonyms of some concepts are identified as such.</Paragraph>
      <Paragraph position="2"> When not identified, truncated terms are used for the mapping, sometimes resulting in inaccurate associations. For example, the candidate term urinary protein is wrongly associated with the concept protein measurement because protein is considered a synonym for the procedure protein measurement in the Metathesaurus.</Paragraph>
      <Paragraph position="3"> Sometimes, the association is not inaccurate, but the concept associated with the candidate term is very general, and the relationship weakly informative. For example, once demodified, aplastic syndrome is associated with syndrome, a concept close to the top of the hierarchy. Although aplastic syndrome is a valid hyponym of syndrome, it would be more accurately categorized as a kind of hematologic syndrome, which requires domain knowledge unavailable here.</Paragraph>
      <Paragraph position="4"> Finally, in some cases, because hyponymy is the only relation considered, the association of a candidate term with a Metathesaurus concept, although relevant, is not necessarily the closest possible. For example, the term colonic vaginal fistula is correctly associated with its hypernym vaginal fistula, but fails to be identified as a synonym of the concept fistula of vagina to large intestine.</Paragraph>
      <Paragraph position="5"> Practically, in a completely automatic setting, the use of this algorithm could result in creating several concepts for the same meaning.</Paragraph>
      <Paragraph position="6"> Tuning This algorithm can be tuned from a strict mode, allowing fewer phrases to automatically become candidate terms, but with greater precision, to a relaxed mode, selecting a larger number of candidate terms when recall is the priority. The latter would require some supervision prior to integrating the candidate terms into the terminology.</Paragraph>
      <Paragraph position="7"> Almost all the limitations mentioned above can be addressed. Terms containing acronyms could be identified and eliminated before mapping to the Metathesaurus. Part of speech taggers trained on a terminology would more accurately identify the part of speech of words that can be both adjectives and nouns. Truncated Metathesaurus terms should be systematically excluded from the index used for mapping. Methods for identifying synonymy based on derivational variation or other techniques could also be investigated.</Paragraph>
      <Paragraph position="8"> Moreover, additional refinement could be brought to this method. For example, when demodified terms are created, the removal of adjectives could be restricted to the leftmost, thus maximally preserving the structure of the remaining noun phrase, and therefore limiting the risks of association with a semantically distant concept.</Paragraph>
      <Paragraph position="9"> Finally, using statistical information about the distribution of adjectival modifiers could provide a surrogate for the strength of the association. For example, knowing that many diseases can be acute, if this adjective is found in the corpus as the modifier of a disease concept, this association could be accepted with a confidence proportional to the relative frequency of this modifier for all diseases, in the case of acute for a disease, a high confidence. null Generalization The method presented was voluntarily restricted to the domain of disorders and procedures, to adjectival modification, and to the biomedical literature. Generalizing to other domains would pose no problems as long as terms of their terminology is amenable to natural language processing techniques and modification phenomena. This would include domains such as anatomy or physiology.</Paragraph>
      <Paragraph position="10"> However, domains such as molecular biology, with many gene and gene product names, and chemistry, with many chemical names would probably yield fewer candidate terms.</Paragraph>
      <Paragraph position="11"> Nominal modification is common in English and in principle can be addressed with a methodology similar to the one discussed here. Nominal modifiers often express a quality more closely related semantically to the head than do adjectives.</Paragraph>
      <Paragraph position="12"> Details in the methodology would be adjusted to accommodate this characteristic.</Paragraph>
      <Paragraph position="13"> Generalization to other corpora such as patient records and electronic textbooks of medicine would likely yield additional terms.</Paragraph>
      <Paragraph position="14"> Finally, although this method relies on features of the UMLS such as the semantic categorization of the concepts, it could also be applied to other terminologies that do not provide this feature, such as the Medical Subject Headings (MeSH). In this case, the concept hierarchy itself could be used as a surrogate for the categorization. For example, if the candidate term chronic rheumatic fever is associated with the MeSH term rheumatic fever, its category is disease because the polyhierarchical structure in which rheumatic fever is involved ultimately converges to the top of the C hierarchy, i.e., the term diseases.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML