File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1106_intro.xml

Size: 1,621 bytes

Last Modified: 2025-10-06 14:02:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1106">
  <Title>Language &amp; Information Engineering</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> As we witness the ever-increasing proliferation of volumes of medical and biological documents, the available dictionaries and terminological systems cannot keep up with this pace of growth and, hence, become more and more incomplete. What's worse, the constant stream of new terms is increasingly getting unmanageable because human curators are in the loop. The costly, often error-prone and time-consuming nature of manually identifying new terminology from the most recent literature calls for advanced procedures which can automatically assist database curators in the task of assembling, updating and maintaining domain-speci c controlled vocabularies. Whereas the recognition of single-word terms usually does not pose any particular challenges, the vast majority of biomedical or any other domain-speci c terms typically consists of multi-word units.1 Unfortunately these are much more dif cult to recognize and extract than their singleton counterparts. Moreover, although the need to assemble and extend technical and scienti c terminologies is currently most pressing in the biomedical domain, virtually any (sub-) eld of human research/expertise in which we deal with terminologically structured knowledge calls for high-performance terminology identi cation and extraction methods. We want to target exactly this challenge.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML