File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2083_evalu.xml

Size: 6,460 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2083">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Term Recognition Approach to Acronym Recognition</Title>
  <Section position="6" start_page="647" end_page="649" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> Several evaluation corpora for acronym recognition are available. The Medstract Gold Standard Evaluation Corpus, which consists of 166 alias pairs annotated to 201 MEDLINE abstracts, is widely used for evaluation (Chang and Sch&amp;quot;utze, 2006; Schwartz and Hearst, 2003). However, the amount of the text in the corpus is insufficient for the proposed method, which makes use of statistical features in a text collection. Therefore, we prepared an evaluation corpus with a large text collection and examined how the proposed algorithm extracts short/long forms precisely and comprehensively. null We applied the short-form mining described in Section 3 to 7,306,153 MEDLINE abstracts10.</Paragraph>
    <Paragraph position="1"> Out of 921,349 unique short-forms recognized by the short-form mining, top 50 acronyms11 appearing frequently in the abstracts were chosen for our 11We have excluded several parenthetical expressions such as II (99,378 occurrences), OH (37,452 occurrences), and P&lt;0.05 (23,678 occurrences). Even though they are enclosed within parentheses, they do not introduce acronyms. We have also excluded a few acronyms such as RA (18,655 occurrences) and AD (15,540 occurrences) because they have many variations of their expanded forms to prepare the evaluation corpus manually.</Paragraph>
    <Paragraph position="2">  evaluation corpus. We asked an expert in bioinformatics to extract long forms from 600,375 contextual sentences with the following criteria: a long form with minimum necessary elements (words) to produce its acronym is accepted; a long form with unnecessary elements, e.g., magnetic resonance imaging unit (MRI) or computed x-ray tomography (CT), is not accepted; a misspelled long-form, e.g., hidden markvov model (HMM), is accepted (to separate the acronym-recognition task from a spelling-correction task). Table 3 shows the top 20 acronyms in our evaluation corpus, the number of their contextual sentences, and the number of unique long-forms extracted.</Paragraph>
    <Paragraph position="3"> Using this evaluation corpus as a gold standard, we examined precision, recall, and f-measure12 of long forms recognized by the proposed algorithm and baseline systems. We compared five systems: the proposed algorithm with Schwartz and Hearst's algorithm integrated (PM+SH); the proposed algorithm without any letter-matching algorithm integrated (PM); the proposed algorithm but using the original C-value measure for long-form likelihood scores (CV+SH); the proposed algorithm but using co-occurrence frequency for long-form likelihood scores (FQ+SH); and Schwartz and Hearst's algorithm (SH). The threshold for the proposed algorithm was set to four.</Paragraph>
    <Paragraph position="4"> Table 4 shows the evaluation result. The best-performing configuration of algorithms (PM+SH) achieved 78% precision and 85% recall. The Schwartz and Hearst's (SH) algorithm obtained a good recall (93%) but misrecognized a number of long-forms (56% precision), e.g., the kinetics of serum tumour necrosis alpha (TNF-ALPHA) and infected mice lacking the gamma interferon (IFN-GAMMA). The SH algorithm cannot gather variations of long forms for an acronym, e.g., ACE as angiotensin-converting enzyme level, angiotensin i-converting enzyme gene, angiotensin1-converting enzyme, angiotensin-converting, angiotensin converting activity, etc. The proposed method combined with the Schwartz and Hearst's algorithm remedied these misrecognitions based on the likelihood scores and the long-form validation algorithm. The PM+SH also outperformed other likelihood measures, CV+SH and FQ+SH.</Paragraph>
    <Paragraph position="5"> 12We count the number of unique long forms, i.e., count once even if short/long form pair &lt;HMM, hidden markov model&gt; occurs more than once in the text collection. The Porter's stemming algorithm was applied to long forms before comparing them with the gold standard.</Paragraph>
    <Paragraph position="6">  The proposed algorithm without Schwartz and Hearst's algorithm (PM) identified long forms the most precisely (81% precision) but misses a number of long forms in the text collection (14% recall). The result suggested that the proposed likelihood measure performed well to extract frequently used long-forms in a large text collection, but could not extract rare acronym-definition pairs.</Paragraph>
    <Paragraph position="7"> We also found the case where PM missed a set of long forms for acronym ER which end with rate, e.g., eating rate, elimination rate, embolic rate, etc. This was because the word rate was used with a variety of expansions (i.e., the likelihood score for rate was not reduced much) while it can be also interpreted as the long form of the acronym.</Paragraph>
    <Paragraph position="8"> Even though the Medstract corpus is insufficient for evaluating the proposed method, we examined the number of long/short pairs extracted from 7,306,153 MEDLINE abstracts and also appearing in the Medstract corpus. We can neither calculate the precision from this experiment nor compare the recall directly with other acronym recognition methods since the size of the source texts is different. Out of 166 pairs in Medstract corpus, 123 (74%) pairs were exactly covered by the proposed method, and 15 (83% in total) pairs were partially covered13. The algorithm missed 28 pairs because: 17 (10%) pairs in the corpus were not acronyms but more generic aliases, e.g., alpha tocopherol (Vitamin E); 4 (2%) pairs in the corpus were incorrectly annotated (e.g, long form in the corpus embryo fibroblasts lacks word mouse to form acronym MEFS); and 7 (4%) long forms are missed by the algorithm, e.g., the algorithm recognized pair protein kinase (PKR) while the correct pair in the corpus is RNA-activated protein kinase (PKR).</Paragraph>
    <Paragraph position="9"> 13Medstract corpus leaves unnecessary elements attached to some long-forms such as general transcription factor iib (TFIIB), whereas the proposed algorithm may drop the unnecessary elements (i.e. general) based on the frequency. We regard such cases as partly correct.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML