File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0516_intro.xml

Size: 2,324 bytes

Last Modified: 2025-10-06 14:01:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0516">
  <Title>Hybrid text mining for finding abbreviations and their definitions</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
5. Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We have conducted experiments with three documents: a book about automotive engineering (D1), a technical book from a pharmaceutical company (D2), and NASA press releases for 1999 (D3). The data used in the experiments and experimental results are shown in Table 5. Performance is evaluated using recall and precision.</Paragraph>
    <Paragraph position="1">  For D1, the system found 32 abbreviations and their definitions but among them 1 abbreviation is incorrect. Thus, it shows 93.9% recall and 96.9% precision. For D2, it found 60 pairs and missed 3 pairs showing 95.2% recall and 100% precision. For D3, it found 78 pairs with 2 incorrect results and missed 5 pairs. The recall rate is 93.8 % and precision is 97.4 %. The reasons for missing some abbreviations are (a) the definitions fell outside of the search space (b) misinterpretation by the part-of-speech tagger (c) matches beyond system's current capability. Some examples of missed abbreviations are:  (1) DEHP di-2-ethylhexylphthalate (2) ALT alanine aminotransferase (3) ASI Italian Space Agency (4) MIDEX medium-class Explorer (5) CAMEX-3 Third Convection and</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Moisture Experiment
</SectionTitle>
      <Paragraph position="0"> For (1), we would need to add the domain-specific prefixes &amp;quot;ethyl and &amp;quot;hexyl&amp;quot; to the prefix list. In general, adaptation of our method to new technical domains will probably involve the addition of domain-specific prefixes to the prefix list. (2) failed because there was no first letter match for &amp;quot;aminotransferase&amp;quot;. The abbreviation in (3) is an acronym of the Italian translation of the definition. In (4), there is no credible source for the &amp;quot;I&amp;quot; in the abbreviation. In (5), the numeric replacement in the abbreviation is permuted. These and other phenomena such as compound word processing will be the subject of further investigation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML