File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1083_evalu.xml

Size: 3,352 bytes

Last Modified: 2025-10-06 13:58:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1083">
  <Title>A Methodology for Terminology-based Knowledge Acquisition and Integration</Title>
  <Section position="5" start_page="0" end_page="85" type="evalu">
    <SectionTitle>
3 Evaluation and discussion
</SectionTitle>
    <Paragraph position="0"> We have conducted preliminary experiments using the proposed framework. In this paper we briefly present the quality of automatic term recognition and similarity measure calculation via automatically clustered terms. After that, we discuss the practical performance of tag manipulation in TIMS compared to string-based XML tag manipulation to show the advantage of the tag information management scheme.</Paragraph>
    <Paragraph position="1"> The term recognition evaluation was performed on the NACSIS AI-domain corpus (Koyama et al., 1998), which includes 1800 abstracts and on a set of MEDLINE abstracts. Table 1 shows a sample of extracted terms and term variants. The ATR precisions of the top 100 intervals range from 93% to 98% (see figure 7; for detailed evaluation, see Mima et al. (2001b) and Nenadic et al. (2002)).</Paragraph>
    <Paragraph position="2">  For term clustering and tag manipulation performance we used the GENIA resources (GENIA corpus, 2002), which include 1,000 MEDLINE abstracts (MEDLINE, 2002), with overall 40,000 (16,000 distinct) semantic tags annotated for terms in the domain of nuclear receptors. We used the similarity measure calculation as the central computing mechanism for inferring the relevance between the XML tags and tags specified in the TIQL/interval operation, determining the most relevant tags in the XML-based KS(s). As a gold standard, we used similarities between the terms that were calculated according to the hierarchy of the clustered terms according to the GENIA ontology. In this experiment, we have adopted a semantic similarity calculation method for measuring the similarity between terms described in (Oi et al., 1997). The three major sets of classes (namely, nucleic_acid, amino_acid, SOURCE) of manually classified terms from GENIA ontology (GENIA corpus, 2002) were used to calculate the average similarities (AS) of the elements. ASs of the elements within the same classes were greater than the ASs between elements from different classes, which proves that the terms were clustered reliably according to their semantic features.</Paragraph>
    <Paragraph position="3"> In order to examine the tag manipulation performance of TIMS, we measured the processing times consumed for executing an interval operation in TIMS compared to the time needed by using string-based regular expression matching (REM). We focused on measuring the interval operation '[?]' with intervals (tags) &lt;title&gt; and &lt;term&gt; (i.e. extracting all terms within titles). In the evaluation process, we used 5 different samples to examine IE performances according to their size (namely the number of tags and file size in Kb).</Paragraph>
    <Paragraph position="4">  processing times of TIMS were about 1.4-1.8 times faster (depending on number of tags and corpus length) than those of REM. Therefore, we assume that the TIMS tag information management scheme can be considered as an efficient mechanism to facilitate knowledge acquisition and information extraction process.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML