File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3302_evalu.xml

Size: 6,290 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3302">
  <Title>Ontology-Based Natural Language Query Processing for the Biological Domain</Title>
  <Section position="5" start_page="12" end_page="15" type="evalu">
    <SectionTitle>
3 Experiment Results
</SectionTitle>
    <Paragraph position="0"> We tested our approach on the GENIA corpus and ontology. The evaluation presented in this section focuses on the ability of the system to translate NL queries into their normalized representation, and the corresponding ER queries.</Paragraph>
    <Section position="1" start_page="12" end_page="13" type="sub_section">
      <SectionTitle>
3.1 Test Data
</SectionTitle>
      <Paragraph position="0"> The GENIA corpus contains 2000 annotated MEDLINE abstracts [Ohta 2002]. The main reason we chose this corpus is that we could extract the pre-annotated biological entities to populate a domain lexicon, which is used by the NL parser.</Paragraph>
      <Paragraph position="1"> Therefore, we were able to ensure that the system had complete terminology coverage of the corpus.</Paragraph>
      <Paragraph position="2"> During indexing, we used the raw text data as input by stripping out the annotation tags.</Paragraph>
      <Paragraph position="3"> The GENIA ontology has a complete taxonomy of entities in molecular biology. It is divided into substance and source sub-hierarchies. The substances include sub-paths such as nucleic_acid/DNA and amino_acid/protein. Sources are biological locations where substances are found and their reactions take place. They are also hierarchically subclassified into organisms, body parts, tissues, cells  or cell types, etc. Our adoption of the GENIA ontology as a conceptual model for guiding query interpretation is described as follows.</Paragraph>
      <Paragraph position="4"> Entities - For gene and protein names, we added synonyms and variations extracted from the Entrez Gene database (previously LocusLink).</Paragraph>
      <Paragraph position="5"> Interactions - The GENIA ontology does not contain associative relations. By consulting a domain expert, we identified a set of relations that are of particular interest in this domain. Some examples of relevant relations are: activate, bind, interact, regulate. For each type of interaction, we created a list of corresponding action verbs.</Paragraph>
      <Paragraph position="6">  Entity Attributes - We identified two types of entity attributes: 1. Location, e.g. body_part, cell_type, etc.</Paragraph>
      <Paragraph position="7">  identified by path [genia/source] Figure 3 shows our natural language query interface. The retrieved subject-verb-object relationships are displayed in a tabular format. The lower screenshot shows the document display page when user clicks on the last result link &lt;interleukin 2, activate, NF-kappa B&gt;. The sentence that contains the result relationship is highlighted.</Paragraph>
      <Paragraph position="8"> 2. Subtype of proteins/genes, e.g. enzymes, transcription factors, etc., identified by types like protein_family_or_group, DNA_family_or_group Event Attributes - Locations were the only event attribute we supported in this experiment.</Paragraph>
      <Paragraph position="9"> Designators - We added a mapping between each semantic type and its natural language names. For example, when a term such as &amp;quot;gene&amp;quot; or &amp;quot;nucleic acid&amp;quot; appears in a query, we map it to the taxonomic path: [Substance/compound/nucleic_acid]</Paragraph>
    </Section>
    <Section position="2" start_page="13" end_page="14" type="sub_section">
      <SectionTitle>
3.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> To demonstrate our ability to interpret and answer NL queries correctly, we selected a set of 50 natural language questions in the molecular biology domain. The queries were collected by consulting a domain expert, with restrictions such as:  1. Focusing on queries concerning entities and interaction events between entities.</Paragraph>
      <Paragraph position="1"> 2. Limiting to taxonomic paths defined  within the GENIA ontology, which does not contain important entities such as drugs and diseases.</Paragraph>
      <Paragraph position="2"> For each target question, we first manually created the ground-truth entity-relationship model. Then, we performed automatic question interpretation and answer retrieval using the developed software prototype. The extracted semantic expressions were verified and validated by comparison against the ground-truth. Our system was able to correctly interpret all the 50 queries and retrieve answers from the GENIA corpus. In the rest of this section, we describe a number of representative queries. Query on events: With what genes does ap-1 physically interact?  An entity's properties are often mentioned in a separate place within the document. We translate these types of queries into DOC_LEVEL_AND of multiple ER queries. This AND operator is currently implemented using the feature of nested search. For example, given query: What enzymes does HIV-1 Tat suppress? we recognize the word &amp;quot;enzyme&amp;quot; is associated with the path: [protein/protein_family_or_group], and we consider it as an attribute constraint.</Paragraph>
      <Paragraph position="3">  One of the answer sentences is displayed below: &amp;quot;Thus, our experiments demonstrate that the Cterminal region of HIV-1 Tat is required to suppress Mn-SOD expression&amp;quot; while Mn-SOD is indicated as an enzyme in a different sentence: &amp;quot;... Mn-dependent superoxide dismutase (Mn-SOD), a mitochondrial enzyme ... &amp;quot;</Paragraph>
    </Section>
    <Section position="3" start_page="14" end_page="15" type="sub_section">
      <SectionTitle>
Inter-Event Relations
</SectionTitle>
      <Paragraph position="0"> The inter-event relations or nested event queries (CLAUSE_LEVEL_AND) are currently implemented using the ER query's local context constraints, i.e. one event must appear within the local context of the other.</Paragraph>
      <Paragraph position="1"> Query on inter-event relations: What protein inhibits the induction of Ikappa- null One of the answer sentences is: &amp;quot;In both cell types, the cytokine that inhibits the induction of IkappaBapha by DEX, also rescues these cells from DEX-induced apoptosis.&amp;quot;</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML