File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3302_metho.xml

Size: 14,518 bytes

Last Modified: 2025-10-06 14:10:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3302">
  <Title>Ontology-Based Natural Language Query Processing for the Biological Domain</Title>
  <Section position="3" start_page="9" end_page="10" type="metho">
    <SectionTitle>
2 Overview of Our Approach
</SectionTitle>
    <Paragraph position="0"> Molecular biology concerns interaction events between proteins, drugs, and other molecules. These events include transcription, translation, dissociation, etc. In addition to basic events which focus on interactions between molecules, users are also interested in relationships between basic events, e.g.</Paragraph>
    <Paragraph position="1"> the causality between two such events [Hirschman 2002]. In order to produce a useful NL query tool, we must be able to correctly interpret and answer typical queries in the domain, e.g.:  * What genes does transcription factor X regulate? * With what genes does gene G physically interact? * What proteins interact with drug D? * What proteins affect the interaction of an null other protein with drug D? Figure 1 shows the process diagram of our system. The query interpretation process consists of two major steps: 1) Syntactic analysis - parsing and decomposition of the input query; and 2) Semantic analysis - mapping of syntactic structures to an intermediate conceptual representation. The analysis uses an ontology to extract domain-specific entities/relations and to resolve linguistic ambiguity and variations. Then, the extracted semantic expression is transformed into an entity-relationship query language, which retrieves results from preindexed biological literature databases.</Paragraph>
    <Section position="1" start_page="9" end_page="10" type="sub_section">
      <SectionTitle>
Natural Language
Query
Parsing &amp;
Decomposition
2.1 Incorporating Domain Ontology
</SectionTitle>
      <Paragraph position="0"> Domain ontologies explicitly specify the meaning of and relation between the fundamental concepts in an application domain. A concept represents a set or class of entities within a domain. Relations describe the interactions between concepts or a concept's properties. Relations also fall into two broad categories: taxonomies that organize concepts into &amp;quot;is-a&amp;quot; and &amp;quot;is-a-member-of&amp;quot; hierarchy, and associative relationships [Stevens 2000]. The associative relationships represent, for example, the functions and processes a concept has or is involved in. A domain ontology also specifies how knowledge is related to linguistic structures such as grammars and lexicons. Therefore, it can be used by NLP to improve expressiveness and accuracy, and to resolve the ambiguity of NL queries.</Paragraph>
      <Paragraph position="1"> There are two major steps for incorporating a domain ontology: 1) building/augmenting a lexicon for entity tagging, including lexical patterns that specify how to recognize the concept in text; and 2) specifying syntactic structure patterns for extracting semantic relationships among concepts.</Paragraph>
      <Paragraph position="2"> The existing ontologies (e.g. UMLS, Gene Ontology) are created mainly for the purpose of database  annotation and consolidation. From those ontologies, we could extract concepts and taxonomic relations, e.g., is-a. However there is also a need for ontologies that specify relevant associative relations between concepts, e.g. &amp;quot;Protein acetylate Protein.&amp;quot; In our experiment we investigate the problem of augmenting an existing ontology (i.e.</Paragraph>
      <Paragraph position="3"> GENIA) with associative relations and other linguistic information required to guide the query interpretation process.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.2 Query Parsing and Normalization
</SectionTitle>
      <Paragraph position="0"> Our NL parser performs the steps of tokenization, part-of-speech tagging, morphological processing, lexical analysis, and identification of phrase and grammatical relations such as subjects and objects.</Paragraph>
      <Paragraph position="1"> The lexical analysis is based on a customizable lexicon and set of lexical patterns, providing the abilities to add words or phrases as dictionary terms, to assign categories (e.g. entity types), and to associate synonyms and related terms with dictionary items. The output of our parser is a dependency tree, represented by a set of dependency relationships of the form (head, relation, modifier).</Paragraph>
      <Paragraph position="2"> In the next step, we perform syntactic decomposition to collapse the dependency tree into subject-verb-object (SVO) expressions. The SVO triples can express most types of syntactic relations between various entities within a sentence. Another advantage of this triple expression is that it becomes easier to write explicit transformational rules that encode specific linguistic variations.</Paragraph>
      <Paragraph position="3"> Figure 2 shows the subject-action-object triplet.</Paragraph>
      <Paragraph position="4"> Verb modifiers in the syntactic structure may include prepositional attachment and adverbials. The modifiers add context to the event of the verb, including time, location, negation, etc. Subject/object modifiers include appositive, nominative, genitive, prepositional, descriptive (adjective-noun modification), etc. All these modifiers can be either considered as descriptors (attributes) or reformulated as triple expressions by assigning a type to the pair.</Paragraph>
      <Paragraph position="5"> Linguistic normalization is a process by which linguistic variants that contain the same semantic content are mapped onto the same representational structure. It operates at the morphological, lexical and syntactic levels. Syntactic normalization involves transformational rules that recognize the equivalence of different structures, e.g.:</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="10" end_page="12" type="metho">
    <SectionTitle>
* Verb Phrase Normalization - elimination
</SectionTitle>
    <Paragraph position="0"> of tense, modality and voice.</Paragraph>
    <Paragraph position="1"> * Verbalization of noun phrases - e.g. Inhibition of X by Y Y inhibit X.</Paragraph>
    <Paragraph position="2"> For example, queries such as: Proteins activated by IL-2 What proteins are activated by IL-2? What proteins does IL-2 activate? Find proteins that are activated by IL-2 are all normalized into the relationship: IL-2 &gt; activate &gt; Protein As part of the syntactic analysis, we also need to catch certain question-specific patterns or phrases based on their part-of-speech tags and grammatical roles, e.g. determiners like &amp;quot;which&amp;quot; or &amp;quot;what&amp;quot;, and verbs like &amp;quot;find&amp;quot; or &amp;quot;list&amp;quot;.</Paragraph>
    <Section position="1" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.3 Semantic Analysis
</SectionTitle>
      <Paragraph position="0"> The semantic analysis typically involves two steps: 1) Identifying the semantic type of the entity sought by the question; and 2) Determining additional constraints by identifying relations that ought to hold between a candidate answer entity and other entities or events mentioned in the query [Hirschman 2001]. The semantic analysis attempts to map normalized syntactic structures to semantic entities/relations defined in the ontology. When the system is not able to understand the question, the cause of failure will be explained to the user, e.g.</Paragraph>
      <Paragraph position="1"> unknown word or syntax, no relevant concepts in the ontology, etc. The output of semantic analysis is a set of relationship triplets, which can be grouped into four categories:</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="11" type="sub_section">
      <SectionTitle>
Subject Action Object
</SectionTitle>
      <Paragraph position="0"> Events, including interactions between entities and inter-event relations (nested events), e.g.</Paragraph>
      <Paragraph position="1">  A natural language query will be decomposed into a list of inter-linked triplets. A user's specific information request is noted as &amp;quot;UNKNOWN.&amp;quot; Starting with an ontology, we determine the mapping from syntactic structures to semantic relations. Given our example &amp;quot;IL-2 &gt; activate &gt; Protein&amp;quot;, we recognize &amp;quot;IL-2&amp;quot; as an entity, map the verb &amp;quot;activate&amp;quot; to a semantic relation &amp;quot;Activation,&amp;quot; and detect the term &amp;quot;protein&amp;quot; as a designator of the semantic type &amp;quot;Protein.&amp;quot; Therefore, we could easily transform the query to the following triplets:  Given a syntactic triplet of subject/verb/object or head/relation/modifier, the ontology-driven semantic analysis performs the following steps:  1. Assign possible semantic types to the pair of terms, 2. Determine all possible semantic links between each pair of assigned semantic types defined in the ontology, 3. Given the syntactic relation (i.e. verb or modifier-relation) between the two concepts, infer and validate plausible inter-concept semantic relationships from the set determined in Step 2, 4. Resolve linguistic ambiguity by rejecting  inconsistent relations or semantic types.</Paragraph>
      <Paragraph position="2"> It is simpler and more robust to identify the query pattern using the extracted syntactic structure, in which linguistic variations have been normalized into a canonical form, rather than the original question or its full parse tree.</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="12" type="sub_section">
      <SectionTitle>
2.4 Entity-Relationship Indexing and
Search
</SectionTitle>
      <Paragraph position="0"> In this section, we describe the annotation, indexing and search of text data. In the off-line indexing mode, we annotate the text with ontological concepts and relationships. We perform full linguistic analysis on each document, which involves splitting of text into sentences, sentence parsing, and the same syntactic and semantic analysis as described in previous sections on query processing.</Paragraph>
      <Paragraph position="1"> This step recognizes names of proteins, drugs, and other biological entities mentioned in the texts.</Paragraph>
      <Paragraph position="2"> Then we apply a document-level discourse analysis procedure to resolve entity-level coreference, such as acronyms/aliases and pronoun anaphora. Sentence-level syntactic structures (subject-verbobject triples) and semantic markups are stored in a database and indexed for efficient retrieval.</Paragraph>
      <Paragraph position="3"> In the on-line search mode, we provide a set of entity-relationship (ER) search operators that allow users to search on the indexed annotations. Unlike keyword search engines, we employ a highly expressive query language that combines the power of grammatical roles with the flexibility of Boolean operators, and allows users to search for actions, entities, relationships, and events. We represent the basic relationship between two entities with an expression of the kind: Subject Entity &gt; Action &gt; Object Entity We can optionally constrain this expression by specifying modifiers or using Boolean logic. The arrows in the query refer to the directionality of the action. For example, Entity 1 &lt;&gt; Action &lt;&gt; Entity 2 will retrieve all relationships involving Entity 1 and Entity 2, regardless of their roles as subject or object of the action. An asterisk (*) can be used to denote unknown or unspecified sources or targets, e.g. &amp;quot;Il-2 &gt; inhibit &gt; *&amp;quot;.</Paragraph>
      <Paragraph position="4"> In the ER query language we can represent and organize entity types using taxonomy paths, e.g.: [substance/compound/amino_acid/protein] [source/natural/cell_type] The taxonomic paths can encode the &amp;quot;is-a&amp;quot; relation (as in the above examples), or any other relations defined in a particular ontology (e.g. the &amp;quot;part-of&amp;quot; relation). When querying, we can use a taxonomy path to specify an entity type, e.g. [Protein/Molecule], [Source], and the entity type will automatically include all subpaths in the taxonomic  hierarchy. The complete list of ER query features that we currently support is given in Table 1.</Paragraph>
      <Paragraph position="5">  [Protein]&amp;quot; will return all instances of il-2 regulating a protein.</Paragraph>
      <Paragraph position="6"> Events restricted to a certain action type categories of actions that can be used to filter or expand search The query &amp;quot;[Protein] &gt; [Inhibition] &gt; [Protein]&amp;quot; will retrieve all events involving two proteins that are in the nature of inhibition.</Paragraph>
      <Paragraph position="8"> will only return results mentioning a cell type location where the activation occurs.</Paragraph>
      <Paragraph position="9">  that contain the specified metadata values.</Paragraph>
      <Paragraph position="10"> Nested Search Allow users to search the results of a given search.</Paragraph>
      <Paragraph position="11"> Negation Filtering Allow users to filter out negated results that are detected during indexing.</Paragraph>
      <Paragraph position="12"> Table 1 lists various types of ER queries</Paragraph>
    </Section>
    <Section position="4" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
2.5 Translation to ER Query
</SectionTitle>
      <Paragraph position="0"> We extract answers through entity-relational matching between the NL query and syntactic/semantic annotations extracted from sentences. Given the query's semantic expression as described in Section 2.3, we translate it to one or more entity-relationship search operators. The different types of semantic triplets (i.e. Event, Attribute, and Type) are treated differently when being converted to ER queries.</Paragraph>
      <Paragraph position="1">  tracted either from same sentence or from somewhere else within document context, using the nested search feature.</Paragraph>
      <Paragraph position="2"> * The Entity Type relations are specified in the ontology taxonomy.</Paragraph>
      <Paragraph position="3"> For our example, &amp;quot;proteins activated by il-2&amp;quot;, we translate it into an ER query: &amp;quot;il-2 &gt; [activation] &gt; [protein]&amp;quot;. Figure 3 shows the list of retrieved subject-verb-object triples that match the query, where each triple is linked to a sentence in the corpus.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML