File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3302_intro.xml
Size: 4,050 bytes
Last Modified: 2025-10-06 14:04:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3302"> <Title>Ontology-Based Natural Language Query Processing for the Biological Domain</Title> <Section position="2" start_page="0" end_page="9" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> New scientific research methods have greatly increased the volume of data available in the biological domain. A growing challenge for researchers and health care professionals is how to access this ever-increasing quantity of information [Hersh 2003]. The general public has even more trouble following current and potential applications. Part of the difficulty lies in the high degree of specialization of most resources. There is thus an urgent need for better access to current data and the various domains of expertise. Key considerations for improving information access include: 1) accessibility to different types of users; 2) high precision; 3) ease of use; 4) transparent retrieval across heterogeneous data sources; and 5) accommodation of rapid language change in the domain.</Paragraph> <Paragraph position="1"> Natural language searching refers to approaches that enable users to express queries in explicit phrases, sentences, or questions. Current information retrieval engines typically return too many documents that a user has to go through. Natural language query allows users to express their information need in a more precise way and retrieve specific results instead of ranked documents. It also benefits users who are not familiar with domain terminology.</Paragraph> <Paragraph position="2"> With the increasing availability of textual information related to biology, including MEDLINE abstracts and full-text journal articles, the field of biomedical text mining is rapidly growing. The application of Natural Language Processing (NLP) techniques in the biological domain has been focused on tagging entities, such as genes and proteins, and on detecting relations among those entities. The main goal of applying these techniques is database curation. There has been a lack of effort or success on improving search engine performance using NLP and text mining results. In this effort, we explore the feasibility of bridging the gap between text mining and search by * Indexing entities and relationships extracted from text, * Developing search operators on entities and relationships, and * Transforming natural language queries to the entity-relationship search operators.</Paragraph> <Paragraph position="3"> The first two steps are performed using our existing text analysis and search platform, called InFact [Liang 2005; Marchisio 2006]. This paper concerns mainly the step of NL query interpretation and translation. The processes described above are all guided by a domain ontology, which provides a conceptual mapping between linguistic structures and domain concepts/relations. A major drawback to existing NL query interfaces is that their linguistic and conceptual coverage is not clear to the user [Androutsopoulos 1995]. Our approach addresses this problem by pointing out which concepts or syntactic relations are not mapped when we fail to find a consistent interpretation.</Paragraph> <Paragraph position="4"> Figure 1 shows the query processing and retrieval process.</Paragraph> <Paragraph position="5"> There has been skepticism about the usefulness of natural language queries for searching on the web or in the enterprise. Users usually prefer to enter the minimum number of words instead of lengthy grammatically-correct questions. We have developed a prototype system to deal with queries such as &quot;With what genes does AP-1 interact?&quot; The queries do not have to be standard grammatical questions, but rather have forms such as: &quot;proteins regulated by IL-2&quot; or &quot;IL-2 inhibitors&quot;. We apply our system to a corpus of molecular biology literature, the GENIA corpus. Preliminary experimental results and evaluation are reported.</Paragraph> </Section> class="xml-element"></Paper>