File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-3302_concl.xml
Size: 3,226 bytes
Last Modified: 2025-10-06 13:55:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3302"> <Title>Ontology-Based Natural Language Query Processing for the Biological Domain</Title> <Section position="6" start_page="15" end_page="15" type="concl"> <SectionTitle> 4 Discussions </SectionTitle> <Paragraph position="0"> We demonstrated the feasibility of our approach using the relatively small GENIA corpus and ontology. A key concern with knowledge or semantic based methods is the scalability of the methods to larger set of data and queries. As future work, we plan to systematically measure the effectiveness of the approach based on large-scale experiments in an information retrieval setting, as we increase the knowledge and linguistic coverage of our system.</Paragraph> <Paragraph position="1"> We are able to address the large data size issue by using InFact as an ingestion and deployment platform. With a distributed architecture, InFact is capable of ingesting large data sets (i.e. millions of MEDLINE abstracts) and hosting web-based search services with a large number of users. We will investigate the scalability to larger knowledge coverage by adopting a more comprehensive ontology (i.e. UMLS [Bodenreider 2004]). In addition to genes and proteins, we will include other entity types such as drugs, chemical compounds, diseases and phenotypes, molecular functions, and biological processes, etc. A main challenge will be increasing the linguistic coverage of our system in an automatic or semi-automatic way.</Paragraph> <Paragraph position="2"> Another challenge is to encourage keyword search users to use the new NL query format and the semi-structured ER query form. We are investigating a number of usability enhancements, where the majority of them have been implemented and are being tested.</Paragraph> <Paragraph position="3"> For each entity detected within a query, we provide a hyperlink that takes the user to an ontology lookup page. For example, if the user enters &quot;protein il-2&quot;, we let the user know that we recognize &quot;protein&quot; as a taxonomic path and &quot;il-2&quot; as an entity according to the ontology. If a relationship triplet has any unspecified component, we provide recommendations (or tips) that are hyperlinks to executable ER queries. This allows users who are not familiar with the underlying ontology to navigate through most plausible results. When the user enters a single entity of a particular type, we display a list of relations the entity type is likely to be involved in, and a list of other entity types that are usually associated to the given type. Similarly, we define a list of relations between each pair of entity types according to the ontology. The relations are ranked according to popularity. When the user enters a query that involves two entities, we present the list of relevant relations to the user.</Paragraph> <Paragraph position="4"> Acknowledgements: This research was supported in part by grant number 1 R43 LM00846401 from the NIH. The authors thank Dr. David Haynor for his advice on this work; the anonymous reviewers for their helpful comments; and Yvonne Lam for helping with the manuscript.</Paragraph> </Section> class="xml-element"></Paper>