File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1033_metho.xml

Size: 17,045 bytes

Last Modified: 2025-10-06 14:11:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1033">
  <Title>SURFACE ANALYSIS OF QUERIES DIRECTED TOWARD A DATABASE</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SURFACE ANALYSIS OF QUERIES DIRECTED TOWARD A DATABASE
</SectionTitle>
    <Paragraph position="0"> A natural language interface is directed toward the database query languages that access machine stored data. A pattern driven transformation mechanism supports natural language access. A natural language is mapped onto a more formal computer database language. A human-like &amp;quot;understanding&amp;quot; of the query statement is not required. The transformation mechanism is separate from the target database management system. A goal is independence from both domain content and DBMS implementation. There is an emphasis on surface over content analysis.</Paragraph>
    <Paragraph position="1"> Two particular questions are at issue. First, the extent to which a natural language interface to a database may operate independent of the subject domain of the database. Specifically~ the extent to which natural language queries can be evaluated without the use of a query world descriptive reference system. Second, the extent to which natural language queries can be analyzed using pattern recognition techniques.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="207" type="metho">
    <SectionTitle>
I CONTEXT
</SectionTitle>
    <Paragraph position="0"> As computer mass storage has become cheaper, more data has been stored in computers. The relationship between the stored data and access techniques has become increasingly more complex. This has resulted in the creation of very large databases and in the development of the powerful database management systems (DBMS) needed to access stored data. Most existing systems use a sophisticated data manipulation language (DML) to access the information in the database. These languages require detailed knowledge of the database's organization, the DML, and the host computer system to be used effectively. Computationally naive users must transmit their requests through a highly trained database expert who can use the DMLs. Simply put, data accessibility is limited by a communication barrier.</Paragraph>
    <Paragraph position="1"> One way to have machine stored data directly avaliable to a wider range of people is to permit queries formulated in a natural language. Natural languages as data access languages have several compelling advantages. Some important ones include  ~. A large number of potential computer users are unwilling or unable to learn and use formal machine languages.</Paragraph>
    <Paragraph position="2"> 2. For at least some applications natural language provides the ideal communications medium (Grishman and Hirshman,1978).</Paragraph>
    <Paragraph position="3"> 3. Potential users already know their natural language so little training in its use as a query language would be needed.</Paragraph>
    <Paragraph position="4"> 4. Natural languages are powerful tools for the expression of certain types of non-mathematical 5deas and concepts.</Paragraph>
    <Paragraph position="5"> 5. The immediacy and flexibility of information retrieval are significantly improved when end users retrieve the data themselves.</Paragraph>
    <Paragraph position="6">  If the user of computer stored data is able to access the data by using natural language, the utility of machine stored data woul~ be increased. Not only would the casual user gain unhindered access, but an expert user could gain easier access as new DB~S statements would not have to be learned with every system change.</Paragraph>
  </Section>
  <Section position="3" start_page="207" end_page="207" type="metho">
    <SectionTitle>
208 L.J. MAZLACK and R.A. FEINAUER
2 ACTIVITY OVERVIEW
</SectionTitle>
    <Paragraph position="0"> Along with the general goal of developing a natural language database interface, a particular emphasis is on the portability of the interface. The goal is to achieve both domain and DBM_S portability. By domain portability, we mean the capability to use the same natural language interface (NLI) to resolve queries against databases concerning different subject matter. By DBMS portability, we mean the capabillty to use the same NLI for a variety of DBM~ implementations.</Paragraph>
    <Paragraph position="1"> In order to achieve domain portability, it is clear that it is necessary to develop a system that minimizes and/or localizes the need for semantic referants. In order to achieve DBM~ portability, it is necessary to limit contact between the DBMS mechanism and the NLI.</Paragraph>
    <Paragraph position="2"> For the purposes of this paper, the term &amp;quot;syntax&amp;quot; will reference query surface structure and &amp;quot;semantic&amp;quot; will reference concernswhich are not focused at surface structure. The focus of our semantic concerns are relatively narrow as the primary concern is with intentlonality.</Paragraph>
    <Section position="1" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
2.1 STRATEGY
</SectionTitle>
      <Paragraph position="0"> General machine language processing has turned out to be difficult. It is unclear whether we currently have enough knowledge to develop a comprehensive machine natural language processing capability. Perhaps, the greatest opportunities for immediate success lie in the solution of subset problems. This investigation focuses on the relatively constrained natural language requirements necessary to support queries of a general database. Database queries require the capability to deal with a large subject context, but have a narrow pragmatic language use requirement. Others have sought to restrict problem complexity by trying to understand general statement about a limited world.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="207" end_page="207" type="metho">
    <SectionTitle>
2.1.1 PHILOSOPHY
</SectionTitle>
    <Paragraph position="0"> Our concern is with mapping of a natural language into a more formal language, not an understanding of the natural language. Hillman (1977) identifies thas as distinguishing between information retrieval and knowledge transfer. Knowledge transfer is dependent on access to knowledge representation systems capable of providing extensive help in gaining understanding. In comparison, in transformational mapping, if the question is BOW MANY DOGS ARE BLACK? a human-like understanding of the nature of DOGS is not at issue, but a way of formulating a database query to search the stored information with regard to the colour of dogs. ! In focusing on the question of mapping from a natural language to a DML, the primary concern is not with the enhancement of understanding of language (as with Schank, 1973,1975) but rather attempting to bridge the gap between people and machines (as with Lehmann (1977), 0tt(1977), Berrendonner(1980) ).</Paragraph>
    <Paragraph position="1">  Natural language queries have three characteristics that aid their analysis: (I) the portion of a natural language's syntax that must be covered is a subset of the entire language, (2) the pragmatic use of language in queries limits the interpretations of a statement, and (3) the analysis can be significantly guided by the assumption that a statement is a request for data from a known database.</Paragraph>
    <Paragraph position="2">  In separating the problems of access and DBMS design, both are simplified and made more amenable to solution. It would seem to be much more difficult to juggle with the problems of a natural language front end and at the same time to work on database development problems. The two problems would appear to compound each other.</Paragraph>
  </Section>
  <Section position="5" start_page="207" end_page="207" type="metho">
    <SectionTitle>
SURFACE ANALYSIS OF QUERIES DIRECTED TOWARD A DATABASE 209
</SectionTitle>
    <Paragraph position="0"> Communication with a DBMS can be directed toward either the database structure or the actual contents of the database. In either case, communication flows through the DBMS facilities. We only consider content directed queries. To enhance the possibilities of DBMS portabilty, our NLI only makes contact with the data in the DBMS through the DML of the DBMS. Only the mapping between the final internal form of the NLI and the DML of the DBMS must be changed from DBMS to DBMS. The utilization of an existing database's formal query language as the target representation allows the fundamental questions of query transformation to he addressed without the problems associated with with the collateral development of a DBMS.</Paragraph>
    <Section position="1" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
2.2 OVERVIEW: QUERY TRANSFORMATION
</SectionTitle>
      <Paragraph position="0"> Analysis of a query is treated as a transformation problem. The query is transformed from an informal language into a more formal language, the DML. The transformation is done in two steps: (I) the query is transformed from English into an internal representation and (2) the internal representation is transformed into a DML. The transformations are driven by a non-serial surface structure analysis.</Paragraph>
      <Paragraph position="1"> This analysis is supported by non-structural referants which are focused on recognizing the intended use of words and/or word groups.</Paragraph>
      <Paragraph position="2"> The use of an internal intermediate representation of the query allows the determination of what is the desired information to be carried out in isolation from the peculiarities of the target DML. Also, by keeping the initial analysis of the query independent of the specific DML, the mechanism that puts the query in standard form will not have to be changed if the system is moved to a new DBMS with a different DML. This allows the analysis to be partitioned into two distinct phases: (a) transformation of the English query into a standard form and (b) subsequent construction of the DML query.</Paragraph>
      <Paragraph position="3"> A simplified frames (Minsky, 1975) type data structure called templates is used for use as the target internal representation. The analysis process includes identification of the template which best matches the query and the filling in of all the information needed to complete the stereotyped question.</Paragraph>
    </Section>
    <Section position="2" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
2.3 OVERVIEW: THE ANALYZER
</SectionTitle>
      <Paragraph position="0"> The analyzer in the mechanism uses both syntactic and semantic information to transform the query into an internal representation. Syntactic sources support as much of the analysis as is possible. When semantic sources must be used, existing sources of semantic information are used. This minimizes the amount of effort that must be expended in developing semantic referants. After the query is in a fully notated template representation, control of the mechanism passes to the bridge coding which transforms the standard form representation of the query into a DML.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="207" end_page="207" type="metho">
    <SectionTitle>
2.4 THE BRIDGE CODE
</SectionTitle>
    <Paragraph position="0"> The bridge coding transforms the query from a completed template into the DML of the host DBMS. Use of the DMLs of DBMSs has several advantages that are not exploited in systems which develope their own access routines. First, using the existing DMLs reduces the amount of new software that must be produced. Second, existing software such as report generators etc. would not require modification.</Paragraph>
    <Paragraph position="1"> Lastly, the DML can continue to be used directly, without going through the natural language processor, for those applications, such as updating, where the use of a natural language system may be undesireable.</Paragraph>
    <Paragraph position="2"> After the template has been converted into a DML query, control of the system passes to the DBMS which will evaluate the query. When the DBMS is finished the answer and control of the system passes to a response generator.</Paragraph>
    <Paragraph position="3"> 210 L.J. MAZLACK and R.A. FEINAUER</Paragraph>
  </Section>
  <Section position="7" start_page="207" end_page="207" type="metho">
    <SectionTitle>
3 THE ANALYZER
</SectionTitle>
    <Paragraph position="0"> The analyzer transforms an English query into a semantically equilvalent template representation. The analyzer goes through four steps: a word role identifer, a phrase identifier, a phrase analyzer, and a template marcher. The template marcher is used to match template fragments to a template and to integrate the fragments into a single query. This approach is similar to the method used by (Wilks,1975a) in his preference semantics theory for general natural language processing.</Paragraph>
    <Paragraph position="1"> Once the query is in a fully notated internal representation, the mechanism has established exactly what information the user requires. When this happens, the queries can then be transformed from the standard form into the DML of the host DBMS. To transform the query from English into an internal representation the analyzer has to identify in the query: I. the desired information;  2. the required attributes; 3. any implied or assumed information.</Paragraph>
    <Paragraph position="2">  Surface analysis of the query is used to do as much as is possible. From the analyzer, an understanding of the use of most of the words and word groups in the query isderived.</Paragraph>
    <Section position="1" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
3.1 NON-SERIAL PROCESSING
</SectionTitle>
      <Paragraph position="0"> Substantially all formal language analysis (compilers, etc.) procede serially (left-right or right-left). Also, most natural language parsing schemes procede on a serial basis. This is particularily true for natural language since Woods (1970) developed the powerful ATN concept. This project is significantly different in that it does not procede in a serial direction through a query. We resolve the easiest elements first, then use these resolutions to resolve the next element. By resolving an easy element first, where ever it is in the query, it often makes other query element resolution easier, irregardless of where the other element may be in serial relationship to the element resolved first. For example, if the query we are trying to resolve has the form A B C D E if the easiest element to resolve is C, it would be resolved first. Resolving C, might then make the resolution of B easy, etc.</Paragraph>
    </Section>
    <Section position="2" start_page="207" end_page="207" type="sub_section">
      <SectionTitle>
3.2 IDENTIFING THE WORD ROLE
</SectionTitle>
      <Paragraph position="0"> The problem of identifying the role of a word is not a trivial one since the same word may have a different role in d~fferent contexts. Some preliminary work on statistically- based identification (Mazlack,Feinauer,1980) has already been reported. Further to this, an identification mechanism using pattern recognition techniques has been developed. Initial word role labelling is supported by the use of various dictionaries and statistical data.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="207" end_page="207" type="metho">
    <SectionTitle>
3.2.1 DICTIONARIES AND VOCABULARY
</SectionTitle>
    <Paragraph position="0"> Several Dictionaries are applied successively. They are (a) core dictionaries describing: single role structural words (prepositions, conjunctions, question Words, existance verbs, articals, quantifiers) and functional words (total, average, sum, etc.).</Paragraph>
    <Paragraph position="1">  (b) terms appearing in the logical schema (C) Jargon (d) a general dictionary  The dictionaries are used to provide canditate word roles. By applying these dictionaries, the words in a query are labelled. The process is reductive in that canditate word roles are reduced in number as successive dictionaries are applied. A natural language interface which accepts a rich natural language input and reduces it to a constrained output, must reduce the variability of the words in the</Paragraph>
  </Section>
  <Section position="9" start_page="207" end_page="207" type="metho">
    <SectionTitle>
SURFACE ANALYSIS OF QUERIES DIRECTED TOWARD A DATABASE 211
</SectionTitle>
    <Paragraph position="0"> query. How the reduction is achieved is more than a simple table lookup with and attendent vocabulary reduction. A reduction down to a minimal set of words similar to Wilks (1975h) primatives is not required. Of more interest is the identification of a vocabulary which is not obviously redundant; i.e., with two words covering the same or nearly the same subject area. Vocabulary reduction takes place as part of both the word role identification and phrase recognition processes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML