File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/85/e85-1030_metho.xml

Size: 14,497 bytes

Last Modified: 2025-10-06 14:11:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="E85-1030">
  <Title>A NATUWAL LANGUAGE INTERFACE USING A WORLD MODEL</Title>
  <Section position="3" start_page="0" end_page="206" type="metho">
    <SectionTitle>
WORLD NODEL
</SectionTitle>
    <Paragraph position="0"> The world model represents the user's image of the application domain. The user's image does not match the database schema, because the database schema reflects the storage structure of the data and the performance consideration of the database system. The world model represents the user's image as classes and relationships between them.</Paragraph>
    <Paragraph position="1">  A class is represented as an object in the object-oriented programming sense (Bobrow, 81), which describes a thing or event in the domain. There are only two types of relationship; attribute relationship and super-sub relationship. This model matches the user's image and is very simple, so design and editing of the model is easy.</Paragraph>
    <Paragraph position="2"> Figure I shows the part of the world model for a sales domain. The commodity class has two attribute classes, commodity's name and fixed price. The beer and whisky classes are subclasses of the commodity class and inherit its attributes. Figure 2 shows a part of the definition of the sales class. The internal representation of a class object is a frame expression. A slot represents a relationship to another class using a $class facet and mapping information to the database schema using a Sstorage facet. The value of a Sstorage facet denotes the class name which has mapping information. The sales class has four attribute classes: RETAILER, COMMODITY, SALES ?RICE, and SALES QUANTITY. An object may also include the method for handling data within it.</Paragraph>
    <Paragraph position="3"> The system allows the user to define lexical information in the world model. For example, the noun 'commodity' corresponds to the commodity class. The verb 'sell' and the noun 'sale' both correspond to the sales class. The verb 'locate'  corresponds to the arc between the relation and location classes. Lexical information is physically stored in the word dictionary. The dictionary is represented as a table of the relational database system. Figure 3 shows part of the dictionary. The dictionary consists of a headword, an identifier, a part of speech, parsing information and other fields. The correspondence to the world model is represented in the OBJECT feature of the PARSE field. The verb also has its case frame information in the PARSE field. All the information relating to a specific domain is stored in the world model, so the user need only create the world model to customize KID to a specific application. This results in transportability of the system.</Paragraph>
  </Section>
  <Section position="4" start_page="206" end_page="206" type="metho">
    <SectionTitle>
SYSTEM CONFIGURATION
</SectionTitle>
    <Paragraph position="0"> KID is the front-end system of the database management system, the configuration being shown in Figure 4. The user enters a query via Japanese word processing terminal. Since a Japanese-language sentence is not separated into words, the morphological analyzer segments the sentence to get the list of words, using the word dictionary.</Paragraph>
    <Paragraph position="1"> The model-based parser analyzes the word list, and semantically interprets it, using the world model as a basis. The result is the &amp;quot;meaning structure&amp;quot; consisting of the parsed tree and the relevant part of the world model representing the meaning of the input query. The retriever generates the Japanese-language paraphrase from the meaning structure and outputs it to the user terminal for confirmation. Then, the retriever translates the meaning structure into the query language of the target database management system and executes it.</Paragraph>
    <Paragraph position="2"> The result is displayed on the user terminal. The world model is managed by the modeling system, REALM (REAL world Modeling system), and is edited by the world model editor.</Paragraph>
  </Section>
  <Section position="5" start_page="206" end_page="207" type="metho">
    <SectionTitle>
MORPHOLOGICAL ANALYZER
</SectionTitle>
    <Paragraph position="0"> A Japanese-language sentence is not separated into words. The system must segment a sentence into its component words. The morphological  analyzer performs this segmentation. KID selects the segmentation candidate with the least number of 'bunsetsu'. We believe this method to be the best method for segmenting a Japanese-language sentence (Yoshimura, 83). This method uses a breadth-first search of a candidate word graph. Since many candidate words are generated by this method, the performance of the segmentation is not so good. We use the optimum graph search algorithm, called A* (Nilssen, 80), to search the candidate word graph.</Paragraph>
    <Paragraph position="1"> Figure 5 shows an example of morphological analysis. This sentence has three possible segmentations. The first line is the correct segmentation, having the least number of 'bunsetsu'. The algorithm A* estimates the number of bunsetsu in the whole sentence at each node of the candidate word graph, and selects the next search path. This method eliminates useless searching of the candidate graph. In Figure 5, the circled numbers denote the sequence of the graph search.</Paragraph>
    <Paragraph position="2"> The morphological analyzer segments a sentence using connection information for each word. The connection information depends on the part of speech. Detailed classification of words leads to correct segmentation. However, it is difficult for an end-user perform this kind of classification. Thus, we classify words into two categories: content words and function words.</Paragraph>
    <Paragraph position="3"> Content words are nouns, verbs, adjectives, and adverbs, which depend on the application. They are classified roughly. Function words include auxiliaries, conjunctions, and so on, which are independent of the domain. They are classified in detail. It is easy for the user to roughly classify content words. This morphological analyzer segments sentences precisely and efficiently, and generates a word list. This word list is then passed to the model-based parser.</Paragraph>
  </Section>
  <Section position="6" start_page="207" end_page="209" type="metho">
    <SectionTitle>
MODEL BASED PARSER
</SectionTitle>
    <Paragraph position="0"> In its first phase the parser generates 'bunsetsu' from the word list. The parser syntactically analyzes the relationship between these 'bunsetsu'. At the same time, the parser semantically checks and interprets the relationships, based on the world model.</Paragraph>
    <Paragraph position="1"> 'Bunsetsu' sequences of a Japanese-language sentence are relatively arbitrary. And conversational sentences may include errors and ellipses, therefore the parser must be robust, in order to deal with these ill-formed sentences.</Paragraph>
    <Paragraph position="2"> These factors suggest that semantic interpretation should play an important role in the parser.</Paragraph>
    <Paragraph position="3"> The basic rules of semantic interpretation are the identification rule and the connection rule.</Paragraph>
    <Paragraph position="4"> These rules check the relationship between the classes which correspond to the 'bunsetsu' and interpret the meaning of the unified 'bunsetsu'.</Paragraph>
    <Paragraph position="5"> The identification rule corresponds to a super-sub relationship. If two classes, corresponding to</Paragraph>
    <Paragraph position="7"> two phrases, are connected by a super-sub relationship, this rule selects the subclass as the meaning of the unified phrase, because the subclass has a more specific and restricted meaning than the super class. Figure 6 shows an example of the identification rule. In this example, the phrase 'sales price' corresponds to the sales price class, and '2000 yen' corresponds to the price class. The identification rule selects the sales price class as the unified meaning. The connection rule corresponds to an attribute relationship. If two classes are connected by an attribute relationship, this rule selects the root class of the relation as the meaning of the unified class, because the root class clarifies the semantic position of the leaf class in the world model. Figure 7 shows an example of the connection rule. In this example, the phrase 'retailer' corresponds to the retailer class, and 'name' corresponds to the name class.</Paragraph>
    <Paragraph position="8"> The connection rule selects the retailer class as the unified meaning.</Paragraph>
    <Paragraph position="9">  model-based parser. In each process, input sentences are scanned from left to right. In the first phase, 'bunsetsu' are generated from the word list. At the same time the parser attaches the object which is instanciated from the corresponding class to each 'bunsetsu' The following identification and connection phases perform semantic interpretation using these instance objects, and determines the relationship between phrases. The identification process and connection process are executed repeatedly until all the relationship between phrases have been determined. The identification process has priority over the connection process, because a super-sub relationship represents a same concept generalization hierarchy and has stronger connectivity than an attribute relationship, the latter representing a property relation between different concepts. This parsing mechanism is very simple, allowing the user to expand each process easily. Each process consists of a number of production rules, which are grouped into packets according to the relevant syntactic patterns. Each packet has an execution priority according to the syntactic connectivity of each pattern. Thus the identification or addition of the rules are localized in the packet concerned with the modification. This simple parsing mechanism and the modular construction of the parsing rules contribute to the expandability of the parser.</Paragraph>
    <Paragraph position="10"> Figures 9 and 10 show an example of parsing.</Paragraph>
    <Paragraph position="11"> This query means 'What is the name of the retailer in Geneva who sells commodity A?'. The morphological analyzer segments the sentence, and the model-based parser generates the phrases in the parentheses. The identification process is not applied to these phrases, because there is no super-sub relationship between them. Next, the model-based parser applies the connection process. The phrase 'Geneva' can modify the phrase 'commodity A' syntactically, but not semantically, because the corresponding classes, &amp;quot;Location&amp;quot; and &amp;quot;Commodity&amp;quot;, do not have an attribute relationship. The phrase 'commodity A' can modify the phrase 'to sell' both semantically and  syntactically, because the classes &amp;quot;Commodity&amp;quot; and &amp;quot;Sales&amp;quot; have an attribute relationship. In this case, the predicate connection rule is applied, generating the unified phrase, node I. The parser uses these three kinds of objects to check the connectivity. The syntactic object S represents the syntactic center of the unified phrase. In the Japanese-language the last phrase of the unified phrase is syntactically dominant. The conceptual object represents the semantic center of the unified phrase, and is determined by the identification and connection rule. The meaning objects M represent the meaning of the unified phrase using the sub-network of the world model.</Paragraph>
    <Paragraph position="12"> The predicate connection rule determines the sales class to be the conceptual object of node I, because the sales class is the root class of the attribute relationship. The meaning objects are Sales --&gt; Commodity --&gt; Commodity name. The predicate connection rule also generates noun phrase node 2 and the S,C,and M of the node is determined as described in Figure 9. Next, the noun phrase connection rule is applied. This rule is applied to a syntactic pattern such as a noun phrase with a postposition 'no' followed by a noun phrase with any kind of postposition. The phrase 'Geneva' and the unified phrase 3 are unified to node 3 by the noun phrase connection rule (see Figure 10). This rule also generates node 4. The meaning of this sentence is that of node 4.</Paragraph>
    <Paragraph position="13"> Errors or ellipses of postposition, such as no or ga , are handled by packets which deal with the syntactic pattern. On the other hand, ellipses are handled by the special packets which deal with non-unified phrases based on the world model. These special packets have a lower priority than the standard packets. Different levels of robustness can be achieved by using the suitable packet for dealing with errors or</Paragraph>
  </Section>
  <Section position="7" start_page="209" end_page="209" type="metho">
    <SectionTitle>
ellipses *
CUSTORIZATION
</SectionTitle>
    <Paragraph position="0"> To customize the KID system to a specific application domain, the user has to perform several domain-dependent tasks. First, the user makes a class network for the domain either from queries, which we call a top-down approach, or from the database schema, a bottom-up approach.</Paragraph>
    <Paragraph position="1"> Then, the user assigns words to the classes or attributes of the class network. Lastly, the user describes mapping information between classes and the database schema within the classes.</Paragraph>
    <Paragraph position="2"> The world model editor supports these customization processes. The world model editor has three levels of user interface, in order to assist various users in editing the world model (see Figure 11). The first level is a construction description level, in which the user makes a structure of a class network. The second level is a word assignment level, in which the user assigns words to classes or attributes.</Paragraph>
    <Paragraph position="3"> These two levels are provided for end-users. The third level is a class- or word-contents description level. This level is provided for more sophisticated users, who understand the internal knowledge representation. The world model editor enables users to navigate any of the interface levels. Various users can edit the knowledge, according to their own particular view. Thus, knowledge base editing is made easier.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML