File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1908_metho.xml
Size: 19,291 bytes
Last Modified: 2025-10-06 14:10:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1908"> <Title>Dialogue based Question Answering System in Telugu</Title> <Section position="3" start_page="53" end_page="53" type="metho"> <SectionTitle> ARISE (Automatic Railway Information Sys- </SectionTitle> <Paragraph position="0"> tem for Europe) is a spoken dialogue system to provide train timetable information over the phone. Prototypes have been developed in four languages: Dutch, French, English, and Italian.</Paragraph> <Paragraph position="1"> ARISE uses a mixed initiative Dialogue Manager (DM). A mix of implicit and explicit confirmation is used, based on how confident the system is in deciding whether an item has been correctly understood.</Paragraph> <Paragraph position="2"> We relate this paper as an experiment for designing a keyword based QA system for a huge domain (i.e. for Railways), which aims at replying users questions in their native language (Telugu). The system generates SQL query out of the natural language question, executes the SQL query over a relational database and then provide the answer. Dialogue Manager (DM) is maintained to generate dialogues with user and to handle the anaphoric and elliptical expression in our query. This system is implemented on a relatively restricted domain that includes a number of aspects of railway information system (Arrival/Departure time, Fare between for particular stations, Trains between important stations etc.).</Paragraph> <Paragraph position="3"> The precision of the information extraction stage is essential to the success of a QA system, because it places an upper bound on the precision of the entire system.</Paragraph> <Paragraph position="4"> The empirical results obtained on the current system are encouraging. Testing with a set of questions in Railway domain, the QA system showed 96.34% of precision and 83.96% of dialogue success rate.</Paragraph> <Paragraph position="5"> Section 2 deals with the System Architecture of the QA system. Section 3 details about the QA system design in the Railway information domain using the Keyword based approach. The evaluation has been carried out in Section 4. Section 5 concludes with some directions for future work.</Paragraph> </Section> <Section position="4" start_page="53" end_page="54" type="metho"> <SectionTitle> 2 System Architecture </SectionTitle> <Paragraph position="0"> In this keyword based approach the input query statement is analyzed by the query analyzer, which uses domain ontology stored as knowledge base, generating tokens and keywords. The appropriate query frame is selected based on the keywords and the tokens in the query statement.</Paragraph> <Paragraph position="1"> Each query frame is associated with a SQL generation procedure. The appropriate SQL statement(s) is generated using the tokens retrieved from the input query.</Paragraph> <Paragraph position="2"> The QA system architecture is shown in Figure 1. The Dialogue Manager keeps track of the elliptical queries from the user that constitute the dialogue and helps in the SQL generation procedure using dialogue history (Flycht-Erikson et al., 2000), which contains information about previous tokens and their types as well as other dialogue information like answers retrieved by the current SQL statements and the answers for previous queries in the dialogue. The SQL statements used to retrieve the correct answer from the database. Based on the result of the DBMS, a natural language answer is generated. This answer is forwarded to the DM for onward trans- null If the system cannot decide on the query frame by using the keywords extracted from the input query, the system enters into a dialogue with the user through the DM. During SQL generation if it is detected that more information is needed from the user to generate the SQL statement then an interactive message is sent to the user through the DM. The user will then send the needed information to the system. If user could not provide correct information then DM sends an error message to the user indicating the error in the user query. In case, the SQL statement generates a null response from the database the DM will send a cooperative message depending on the user query.</Paragraph> </Section> <Section position="5" start_page="54" end_page="56" type="metho"> <SectionTitle> 3 Design of Railway Information Sys- </SectionTitle> <Paragraph position="0"> tem The most important issue in the design of the Railway information system is the design of the Railway database and the Knowledge base. These are detailed in Sections 3.1 & 3.2 respectively. The different components of the dialogue based QA system, i.e., Query Analyzer, Query Frame Decision, Dialogue Manager, SQL Generation and Answer Generation sub systems are described in subsequent sections.</Paragraph> <Section position="1" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 3.1 Railway Database Management </SectionTitle> <Paragraph position="0"> The system as a whole is engaged in data access, and is a hybrid system with subsystem to analyze the natural language query and formal query language SQL, and a data retrieval and database management system. The database is structured and contains the information to provide the railway information service. For example in a Railway information system, database contains information about the arrival/departure time of trains, their fares and their running information etc. The aim of database management is to describe the information, in order to offer the service. null For our purposes the relational model has important advantage: The relational model stresses on data independence. This means that the user and front-end programs are effectively isolated from the actual database organization.</Paragraph> <Paragraph position="1"> The main tables used here are schedule table for each train, fare tables for special trains like Rajdhani, Shatabdi etc. that have a different fare structure, Route tables for each route and tables that include train running frequency details etc.</Paragraph> <Paragraph position="2"> Some temporal tables are maintained in order to check the status of the railway ticket (which is known as checking the Passenger Name Record or PNR status of the ticket) and reservation availability information of a particular train.</Paragraph> </Section> <Section position="2" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 3.2 Design of the Knowledge Base </SectionTitle> <Paragraph position="0"> The system maintains a knowledge base of the domain to facilitate question answering. For a system operating on a restricted domain this is quite obvious since it will greatly improve the disambiguation and parsing.</Paragraph> <Paragraph position="1"> The words that occur in the database query for Railway information system includes words describing train name, station name, reservation class, and date and/or period of journey or key-words that specify the topic of the query. Hence we stored a domain dependent ontology in the knowledge base.</Paragraph> <Paragraph position="2"> Knowledge base, which contains tables for train name, station name and alias tables for train name and station name. We have stored possible Telugu inflections (~y (ke [to]), -yoe (ku [to]), UI (loo [in]), AE-thoeOyAsth (tundi [ing]), degy (vi [have]) etc. for ex: &quot;-yoeOyOthoeO-yoe-yoe (gunturku [to Guntur])), which can be used in morphological analysis of input query. We have considered possible postpositions like [?]'yoeOyA~th (nundi [from]), [?]'yoeOyEth (nunchi [from] etc. (For ex: [?]'yoeoA&quot;~thU!oth [?]'yoeOyA~th (newdelhi nundi [from New Delhi])), which can be used to identify the source station in the input query and route words like A-th&quot;a-yO-y (daggara [near]), A--oO-(dwara [through]), &quot;-yoeOyA-- (gunda [through]), -'yAi-th (vadda [at]), deg!yoeA-thoe&quot;-- (meedugaa [via]) etc. (For ex: &quot;-yOyoe'oe deg!yoeA-thoe&quot;-- (gaya meedugaa [via Gaya])), which can be used to identify the route station of the journey. We kept a list of keywords in a table in order to identify the proper query frame.</Paragraph> </Section> <Section position="3" start_page="54" end_page="55" type="sub_section"> <SectionTitle> 3.3 Query Analyzer </SectionTitle> <Paragraph position="0"> During query analysis, Morphological analysis of the input query statement is carried out to identify the root words / terms. Analyzing the whole input query, the system identifies several tokens such as Train name, Station name, Reservation class, date and period of the day etc. and a set of keywords.</Paragraph> <Paragraph position="1"> The query analyzer consults the domain-dependent ontology i.e. knowledge base for recognizing these tokens and keywords. It may happen that some words/terms may not found in the knowledge base. Those words do not contain any semantic information and are simply discarded.</Paragraph> <Paragraph position="2"> EACL 2006 Workshop on Multilingual Question Answering - MLQA06 For example: If our input query is ^ySS'ydoeA-thoe SS&quot;'yUth |[?]'yoe-'yoe ^y|yuoSS'y(c)3/4y &quot;-yoeOyOthoeO-yoe-yoe -YyUiAth3OyAsth (eppudu falaknuma express gunturuku veltundi [When the Falaknuma Express goes to Guntur]) Here query is parsed based on spaces. After parsing each word, it is searched in the knowledge base until the word is found. After searching each word/term in the knowledge base, their types and semantic information are put in a list of tokens. Each token has three properties: the token value, its type and semantic information that it contains. These tokens and keywords are used to decide the proper query frame.</Paragraph> <Paragraph position="3"> For the above example, the tokens identified are SS&quot;'yUth |[?]'yoe-'yoe ^y|yuoSS'y(c)3/4y (Falaknuma Express) as Train name and &quot;-yoeOyOthoeO-yoe (Guntur) as Station name. Whereas ^ySS'ydoeA-thoe (eppudu [when]), -YyUiAth3OyAsth (veltundi [goes]) are under keywords list.</Paragraph> </Section> <Section position="4" start_page="55" end_page="56" type="sub_section"> <SectionTitle> 3.4 Query Frame Decision </SectionTitle> <Paragraph position="0"> During the analysis of query, the keywords in the input query are detected. In this step, based on the tokens and keywords, we identify the appropriate query frame.</Paragraph> <Paragraph position="1"> Restricting the query domain and information resource, the scope of the user request can be focused. That is, there are a finite number of expected question topics. Each expected question topic is defined under a single query frame.</Paragraph> <Paragraph position="2"> Some query frame examples for Railway information system are fare of a journey [Fare], arrival [Arr_Time] or departure time [Dep_Time] of a train, trains between important stations [Trains_Imp_Stations], scheduled time [Sched_Time], weekly frequency of a train</Paragraph> <Paragraph position="4"> of reservation class in a particular train [Reservation_Availability] and PNR enquiry [PNR_Enquiry].</Paragraph> <Paragraph position="5"> It is important to select the appropriate query frame for the user request; because in some cases ambiguity will occur i.e. a single natural language query statement may belong to one or more query frames means same keywords are used to identify the query frames.</Paragraph> <Paragraph position="6"> For example keywords like -YyUAth3/ (vellu [go]), -'yE-thoea (vachhu [come]), EYthY=O-yoe (cheru [reach]), and xthOyoe'oeUthoeAYthY=O-yoe (bayuluderu [start]) etc. are used to identify the query frames [Arr_Time], [Dep_Time], and [Trains_Imp_Stations]. To resolve this ambiguity, we consider what/which (question having words %0 (ee [what]), %0%0 (eeee [what]), %0degy (evi [which]) etc.) type of questions like [?]'yoeoA&quot;~thU!oth [?]'yoeOyA~th +-'yIO---yoe xthOyoe'oeUthoeAYthY=O-yoe OC/YyUAth3/ %0degy (newdelhi nundi howrahku bayaluderu raillu evi [What are the trains starts from New Delhi to Howrah]) are under [Trains_Imp_Stations] query frame. Where as, when (questions having words ^ySS'ydoeA-thoe (eppudu [when]), ^y(r)yi&quot;-yOyOthUth-yoe (ennigantalaku [at what time]), ^y(r)yiOyOthP~y (ennintiki [at what time]) etc.) type of questions like ^y(r)yi&quot;-yOyOthUth-yoe $?Y=UIthyAE-- O--NthA&quot;--(r)y ^y|yuoSS'y(c)3/4y xthOyoe'oeUthoeAYthY=O-yoeAE-thoeOyAsth (ennigantalaku kolkata rajadhani express bayaluderutundi [When Kolkata Rajdhani Express starts]) are under [Dep_Time] query frame. Similarly, weekday names like a1/4-'yoe---O-y-'yoeoe (somavaaramu) [Monday], -'yoeOy&quot;-yUAth---O-y-'yoeoe (mangalavaaramu) [Tuesday] etc. and keywords used in [Arr_Time]/ [Dep_Time] query frame are used to identify the [Arr_Frequency]/ [Dep_Frequency] query frame.</Paragraph> <Paragraph position="7"> In contrast, separate keywords are used to identify [Arr_Time] and [Dep_Time] query frames. But keywords like ^1/4AE-thoeOyAsth (potundi [go]), -YyUAth3/ (vellu [go]) etc. are used to identify both [Arr_Time] and [Dep_Time] query frames. To resolve this ambiguity, we consider the station type, i.e. whether the station is source or destination. If the station is source station (station name succeeded by postpositions like [?]'yoeOyA~th (nundi [from]), [?]'yoeOyEth (nunchi [from])), then we conclude that our query is under [Dep_Time] query frame.</Paragraph> <Paragraph position="8"> Otherwise query will be under [Arr_Time] query frame. For example, questions like ^ySS'ydoeA-thoe SS&quot;'yUth| [?]'yoe-'yoe ^y|yuoSS'y(c)3/4y &quot;-yoeOyOthoeO-yoe-yoe -YyUiAth3OyAsth (eppudu falaknuma express gunturuku veltundi [When the Falaknuma Express goes to Guntur]) is under [Arr_Time] query frame. But, questions like ^ySS'ydoeA-thoe SS&quot;'yUth |[?]'yoe-'yoe ^y|yuoSS'y(c)3/4y &quot;-yoeOyOthoeO-yoe [?]'yoeOyA~th -YyUiAth3OyAsth (eppudu falaknuma express gunturu nundi veltundi [When the Falaknuma Express goes from Guntur]) is under [Dep_Time] query frame.</Paragraph> <Paragraph position="9"> The selection process of query frame has a great influence on the precision of the system, while there is not much likelihood of errors in other processes, such as getting the information from the dialogue history or generating SQL statement(s) from the selected query frame and/or retrieving the answer from the database and generating natural language answer from the retrieved result.</Paragraph> <Paragraph position="10"> EACL 2006 Workshop on Multilingual Question Answering - MLQA06</Paragraph> </Section> <Section position="5" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 3.5 Dialogue Manager </SectionTitle> <Paragraph position="0"> The role of the Dialogue Manager (DM) differs slightly between different dialogue systems. But the primary responsibility of the DM is to control the flow of dialogue by deciding how the system should respond to a user request and the coordination of the other components in the system. If some information is missing or a request is ambiguous, clarification questions are specified by the DM and posed to the user.</Paragraph> <Paragraph position="1"> For example in general, users ask questions about Arrival/Departure time without mentioning journey of train i.e. Upward/Downward journey, then system asks the user for proper information.</Paragraph> <Paragraph position="2"> Sometimes user may not give correct information (like missing Train name, Station name or query does not belong to any of the query frames etc.). At that time DM generates error message describing that missed information. In another case user asks questions without knowledge. In this case DM generates a cooperative message, which will help the user in further requests.</Paragraph> <Paragraph position="3"> As a basis for the above tasks the DM utilizes the dialogue history. Here dialogue history records the focal information, i.e what has been talked in the past and what is talking at present.</Paragraph> <Paragraph position="4"> It is used for dialogue control and disambiguation of context dependent requests. The DM gets a semantic frame from the other system components. This frame is filled by interpreting the request in the context of the ongoing dialogue, domain knowledge, and dialogue history. The DM then prompts for missing information or sends a SQL query. Before the query is sent off, DM checks whether new information is contained in the query or the information is contradictory to information given before. If this is the case then the DM can either keep the original information or replace it with the new one in the dialogue history or engage in a confirmation subdialogue. null The DM looks at the query after language processing has been completed (but before the formal query is issued), as well as after the result has been obtained from the formal query. The accuracy of the system mainly depends on the representation of the dialogue history and how the DM responds to the user's dialogue.</Paragraph> </Section> <Section position="6" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 3.6 SQL Generation </SectionTitle> <Paragraph position="0"> Once the query frame is selected for a question, the corresponding procedure for the SQL query generation is called. For each query frame there is a procedure for SQL statement(s) generation. In order to generate the SQL query, it needs the tokens generated by the query analyzer.</Paragraph> <Paragraph position="1"> If the tokens are presented in the current query, it uses them. Otherwise it gets the token information from the dialogue history. For example, in the arrival time queries user has to specify Train name/no and station/city name where he/she needs to go. If he/she did not mention that information, SQL generation procedure gets the information from the dialogue history.</Paragraph> <Paragraph position="2"> question to the SQL query For the fare related query, SQL generation procedure would be called depending on the type of train. The procedure considers that the user will provide the train name and reservation class. If the train is of Express type, it considers that the user may provide either the source and destination stations of journey or the distance of journey. If it is of Rajdhani type, it considers that the user may provide source and destination station of journey. Similarly for the other query frames, SQL generation procedure considers that the user provide the necessary information.</Paragraph> <Paragraph position="3"> ^ySS'ydoeA-thoe SS&quot;'yUth |[?]'yoe-'yoe ^y|yuoSS'y(c)3/4y &quot;-yoeOyOthoeO-yoe-yoe -YyUiAth3OyAsth (eppudu falaknuma express gunturuku veltundi When the Falaknuma Express goes to Guntur])? null Train name: SS&quot;'yUth |[?]'yoe-'yoe ^y|yuoSS'y(c)3/4y (Falaknuma Express) null Station name: &quot;-yoeOyOthoeO-yoe (Guntur) Keywords: ^ySS'ydoeA-thoe (eppudu [when]), -YyUiAth3OyAsth (veltundi [goes]).</Paragraph> <Paragraph position="4"> The [Arr_Time] Query frame is selected.</Paragraph> <Paragraph position="5"> The system checks with the user for up/down journey of the train Let user asked about upward journey of train via DM.</Paragraph> </Section> </Section> class="xml-element"></Paper>