File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1020_metho.xml
Size: 12,559 bytes
Last Modified: 2025-10-06 14:12:43
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1020"> <Title>STOCHASTIC REPRESENTATION OF CONCEPTUAL STRUCTURE IN THE ATIS TASK</Title> <Section position="4" start_page="0" end_page="121" type="metho"> <SectionTitle> MAP DECODING OF CASES </SectionTitle> <Paragraph position="0"> Let us denote by</Paragraph> <Paragraph position="2"> the sequence of acoustic observations extracted from a spoken sentence, by</Paragraph> <Paragraph position="4"> the sequence of words constituting the sentence, and by</Paragraph> <Paragraph position="6"> the sequence of case labels, where ci takes its values from a pre-defined set of conceptual relations C = {C1,C2,...CK}. The problem of finding W and C given A can be approached using the maximum a posteriori decoding (MAP). Following this criterion we want to find the sequence of words ~V and the sequence of cases C that maximizes the conditional probability P(~V, CJA) = max P(W,C\]A). (6) WxC This conditional probability can be written using the Bayes inversion formula as:</Paragraph> <Paragraph position="8"> In this formula P(C) represents the a-priori probability of the sequence of cases, P(W I C) is the probability of a sentence expressing a given sequence of cases, and P(A I W,C) is the acoustic model. We can reasonably assume that the acoustic representation of a word is independent of the conceptual relation it belongs to, hence:</Paragraph> <Paragraph position="10"> and this is the criterion that is usually maximized in stochastic based speech recognizers, for instance those using hidden Markov modeling \[1\] for the acoustic/phonetic decoding. In this paper we deal with the remaining terms</Paragraph> <Paragraph position="12"> We proceed by assuming that:</Paragraph> <Paragraph position="14"> These are Markov processes of order n and m respectively, and if n and m are large we don't lose any generality by making this assumption. For practical purposes n and m should be small enough to allow a reliable estimation of the probabilities from a finite set of data. An additional assumption in equation (10) is that a given word in the sentence, used for expressing a certain case, is independent of the case of the preceding words. Assuming that the sequence of words could be directly observed (for instance providing a transcription of the uttered sentence), and the sequence of cases is unknown, equations (10) and (11) describe a a hidden Markov process, where the states of the underlying model correspond to the cases, the observation probabilities of each state are represented by equation (10) in the form of state local (n + 1)gram language models, and the transitions between the states are described by equation (11).</Paragraph> </Section> <Section position="5" start_page="121" end_page="121" type="metho"> <SectionTitle> THE FROM-TO TASK </SectionTitle> <Paragraph position="0"> A first evaluation of the model was performed based on a set of 825 sentences artificially generated by a finite state grammar \[3\] using a vocabulary of 41 different words. The sentences express different ways of making requests to travel between two cities. A typical example is: I want to travel into Boston and I am interested in flights between Boston and Washington The task consisted of identifying the origin and destination cities of the flight. The relevant cases of this task are then flight origin and flight destination. However the model has three states, ORIGIN, DESTINATION and DUMMY. 50 sentences, randomly selected out of the 825, were used to estimate the parameters of the model, i.e. the transition probabilities (equation 11) and the state local language models (equation 10), with n = 1 and m = 1 (i.e. the underlying Markov process was a 1 st order process and the state local language models were bi-grams). The training sentences were hand-labeled with the appropriate cases.</Paragraph> <Paragraph position="1"> The remaining 775 sentences were decoded using Viterbi decoding algorithm. The performance was assessed by counting the number of sentences that were segmented assigning the correct words (i.e. the correct city names) to the DESTINATION and ORIGIN states. We observed that 7% of the sentences (55 out of 775) had a wrong origin/destination assignment. In some of the wrong segmentations one of the relevant states was missing, the other state containing both the real destination and origin cities. In other examples, similar to the sentences shown above, both the destination and the origin states were assigned to the same city name, that appeared twice in the sentence. To improve the performance we imposed some additional constraints in the decoding procedure. For a given sentence the decoded state sequence was searched among those sequences of states where both the origin and destination states were visited only once (i.e. when one of those states was left, the current partial path was not allowed to enter that state again). In addition, the phrases assigned to the origin and destination states had to include different city names.</Paragraph> <Paragraph position="2"> These constraints, representing a higher level a priori knowledge of the task, were imposed in the Viterbi decoding by keeping track of the past sequence of states for each partial candidate solution, and duplicating the partial solutions when two (or more) candidates merged at the same state and showed conflicting constraints. This approach resulted in a substantial improvement of the performance. Only one error was observed out of the 775 test sentences ( 0.13% error rate). The same level of performance was obtained in experiments using a 1-gram language model inside each state, but increasing the number of states to five: ORIGIN, DESTINATION, DUMMY, FROM, TO.</Paragraph> <Paragraph position="3"> The last two states accounted for the expressions that usually precede the origin and destination city names respectively. For example the FROM state was associated to expressions of the kind: from, depart out of, leaving, etc., and the TO state was associated to expressions like: to, going to, arriving into, etc. This experiment indicates that there is a tradeoff between the number of states and the complexity (order) of the state language models. Expanding the set of states to reflect the linguistic structure of the sentences may result in a reduction of the number of parameters to be estimated during training, giving a more robust model.</Paragraph> </Section> <Section position="6" start_page="121" end_page="123" type="metho"> <SectionTitle> THE ATIS TASK </SectionTitle> <Paragraph position="0"> The technique of case decoding is being applied to the class A sentences of the DARPA ATIS task. A sentence of this task can be analyzed in terms of 7 general cases, that are QUERY, generally associated to the phrases expressing the kind of request, OBJECT expressing the object of the query, ATTRIBUTE that describes some attributes of the object, RESTRICTION describing the restrictions on the values of the answer, Q_ATTR describing possible attributes of the query, AND including connectives like and, or, also, indicating that the sentence may have more that one query. Of course we include a DUMMY state like in the above mentioned examples. For example, a sentence like: What type of economy fare could I get from San Francisco to Dallas on the 25th of April is segmented as: words) were hand-labeled according to this set of states and the transition probabilities and the state local bigram models were estimated using the maximum likelihood criterion. Table 2 shows examples of the phrases used for estimating the bigram language models for some of the defined states. Considering the large number of parameters to be estimated ( i.e. the transition probabilities between the 44 states of the model and the 44 bigram models extended to the entire vocabulary of 501 words), and considering the small number of training sentences, this estimation poses robustness problems. One way to alleviate these problems consists of grouping the words in the vocabulary into equivalence classes. For example all the' city names can be grouped in the same class, as well as the airport names, the numbers, the airline names, etc. The testing of the system was performed on the transcribed Jun-90 and Feb-91 class A test sentences. New words were allocated to a new-word category that was assigned a small probability within each state. Table 3 reports the number of sentences, for each test set, that were correctly labeled by the case decoder, along with the statistics on the correctly assigned cases. Table 4 shows examples of correct segmentations from the FEB-91 test set. It is interesting to notice the allocation of the connective and to different cases in sentences 1),3), and 4)..Although sentences 1) and 3) contain similar expressions (between ... and ...), the system recognizes that in the first case the phrase refers to a period of time, while in the second case it refers to origin and destination cases. Moreover, sentence 3) shows that the concept relations origin and destination are not necessarily referred to the origin and destination of a flight, but can be referred to other events, like ground transportation in this case. This sensitivity to the context (to the value of the 0 BJ ECT in the example above) shown by certain cases must be taken into account by the module that will interpret the conceptual segmentation and generate the SQL query. In sentence 4) the word and is clearly interpreted as connecting two distinct restrictions on the query. The same phenomenon is shown in sentence 5) where the word or connects two alternative possible origins of the flight. Table 5 shows examples of incorrect segmentations from the FEB-91 test set. In sentence 1) the phrase used for Eastern should be assigned to the airline case. The error is due to the fact that the word Eastern was not observed in the training set. In sentence 2) the phrase through Dallas Fort Worth should have been labeled with the connect case, but this case has very few examples in the training set dept_time leaving after 1:00 pm that depart in the afternoon way round- trip return that are round-trip class a class Q W ticket a 1st class ticket which have 1st class service available 1)Please list all flights between Baltimore and Atlanta on Tuesdays between 4 in the afternoon and 9 in the eveninc DUMMY: Please QUERY: list all OBJECT: the flights origin: between Baltimore destin: and Atlanta day: on Tuesdays time: between 4 in the afternoon and 9 in the evening 2) What's the cheapest round-trip airfare on American flight 1074 from Dallas to Philadelphia QUERY: What's a_fare: the cheapest a_way: round-trip OBJECT: airfare airline: on American flcode: flight 1074 origin: from Dallas destin: to Philadelphia 3) What kind of ground transportation is there between the airport and dowrltown Atlanta QUERY: What kind of OBJECT: ground transportation Q_ATTR: is there origin: between the airport destin: and downtown Atlanta 4) What are the restrictions on the cheapest fare from Pittsburgh to Denver and from Denver to San Francisco QUERY: What are OBJECT: the restrictions fare: on the cheapest fare i !origin: from Pittsburgh destin: to Denver AND: and origin: from Denver destin: to San Francisco 5)Display flights from Oakland or San Francisco to Denver origin: from Denver through Dallas Fort Worth destin: to Philadelphia 3)Can you please tell me the type of plane that my client would be flying on from Baltimore to Pittsburgh DUMMY: Can you please QUERY: tell me the type of plane that my client would be OBJECT: flying on origin: from Baltimore destin: to Pittsburgh Table 5: Examples of incorrectly decoded test sentences from FEB-91 test set with a consequent poor estimation of the parameters related to it. The same problem, i.e. inadequate training, is also the cause of the wrong segmentation of sentence 3.</Paragraph> <Section position="1" start_page="123" end_page="123" type="sub_section"> <SectionTitle> Future Work </SectionTitle> <Paragraph position="0"> The goal of the understanding system is to retrieve the information in the ATIS database. In order to do this we are developing a module that translates the conceptual representation of the sentence obtained with the described method into an SQL query.</Paragraph> <Paragraph position="1"> Since the ambiguity of the sentence is resolved by the conceptual segmentation, this module implements a deterministic mapping.</Paragraph> </Section> </Section> class="xml-element"></Paper>