File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0708_metho.xml
Size: 21,688 bytes
Last Modified: 2025-10-06 14:08:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0708"> <Title>Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2TheC-star II Domain, Database, </SectionTitle> <Paragraph position="0"> and Interlingua The C-star II interlingua (Levin et al., 1998) was developed between 1997 and 1999 for use in the C-star II 1999 demo (www.c-star.org). Association for Computational Linguistics.</Paragraph> <Paragraph position="1"> Algorithms and Systems, Philadelphia, July 2002, pp. 53-60. Proceedings of the Workshop on Speech-to-Speech Translation: c: can I have some flight times that would leave some time around June sixth a: the there are several flights leaving D C there'd be one at one twenty four there's a three fifty nine flight that arrives at four fifty eight ...</Paragraph> <Paragraph position="2"> what time would you like to go c: I would take the last one that you mentioned ...</Paragraph> <Paragraph position="3"> a: what credit card number would you like to reserve this with c: I have a visa card and the number is double oh five three three one one six ninety nine eighty seven a okay c: the expiration date is eleven ninety seven ...</Paragraph> <Paragraph position="4"> a okay they should be ready tomorrow c: okay thank you very much with six participating research sites. The semantic domain was travel, including reservations and payments for hotels, tours, and transportation. Figure 1 shows a sample dialogue from the C-star II database. (C is the client and a is the travel agent.) The C-star II database contains 2278 English sentences and 7148 non-English (Japanese, Italian, Korean) sentences tagged with interlingua representations. Most of the database consists of transcripts of roleplaying conversations.</Paragraph> <Paragraph position="5"> The driving concept behind the C-star II interlingua is that there are a limited number of actions in the domain |requesting the price of a room, telling the price of a room, requesting the time of a flight, giving a credit card number, etc. |and that each utterance can be classi ed as an instance of one of these domain actions. Figure 2 illustrates the components of the C-star II interlingua: (1) the speaker tag, in this case c for client, (2) a speech act (request-action), (3) a list of concepts (reservation, temporal, hotel), (4) arguments (e.g., time), and (5) values of ar- null guments. The C-star II interlingua speci cation document contains de nitions for 44 speech acts, 93 concepts, and 117 argument names. The domain action is the part of the interlingua consisting of the speech act and concepts, in this case request-action+reservation+temporal+hotel. The domain action does not include the list of argument-value pairs.</Paragraph> <Paragraph position="6"> First it is important to point out that domain actions are created compositionally. A domain action consists of a speech act followed by zero or more concepts. (Recall that argument-value pairs are not part of the domain action.) The Nespole interlingua includes 65 speech acts and 110 concepts. An interlingua speci cation document de nes the legal combinations of speech acts and arguments.</Paragraph> <Paragraph position="7"> The linguistic justi cation for an interlingua based on domain-actions is that many travel domain utterances contain xed, formulaic phrases (e.g., can you tell me; I was wondering; how about; would you mind, etc.) that signal domain actions, but either do not translate literally into other languages or have a meaning that is su ciently indirect that the literal meaning is irrelevant for translation. To take two examples, how about as a signal of a suggestion does not translate into other languages with the words corresponding to how and about.Also,would you mind might translate literally into some European languages as a way of signaling a request, but the literal meaning of minding is not relevant to the translation, only the fact that it signals politeness.</Paragraph> <Paragraph position="8"> The measure of success for the domain-action basedinterlingua(asdescribedin(Levinetal., 2000a)) is that (1) it covers the data in the C-star II database with less than 8% no-tag rate, (2) inter-coder agreement across research sites is reasonably high: 82% for speech acts, 88% for concepts, and 65% for domain actions, and (3) end-to-end translation results using an an null alyzer and generator written at di erent sites were about the same as end-to-end translation results using an analyzer and generator written The Nespole interlingua has been under development for the last two years as part of the Nespole project (http://nespole.itc.it). Fig-I would like to make a hotel reservation for the fourth through the seventh of july</Paragraph> <Paragraph position="10"> ure 3 shows a Nespole dialogue. The Nespole domain does not include reservations and payments, but includes more detailed inquiries about hotels and facilities for ski vacations and summer vacations in Val di Fiemme, Italy. (The tourism board of the Trentino area is a partner of the Nespole project.) Most of the database consists of transcripts of dialogues between an Italian-speaking travel agent and an English or German speaker playing the role of a traveller.</Paragraph> <Paragraph position="11"> There are fewer xed, formulaic phrases in the Nespole domain, prompting us to move toward domain actions that are more general, and also requiring more detailed interlingua representations. Changes from the C-star II interlingua fall into several categories: 1. Extending semantic expressivity and syntactic coverage: Increased coverage of modality, tense, aspect, articles, fragments, coordinate structures, number, and rhetorical relations. In addition, we have added more explicit representation of grammatical relations and improved capabilities for representing modi cation and embedding.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Additional Domain-Speci c Con- </SectionTitle> <Paragraph position="0"> cepts: New concepts include giving directions, describing sizes and dimensions of objects, traveling routes, equipment and gear, airports, tourist services, facilities, vehicles, information objects (brochures, web pages, rules and regulations), hours of operation of businesses and attractions, etc.</Paragraph> <Paragraph position="1"> 3. Utterances that accompany multi-modal gestures: The Nespole system includes capabilities to share web pages and draw marks such as circles and arrows on web pages. The interlingua was extended to cover colord, descriptions of two-dimensional objects, and actions of showing. null 4. General concept names from Word-Net: The Nespole interlingua includes conventions for making new concept names based on WordNet synsets.</Paragraph> <Paragraph position="2"> 5. More general domain actions replacing speci c ones: For example, replacing hotel with accommodation.</Paragraph> <Paragraph position="3"> Interlinguas based on domain actions contrast with interlinguas based on lexical semantics (Dorr, 1993; Lee et al., 2001; Goodman and Nirenburg, 1991). A lexical-semantic interlingua includes a representation of predicates and their arguments. For example, the sentence I want to take a vacation has a predicate want with two arguments I and to take a vacation,whichin turn has a predicate take and two arguments, I and a vacation. Of course, predicates like take may be represented as word senses that are less language-dependent like participate-in.The strength and weakness of the lexical-semantic approach is that it is less domain dependent than the domain-action approach.</Paragraph> <Paragraph position="4"> In order to cover the less formulaic utterances of the Nespole domain,wehavetakenastep closer to the lexical-semantic approach. However, we have maintained the overall framework of the domain-action approach because there are still many formulaic utterances that are better represented in a non-literal way. Also, in order to abstract away from English syntax, concepts such as disposition, eventuality, and obligation are not represented in the interlingua as argument-taking main verbs in order to accommodate languages in which these meanings are c: and I have some questions about coming about a trip I'm gonna be taking to Trento a: okay what are your questions c: I currently have a hotel booking at the Panorama-Hotel in Panchia but at the moment I have no idea how to get to my hotel from Trento and I wanted to ask what would be the best way for me to get there a: okay I'm gonna show you a map that and then describe the directions to you okay so right so you will arrive in the train station in Trento the that is shown in the middle of the map stazione FFSS and just below that here is a bus stop labeled number forty so okay on the map that I'm showing you here the hotel is the orange building off on the right hand side ...</Paragraph> <Paragraph position="5"> c: I also wanted to ask about skiing in the area once I'm in Panchia a: all right just a moment and I'll show you another map c: okay a: okay so on the map you see now Panchia is right in the center of the map c: I see it represented as adverbs or su xes on verbs. Figure 4 shows the Nespole interlingua representation corresponding to the C-star II interlingua in Figure 2. The speci cation document for the Nespole interlingua de nes 65 speech acts, 110 concepts, 292 arguments, and 7827 values grouped into 222 value classes. As in the C-star II interlingua, domain actions are de ned compositionally from speech acts and arguments in combinations that are allowed by the interlingua speci cation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Comparison of Nespole and C-star II Interlinguas </SectionTitle> <Paragraph position="0"> It is useful to compare the Nespole and C-star II Interlinguas in expressivity, language independence, and simplicity.</Paragraph> <Paragraph position="1"> Expressivity of the Nespole interlingua, sivity is the no-tag rate in the databases. The no-tag rate is the percentage of sentences that cannot be assigned an interlingua representation by a human expert. The C-star II database tagged with C-star II interlingua had a no-tag rate of 7.3% (Levin et al., 2000a). The C-star II database tagged with Nespole interlingua has a no-tag rate of 2.4%. More than 300 English sentences in the C-star II database that were not covered by the C-star II interlingua are now covered by the Nespole interlingua. (See Table 2.) We conclude from this that the Nespole interlingua is more expressive in that it covers more data.</Paragraph> <Paragraph position="2"> Language-independence of the Nespole interlingua: We do not have a numerical measure of language-independence, but we note that interlinguas based on domain actions are particularly suitable for avoiding translation mismatches (Dorr, 1994), particularly head-switching mismatches (e.g., I just arrived and Je vient d'arriver where the meaning of recent past is expressed by an adverb just or a syntactic verb vient (venir).) Interlinguas based on domain actions resolve head-switching mismatches by identifying the types of meanings that are often involved in mismatches |modality, evidentiality, disposition, and so on |and assigning them a representation that abstracts away from predicate argument structure. Interlinguas based on domain actions also neutralize the di erent ways of expressing indirect speech acts within and across languages (for example, Would you mind..., I was wondering if you could....,andPlease.... as ways of requesting an action). Although Nespole domain actions are more general than C-star II domain actions, they maintain language independence by abstracting away from predicate-argument structure.</Paragraph> <Paragraph position="3"> Simplicity and cross-site reliability of the Nespole interlingua: Simplicity of an inter-lingua is measured by cross-site reliability in I would like to make a hotel reservation for the fourth through the seventh of july inter-coder agreement and end-to-end translation performance. At the time of writing this paper we have not conducted cross-site inter-coder agreement experiments using the Nespole interlingua. We have, however, conducted cross-site evaluations (Lavie et al., 2002), in which the analyzer and generator were written at di erent sites. Experiments at the end of C-star II showed that cross-site evaluations were comparable to intra-site evaluations (analyzer and generator written at the same site) (Levin et al., 2000b). Nespole evaluations so far show a loss of cross-site reliability: intra-site evaluations are noticeably better than cross-site evaluations, as reported in (Lavie et al., 2002). This seems to indicate that developers at di erent sites have a lower level of agreement on the Nespole interlingua. However there are other possible explanations for the discrepancy |for example developers at di erent sites may have focused their development on di erent sub-domains | that are currently under investigation.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Scalability of the Nespole </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Interlingua </SectionTitle> <Paragraph position="0"> The rest of this paper addresses the scalability of the Nespole interlingua. A possible criticism of domain actions is that they are domain dependent and that the number of domain actions might increase too quickly with the size of the domain. In this section, we will examine the rate of increase in the number of domain actions as a function of the amount of data and the diversity of the data.</Paragraph> <Paragraph position="1"> Di erences in the C-star and Nespole Domains: We will rst show that the C-star and Nespole domains are signi cantly di erent even though they both pertain to travel. The combination of the two domains is therefore signi cantly larger than either domain alone.</Paragraph> <Paragraph position="2"> In order to demonstrate the di erences between the C-star travel domain and the Nespole travel domain, we measured the overlap in vocabulary. The numbers in Table 4 are based on the rst 7900 word tokens in the C-star English database and the rst 7900 word tokens in the Nespole English database. The table shows the number of unique word types in each database, the number of word types that occur in both databases, and the number of word types that occur in one of the databases, but not in the other. In each database, about half of the word types overlap with the other database. The non-overlapping vocabulary (402 C-star word types and 344 Nespole word types) indicates that the two databases cover quite di erent aspects of the travel domain.</Paragraph> <Paragraph position="3"> Scalability: Argument 1: We will now begin to address the issue of scalability of the domain action approach to interlingua design.</Paragraph> <Paragraph position="4"> Our rst argument concerns the number of Number of unique word types terlingua, designed for coverage of the C-star travel domain, included 44 speech acts and 93 concepts. The Nespole interlingua, designed for coverage of the combined C-star and Nespole domains, has 65 speech acts and 110 concepts. Thus a relatively small increase in the number of speech acts and concepts is required to cover a signi cantly larger domain.</Paragraph> <Paragraph position="5"> The increased size of the C-star/Nepsole domain is reflected in the number of arguments and values. The C-star II interlingua contained de nitions for 117 arguments, whereas the Nespole interlingua contains de nitions for 292 arguments. The number of values for arguments also has increased signi cantly in the Nespole domain. There are 7827 values grouped into 222 classes (airport names, days of the week, etc.). Distributional Data: number of domain actions in each database: Next we will present distributional data concerning the number of domain actions as a function of database size. We will compare several databases: Old C-star English (around 2278 sentences tagged with C-star II interlingua), New C-star English (2564 sentences tagged with Nespole interlingua, including the 2278 sentences from Old C-star English), Nespole English, Nespole German, and Nespole Italian. Table 2 shows the number of sentences and the number of domain actions in each database. The number of domain actions refers to the number of types, not tokens, of domain actions.</Paragraph> <Paragraph position="6"> Distributional data: Coverage of the top 50 domain actions: Table 3 shows the percentage of each database that is covered by the 5, 10, 20, and 50 most frequent domain actions in that database. For each database, the domain actions were ordered by frequency. The percentage of sentences covered by the top-n domain actions was then calculated. For this experiment, we separated sentences spoken by the traveller (client) and sentences spoken by the travel agent (agent). C-star data in Table 3 refers to 2564 English sentences from the C-star database that were tagged with Nespole interlingua. Nespole data refers to the English portion of the Nespole database (1446 sentences). Combined data refers to the combination of the two (4014 sentences).</Paragraph> <Paragraph position="7"> Two points are worth noting about Table 3.</Paragraph> <Paragraph position="8"> First, the Nespole agent data has a higher coverage rate than the Nespole client data. That is, more data is covered by the top-n domain actions. This may be because there was was</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Nespole </SectionTitle> <Paragraph position="0"> only a small amount of English agent data and it was spoken by non-native speakers. Second, the combined data has a slightly lower coverage rate than either the C-star or Nespole databases alone. This is expected because, as shown above, the combined domain is signi cantly more diverse than either domain by itself.</Paragraph> <Paragraph position="1"> Scalability: Argument 2: Table 3 provides additional evidence for the scalability of the Nespole interlingua to larger domains. In the combined C-star and Nespole domain, the top 50 domain actions cover only slightly less data than the top 50 domain actions in either domain separately. There is not, in fact, an explosion of domain actions when the two C-star and Nespole domains are combined.</Paragraph> <Paragraph position="2"> Distributional Data: domain actions as a function of database size: Table 3 shows that in each of our databases, the 50 most frequent domain actions cover approximately 65% of the sentences. The next issue we address is the nature of the \tail&quot; of less frequent domain actions covering the remainder of the data.</Paragraph> <Paragraph position="3"> Figure 5 shows the number of domain actions as a function of data set size. Sampling was done for intervals of 25 sentences starting at 100 sentences. For each sample size s there was ten-fold cross-validation. Ten random samples of size s were chosen, and the number of di erent domain actions in each sample was counted. The average of the number of domain actions in each of the ten samples of size s are plotted in Figure 5. The four databases represented in Figure 5 are the C-star English database tagged with C-star II interlingua, the C-star II database tagged with Nespole interlingua, the Nespole English database, and the combined C-star and Nespole English databases.</Paragraph> <Paragraph position="4"> Expressivity, Argument 2: Figure 5 provides evidence for the increased expressivity of the Nespole interlingua. In contrast to Table 3, which deals with samples containing the most frequent domain actions, the samples plotted in Figure 5 contain random mixtures of frequent and non-frequent domain actions. The curve representing the C-star data with C-star II interlingua is the slowest growing of the four curves. This is because the grain-size of meaning represented in the C-star II interlingua was larger than in the Nespole interlingua. Also many infrequent domain actions were not covered by the C-star II interlingua. The faster growth of the curve representing the C-star data with Nespole interlingua indicates improved expressivity of the Nespole interlingua |it covers more of the infrequent domain actions. The highest curve in Figure 5 represents the combined C-star and Nespole domains. This curve is higher than the others because, as shown above, the two travel domains are signi cantly di erent from each other.</Paragraph> <Paragraph position="5"> Expressivity and Simplicity, the right balance: Comparing Table 3 and Figure 5, we argue that the Nespole interlingua strikes a good balance between expressivity and simplicity. Table 3 shows evidence for the simplicity of the Nespole interlingua: Only 50 domain actions are needed to cover 60-70% of the sentences in the database. Figure 5 shows evidence for expressivity: because domain actions are compositionally formed from speech acts and concepts, it is possibletoformalargenumberoflow-frequency null domain actions in order to cover the domain.</Paragraph> <Paragraph position="6"> Over 600 domain actions are used in the combined C-star and Nespole domains.</Paragraph> </Section> </Section> class="xml-element"></Paper>