XML Viewer - e83-1007

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1007_metho.xml
Size: 14,773 bytes
Last Modified: 2025-10-06 14:11:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="E83-1007">
  <Title>VOCAL INTEILFACE FOR A MAN-MACHINE DIALOG</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
I INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> A great deal of interest is actually being shown in providing computer interfaces through dialog processing systems using speech input and output (Levinson and Shipley, 1979). In the same time, the amelioration of the microprocessor technology has allowed the implantation of word recognition and text-to-speech synthesis systems on single boards (Li~nard and Mariani, 1982 ; Gauvain, 1983 ; Asta and Li~nard, 1979) ; in our laboratory, such modules have been integrated into a compact unit that forms an autonomous vocal processor which has applications in a number of varied domains : vocal command of cars, of planes, office automation and computer-aided learning (N~el et al., 1982).</Paragraph>
    <Paragraph position="1"> Whereas most of the present language understanding systems require large computational resources, our goal has been to implement a dialog-handling board in the LIMSI's Vocal Terminal.</Paragraph>
    <Paragraph position="2"> The use of micro-systems introduces memory size and real-time constraints which have incited us to limit ourselves in the use of presently available computational linguistic techniques. Therefore, we have taken inspiration from a simple model of semantic network ; for the same reasons, the initial parser based on an Augmented Transition Network (Woods, 1970) and implemented on an IBM 370 (Memmi and Mariani, 1982) was replaced by another less time- and memory-consuming one.</Paragraph>
    <Paragraph position="3"> The work presented herein extends possible application fields by allowing an interactive vocal relation between the machine and its user for the execution of a specific task : the application that we have chosen is a man-machine communication with a robot manipulating blocks and using a Plan Generating System.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SPEECH I RECOGNIZER
SEMANTIC \[ SYNTACTIC PROCESSING
ANALYSIS
SEMANTIC \] TREATMENT
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"/>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SPEECH J SYNTHESIZER
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="6" start_page="0" end_page="43" type="metho">
    <SectionTitle>
II SYNTACTIC PROCESSING
A. Prediction Device
</SectionTitle>
    <Paragraph position="0"> Once the acoustic processing of the speech signal is performed by the 250 word-based recognition board, syntactic analysis is carried out.</Paragraph>
    <Paragraph position="1"> It may be noted that response time and word confusions increase with the vocabulary size of word recognition systems. To limit the degradation of performance, syntactic information is used : words that can possibly follow a given word may be predicted at each step of the recognition process with the intention of reducing vocabulary.</Paragraph>
    <Paragraph position="2">  B. Parameters Transfer In order to build a representation of the deep structure of an input sentence, parameters requested by the semanticprocedures must be filled with the correct values. The parsing method that we de ~ velopped considers the naturel language utterances as a set of noun phrases connected with function words (prepositions, verbs ...) which specify their relationships. At the present time, the set of noun phrases is obtained by segmenting the utterance at each function word.</Paragraph>
    <Paragraph position="4"/>
    <Paragraph position="6"/>
  </Section>
  <Section position="7" start_page="43" end_page="44" type="metho">
    <SectionTitle>
III SEMANTIC PROCESSING
</SectionTitle>
    <Paragraph position="0"> A. S\[stem knowledge data The computational semantic memory is inspired by the Collins and Quillian model, a hierarchical network in which each node represents a concept. Properties can be assigned to each node, which also inherits those of its ancestors. Our choice has been influenced by the desire to design a system which would be able to easily learn new conceptS ; that is, to complete or to modify its knowledge according to information coming from a vocal input/ output system.</Paragraph>
    <Paragraph position="1"> Each noun of the vocabulary is represented by a node in such a tree structure. The meaning of any given verb is provided by rules that indicate the type of objects that can be related. As far as adjectives are concerned, they are arranged in exclusive property groups.</Paragraph>
    <Paragraph position="2">  The knowledge-based data (which may be enlarged by information provided by the vocal channel) is complemented by temporary data which chronologically contain, in abbreviated form, events evoked during the dialogue.</Paragraph>
    <Paragraph position="3"> B. Assertion processin~ The small amount of data representing a given universe allows us to approach the computational treatment of these two complementary and contrary components of dialogue: learning and contestation. Every time an assertion is proposed by the user a procedure parses its semantic validity by answering the question &amp;quot;Does this sentence fit with the current state of the knowledge data ?&amp;quot;. If a contradiction is detected, it is pointed out to the user who must justify his proposal. If the user persists in his declaration, the machine may then modify its universe knowledge, otherwise the utterance is not taken into account.</Paragraph>
    <Paragraph position="4"> When no contradiction is encountered, the program enters into a learning process adding to the  temporary data or knowledge-based data.</Paragraph>
    <Paragraph position="5"> User : Un poisson poss~de des plumes (A fish has got feathers) System : J'ai compris ... As-tu quelque chose ajouter ? (I have understood ... Would you like to say something else ?)</Paragraph>
    <Paragraph position="7"> (It is an animal which has got scales)  I. Teaching utterances These assertions, characterized by the presence of a non-action verb, permit both the complete construction of the semantic network and of the concept relation rules specifying the possible entities that can serve as arguments for a predicate. null Although most of our knowledge results from long nurturing and frequent interactions with the outside world, it is possible to give an approximate meaning to concrete objects and verbs by using an elementary syntax. A new concept may be taught by filling in its position within the semantic network and possibly associating it with properties that will differentiate it from its brother nodes. Concept relation rules can be learned, too.</Paragraph>
    <Paragraph position="9"/>
  </Section>
  <Section position="8" start_page="44" end_page="44" type="metho">
    <SectionTitle>
2. Descriptive utterances
</SectionTitle>
    <Paragraph position="0"> Sentences involving an action verb are translated into an unambiguous representation which condenses and organizes information into the very same form as that of the concept relation rules from knowledge data. Therefore, semantic validity can be easily tested by a pattern-matching process. A semantic event reduced to a nested-triplet structure and considered as valid is then inserted in the dynamic-events memory, and can be requested later on by the question-answering process.</Paragraph>
    <Paragraph position="1"> Although the language is limited to a small subset of natural French, several equivalent syntactic structures are allowed to express a given event ; in order to avoid storing multiple representations of the same event, paraphrases of a given utterance are reduced to a single standard form.</Paragraph>
    <Paragraph position="2"> One of the task effected by a language understanding system consists of recognizing the concepts that are evoked inside the input utterances. As soon as ambiguities are detected, they are resolved through interaction with the user.</Paragraph>
    <Paragraph position="3"> U : Je prends le cube I (I am taking the cube I) S : S'agit-il du petit cube I ? (Is the small cube I in question ?)</Paragraph>
    <Paragraph position="5"> S:O.K.</Paragraph>
    <Paragraph position="6"> Relative~ clauses are not represented in the canonical form of the utterance in which they appear, but they are only used to determine which concept is in question.</Paragraph>
    <Paragraph position="7"> article i - Nun ! - Adjective I - Verb - article 2 - Adjec. 2 - Nun 2 abbreviated form : @ (( NI A1 )( N2 A2 ))) = semantic event E relation rule n deg i :</Paragraph>
    <Paragraph position="9"/>
    <Paragraph position="11"> saisis les cubes 2 et 5 (grasp cubes 2 and 5) prends le cube 2 et le 5 (take hold of the cube 2 and the 5 one) attrape le cube 2 et saisis le cube 5 (lay hold of the cube 2 and grasp the cube 5)</Paragraph>
  </Section>
  <Section position="9" start_page="44" end_page="44" type="metho">
    <SectionTitle>
3. Orders
</SectionTitle>
    <Paragraph position="0"> Input utterances beginning with an action verb specify an order that the machine connected to the vocal interface is supposed to execute ; in addition to the deep structure of this natural language message, a formal command language message is built and then sent to the machine. The task universe memory is modified in order to reflect the execution of a user's command.</Paragraph>
    <Section position="1" start_page="44" end_page="44" type="sub_section">
      <SectionTitle>
User : Prends la pyramide qui est sur la table et
</SectionTitle>
      <Paragraph position="0"> pose. la sur le gros cube (grasp the pyramid which is on the table and put it on the big  pyramide et que je pose la petite pyramide sur le gros cube 3 (You have asked me to grasp the small pyramid and put the small pyramid on the big cube 3)  In everyday language, intonation often contitutes the marker that discriminates between questions and assertions. Since prosody information is not presently taken into account by the word recognition system, the presence of an interrogative pronoun switches on the information research processing in permanent knowledge-data or in dynamic-events memory.</Paragraph>
      <Paragraph position="1"> I. Research in permanent knowledge-data The program is allowed to express its knowledge at the user's request, for instance, on concept meanings, or the systems abilities.</Paragraph>
      <Paragraph position="2">  The abbreviated semantic events list is closely examined, from recent to older data, until the question-pattern approximately matches one of the memorized events. Possible analogy between a memorized event and one evoked by the question is then analysed. Coincidences rarely happen, so the system must be able to ask for full specifications about the event that interests the user ; at that time there is a vocal discussion aimed at leading the system to that event in a step-wise manner.</Paragraph>
      <Paragraph position="3">  D. Processing a user's incomplete utterance An important specific quality of the semantic process is that it is able to accomodate bad acoustical recognition through intelligent interactive feedback.</Paragraph>
      <Paragraph position="4"> So, when one part of a given sentence has not been recognized, because of mispronunciation or background noise, the system produces a suitable question bringing the user to repeat the unrecognized word within his answer.</Paragraph>
      <Paragraph position="5"> Two cases can occur : if the word is again unrecognized, the system assumes that the entity is not in the prescribed vocabulary (containing the acoustic features of the words). An explanatory message is then produced through the synthesis module.</Paragraph>
      <Paragraph position="6"> if the lexical entity is well recognized this time, it is added to the previous utterance and computed in the same manner as the others.</Paragraph>
      <Paragraph position="7">  When a certain amount of acoustical components in a sentence have not been recognized, the system asks for the user to repeat his assertion.</Paragraph>
      <Paragraph position="9"> E. Sentence production 1. Translation of a deep structure into an output sentence  This process consists of inserting semantic entities into the suitable syntactic diagram which depends on the computational procedure that is activated (question answering, contradiction, learning, asking for specifications ...). Since each syntactic variation of a word corresponds to a single semantic representation, sentence generation makes use of verb conjugation procedures and concordance procedures.</Paragraph>
      <Paragraph position="10"> In order to improve the natural quality of speech, different types of sentences expressing one same idea may be generated in a pseudo-random manner. The same question asked to the system several times can thus induce different formulated responses. null 2. Text-to-speech transcription ambiguities A module of the synthesis process takes any French text and determines the elements necessary for the diphone synthesis, with the help of a dictionnary containing pronunciation rules and their exceptions (Prouts, 1979). However, some ambiguities concerning text-to-speech transcription can still remain and cannot be resolved without syntactico-semantic information ; for instance : &amp;quot;Les poules du couvent couvent&amp;quot; (the convent hens are sitting on their eggs) is pronounced by the synthesizer : / I PS p u I d y k u v ~ k u v E / (the convent hens ~onvent).</Paragraph>
      <Paragraph position="11"> To deal with that problem, we may send the synthesizer the phonetic form of the words.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="44" end_page="44" type="metho">
    <SectionTitle>
IV CONCLUSION
</SectionTitle>
    <Paragraph position="0"> The dialog experiment is presently running on a PDP 11/23 MINC and on an INTEL development system with a VLISP interpreter in real-time and using a series interface with the vocal terminal.</Paragraph>
    <Paragraph position="1"> The isolated word recognition board we are using for the moment makes the user pause for approximately half a second between each word he pronounces. In the near future we plan to replace this module by a connected word system which will make the dialog more natural. It may be noted that the compactness of the understanding program allows its implantation on a microprocessor board which is to be inserted in the vocal terminal.</Paragraph>
    <Paragraph position="2"> At present we apply ourselves to make the dialog-handling module easily adaptable to various domains of application.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML