XML Viewer - h90-1103

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/h90-1103_metho.xml
Size: 66,022 bytes
Last Modified: 2025-10-06 14:12:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1103">
  <Title>Opportunities for Advanced Speech Processing in Military Computer-Based Systems*</Title>
  <Section position="3" start_page="433" end_page="433" type="metho">
    <SectionTitle>
2 Framework of Speech Tech-
</SectionTitle>
    <Paragraph position="0"> nologies and Military Application Areas An outline of speech technology areas which are of importance for military (and non-military) applications is presented in Table 1. All these areas are subjects for on-going research and development. Summaries of the technology are presented in a variety of textbooks and summary papers, and ongoing developments are presented at the annual ICASSP conferences and in other forums.</Paragraph>
    <Paragraph position="1"> Although the terms used in Table 1 are generally well known, they will be defined, as needed, later in this paper in the context of discussions of particular applications. null</Paragraph>
  </Section>
  <Section position="4" start_page="433" end_page="433" type="metho">
    <SectionTitle>
Applications
1. Speech Recognition
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
1.1 Isolated Word Recognition (IWR)
1.2 Continuous Speech Recognition (CSR)
1.3 Key-Word Recognition (KWR)
1.4 Speech Understanding (SU)
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="5" start_page="433" end_page="433" type="metho">
    <SectionTitle>
2. Speaker Recognition
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
2.1 Speaker Verification (SV)
2.2 Speaker Identification (SI)
2.3 Language Identification (LI)
</SectionTitle>
      <Paragraph position="0"> ing communication between people and computers. In the latter area, speech recognition and synthesis generally would serve as a part of a larger system designed to provide a natural user interface between a person and a computer. Tables 1 and 2 together serve as a framework for the remainder of this paper.</Paragraph>
      <Paragraph position="1">  1. Speech Communications (Speech Coding, Speech Enhancement)</Paragraph>
    </Section>
    <Section position="2" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
1.1 Secure Communications
1.2 Bandwidth Reduction
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="6" start_page="433" end_page="433" type="metho">
    <SectionTitle>
2. Speech Recognition Systems for Command and
</SectionTitle>
    <Paragraph position="0"> Control (C 2) (IWR, CSR, KWR, SU, Synthesis)</Paragraph>
    <Section position="1" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
2.1 Avionics
2.2 Battle Management
2.3 Resource and Data Base Management
2.4 Interface to Computer and Communication
Systems
</SectionTitle>
      <Paragraph position="0"> 3. Speech Recognition Systems for Training (IWR, CSR, SU, Synthesis) 4. Processing of Degraded Speech (Enhancement) 5. Security Access Control (SV) 3. Speech Coding and Digitization</Paragraph>
    </Section>
    <Section position="2" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
3.1 Waveform Coding
3.2 Source Coding Using Analysis/Synthesis
3.3 Vector Quantization (VQ)
3.4 Multiplexing
4. Speech Enhancement
4.1 Noise Reduction
4.2 Interference Reduction
4.3 Speech Transformations (Rate and Pitch)
4.4 Distortion Compensation
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="7" start_page="433" end_page="433" type="metho">
    <SectionTitle>
5. Speech Synthesis
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
5.1 Synthesis from Coded Speech
5.2 Synthesis from Text
</SectionTitle>
      <Paragraph position="0"> Table 2 outlines a number of key military speech application areas which will be addressed in more detail in this paper, and identifies the speech technology areas which are utilized for each application area. The areas in Table 2 generally divide into applications involving speech communication between people, and applications involv-</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="433" end_page="433" type="metho">
    <SectionTitle>
3 Relationship to Previous As-
</SectionTitle>
    <Paragraph position="0"> sessments of Military Applications of Speech Technology</Paragraph>
    <Section position="1" start_page="433" end_page="433" type="sub_section">
      <SectionTitle>
3.1 Beek, Neuburg, and Hodge (197&amp;quot;7)
</SectionTitle>
      <Paragraph position="0"> This paper \[10\] provided a comprehensive review of the state-of-the-art of speech technology and military applications as of 1977. It serves as a useful reference point for the present paper. The authors grouped potential military applications into four major categories: (1) security (including access control and surveillance); (2) command and control; (3) data transmission and communication; and (4) processing distorted speech. Other than the training application (which was mentioned briefly in the Beek, et al., paper), these categories cover all the application areas listed in Table 2.</Paragraph>
      <Paragraph position="1"> Most of the applications cited in the 1977 review were in the research and development stage, and a daunting list of unsolved problems was cited. Much progress has been made since 1977 in speech technology (both algorithms, and hardware implementations of these algorithms) and in applications. The following areas of progress are worthy of particular note:</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="433" end_page="435" type="metho">
    <SectionTitle>
1. Digital Narrowband Communication Sys-
</SectionTitle>
    <Paragraph position="0"> terns -- The Linear Predictive Coding (LPC) algorithm was relatively new in 1977. Improvements in technology and the coding algorithm have now led to widespread deployment of digital narrowband secure voice, especially by means of the STU-III (secure terminal unit) family of equipment at 2.4  kb/s. In addition, significant progress has been made in developing practical coders for lower rates using Vector Quantization (i.e., pattern matching) techniques.</Paragraph>
    <Paragraph position="1"> 2. Automatic Speech Recognition -- Major advances both in CSR and IWR have been made  largely through the widescale development of statistically-based Hidden Markov Model (HMM) techniques, as well as through the development and application of dynamic time warping (DTW) recognition techniques. ItMM techniques which were pioneered prior to 1977, have in recent years been further developed at a large number of laboratories, with significant advances both in recognition performance and in efficiency of implementation. A sampling of basic references on DTW and tIMM is provided by \[6,21,59,60,113\]. \[94\] provides a good overview of speech recognition technology, and has many useful references. A comprehensive bibliography on speech recognition has recently been pub- null lished \[53\].</Paragraph>
    <Paragraph position="2"> 3. Noise and Interference Reduction -- Work in  application of digital speech processing to noise and interference reduction was relatively new in 1977, and has progressed significantly since that time \[89\]. Hardware systems for speech enhancement have been developed \[28,153\] and have been shown to improve both speech readability and ASR performance under certain conditions of noise and interference. null</Paragraph>
    <Section position="1" start_page="434" end_page="434" type="sub_section">
      <SectionTitle>
3.2 Woodard and Cupples (1983)
</SectionTitle>
      <Paragraph position="0"> This paper \[159\] did not attempt a comprehensive review of the state-of-the-art, but instead described selected military applications in three areas: (1) voice input for command and control; (2) message sorting by voice; and (3) very-low-bit-rate voice communications.</Paragraph>
      <Paragraph position="1"> Current and future applications in the first and third areas will be discussed in some detail in the following discussions. For a general discussion of message sorting and surveillance, the reader is referred to \[159\].</Paragraph>
    </Section>
    <Section position="2" start_page="434" end_page="434" type="sub_section">
      <SectionTitle>
3.3 Other Reviews of Military Applica-
</SectionTitle>
      <Paragraph position="0"> tions of Speech Technology The 1984 National Research Council report by Flanagan, et al., \[38\] contains an excellent review of speech recognition system applications to data base management, command and control of weapons systems, and training; a categorization of applications is included, as well as a number of specific case studies. Beek and Vonusa (1983) \[14\] provide a general review of military applications of speech technology, with substantial updates from the 1977 Beck, et al., paper referred to above. An early, but comprehensive, assessment of potential military applications of speech understanding systems, is provided by Turn, et al., in 1974 \[143\]. The book by Lea \[68\] contains useful discussions on both military and non-military applications of speech recognition. Other applications overviews are presented in \[11,12,13\]. Taylor (1986) \[140\] provides a more updated review of avionics applications of speech technology. The Proceedings of Military Speech Technology Conferences (1986-1989) contain a substantial number of useful summaries of specific work in a variety of applications areas. A recent update on military applications of audio processing and speech recognition is provided in \[29\].</Paragraph>
    </Section>
    <Section position="3" start_page="434" end_page="435" type="sub_section">
      <SectionTitle>
3.4 The NATO Research Study Group
on Speech Processing
</SectionTitle>
      <Paragraph position="0"> The North Atlantic Treaty Organization (NATO) Research Study Group on Speech Processing (RSG10) \[87\], originally formed in 1977, has as one of its major continuing objectives the identification and analysis of potential military applications of advanced technology. In fact the preparation of this paper has been motivated by the author's association with RSG10 since 1986 as a &amp;quot;technical specialist&amp;quot; in the speech area; much useful information for the paper has been provided by other RSG10 members, or learned during RSG10 site visits to various laboratories in the member NATO countries.</Paragraph>
      <Paragraph position="1"> The RSG10 group has frequently been involved in the past in activities aimed at disseminating information about speech technology, and military applications in particular, to a wider community. For example, in 1983 the group participated in a NATO Advisory Group for Aerospace Research and Development (AGARD) lecture series on speech processing \[2\] which included a number of important papers on military applications of speech recognition \[14,20\]. A similar lecture series was conducted in 1990, and the papers in that series \[3\] represent an up-to-date overview of a number of important topics in speech analysis/synthesis and recognition systems for military applications.</Paragraph>
      <Paragraph position="2"> In another project, RSG10 established in 1983 a working group to look at the human factors aspects of voice input/output systems, which are clearly critical to military (or non-military) applications. A workshop on this subject took place in 1986, resulting in a comprehensive book \[141\] with papers representative of the state-of-the-art in research and applications in the area of multimodal person/machine dialogs including voice.</Paragraph>
      <Paragraph position="3"> In addition to its work in assessing speech technology and opportunities for military applications, RSG10 has continued to initiate and conduct a variety of cooperative international projects \[87\], particularly in the areas of speech recognition in adverse environments, speech data base collection, speech recognition performance assessment, and human factors issues in speech input/output  systems.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="435" end_page="438" type="metho">
    <SectionTitle>
4 Current Work in Development
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="435" end_page="435" type="sub_section">
      <SectionTitle>
of Military Applications of
Speech Technology
Introduction and Summary
</SectionTitle>
      <Paragraph position="0"> This section summarizes and assesses a representative sampling of current work in military applications of speech technology in the following areas: (1) narrowband (2400 b/s-4800 b/s) and low-bit-rate (50-1200 b/s) secure digital voice communications; (2) speech recognition systems in fighter aircraft, military helicopters, battle management, and air traffic control training; and (3) noise and interference suppression.</Paragraph>
    </Section>
    <Section position="2" start_page="435" end_page="435" type="sub_section">
      <SectionTitle>
4.2 Narrowband Secure Voice for Tacti-
cal Applications
</SectionTitle>
      <Paragraph position="0"> Most applications of narrowband voice coders at 2.4 kb/s (e.g., STU-III) have been in office environments where background acoustic noise and other environmental effects are not major problems. Operational military platforms such as fighter aircraft, helicopters, airborne command posts, and tanks, pose additional challenges since the performance of narrowband algorithms tend to be sensitive to noise and distortion both in talker and listener environments. However, substantial progress has been made in developing the voice algorithm, microphone, and system integration technology for tactical deployment of 2.4 kb/s voice. Examples include the Joint Tactical Information Distribution System (JTIDS) narrowband voice efforts in the U.S. \[123,142\] and in the U.K. \[125\], and the development of the Advanced Narrowband Digital Voice Terminal (ANDVT) family of equipment \[134\] for a variety of environments.</Paragraph>
    </Section>
    <Section position="3" start_page="435" end_page="435" type="sub_section">
      <SectionTitle>
4.1 Digital Narrowband Secure Voice -
</SectionTitle>
      <Paragraph position="0"> the STU-III Beck, et al., noted in 1977 \[10\] that &amp;quot;a massive effort is underway to develop and implement an all-digital (secure narrowband speech) communication system.&amp;quot; The development and widespread deployment of the STU-III as described by Nagengast \[88\] has brought this effort to fruition, and probably represents the single most significant operational military application of speech technology. The STU-III represents a marriage of a sophisticated speech algorithm, the Linear Predictive Coding (LPC) technique at 2.4 kb/s, with very large-scale integration (VLSI) digital signal processor (DSP) technology to allow development of a secure terminal which is small enough and low enough in cost to be widely used for secure voice communication over telephone circuits in the United States. It is worth noting that although the STU-III includes recent improvements in the LPC algorithm, the basic algorithm for 2.4 kb/s LPC has not changed significantly over the last ten years. The primary factor which has allowed its widespread application has been progress in VLSI technology.</Paragraph>
      <Paragraph position="1"> Although the 2.4 kb/s LPC algorithm in STU-III produces intelligible speech, it is not toll quality and current efforts are focussed on providing improved quality for secure voice, while maintaining the ability to transmit over standard telephone circuits. Modern technology has evolved to the point where 4.8 kb/s is now generally supportable over the dial network. Hence, recent efforts have focussed, with some success (see \[66\]) on the development of 4.8 kb/s voice coders with higher quality than LPC. Based on this work, the Code-Excited LPC (CELP) technique has been proposed as a standard for 4.8 kb/s secure voice communication \[24\]. CELP provides a better representation of the excitation signal, at the cost of a higher bit rate, than the traditional pitch and voiced/unvoiced excitation coding used in 2.4 kb/s LPC.</Paragraph>
    </Section>
    <Section position="4" start_page="435" end_page="436" type="sub_section">
      <SectionTitle>
4.3 Low-Bit-Rate (50-1200 b/s) Voice
Communications
</SectionTitle>
      <Paragraph position="0"> Significant advances both in speech algorithms and in VLSI technology have greatly enhanced the feasibility of intelligible, practical digital voice communication at low bit rates (i.e., _&lt; 1200 b/s). These coders should have important applicability in a variety of strategic and tactical systems, where channel bandwidth may be extremely limited. Developments in four bit rate ranges are of interest, and will be summarized briefly here. First, it has been demonstrated that frame-fill techniques \[16,85,98\] can be used very effectively to reduce a 2.4 kb/s algorithm to 1.2 kb/s operation, with little loss in speech performance, and little added complexity. Secondly, to enter the 600-800 b/s range, Vector Quantization (VQ) techniques have been successfully developed \[78\] which use pattern matching to reduce bit rate. The performance of these VQ systems tends to be sensitive to the speaker on which the patterns are trained, and adaptive training techniques \[99\] have been developed which effectively adapt the codebook of patterns to the speaker in real time. The third bit rate range of interest is 200-400 b/s. Here, segment vocoder \[120\] and matrix vocoder techniques which use pattern matching over longer intervals (typically, 100 ms) have been developed. Although quality and intelligibility of these systems are marginal, practical real-time implementations are now feasible, and vocoders at this rate may be useful in selected applications where bandwidth is very limited \[49\].</Paragraph>
      <Paragraph position="1"> Finally, to achieve even lower voice bit rates (say, 50 b/s) it would be necessary to use speech recognition and synthesis techniques, with a restriction on vocabulary.</Paragraph>
      <Paragraph position="2"> These systems may be useful in selected applications \[63\] such as transmission of stereotyped reports from a forward observer post. Recognition/synthesis techniques may also be useful for two-way communication in situations where bandwidth is very limited in one direction, but where real-time voice is possible (say, at 1200  b/s) in the other direction \[39\], allowing confirmation of the correctness of the transmissions which use recognition/synthesis. null</Paragraph>
    </Section>
    <Section position="5" start_page="436" end_page="436" type="sub_section">
      <SectionTitle>
4.4 Voice/Data Integration in Computer
Networks
</SectionTitle>
      <Paragraph position="0"> The widespread development of computer networks using packet switching technology has opened opportunities for a variety of applications of speech technology, including: packet voice communications \[46,148\] with efficient sharing of network resources for voice and data; advanced intelligent terminals \[39,104\] with multi-media communications; multi-media conferencing \[104\]; and voice control of resources and services (such as voice mail) in computer networks \[62,106\]. Since data communications using packet systems is becoming widely used in military systems, integration of voice and data on these networks provides significant advantages. Applicable technologies are speech coding, speech recognition, speech synthesis, and multiplexing techniques including (see \[148\]) Time-Assigned Speech Interpolation (TASI), which take advantage of the bursty nature of speech communications.</Paragraph>
    </Section>
    <Section position="6" start_page="436" end_page="436" type="sub_section">
      <SectionTitle>
4.5 Speech Recognition Systems in
High-Performance Fighter Aircraft
</SectionTitle>
      <Paragraph position="0"> The pilot in a high-performance military aircraft operates in a heavy workload environment, where hands and eyes are busy and speech recognition could be of significant advantage. For example, the pilot could use a speech recognizer to set a radio frequency or to choose a weapon, without moving his hands or bringing his gaze inside the cockpit. This would allow the pilot to concentrate more effectively on flying the airplane in combat situations. The potential improvement in pilot effectiveness could be extremely significant in critical situations. In view of the above, substantial efforts have been devoted over recent years to test and evaluate speech recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft \[55,118,119,121,122,154,158\], the program in France on installing speech recognition systems on Mirage aircraft \[81,136,137,138\], and programs in the U.K. dealing with a variety of aircraft platforms \[9,27,41,43,75,128,139,156, 157\]. In these programs, speech recognizers have been operated successfully in fighter aircraft. Applications have included: setting radio frequencies, commanding an autopilot system, setting steerpoint coordinates and weapons release parameters, and controlling flight displays. Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system.</Paragraph>
      <Paragraph position="1"> An excellent description of the roles and limitations of speech recognition systems in fighters, from the user's (i.e., the pilot's) point of view has been presented by AFTI/F-16 pilot Major John Howard \[55\]. Several points are worth noting here:  1. speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently; 2. achievement of very high recognition accuracy (say, 95% or more) was the most critical factor for making the speech recognition system useful - with lower recognition rates, pilots would not use the system; 3. more natural vocabulary and grammar, and shorter  training times would be useful, but only if very high recognition rates could be maintained.</Paragraph>
      <Paragraph position="2"> With respect to the first point above, the most encouraging result was that for some of the pilots (those for which high recognition rate was achieved), noticeable improvements in overall task performance were achieved with speech recognition for air-to-air tracking and for low-level navigation. A key goal (emphasized by the second point above) is to improve the recognition technology to make these improvements more consistent. Recent laboratory research in robust speech recognition for military environments \[100,101,114\] has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft. With respect to the development of vocabularies and grammars which will be well matched to the pilot's needs, a study at the U.S. Air Force Wright-Patterson Avionics Laboratory \[76,77\] obtained a great deal of useful data by having pilots conduct dialogs with a simulated speech recognition-based system, using mission scenarious simulated in the laboratories. Other discussions of human factors and speech recognition requirements in the cockpit are provided in \[15,126\].</Paragraph>
    </Section>
    <Section position="7" start_page="436" end_page="437" type="sub_section">
      <SectionTitle>
4.6 Speech Recognition Systems in He-
licopter Environments
</SectionTitle>
      <Paragraph position="0"> The opportunities for speech recognition systems to improve pilot performance in military helicopters are similar to those in fighter aircraft. In a hands-busy, eyesbusy, heavy workload situation, speech recognition (as well as speech synthesis) could be of significant benefit to the pilot. Of course, the problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone.</Paragraph>
      <Paragraph position="1"> Substantial test and evaluation programs have been carried out in recent years in speech recognition systems applications in helicopters, notably by the U.S.</Paragraph>
      <Paragraph position="2"> Army Avionics Research and Development Activity (AVRADA) \[51,105,115,135,155\] and by the Royal Aerospace Establishment (RAn) in the UK \[42,74,140\].</Paragraph>
      <Paragraph position="3"> The program in France has included speech recognition  in the Puma helicopter \[138\]. Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system (ATHS) which formats and sends air-air and air-ground messages, and has required a great deal of keyboard entry. null As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, where it was found \[135\] that pilots were generally able to run a prescribed course faster and more accurately when speech recognition for radio control was provided. However, these results represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition \[80\] and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.</Paragraph>
    </Section>
    <Section position="8" start_page="437" end_page="437" type="sub_section">
      <SectionTitle>
4.7 Speech Recognition Systems in Bat-
tle Management
</SectionTitle>
      <Paragraph position="0"> Battle management command centers generally require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in display format. Human-machine interaction by voice has the potentiM to be very useful in these environments. A number of efforts have been undertaken to interface commerciMlyavailable isolated-word recognizers into battle management environments. For example Hale \[47\] describes the use of a limited vocabulary recognizer for voice recognition control of a weapons control workstation in a command and control laboratory. Although the system capability was limited, the users reported that the voice recognition provided potential convenience in avoiding the need to redirect eyes between screen and keyboard.</Paragraph>
      <Paragraph position="1"> In another feasibility study \[107\], speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. Again, users were very optimistic about the potential of the system, although capabilities were limited. Another limited application of speech recognition in naval battle management is described in \[103\].</Paragraph>
      <Paragraph position="2"> Clearly, battle management applications of speech recognition systems have high potential; but in order to fully realize this potential, a much more natural speech interface (continuous speech, natural grammar) is needed. The current speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focussed on this problem in the context of a naval resource management task.</Paragraph>
      <Paragraph position="3"> Speech recognition efforts have focussed on a continuousspeech, large-vocabulary database \[108\] which is designed to be representative of the naval resource management task. Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focussed on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system. Much of this work is described in the Proceedings of recent DARPA Speech Recognition and Natural Language Workshops \[31,32,33\].</Paragraph>
    </Section>
    <Section position="9" start_page="437" end_page="437" type="sub_section">
      <SectionTitle>
4.8 Training of Air Trafflc Controllers
</SectionTitle>
      <Paragraph position="0"> Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a &amp;quot;pseudo-pilot&amp;quot;, engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation.</Paragraph>
      <Paragraph position="1"> Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel \[20,48,50,127\]. Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task. The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition. An excellent overview of this work is presented in \[20\], and further discussion of the results is presented in \[38\]. Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system.</Paragraph>
      <Paragraph position="2"> However, the prototype training systems demonstrated a significant potential for voice interaction in these systems, and in other training applications. The U.S. Navy is currently sponsoring a large-scale effort in ATC training systems \[127\], where a commercial speech recognition unit \[146\] is being integrated with a complex training system including displays and scenario creation. Although the recognizer is constrained in vocabulary, one of the goals of the training programs is to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task. Recent research in France on application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task-domain grammar constraints, is described in \[82,83,84,91\]. In addition to the training application, speech recognition has a variety of other potential applications in ATC systems, as described, for example, in \[1\].</Paragraph>
    </Section>
    <Section position="10" start_page="437" end_page="438" type="sub_section">
      <SectionTitle>
4.9 Removal of Noise from Noise-
Degraded Speech Signals
</SectionTitle>
      <Paragraph position="0"> There are a variety of military and non-military applications where removal of noise and interference from speech signals is important, and a significant amount of work continues to be devoted to this area, both in technology development and in applications. A good summary of the field, with reprints of many important papers, is provided in Lim's book \[73\]. More recently, a 1989 National Research Council study \[89\] summarizes  the state-of-the-art in noise removal. Application areas identified in the study include: (1) two-way communication by voice; (2) transcription of a single, important recording; and (3) transcription of quantities of recorded material. The focus of the study is on speech processing to aid the human listener. The panel concluded that, although some noise reduction methods appear to improve speech quality in noise, intelligibility improvements had not been demonstrated using closed response tests such as the Diagnostic Rhyme Test (DRT). The committee recommended further research both on noise reduction algorithm development and on new testing procedures to assess not only intelligibility, but also to assess speech quality, fatigue, workload, and mental effort.</Paragraph>
      <Paragraph position="1"> In the area of noise removal, a sustained and successful effort has been sponsored by the Rome Air Development Center \[28,152,153,159\], which has led to the development of a fieldable development model called the Speech Enhancement Unit (SEU). The SEU has been tested under various realistic noise and interference conditions, and improvements in speech readability have been noted, as well as apparent reduction in operator fatigue.</Paragraph>
      <Paragraph position="2"> Prior to any attempts at digital processing for noise removal, it is clearly desirable to apply the most effective possible microphone technology to reduce the noise in the input to the digital system. The effectiveness of standard noise-cancelling microphones is discussed briefly in \[142\]. Multi-microphone techniques for noise reduction have been the subject of much recent work; \[147\] and \[116\] present examples of the work in this area.</Paragraph>
      <Paragraph position="3"> In combatting noise and interference in speech broadcast and communication systems, it may be necessary and appropriate in certain situations to process the signal before transmission rather than after reception. Recent work in this area, directed at improving listenability and range in a broadcast system, is described in \[110,112\].</Paragraph>
      <Paragraph position="4"> Finally, recent work has also been successfully directed at advanced headphone technology \[19,44\] to reduce the noise in the ears of a listener in a high-noise environment. This work has important potential application for speech communications in military environments such as fighter cockpits.</Paragraph>
    </Section>
    <Section position="11" start_page="438" end_page="438" type="sub_section">
      <SectionTitle>
4.10 Speaker Recognition and Speaker
Verification
</SectionTitle>
      <Paragraph position="0"> Automatic speech processing techniques for identification of people from their voice characteristics have a number of military and non-military applications, which are summarized in \[10\] and in \[35\]. These applications include: (1) security, where the task is to verify the identity of an individual (e.g., for control of access to a restricted facility), and where the subject can often be instructed to speak a required phrase (this is referred to as &amp;quot;text-dependent&amp;quot; speaker verification); (2) surveillance of communciation channels \[10,35\], where the task is to identify a speaker from samples of unconstrained text (&amp;quot;text-independent&amp;quot; speaker recognition); and (3) forensic applications, which can involve either a recognition or verification task, but where control over the available speech sample is often limited, and the potential number of impostors (i.e., for a verification task) may be very large. Among the above applications, security applications will yield the best speaker recognition performance because of the cooperative user and controlled conditions. A case study of a reasonably successful operational speaker verification system, which has been used to control physical access into a corporate computer center, is described in \[35\], which points out some of the key problems and solutions in making a successful operational system. Current research efforts in speaker recognition are generally being directed toward the more difficult text-independent speaker recognition problem \[35,45,117\], with a goal of high performance under conditions of noise and channel distortion.</Paragraph>
    </Section>
    <Section position="12" start_page="438" end_page="438" type="sub_section">
      <SectionTitle>
4.11 Evaluation of Speech Processing
Systems
</SectionTitle>
      <Paragraph position="0"> Careful assessment of speech communication systems, speech synthesis systems, and speech recognition systems, using standard data bases and quantitative evaluation measures, is clearly essential for making progress in speech technology for military or non-military applications. Much attention has been directed at the assessment problem in recent years, and an extensive discussion is beyond the scope of this paper. However, the reader is referred to \[109\] for a comprehensive overview of speech quality assessment, and to \[132\] for a recent review of evaluation efforts in both speech communication and recognition systems.</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="438" end_page="443" type="metho">
    <SectionTitle>
5 Opportunities
</SectionTitle>
    <Paragraph position="0"> for Advanced Military Applications of Speech Technology</Paragraph>
    <Section position="1" start_page="438" end_page="438" type="sub_section">
      <SectionTitle>
5.1 Introduction and Summary
</SectionTitle>
      <Paragraph position="0"> In this section, opportunities for advanced military applications of speech technology are identified by means of descriptions of several generic systems which would be possible with advances in speech technology and in system integration. These generic systems include: (1) integrated multi-rate voice/data communications terminal; (2) interactive speech enhancement system; (3) voice-controlled pilot's associate system; (4) advanced air traffic control training system; (5) battle management command and control support system with spoken natural language interface; and (6) spoken language translation system.</Paragraph>
    </Section>
    <Section position="2" start_page="438" end_page="439" type="sub_section">
      <SectionTitle>
5.2 Integrated Multi-Rate Voice/Data
Communications Terminal
</SectionTitle>
      <Paragraph position="0"> Advanced speech processing will play a very important role in meeting the multiple and time-varying commu- null nications needs of military users. For example, a commander in a fixed or mobile command center will require communication over a variety of networks at a variety of conditions of stress on the networks. An integrated, multi-rate voice/data terminal \[39\] could be developed to support the commander's needs under normal and stressed conditions as follows: (1) under normal conditions, the terminal would provide secure digital voice, low-rate digital video, and graphics; (2) under heavily stressed conditions with network jamming and damage, the terminal would be limited to stylized data messages; (3) under more favorable but degraded network conditions, more interactive communications would be provided, including very-low-rate secure voice using speech recognition and synthesis techniques.</Paragraph>
      <Paragraph position="1"> A sketch of the commander's terminal is shown in Figure 1. The potential roles of advanced speech processing include: (1) a variable-rate coder capable of rates from 50-9600 b/s, depending on network conditions (higher rates, with the attendant higher quality, would be used when conditions permit) and connectivity requirements; and (2) use of speech recognition as an alternate to the keyboard for control of the terminal modes and displays, and for selection or composition of data messages to be transmitted.</Paragraph>
      <Paragraph position="2">  Variable-rate voice coding, including recognition/synthesis, would also be useful in a scaled-down, very compact terminal for field operations \[129\] (e.g., by a forward observer in a tactical environment). A sketch indicating this application is shown in Figure 2. The requirements for voice processing are similar to those for the commander's terminal but with a greater emphasis on reduction of size, weight, and power.</Paragraph>
      <Paragraph position="3"> The current speech coding technologies discussed above will have to be extended, integrated, and implemented in compact hardware to provide integrated multi-rate terminals for future military communications  needs. In the 200-800 b/s rate range, algorithm and implementation efforts are needed to provide speech coders with good performance. At lower bit rates, improvements to recognition techniques, as well as effective integration of recognition into the communications environment, are needed. As an example, Figure 3 illustrates a concept for recognition-based speech communication in a situation where the two-way link capacities are asymmetric. Here, the outgoing link from one user (e.g., a forward observer with a portable terminal, or an airborne user) might only be able to support rates of 100 b/s or below, while real-time voice (say, at 2400 b/s) is possible in the other direction. The possibility of confirming the recognition/synthesis transmission by means of real-time voice transmission in the reverse direction offers the potential for effective voice communication with disadvantaged links. The development and test of an asymmetric voice coding system of this type could lead to an important military application of advanced speech processing technology.</Paragraph>
    </Section>
    <Section position="3" start_page="439" end_page="439" type="sub_section">
      <SectionTitle>
5.3 Interactive Speech Enhancement
Workstation
</SectionTitle>
      <Paragraph position="0"> Advances in speech enhancement technology, coupled with the growing availability of high-performance graphics workstations and signal processing hardware, offer the opportunity for the development of an advanced, interactive speech enhancement workstation with multiple military applications. Such a system, as depicted in Figure 4, would include: (1) real-time speech I/O, including the capability for simultaneous handling of inputs from multiple microphones or sensors \[17,37,116,147\]; (2) high capacity digital speech storage and playback facilities;  software routines, each capable of operating on real-time speech input or on speech from a digital file; and (4) a user interface providing flexible display, playback, and labelling facilities for speech waveforms, spectra, and parameters. null  A primary application for such a workstation would be as a listening and transcription aid for degraded speech. As described in \[89\], two general classes of transcription tasks can be identified: (1) transcription of large quantities of recorded material (such as public broadcasts, or the monitoring of critical telephone lines in a nuclear power station); and (2) transcription of single, important, and often very degraded recording (such as from a cockpit voice recorder after a crash, or forensic material obtained by a law enforcement agency). The interactive speech enhancement system could be used for either of these transcription tasks, as well as for enhanced listening to real-time speech when transcription is not required.</Paragraph>
      <Paragraph position="1"> A great deal of speech enhancement algorithm technology, which would be applicable in such a interactive workstation, has already been developed \[28,73,89,152, 153\]; and integrating the available algorithms to operate in real-time under flexible user control would be an important development effort.</Paragraph>
      <Paragraph position="2"> In addition, there is much to do in the further development of noise and interference reduction, particularly in situations where the interference includes co-channel speech (see, e.g., \[90,133,161\]).</Paragraph>
      <Paragraph position="3"> The advanced interactive speech enhancement workstation represents an important application of advancing speech technology. This work would build on ongoing technology and system efforts, most specifically on the pioneering and ongoing work sponsored by RADC on the speech enhancement unit \[28,152\], which has included both algorithm development and real-time system implementation using VLSI technology.</Paragraph>
    </Section>
    <Section position="4" start_page="439" end_page="441" type="sub_section">
      <SectionTitle>
5.4 Voice-Controlled Pilot's Associate
System
</SectionTitle>
      <Paragraph position="0"> Pilots in combat face an overwhelming quantity of incoming data or communications on which they must base life or death decisions. In addition, they are faced with the need to control dozens of switches, buttons, and knobs to handle the multiple avionics functions in a modern military airplane cockpit. Especially for the case of a single-seater military aircraft, substantial benefit could be achieved through the development of a voice-controlled &amp;quot;pilot's associate&amp;quot;, which reduces the pilot's workload, assisting the pilot in controlling avionics system and in keeping track of his changing environment.</Paragraph>
      <Paragraph position="1"> The concept of the pilot's associate was developed as part of the planning for the DARPA Strategic Computing Program \[30\], as a paradigm for the development of intelligent &amp;quot;personal associate&amp;quot; systems which could have significant benefits in a variety of human-controlled, complex, military systems.</Paragraph>
      <Paragraph position="2"> The pilot's associate would ultimately consist of an ensemble of real-time natural interface system and expert knowledge-based systems. Figure 5 illustrates a concept for an evolving pilot's associate system, which would initially provide a single set of control aids to the pilot, and would evolve to provide a growing set of more complex, knowledge-based functions. In its simplest form, the pilot's associate would include the capability for the pilot to control routine tasks by voice. The efforts described in earlier sections on speech recognition in the cockpit will have to be extended to make speech recognition reliable and useful in the cockpit in order to support functions such as setting radio frequencies, setting navigation systems, or selecting weapons systems.</Paragraph>
      <Paragraph position="3"> In its advanced form, the pilot's associate would assist</Paragraph>
      <Paragraph position="5"> plans based on the current mission situation. The development of knowledge-based systems to support such tasks presents very difficult challenges, only one of which is an upgrade of the speech recognition interface to improve the naturalness and robustness of pilot interaction with the system.</Paragraph>
      <Paragraph position="6"> The pilot's associate represents both an opportunity and a challenge for advanced computing technology in general and for speech technology in particular. The requirement for real-time operation under stressed conditions is particularly demanding for both knowledge-based information monitoring and planning systems, and for the speech interface to the pilot.</Paragraph>
    </Section>
    <Section position="5" start_page="441" end_page="443" type="sub_section">
      <SectionTitle>
5.5 Advanced Air Traffic Control Train-
ing System
</SectionTitle>
      <Paragraph position="0"> Automated training systems can use computer speech recognition and generation to expedite training and to reduce the load on training personnel in a variety of applications. Speech recognition and synthesis would be very helpful in hands-busy, eyes-busy training situations, for example in training personnel to maintain complex mechanical equipment. Here the individual could request information from an &amp;quot;automated instruction manual&amp;quot; while continuing to carry on a manual task, and while maintaining his view of the equipment (e.g., a complex jet engine). However, as suggested in Section 4.8, voice-interactive systems are perhaps most attractive for training in tasks which require voice communication as an integral part of the operational task, such as air traffic control (ATC).</Paragraph>
      <Paragraph position="1"> Previous efforts in the application of speech technology in ATC training systems have achieved only limited success \[20,48,93\], but advances in speech technology, simulation technology, expert systems for automated instruction, and performance measurement offer significant potential for major advances in ATC training systems. A generic voice-interactive ATC training system is shown in Figure 6. This particular block diagram was originally drawn to represent a Precision Approach Radar Training System \[20\], but similar structures would apply to other training scenarios such as air intercept control.</Paragraph>
      <Paragraph position="3"> The combination of voice-interactive technologies with simulation, environment modelling, and performance measurement has the potential to eliminate the need for a &amp;quot;pseudo-pilot&amp;quot; instructor to interact one-on-one with each student. Automated training has the further advantages of standardizing instruction and of capturing the expertise of the best instructors in the simulated training scenarios. In addition, as new automation capabilities in ATC impose new tasks on the controller (e.g.,\[67\]), the automated training system could be updated to capture the knowledge of human experts in developing training scenarios which utilize voice-interactive pseudo-pilots.</Paragraph>
      <Paragraph position="4"> In the speech technology area, a number of advances will be needed to make an advanced ATC training system effective. Since the controllers are expected to speak in a constrained, stylized language, fully natural speech understanding is not required. However, since controllers will stray from the constraints, it is essential that the recognition system be able to cope effectively with deviations from the constrained vocabulary and grammar.</Paragraph>
      <Paragraph position="5"> At a minimum, recognition of the deviation and request to the trainee to rephrase his speech input would be needed. Even more desirable would be a system with adaptive training, which learns to extend its vocabu- null lary and grammar based on the trainee's speech to perform correct recognition on an increasing percentage of each trainee's utterances. Adaptive machine learning techniques also offer significant potential in the overall training system, for example in selecting and developing training scenarios which are well-matched to the progress of each ATC trainee.</Paragraph>
      <Paragraph position="6"> In summary, the application of speech technology to ATC training is an area of high current interest \[18\] and significant future potential. In addition, speech recognition and synthesis may have important application in a large variety of intelligent training systems \[70\], where the computer system effectively simulates a &amp;quot;tutor&amp;quot;, communicating with the student in as natural a manner as possible.</Paragraph>
      <Paragraph position="7">  The application of natural spoken language interfaces in C 2 systems, including battle management, has been viewed for many years as a long-term goal of speech understanding research including the DARPA speech understanding program in the 1970's \[143\] and more recent efforts including the DARPA Strategic Computing Program \[8,30,33,38,65,96,150\]. Some current and previous efforts in this area were noted in an earlier section of this paper. Much remains to be done both in spoken language interface research and in the development of associated support systems and knowledge-based expert systems to support C u users.</Paragraph>
      <Paragraph position="8"> Figure 7 shows a sketch of a system for C 2 battle management with a spoken natural language interface. The generic system structure could be applied to a large variety of C 2 scenarios \[143\] including tactical, strategic, and logistics systems; considerable effort over the past few years has been devoted to the application of Naval battle management, under the Fleet Command and Control Battle Management Program (FCCBMP) (see,e.g., \[8,34,65,96,150\]). There are numerous challenges to be addressed in developing a C 2 support system with a spoken natural language interface, which include: 1. Techniques for query and management of a large database by spoken natural language must be developed. For the case of FCCBMP, efforts in this area have included: development of the Naval resource management task domain \[108\], speech understanding work directed at this task domain \[31,32,33,34\], and porting of natural language interfaces to data base management task for the Naval data base \[8\].</Paragraph>
      <Paragraph position="9">  2. Intelligent expert systems for planning and decision support in the battle management task domain must be developed \[65\].</Paragraph>
      <Paragraph position="10"> 3. The spoken natural language interface must be extended to interact with these complex expert sys- null a spoken natural language interface. The system includes both relatively simple database retrieval and data entry functions, and more complex expert system aids for battle planning and management. For both classes of functions, the development of a natural spoken language interface represents a considerable challenge, requiring large-vocabulary, natural-grammar speech understanding null 4. The speech interface must be combined with other user-interface modalities including graphics, text, and pointing \[96\].</Paragraph>
      <Paragraph position="11"> It is worth emphasizing that although C ~ systems represent an important opportunity for advanced speech processing in military systems, speech technology development is only one component of the challenge in advanced C 2 support systems.</Paragraph>
      <Paragraph position="12"> Meeting this challenge will require long-term future efforts in speech technology, natural language technology, intelligent system technology, and in system integration. Fortunately, it is not necessary to solve all the problems at once, and a phased approach is possible. For example, initial efforts might involve speech interface to a C ~ data base management system only (not to the analysis and planning system); the user could initially be required to speak with a constrained vocabulary and grammar while research proceeds on understanding of spoken natural language. Useful aids to commanders and other system users could be provided with the data base management capability only, while work continues on the development and application of the intelligent system technology for the analysis and planning functions needed to provide additional decision aids to the C 2 user.</Paragraph>
    </Section>
    <Section position="6" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
5.7 Spoken Language Translation Sys-
tem
</SectionTitle>
      <Paragraph position="0"> Automatic translation of spoken natural language certainly represents one of the &amp;quot;grand challenges&amp;quot; \[79\] of speech and natural language technology, as well as a long-term opportunity for advanced speech technology.</Paragraph>
      <Paragraph position="1"> Applications of military relevance include: automatic interpreters for multi-language meetings, NATO field communications, a translating telephone, and translation for.</Paragraph>
      <Paragraph position="2"> cooperative space exploration activities. The impact of automated spoken language translation would clearly be enormous; however, the problem is considerably more difficult than either voice-operated natural language dictation machines or machine translation of text; both of which are unsolved problems requiring much future research. It should be noted, however, that progress continues to be made in dictation systems \[7,61\]; and new initiatives in machine translation of text are being proposed and developed \[54\], including application of the powerful statistical techniques \[23\] which have been successful in speech recognition.</Paragraph>
      <Paragraph position="3"> algorithm toward toll quality speech across a variety of conditions.</Paragraph>
      <Paragraph position="4"> At lower rates (i.e., &lt; 800 b/s), improvements in vector quantization \[78\] and recognition-oriented techniques are needed to make systems effective for general use.</Paragraph>
    </Section>
    <Section position="7" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
6.3 Noise and Interference Suppression
</SectionTitle>
      <Paragraph position="0"> The state-of-the-art in noise suppression is summarized in \[89\], which identifies a number of areas for further work in both algorithm development and in evaluation methods. In terms of recent approaches, a variety of combinations of recognition and noise suppression algorithms appear promising \[95,36,111\]. The suppression of co-channel talker interference is an even more difficult problem than noise suppression \[161,90,133\], and much work is needed to achieve effective suppression. Following the theme of integration of algorithm technologies, recent work has begun to apply speaker recognition technology to the co-channel interference suppression problem \[161,162\].</Paragraph>
    </Section>
  </Section>
  <Section position="12" start_page="443" end_page="444" type="metho">
    <SectionTitle>
6 Problem Areas for Research
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
6.1 Introduction and Summary
</SectionTitle>
      <Paragraph position="0"> The Beek, Neuburg, IIodge paper of 1977 \[10\] concludes with an impressive list of unsolved problems, particularly in the area of automatic speech recognition. The current situation can (perhaps aptly) be summarized by adapting a popular phrase: &amp;quot;You've come a long way, baby, but you've still got a long way to go!&amp;quot;. Despite all the progress, much research remains to be done before large-vocabulary continuous speech recognition crosses a threshold of performance sufficient for common use in applications. In other speech technology areas, some real military applications are either at hand or close, but still further research and development efforts are needed to achieve sufficient performance for many other applications. null This section briefly identifies a number of problem areas for research, with a focus on directing attention to references where problems or progress are described in more detail.</Paragraph>
      <Paragraph position="1"> A theme sometimes observed in current work, which appears likely to produce significant progress and should be encouraged, is the integration of speech algorithm technologies. For example, speech recognition techniques are applied to speech coding to achieve lower bit rates; and speaker recognition techniques may be integrated with speech coders or speech recognizers to improve robustness of performance across different speakers. null</Paragraph>
    </Section>
    <Section position="2" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
6.2 Low-Rate Speech Coding
</SectionTitle>
      <Paragraph position="0"> Many of the problem areas in low-rate speech coding have already been summarized in earlier sections. At 2.4 kb/s there is a need to move beyond the LPC-10</Paragraph>
    </Section>
    <Section position="3" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
6.4 Speech Recognition in Severe Envi-
</SectionTitle>
      <Paragraph position="0"> ronments Prior sections have pointed out both the difficulties and the potential benefits of achieving robust, high-performance speech recognition in severe environments such as fighter aircraft or military helicopters. The National Research Council study \[38\] report summarizes both the state-of-the-art and research needed for automatic speech recognition in severe environments, as of 1984. Substantial progress has been made since that time, particularly in system development and evaluation on databases of speech collected under stress and noise \[114\], application of HMM techniques to robust speech recognition \[92,100,101,149\], and in acoustic-phonetic analysis and compensation for effects of stress and noise \[64,130,131\]. A number of recent efforts have focussed specifically on compensating for acoustic noise in the tIMM recognizer \[57,58,144,145\]. However, this work has generally been performed for severe conditions which are simulated in the laboratory, and has achieved best performance for isolated-word recognition. Much work remains to achieve high-performance, continuous speech recognition under severe operational conditions; an essential, though costly, requirement for achieving progress in this area is a continuing program of data collection and speech recognizer testing in real (e.g., fighter or helicopter) military environments.</Paragraph>
    </Section>
    <Section position="4" start_page="443" end_page="444" type="sub_section">
      <SectionTitle>
6.5 Large-Vocabulary Continuous
Speech Recognition
</SectionTitle>
      <Paragraph position="0"> There has been a great deal of effort and much progress in the area of large-vocabulary continuous speech in recent years \[5,31,32,33,34\]. But substantial improvements in performance are still needed before such systems achieve high enough accuracy to be usable in practical applications \[79\]. For example, a February 1989  evaluation \[97\] of a number of state-of-the-art systems on the 1000-word, perplexity-60 DARPA resource management task yielded the following best results:  1. speaker-dependent: word error rate 3.1%, sentence error rate 21.0%; 2. speaker-independent: word error rate 6.1%, sen null tence error rate 34.3%.</Paragraph>
      <Paragraph position="1"> (Perplexity is a measure of the recognition task difficulty, and is defined as the probabilistically-weighted geometric mean branching factor of the language (see, e.g., \[69\], pp. 145-146)). For a 5000-word, perplexity-93 task, recent systems have achieved a speaker-dependent word error rate of 11.0% \[5\]. For an aggressive (but not unrealistic for applications requirements) goal, such as 95% speaker-independent sentence recognition for a 5,000-word vocabulary system, it is clear that an order-of-magnitude improvement in word error rate is needed. Some potential sources of improvement, where research is needed, include \[79\] better signal representation, better modelling of linguistic units, and better parameter estimation. Recent efforts in phonetic classification using neural networks \[71\], and in combining neural network pattern classifiers with HMM techniques \[22,40,56,72,160\] offer potential promise in these areas. Additional system performance improvements should be achieved by improved language modelling, and by the integration of speech recognition and natural language processing systems \[34\], as discussed further below.</Paragraph>
    </Section>
    <Section position="5" start_page="444" end_page="444" type="sub_section">
      <SectionTitle>
6.6 Natural Language Processing Tech-
</SectionTitle>
      <Paragraph position="0"> nology Advanced systems involving interaction of people with computers using spoken language will clearly require substantial advances in natural language processing. A summary of the state-of-the-art, major challenges, and research opportunities in natural language processing is presented in \[151\]. A major &amp;quot;grand challenge&amp;quot; cited for natural language processing is support for natural, interactive dialogue; this challenge, which must be addressed even for textual natural language input, is clearly a pressing and more difficult challenge for spoken natural language input.</Paragraph>
    </Section>
    <Section position="6" start_page="444" end_page="444" type="sub_section">
      <SectionTitle>
6.7 Integration of Speech Recognition
and Natural Language Processing
Systems
</SectionTitle>
      <Paragraph position="0"> The integration of speech recognition and natural language processing systems to form interactive spoken language systems is a key focus for current \[34\] and future \[79,151\] research, needed to develop practical systems for complex military (and other) applications. Specific recent efforts which indicate promising directions for research and development in integration of speech recognition and natural language processing are described in \[86,102,124,163\].</Paragraph>
    </Section>
    <Section position="7" start_page="444" end_page="444" type="sub_section">
      <SectionTitle>
6.8 Human Factors
</SectionTitle>
      <Paragraph position="0"> Work on human factors integration is an often neglected, but crucial, element which is necessary to realize the benefits of speech technology. Major areas of concern, which are summarized in \[38\], include task selection, dialogue design, user characteristics, speech display design, task environment, and overall system performance assessment. Recent progress and research directions in the specific area of person/machine dialogues are covered rather broadly in \[141\].</Paragraph>
      <Paragraph position="1"> An excellent controlled study in the use of voice dialogs to accomplish specific tasks, which illustrates the effectiveness of voice as compared with other modalities (e.g., typing, writing), is described in \[25\]. Experiments indicating the capability of people to restrict themselves to limited vocabularies in task-oriented dialogues are described in \[26\]. In general, more attention is needed to human factors and dialogue issues, and speech system developers can benefit significantly from the results of studies such as these, and from other studies cited in \[141\].</Paragraph>
    </Section>
    <Section position="8" start_page="444" end_page="444" type="sub_section">
      <SectionTitle>
6.9 Speaker Recognition and Verifica-
</SectionTitle>
      <Paragraph position="0"> tion An excellent summary of the state-of-the-art and of applications in speaker recognition and verification, as of 1985, is provided in \[35\]. Some promising recent efforts in the challenging problem of text-independent speaker recognition are described in \[45,117\]. An additional important current (and projected future) research thrust \[79\] is the application of speaker recognition techniques to adapt speech recognizers to speaker characteristics, and hence to improve speech recognition performance.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML