File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1057_metho.xml
Size: 8,838 bytes
Last Modified: 2025-10-06 14:13:49
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1057"> <Title>One DODer's View of ARPA Spoken Language Directions</Title> <Section position="2" start_page="0" end_page="297" type="metho"> <SectionTitle> 2. Tech Transfer vs Research </SectionTitle> <Paragraph position="0"> As we have long recognized, there are several ways to improve performance on a task. One is by improving the underlying science and algorithms. Another is taking advantage of natural structural constraints that are common to many tasks. But equally important is tailoring the system to use constraints and biases unique to the specific task. To clarify this view, consider an application like the ATIS speech understanding task: Finding features that provide a better performing talker independent system is an algorithmic improvement to the acoustic recognizer. Using knowledge of the prior discourse and likely new queries to guide the acoustic recognizer is a natural stn~ctural constraint. Designing a talker specific system because only a small set of talkers will use the system is tailoring based upon task specific biases.</Paragraph> <Paragraph position="1"> I am deeply interested in task specific tailoring, and I don't care whether the information that guides the process comes from a speech or some totally unique task specific bias.</Paragraph> <Paragraph position="2"> To advance speech research we work on general problems, like ATIS and WSJ, which must be constructed with extraordinary caution to avoid bias. But, I love biases, as long as we know about them and can take advantage of them.</Paragraph> <Paragraph position="3"> And my experience is that most problems have biases that may be exploited once they are known. Perhaps some of our work should attack natural or real tasks with a no-holds-hatred approach.</Paragraph> <Paragraph position="4"> Does this mean that speech researchers should give up their careers and become system developers? - Certainly not. But, at this point in time, throwing a speech algorithm over the fence to the system developer does not work.</Paragraph> <Paragraph position="5"> Some percentage of the speech knowledgeable workers will either have to become system developers or must establish a rapport with the system developer that is a top priority of their work. In the current state of development, only by tailoring the algorithm to the constraints of the problem will speech technology become cost effective.</Paragraph> <Paragraph position="6"> soon? I'm skeptical. Considerable work has been done to improve underlying speech processing systems by using natural constraints such as statistical grammars. We need to do more. I do not know whether systems with little or no senmntic or higher level knowledge can ever give acceptable performance on a real application like WSJ. Worse than that, I don't know how to measure the performance limits imposed by this handicap.</Paragraph> <Paragraph position="7"> The alternative is not to live with the handicap, but to begin to introduce higher level knowledge, perhaps in a fragmentary manner.</Paragraph> <Paragraph position="8"> Indeed, some researchers are working to constrain lower level processes by using structure based on higher level considerations. Since I believe that this will prove to be very important, I would like to encourage this work. In addition, we should give thought to whether there are better architectures for applying these constraints.</Paragraph> <Paragraph position="9"> It is my belief that working on more realistic problems, and being forced to think more about the technology transfer issues will produce systems are more capable of taking advantage of the specifics of a new task. Such a system would be architecturally different from a system designed to work as the general continuous speech transcription system or the general database interface tool. I believe that in addition we would further our understanding of our speech recognition systems and of human computer speech communication.</Paragraph> </Section> <Section position="3" start_page="297" end_page="297" type="metho"> <SectionTitle> 3. Natural Constraints </SectionTitle> <Paragraph position="0"> We need to make money with our technology, but I have already expressed the belief that the number of applications we can expect success on today is limited. What are the limitations of current system performance? Can we take advantage of the natural structures in speech to improve performance? Consider the speech tasks worked on under the ARPA HLT program. The Wall Street Journal transcription task can be useful in advancing large vocabulary speech recognition, but will viable applications of this technology follow</Paragraph> </Section> <Section position="4" start_page="297" end_page="298" type="metho"> <SectionTitle> 4. Improving the Basic System </SectionTitle> <Paragraph position="0"> Since everyone is continually trying to improve their system, you might think that there would be little new to surface here. But this is not the case. There hasn't been enough time, enough money or enough manpower to do as thorough a job as we would like.</Paragraph> <Paragraph position="1"> Improving the understanding of our algorithms is always a useful activity. I strongly suspect that many of the research elements present here do not understand the relative contributions and dependencies intrinsic to each of their system components as well as they believe they do.</Paragraph> <Paragraph position="2"> Discovering these dependencies is an evolutionary process. In most cases, many diagnostic tests that could be run to gain insight into the system have never been done. - At a minimum, they have never been reported.</Paragraph> <Paragraph position="3"> One aspect of system performance that is not often measured is consistency across talkers, channel environments, etc. Although we look at changes in word recognition performance, how often do we measure the consistency of our recognizers at the subword level? We can learn more from our experiments and our data bases, if we are wilting to make the effort. Them arc uncontrolled variables, the effect of which could be measured. For instance, we could use channel simulation to gauge the effect of channel differences. As another example, consider the WSJ task. There are great disparities of performance from talker tO talker. If we reused our test data and restructured our test to reflect this &quot;great divide&quot;, we would gain new insight. - Are there other ideas for getting more milk from these data? We should explore new testing paradigms. R would be useful to know how our systems differed from human performance. It should be possible to do psychophysical measurements on our systems and compare them to human psychophysics. As one simple experiment we could compare human and system performance using diagnostic rhyme, or nonsense syllable tests. This can be looked on as a contrastive test of human versus acoustic recognizer performance with higher level knowledge denied both humans and machines.</Paragraph> <Paragraph position="4"> A more dramatic experimental change would be to run our systems with an &quot;infinite&quot; corpus of data. That is, we would devote a portion of our energy to testing our systems on a continuing, day to day basis on a realistic problem. We would get experience with how problems with time, and undoubtedly improve portability. This would also be a good scenario for having the speech systems perform erA'lain levels of self diagnosis. The machine could tell us when discontinuities in the data occurred, and could be structured to assist in learning the necessary repairs. One advantage of this testing paradigm is that we would test our systems on orders of magnitude more data than we do in the normal static test mode, thereby obtaining more exposure to low probability events that could be saved for further study and system updating.</Paragraph> <Paragraph position="5"> In the infinite data paradigm our view of training would change drastically. We would be blessed by having much more data available for training, and cursed by having less information about the data. From my point of view, it would be extremely beneficial to see the ingenious mechanisms that would evolve to cope with and take advantage of this situation.</Paragraph> </Section> <Section position="5" start_page="298" end_page="299" type="metho"> <SectionTitle> 5. Concluding Remark </SectionTitle> <Paragraph position="0"> I would like to hear a serious discussion of what we might do differently and what the benefits would be. A related issue that should be seriously addressed is what can be done to encourage more diversity of approach, more risk taking, and, consequently, more innovation.</Paragraph> </Section> class="xml-element"></Paper>