File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1019_intro.xml
Size: 3,584 bytes
Last Modified: 2025-10-06 14:05:23
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1019"> <Title>Identification of Non-Linguistic Speech Features</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> As speech recognition technology advances, so do the aims of system designers, and the prospects of potential applications. One of the main efforts underway in the community is the development of speaker-independent, task-independent large vocabulary speech recognizers that can easily be adapted to new tasks. It is becoming apparent that many of the portability issues may depend more on the specification of the task, and the ergonomy, than on the performance of the speech recognition component itself. The acceptance of speech technology in the world at large will depend on how well the technology can be integrated in systems which simplify the life of the users. This in turns means that the service provided by such a system must be easy to use, and as fast as other providers of the service (i.e., such as using a human operator).</Paragraph> <Paragraph position="1"> While the focus has been on improving the performance of the speech recognizers, it is also of interest to be able to identify what we refer to as some of the &quot;non-linguistic&quot; speech features present in the acoustic signal. For example, it is possible to envision applications where the spoken query is to be recognized without prior knowledge of the language being spoken. This is the case for information centers in public places, such as train stations and airports, where the language may change from one user to the next. The ability to automatically identify the language being spoken, and to respond appropriately, is possible.</Paragraph> <Paragraph position="2"> Other applications, such as for financial or banking transactions, or access to confidential information, such as financial, medical or insurance records, etc., require accurate identification or verification of the user. Typically security is provided by the human who &quot;recognizes&quot; the voice of the client he is used to dealing with (and often will also be confirmed by a fax), or for automated systems by the use of cards and/or codes, which must be provided in order to access the data. With the widespread use of telephones, and the new payment and information retrieval services offered by telephone, it is a logical extension to explore the use of speech for user identification. An advantage is that if text-independent speaker verification techniques are used, the speaker's identity can be continually verified during the transaction, in a manner completely transparent to the user.</Paragraph> <Paragraph position="3"> This can avoid the problems encountered by theft or duplication of cards, and pre-recording of the user's voice during an earlier transaction.</Paragraph> <Paragraph position="4"> With these future views in mind, this paper presents a unified approach for identifying non-linguistic speech features, such as the language being spoken, and the identity or sex of the speaker, using phone-based acoustic likelihoods. The basic idea is similar to that of using sex-dependent models for recognition, but instead of the output being the recognized string, the output is the characteristic associated with the model set having the highest likelihood. This approach has been evaluated for French/English language identification, and speaker and sex identification in both languages.</Paragraph> </Section> class="xml-element"></Paper>