File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1020_metho.xml

Size: 4,580 bytes

Last Modified: 2025-10-06 14:12:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1020">
  <Title>GOATS TO SHEEP: CAN RECOGNITION RATE BE IMPROVED FOR POOR TANGORA SPEAKERS?</Title>
  <Section position="4" start_page="145" end_page="146" type="metho">
    <SectionTitle>
METHOD
</SectionTitle>
    <Paragraph position="0"> Twelve users (six males, six females) participated in 21 sessions each. Their task was to produce speech which would be recognized by TANGORA with a high degree of accuracy. To this end, users were encouraged to experiment with their speaking style and to use the feedback provided by recognition errors to shape their speaking style. In the first session, users were given a basic explanation of how their speech would be recognized by the TANGORA system. The importance of clear and consistent speech was stressed. In addition, they were given 30 minutes of experience talking in isolated-word mode with another user's model.</Paragraph>
    <Paragraph position="1"> The remaining 20 sessions consisted of four iterations of a five session sequence (see Figure 1). The first session in each sequence was devoted to training the system. The user read aloud, in isolated-word mode, a 2400 word (171 sentence) text. A model of the speaker's voice was computed from the speech sample collected during these training sessions. Each session lasted approximately one hour.</Paragraph>
    <Paragraph position="2">  Order of sessions 15 &amp; 16 and 20 &amp; 21 was counter-balanced across users.</Paragraph>
    <Paragraph position="3"> In the first two weeks of the study, the speaker model which resulted from a training session was used by TANGORA to decode the speech produced during the following four sessions in the five session sequence.  In each of the final two weeks, the newly created speaker model was used in next three sessions only. The fourth session was decoded against a model generated during an earlier week, as described below. Two practice sessions followed a training session. Users were given lists of 20 unrelated sentences, selected from a corpus of office correspondence, to read aloud as input to the system. They were instructed to experiment with their speaking style and to try to develop a style which would be successfully recognized by TANGORA. In order to facilitate this process, users immediately re-read a sentence if it was not perfectly recognized, up to a total of four times. They attempted to use the feedback from misrecognized words to selectively modify their speaking styles. The final two sessions in each sequence were devoted to tests: Users were given 40 or 50 sentence lists (also office correspondence) to read aloud to the system and were instructed to use what they had determined to be a &amp;quot;good&amp;quot; speaking style in an effort to produce perfect recognition. They read each sentence only once. It should be noted that all words in both the practice and the test sentences were included in TANGORA's vocabulary and that both practice and test sentence sets were carefully controlled for sentence length and perplexity.</Paragraph>
    <Paragraph position="4"> Prior to the second, third and fourth training sessions, each user's performance was analyzed by the experimenter who generated hypotheses about the user's speech habits which may have caused him or her to be poorly recognized by TANGORA. These hypotheses were described to the user and suggestions were made on how the user might modify his or her speaking style.</Paragraph>
    <Paragraph position="5"> In order to determine whether re-training the system would improve recognition accuracy, decoding of each user's speech was done against both the current and an older speaker model during weeks 3 and 4. Thus, during the third week, users completed one test session with the newly generated speaker model and one with the model generated at the beginning of the first week. Similarly, the model from the fourth week was compared against the one generated during the second week. If training (which includes the effect of practice) rather than practice alone is the means whereby accuracy is improved, then the following results should be obtained: (1) accuracy during the third and fourth weeks should be better with each week's current model than with the model which had been generated two weeks earlier, and (2) accuracy during the third week with the third week's model should be better than accuracy from the first week with the first week's model and similarly, better during the fourth week with the fourth week's model than in the second week.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML