File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/a92-1004_evalu.xml
Size: 4,388 bytes
Last Modified: 2025-10-06 14:00:02
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1004"> <Title>A PARSER FOR REAL-TIME SPEECH SYNTHESIS OF CONVERSATIONAL TEXTS</Title> <Section position="10" start_page="29" end_page="30" type="evalu"> <SectionTitle> 4. EVALUATION OF PERFORMANCE </SectionTitle> <Paragraph position="0"> Evaluation of the parser has involved two quite different forms of testing: a field trial and laboratory evaluation.</Paragraph> <Paragraph position="1"> First, the parser was implemented as a component in a version of the Bell Labs text-to-speech synthesizer (Olive and Liberman 1985). The synthesizer forms the core of a telecommunications system that ran for three months as a feature of TRS in California. Several thousand TDD texts were processed by the system. Although restrictions on confidentiality prevented us from collecting actual TDD text data, results of the field trial far surpassed expectations: disconnect rates for text-to-speech calls averaged less than 20% and follow-up surveys indicated a high degree of interest in and acceptance of the technology.</Paragraph> <Paragraph position="2"> A second type of testing that has enabled us to focus on the parser involves the collection of data from a questionnaire given to TDD users. Phrasing for these data was assigned manually by a linguist unfamiliar with the rules of the parser to allow for comparison with the parser's output.</Paragraph> <Paragraph position="3"> Several issues arise in the comparison of human judgements of phrasing with those of a phrase parser's output. One of the more ubiquitous is that of phrasal balancing. Apparently acting under rhythmic coastraint.~ speakers tend to aim for equivalent numbers of stresse~ syllables on either side of a break. However, the incot potation of rhythm into phrasing varies from speaker t, speaker, as well as being partially dependent on semanti intent. For example, the sentence so I feel there shoul, be a better system to say bye, taken from our data, coul, be phrased either as (a), (b), or (c): (a) so I feel there should be \[I a better system to say bye (b) so I feel I\] there should be II a better system to say bye (c) so I feel II there should be a better system I I to say bye If the parser assigns, for example, the phrasing in (~ while the human judge assigns (b) it must be counted a qualitatively different from the parser's assignment of misleading boundary, where the bearer's understandin of the import of the utterance is altered because of th erroneous boundary placement. An example of mislead ing boundary placement as assigned by the parser i given below, where the bearer is incorrectly led to intel pret well as a modification of see, rather than as discourse comment.</Paragraph> <Paragraph position="4"> oh i see well I \[ so i call my boss In a similar vein, giving equal weight in an evalu~ tion to the locations where pauses do and do not occur i misleading. The absence of a phrasal boundary betwee two words is much more common than the presence of boundary, so that predicting the absence of a boundary i always safer and leads to inflated evaluation scores th~ make comparison of systems difficult. For example, i the (a) sentence above there are 12 potential prosodi events, one after each word. If a given system assigr no breaks in this sentence, and if non-events are give equal weight with events, then the system will get score for this sentence of 91.6 percent since it gets l 1 c the 12 judgments right. Also, if a system assigns o~ break in this utterance, but puts it in a clearly inappropr ate place, say before the word bye, it will get a score c 83 percent since it gets 10 of the 12 judgements righ While 83 percent sounds like a decent score for a systex that must capture some subjective performance, tit method of evaluation has completely failed to capture tt fact that assigning an inappropriate prosodic break in th instance has completely misled the listener. Therefor we need to evaluate a phrasing system on the basis c positive occurrences of phrase boundaries only.</Paragraph> <Paragraph position="5"> Assigning phrases to TDD output is not a clear-c1 task. The output is not intended to be spoken anq because of the device, it has telegraphic characteristic In addition, many TDD users do not have standard sp, ken English at their command. Nevertheless, an effort</Paragraph> </Section> class="xml-element"></Paper>