File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1611_concl.xml
Size: 1,759 bytes
Last Modified: 2025-10-06 13:54:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1611"> <Title>A Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Application</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> This paper argues that the best way to represent data at phonological/lexical level for language modeling and MT in languages that employ the Arabic script, is by using a hybrid system, which combines information provided by orthography and includes the vowels that are not represented in orthography. The schemes proposed can significantly aid in speech-to-speech applications in a multitude of different ways: (1) the internal pronunciations of the ASR and the TTS components can employ the USCPron scheme, (2) the internal transcription of the Persian language for purposes of language modeling and statistical machine translation among others can employ the USCPers+ scheme and (3) in the case of a stand-alone TTS, in which case the input is pure Persian text, automated transliteration to the USCPers+ scheme, and hence to the pronunciation, can be generated with statistical language augmentation techniques, which are based on prior model training, as we describe further in Georgiou, 2004.</Paragraph> <Paragraph position="1"> This would ensure a uniqueness that otherwise is not available. It has also been suggested in this paper that a modification of IPA, which would allow the use of ASCII characters, is a more convenient way to capture data for acoustic modeling and TTS. Persian data resources developed under the DARPA Babylon program have adopted the conventions described in this paper.</Paragraph> </Section> class="xml-element"></Paper>