File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2037_abstr.xml
Size: 6,550 bytes
Last Modified: 2025-10-06 13:46:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-2037"> <Title>A PROCEDURE OF AN AUTOMATIC GRAPHE~E-TO-PHO~ TRANSFORHATION OF GERMAN</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> A PROCEDURE OF AN AUTOMATIC GRAPHE~E-TO-PHO~ TRANSFORHATION OF GERMAN </SectionTitle> <Paragraph position="0"> Sabine Koch, Wolfgang Menzel, Ingrid Starke Zentralinstitut fur Sprachwissenschaft, AdW DDR, Berlin, DDR The automatic transformation of texts graphemically stored to the corresponding phonemic symbols will enable the speech synthesizer Rosy 4000 (.developed by VEB Robotron Dres~ den) to extend its field of application (application in information systems, development of reading machines for the blind). The texts for this kind of application cannot be limi~ ted in any way - a fact which had to be taken into account concerning the methods suitable for such a procedure. The use of the d_%ctionary method, that means storing the whole vocabulary needed together with the corresponding phonemic strings was impossible for this purpose.</Paragraph> <Paragraph position="1"> The procedure presented here can shortly be characteriz~ ed as a rule system. The transformation is done on the level of word forms not taking into consideration syntactic or semantic criteria.</Paragraph> <Paragraph position="2"> An important part of the procedure is thb analysis of the structure of word forms. The results of this analysis ix~luence the intended high quality of the transformation to a large extent.</Paragraph> <Paragraph position="3"> The problem of automatically identifying the boundaries between elements of compounds could not be solved havi~g in mind the aim to transform unlimited texts. As it is necessary for a correct phonemic transformation to know these boundar- 158 ies, all compounds are split by hand when the input text is s~ored.</Paragraph> <Paragraph position="4"> The presented procedure identifies graphemic Bubstrings in tb~ word form to be transformed on the basis of an unique deterministic analysis and it also check~ if the context of the string or the status of the system fulfil special conditions. In case these tests were successful the substring will be accepted, that means the corresponding phonemic transcription as well as the stress information are added to the substring. In certain cases it is possible to postpone the trausformation to one of the following steps.</Paragraph> <Paragraph position="5"> The graphemic substrings are contained in the information part of the procedure together with the conditions and the results of the transformation. The infommation part, that means the linguistic part, is strongly separated from the algorithm. This separation was of great advantage when working out the procedure.</Paragraph> <Paragraph position="6"> The transformation is carried out in six stages, the most important of which are the analysis of the structure of word forms (the prefix and suffix strategy) and the transformation of graphemes by a set of rules.</Paragraph> <Paragraph position="7"> The analysis of the structure of word forms splits the regarded word form into morphemes and marks the morphemic bound~ries on the basis of lists &quot;containing prefixes and suffixes together with the corresponding phonemic realizations and the stress information (marking of the stressed syllable or stress shifting to other syllables). These lists also conrain exceptions. The exceptions are substrings of certain word forms which are identical with an affix on the graphemic level but they differ in pronounciation or stress or in both of them.</Paragraph> <Paragraph position="8"> All parts of the word form which are not treated by the prefix or suffix strategy (normally the basis) are to be trams-</Paragraph> <Paragraph position="10"> formed by transformation rules. These are context sensitive rules which are applied from left to right. The word form is run trough only once. One part of the context conditions result from word structure analysis : That is the marked morphemic boundaries which influence the transformation of graphemlc strings with regard to phenomena like the so-called final devoicing and the so-called glottal stop as well as the length of vowels. Classes and subclasses of graphemes and phonemes (classes of consonants, vowels, plosives, etc.)are also used as context conditions for an adequate transformation.</Paragraph> <Paragraph position="11"> The strate~ of stress as the last part in the procedure fixes the main stress in the word form by tak~ulg into consideration the stre~s i~formatlon supplied from the other strategies. There exist three classes of prefences: the absolute stress infol~nation, the conditional stress i~formation (if there is no absolute stress information) and the stress information without preference (if there is no conditional preference information).</Paragraph> <Paragraph position="12"> For the r~m-~4~ unstressed word forms the main stress is fixed by stress patterns. The native German vocabulary can be handled by these patterns without large lists of exceptions. Most of the exceptions are foreign words.</Paragraph> <Paragraph position="13"> The first strategy before these mentioned main parts of the procedure is a lock-up in a list contalnlz~ about 250 of the most frequent German word forms (articles,pronouns) which are transformed as a whole without running through all the strategies of the procedure. This immediate transformation saves a lot of time.</Paragraph> <Paragraph position="14"> Purthermore there is a list of about 60 homographs, which could be transformed unambigiously only by the aid of syntactic or semantic criteria. The word forms of this list are also innnediately transformed to the corresponding variants. The advantage of this method is that the followiz~ parts of - 160 the procedure do not have to handle ambiguities.</Paragraph> <Paragraph position="15"> The paper will contain information concerning the number and kind of transformation mistakes. In general the German vocabulary can be transformed correctly by regularities easily to formulate. Difficulties and a great number of exceptions to the regularities result from foreign words which are very frequent in German. The transformation of foreign words cannot be excluded from the procedure because they are often used in German and sometimes they even have no German equive/ent like Ingenieur, Cello, Charta, Chaussee etc.</Paragraph> <Paragraph position="16"> - 161 -</Paragraph> </Section> class="xml-element"></Paper>