File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1714_metho.xml
Size: 9,680 bytes
Last Modified: 2025-10-06 14:08:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1714"> <Title>XiSTS - XML in Speech Technology Systems</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. REFLEX </SectionTitle> <Paragraph position="0"> REFLEX is a gene ric, language independent application, which allows for the rapid design and construction of syllable lexicons, for any language. One of the main focuses of other research working on broadening the scope of the lexicon across languages, has been in the development of multilingual lexicons. One such project, PolyLex (Cahill & Gazdar, 1999), captures commonalities across related languages using hierarchical inheritance mechanisms. One of the main concerns of the work presented here however, is to provide generic, reusable, tools which facilitate the development and testing of phonological systems, rather than the creation of such multilingual lexicons.</Paragraph> <Paragraph position="1"> Work on phonological features and lexical description has either been within this multilingual context (Tiberius & Evans, 2000) or has concentrated on using a feature-based lexicon for comparison with features extracted from a sound signal (Reetz, 2000). By removing reference to specific languages and concentrating on providing mechanisms for lexical generation, REFLEX can generate a syllable lexicon for any language that can be adequately represented in a phonetic notation.</Paragraph> <Paragraph position="2"> Furthermore, the decision to use XML to represent the output data means that it is readily available for use and manipulation by other outside systems with minimal effort. All background processing is completely hidden; one deals only with the marked-up output, from which idiosyncratic user-required structures can be rapidly generated.</Paragraph> <Paragraph position="3"> The REFLEX system outputs a feature-based syllable lexicon. This lexicon is a valid XML document, meaning that it conforms to the given REFLEX Document Type Definition (DTD). The DTD stipulates the structure, order and number of XML element tags and attributes, modelling all potential syllable structures (e.g. V, CV, CVC etc).</Paragraph> <Paragraph position="4"> An example of a typical lexical entry, in this case corresponding to the multilinear representation specified in Figure 5, [So:n] is given below.</Paragraph> <Paragraph position="5"> Figure 5. Typical lexical entry in XML The syllable element shown has four children, described as follows: 1) A text child, in this case So:n, the SAMPA representation of the entire syllable. 2) An <onset> element whose attribute list denotes its position within the syllable, i.e.<onset type=&quot;first&quot;>, <onset type=&quot;second&quot;> etc. 3) Nucleus and 4) coda elements are similarly defined.</Paragraph> <Paragraph position="6"> Each of the syllable's elements, <onset>, <nucleus> and <coda>, may have only one child element, <segment>, which tags the given phoneme. Its attribute list describes the phonemes specification in terms of phonological features. It also has a duration attribute, which is derived from corpus analysis.</Paragraph> <Paragraph position="8"> REFLEX provides two methods by which syllables can be added to the lexicon. The first, requires users to specify an input file of monosyllables represented in a phonetic notation, in this case SAMPA. The second, enables the user to specify syllables, in terms of phonemes, position, and if desired, a typical duration, by means of a GUI illustrated below in Figure 6.</Paragraph> <Paragraph position="9"> Figure 6. REFLEX lexicographer interface Regardless of the input option chosen, new entries are added to the lexicon via a background process. REFLEX makes use of DATR, a non-monotonic inheritance based lexical representation language (Evans & Gazdar, 1996) to carry out this process.</Paragraph> <Paragraph position="10"> DATR is used to quickly and comprehensively define the phonological feature descriptions for a given language.</Paragraph> <Paragraph position="11"> For a greater understanding of how this can be achieved see Cahill, Carson-Berndsen & Gazdar (2000). Using DATR's inference mechanisms, REFLEX manipulates the output into a valid XML document, creating a sophisticated phonological feature-based lexicon, shown in Figure 5.</Paragraph> <Paragraph position="12"> All syllable elements are enclosed within the root <lexicon> tag, whose sole attribute specifies the lexicon's language.</Paragraph> <Paragraph position="13"> The REFLEX lexicon is a versatile tool that has a number of potential applications within the domain of speech technology systems. The following sub-sections illustrate how this syllable lexicon, by virtue of its being marked up in XML, can contribute to both speech recognition and synthesis.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 LIPS and REFLEX </SectionTitle> <Paragraph position="0"> By allowing feature overlap constraints to be relaxed in the case of underspecified input, LIPS can produce a number of candidate syllables. In Figure 4 above, at the final transition, the automaton is expecting either an [m] or an [n]. The input, however, is underspecified, no feature distinguishing between [m] or [n], or indeed any voiced nasal, is present. By allowing the overlap constraints for the [m] and the [n] to be relaxed, LIPS can consider both [So:n] and [So:m] to be candidate syllables for the utterance. Both candidate syllables are wellformed, adhering to the phonotactics of English, however only one, [So:n], is an actual syllable of English. Thus at this point a lexicon providing good coverage of the language should reject [So:m] and accept [So:n]. In order to achieve this, REFLEX makes use of the XPath specification (a means for locating nodes in an XML document) and formulates a query before applying it to the syllable lexicon. 2 In the document, checking the value of the text child of each syllable element, against each candidate syllable output by LIPS. Any successful matches returned are therefore not only well-formed, but are deemed to be actual syllables. Thus at this point, the lexicon is searched and the syllable [So:n] is recognised. The granularity of the REFLEX search capability is such, that it can be extended to the feature level. Users can search the lexicon for syllables that contain a number of specific features in certain positions, e.g. search for syllables that contain a voiced, labial, plosive in the first onset. Again, REFLEX forms an XPath expression and queries the lexicon, returning all matches. REFLEX also functions as a knowledge source for the T-REX system.</Paragraph> <Paragraph position="1"> This system is responsible for mapping output from the lexicon into syllable representations using different feature sets, e.g. features from other phonologies, and is discussed below in the context of speech synthesis.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5. T-REX </SectionTitle> <Paragraph position="0"> The role of this module is to enable lexicographers and speech scientists etc. to generate, via a transduction process, syllable lexicons based on different phonological feature sets. The default feature set employed by REFLEX is based on IPA-like features. However, T-REX provides a GUI that permits lexicographers to define phoneme to feature attribute mappings. Given this functionality T-REX operates as a testbed for investigating the merits of different feature sets in the context of speech synthesis. Different lexicons are generated by associating new feature sets with the same phonetic alphabet (SAMPA) via a GUI. The new lexicon is then transduced by T-REX which maps all syllable entries from the default lexicon (with IPA-like features) to the new lexicon, applying the features input by the user, to their associated phonemes. In order to exemplify this we return to our sample syllable, [So:n]. Figure 2 above shows the lexical representation, using IPA-like features, for [So:n]. Figure 7 below shows new features being associated with the phoneme [S].</Paragraph> <Paragraph position="1"> Figure 7. GUI for T-REX Similarly, new features are associated with the remaining phonemes, [o:] and [n], and indeed the rest of the SAMPA alphabet. On completion the user initiates the transduction process and a new lexicon is produced. The XML representation of the phoneme [S], in the new lexicon, is depicted in Figure 8.</Paragraph> <Paragraph position="2"> Note how the feature attributes differ from those in the default lexicon.</Paragraph> <Paragraph position="3"> Figure 8. Phoneme with transduced features The advantages of this transduction capability are that numerous lexicons can be rapidly developed and used to investigate the appropriateness of specific formal models of phonological representation for the purposes of speech synthesis.</Paragraph> <Paragraph position="4"> Furthermore, the same computational phonological model, i.e. the Time Map model, can be employed. Bohan et al (2001) describe how the phonotactic automaton is used to generate a multilinear event representation of overlap and precedence constraints for an utterance, which is then mapped to control parameters of the HLsyn (Sensimetrics Corporation) synthesis engine. Different feature sets can be evaluated by assessing how they influence the various control parameters of the HLsyn engine and the quality of the synthesised speech.</Paragraph> </Section> class="xml-element"></Paper>