File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1714_intro.xml

Size: 4,576 bytes

Last Modified: 2025-10-06 14:01:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1714">
  <Title>XiSTS - XML in Speech Technology Systems</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
3. LIPS
</SectionTitle>
    <Paragraph position="0"> methodology which enables users to construct their own phonotactic automata for any language by means of a graphical user interface. Furthermore, LIPS employs an event logic, enabling it to map from absolute time to relative time, and in a novel approach to ASR, carry out parsing on the phonological feature level. The system is comprised of two principal components, the network generator and the parser, outlined in the following subsections.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1. The Network Generator
</SectionTitle>
      <Paragraph position="0"> The network generator interface allows users to build their own phonotactic automata. Users input node values and select from a list of feature overlap relations those that a given arc is to represent. These relations can be selected from a default list of IPA-like features or the user can specify their own set. In this way LIPS is feature-set independent. The network generator constructs feature-based networks and parsing takes place at the feature level. Once the user has completed the network specification, the system generates an XML representation of the phonotactic automaton.</Paragraph>
      <Paragraph position="1"> An automaton representing a small subsection of the phonotactics of English is illustrated in Figure 2. It is clear from this automaton that English permits an [S] followed by a [r] in syllable-initial position, but not the other way around.</Paragraph>
      <Paragraph position="2">  Figure 3 illustrates a subsection of the XML representation of the English phonotactics output by the network generator. A single arc with a single phoneme, [S], and its overlap constraints, is shown.</Paragraph>
      <Paragraph position="3"> The motivation for generating an XML representation for our phonotactic automata is that XML enables us to specify a welldefined, easy to interpret, portable template, without compromising the generic nature of the network generator. That is to say the user can still specify a phonotactic automaton independent of any language or feature-set. The generated phonotactic automaton is then used to guide the second principal component of the system, the parser.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 The Parser
</SectionTitle>
      <Paragraph position="0"> LIPS employs a top-down and breadth-first parsing strategy and is best explained through exemplification.</Paragraph>
      <Paragraph position="1"> Purely for the purposes of describing how the parsing procedure takes place, we return to the phonotactic automaton of Figure 2, which of course represents only a very small subsection of English. This automaton will recognise such syllables as shum, shim, shem, shown, shrun, shran etc., some being actual lexicalised syllables of English and others being phonotactically well-formed, potential, syllables of English. For our example we take the multilinear representation of the utterance [So:n] as depicted in Figure 4 as our input to the  At the beginning of the parsing process the phonotactic automaton is anticipating a [S] sound, that is it requires three temporal overlap constraints to be satisfied, the feature voiceless must overlap the feature fricative, the feature palato must overlap the feature voiceless, and the feature fricative must overlap the feature palato. A variable window is applied over the input utterance and the features within the window are examined to see if they satisfy the overlap constraints. As can be seen from Figure 4 the three features are indeed present and all overlap in time. Thus the [S] is recognised and the two arcs bearing the [S] symbol are traversed and the window moves on. At this point then the automaton is anticipating either an [r] or a vowel sound. In a similar fashion the contents of the new window are examined and in the case of our example the vowel [o:] is recognised (the [r] is rejected). The vowel transition is traversed, the window moves on, and the automaton is expecting an [n] or an [m]. For full details of the parsing process see Carson-Berndsen &amp; Walsh (2000b). Output from LIPS is then fed through the REFLEX system to determine if actual or potential syllables have been found.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML