XML Viewer - j00-2003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/j00-2003_intro.xml
Size: 7,871 bytes
Last Modified: 2025-10-06 14:00:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="J00-2003">
  <Title>A Multistrategy Approach to Improving Pronunciation by Analogy</Title>
  <Section position="4" start_page="198" end_page="200" type="intro">
    <SectionTitle>
3. Dedina and Nusbaum's System
</SectionTitle>
    <Paragraph position="0"> The results reported here were obtained using an extended and improved version of PRONOUNCE, the Dedina and Nusbaum (D&amp;N) system, which we now describe.</Paragraph>
    <Section position="1" start_page="198" end_page="200" type="sub_section">
      <SectionTitle>
3.1 Principles
</SectionTitle>
      <Paragraph position="0"> The basic PRONOUNCE system consists of four components: the lexical database; the matcher, which compares the target input to all the words in the database; the pronunciation lattice (a data structure representing possible pronunciations); and the decision function, which selects the &amp;quot;best&amp;quot; pronunciation among the set of possible ones. Reflecting PbA's origins as an empirical, psychological model, this selection is heuristic  Simplified pronunciation lattice for the word anecdote. For clarity, only a subset of the arcs is shown. Full pattern matching is used as described in Section 3.2. Phoneme symbols are those employed by Sejnowski and Rosenberg.</Paragraph>
      <Paragraph position="1"> rather than being based (like certain other approaches to automatic pronunciation) on any statistical model.</Paragraph>
      <Paragraph position="2"> 3.1.1 Pattern Matching. The input word is first compared to words listed in the lexicon (Webster's Pocket Dictionary) and substrings common to both are identified. For a given dictionary entry, the process starts with the input string and the dictionary entry leftaligned. Substrings sharing contiguous, common letters in matching positions in the two strings are then found. Information about these matching letter substrings--and their corresponding phoneme substrings in the dictionary entry under consideration-is entered into the input string's pronunciation lattice as detailed below. (Note that this requires the letters and phonemes of each word in the lexicon to have been previously aligned in one-to-one fashion.) The shorter of the two strings is then shifted right by one letter and the matching process repeated. This continues until the two strings are right-aligned, i.e., the number of right shifts is equal to the difference in length between the two strings. This process can be alternatively seen as a matching between substrings of the incoming word &amp;quot;segmented in all possible ways&amp;quot; (Kay and Marcel 1981, 401) and the entries in the lexicon.</Paragraph>
      <Paragraph position="3"> 3.1.2 Pronunciation Lattice. Matched substrings, together with their corresponding phonemic mappings as found in the lexicon, are used to build the pronunciation lattice for the input string. A node of the lattice represents a matched letter, Li, at some position, i, in the input. The node is labeled with its position index i and with the phoneme that corresponds to Li in the matched substring, Pim say, for the mth matched substring. An arc is placed from node i to node j if there is a matched substring starting with L i and ending with Lj. The arc is labeled with the phonemes intermediate between Pim and Pjm in the phoneme part of the matched substring. Additionally, arcs are labeled with a &amp;quot;frequency&amp;quot; count (see below), which is incremented by one each time that substring (with that pronunciation) is matched during the pass through the lexicon.</Paragraph>
      <Paragraph position="4"> Figure 1 shows an example pronunciation lattice for the word anecdote. For clarity, the lattice has been simplified to show only a subset of the arcs. This word suffers from the so-called silence problem whereby PbA fails to produce any pronunciation, because there is no complete path through the lattice (see next page). In the case illustrated, there is no cd ~ /kd/ mapping in the dictionary other than in the word anecdote itself. Hence, in view of the leave-one-out testing strategy (see next page), there will never be an arc between nodes (/k/,4) and (/d/, 5).</Paragraph>
      <Paragraph position="5">  Marchand and Damper Improving Pronunciation by Analogy 3.1.3 Decision Function. A possible pronunciation for the input string then corresponds to a complete path through its lattice, with the output string assembled by concatenating the phoneme labels on the nodes/arcs in the order that they are traversed. (Different paths can, of course, correspond to the same pronunciation.) Scoring of candidate pronunciation uses two heuristics in PRONOUNCE. If there is a unique shortest path, then the pronunciation corresponding to this path is taken as the output. If there are tied shortest paths, then the pronunciation corresponding to the best-scoring of these is taken as the output. In D&amp;N's original work, the score used is the sum of arc &amp;quot;frequencies&amp;quot; (Dedina and Nusbaum's term, and nothing to do with frequency of usage in written or spoken communication) obtained by counting the number of times the corresponding substring matches between the input and the entire lexicon.</Paragraph>
      <Paragraph position="6"> The scoring heuristics are one obvious dimension on which different versions of PbA can vary. In the following, when we refer to a multistrategy approach to PbA, it is principally the use of multiple scoring strategies which is at issue.</Paragraph>
    </Section>
    <Section position="2" start_page="200" end_page="200" type="sub_section">
      <SectionTitle>
3.2 Appraisal
</SectionTitle>
      <Paragraph position="0"> PRONOUNCE was evaluated on just 70 monosyllabic pseudowords--a subset of those previously used in reading studies by Glushko (1979). Such a test is largely irrelevant to TTS applications: the test set is not representative of general English, either in the small number of words used or their length. Also, D&amp;N's claimed results on this pseudoword test set have proved impossible to replicate (Damper and Eastmond 1996, 1997; Yvon 1996; Bagshaw 1998). In addition, no consideration is given to the case where no complete path through the lattice exists (the silence problem mentioned earlier).</Paragraph>
      <Paragraph position="1"> D&amp;N's pattern matching (when building the pronunciation lattice) is a &amp;quot;partial&amp;quot; one. That is, as explained in section 3.1.1, the process starts with the leftmost letter of the input string and of the current dictionary entry aligned and continues until the two are right-aligned. They give (on page 59) the example of the input word blope matching to the lexical entry sloping. At the first iteration, the initial b of blope aligns with the initial s of sloping, and the common substring lop is extracted. The process terminates at the third iteration, when the final e of blope aligns with the final g of sloping: there are no common substrings in this case. There seems to be no essential reason for starting and discontinuing matching at these points. That is, we could shift and match over the range of all possible overlaps--starting with the final e of blope aligned with the initial s of sloping, and terminating with the initial b of the former aligned with the final g of the latter. We call this &amp;quot;full&amp;quot; as opposed to &amp;quot;partial&amp;quot; matching. (Note that the simplified pronunciation lattice depicted in Figure 1 was obtained using full pattern matching.) One conceivable objection to partial pattern matching is that some morphemes can act both as prefix and suffix (e.g., someBODY and BODYguard). From this point of view, full matching seems worth consideration. A linguistic justification for the full method is that affixation is often implicated in the creation of new words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML