File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1030_intro.xml
Size: 3,703 bytes
Last Modified: 2025-10-06 14:03:25
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1030"> <Title>Learning Pronunciation Dictionaries Language Complexity and Word Selection Strategies</Title> <Section position="3" start_page="0" end_page="232" type="intro"> <SectionTitle> 2 Assumptions and Issues </SectionTitle> <Paragraph position="0"> In designing language technology development tools we find it helpful to envision our target user, whom may be characterized as &quot;non-technical.&quot; Such a person speaks, reads, and writes the target language, is able to enumerate the character set of that language, distinguish punctuation from whitespace, numerals, and regular letters or graphemes, and specify if the language distinguishes upper and lower casing. When presented with the pronunciation of a word (as a synthesized wavefile), the user can say whether it is right or wrong. In addition, such a person has basic computer fluency, can record sound files, and can navigate the HTML interface of our software tools. If these latter requirements present a barrier then we assume the availability of a field agent to configure the computer, familiarize the user, plus translate the English instructions, if necessary.</Paragraph> <Paragraph position="1"> Ideally, our target user need not have explicit knowledge of their own language's phoneme set, nor even be aware that a word can be transcribed as a sequence of phonemes (differently from letters). The ability to reliably discover a workable phoneme set from an unlabeled corpus of speech is not yet at hand, however. Instead we elicit a language's phoneme set during an initialization stage by presenting examples of IPA wavefiles (Wells and House, 1995).</Paragraph> <Paragraph position="2"> Currently, pronunciations are spelled out using a romanized phonetic alphabet. Following the recommendation of (Davel and Barnard, 2005) a candidate pronunciation is accompanied with a wavefile generated from a phoneme-concatenation synthesizer. Where possible, more than one pronunciation is generated for each word presented, under that assumption that it is easier for a listener to select from among a small number of choices than correct a wrong prediction.</Paragraph> <Section position="1" start_page="232" end_page="232" type="sub_section"> <SectionTitle> 2.1 Four Questions to Address </SectionTitle> <Paragraph position="0"> 1. What is our measure of success? Ultimately, the time to build a lexicon of a certain coverage and correctness. As a proxy for time we use the number of characters presented. (Not words, as is typically the case, since long words contain more information than short, and yet are harder for a human to verify.) 2. For a given language, how many words (letters) are needed to learn its LTS rule system? The true, yet not too useful answer is &quot;it depends.&quot; The complexity of the relation between graphemic representation and acoustic realization varies greatly across languages. That being the case, we seek a useful measure of a language's degree of complexity.</Paragraph> <Paragraph position="1"> 3. Can the asymptote of the LTS system be estimated, so that one can determine when the learned rules are 90 or 95% complete? In Section 4 we present evidence that this may not be possible. The fall-back position is percentage coverage of the supplied corpus.</Paragraph> <Paragraph position="2"> 4. Which words should be presented to the user, and in what order? Each additional word should maximize the marginal information gain to the system. However, short words are easier for humans to contend with than long. Thus a length-based weighting needs to be considered.</Paragraph> </Section> </Section> class="xml-element"></Paper>