File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2035_intro.xml

Size: 10,626 bytes

Last Modified: 2025-10-06 14:01:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2035">
  <Title>A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In traditional Thai TTS, it consists of four main modules: word segmentation, grapheme-to-phoneme, prosody generation, and speech signal processing. The accuracy of pronunciation in Thai TTS mainly depends on accuracies of two modules: word segmentation, and grapheme-to-phoneme. In word segmentation process, if word boundaries cannot be identified correctly, it leads Thai TTS to the incorrect pronunciation such as a string &amp;quot;taaklm&amp;quot; which can be separated into two different ways with different meanings and pronunciations. The first one is &amp;quot;taa(eye) klm(round)&amp;quot;, pronounced [ta:0 klom0] and the other one is &amp;quot;taak(expose) lm(wind)&amp;quot;, pronounced [ta:k1 lom0]. In grapheme-to-phoneme module, it may produce error pronunciations for a homograph which can be pronounced more than one way such as a word &amp;quot;ephlaa&amp;quot; which can be pronounced [phlaw0] or [phe:0 la:0]. Therefore, to improve an accuracy of Thai TTS, we have to focus on solving the problems of word boundary ambiguity and homograph ambiguity which can be viewed as a disambiguation task.</Paragraph>
    <Paragraph position="1"> A number of feature-based methods have been tried for several disambiguation tasks in NLP, including decision lists, Bayesian hybrids, and Winnow. These methods are superior to the previously proposed methods in that they can combine evidence from various sources in disambiguation. To apply the methods in our task, we treat problems of word boundary and homograph ambiguity as a task of word pronunciation disambiguation.</Paragraph>
    <Paragraph position="2"> This task is to decide using the context which was actually intended. Instead of using only one type of syntactic evidence as in N-gram approaches, we employ the synergy of several types of features. Following previous works [4, 6], we adopted two types of features: context words, and collections. Context-word feature is used to test for the presence of a particular word within +/- K words of the target word and collocation test for a pattern of up to L contiguous words and/or part-of-speech tags surrounding the target word. To automatically extract the discriminative features from feature space and to combine them in disambiguation, we have to investigate an efficient technique in our task.</Paragraph>
    <Paragraph position="3"> The problem becomes how to select and combine various kinds of features. Yarowsky [11] proposed decision list as a way to pool several types of features, and to solve the target problem by applying a single strongest feature, whatever type it is. Golding [3] proposed a Bayesian hybrid method to take into account all available evidence, instead of only the strongest one. The method was applied to the task of context-sentitive spelling correction and was reported to be superior to decision lists. Later, Golding and Roth [4] applied Winnow algorithm in the same task and found that the algorithm performs comparably to the Bayesian hybrid method when using pruned feature sets, and is better when using unpruned sets or unfamiliar test set.</Paragraph>
    <Paragraph position="4"> In this paper, we propose a unified framework in solving the problems of word boundary ambiguity and homograph ambiguity altogether. Our approach employs both local and long-distance contexts, which can be automatically extracted by a machine learning technique. In this task, we employ the machine learning technique called Winnow. We then construct our system based on the algorithm and evaluate them by comparing with other existing approaches to Thai homograph problems. null  In Thai TTS, there are two major types of text ambiguities which lead to incorrect pronunciation, namely word boundary ambiguity and homograph ambiguity.</Paragraph>
    <Paragraph position="5"> Word Boundary Ambiguity (WBA) Thai as well as some other Asian languages has no word boundary delimiter. Identifying word boundary, especially in Thai, is a fundamental task in Natural Language Processing (NLP). However, it is not a simple problem because many strings can be segmented into words in different ways. Word boundary ambiguities for Thai can be classified into two main categories defined by [6]: Context Dependent Segmentation Ambiguity (CDSA), and Context Independent Segmentation Ambiguity (CISA).</Paragraph>
    <Paragraph position="6"> CISA can be almost resolved deterministically by the text itself. There is no need to consult any context. Though there are many possible segmentations, there is only one plausible segmentation while other alternatives are very unlikely to occur, for example, a string &amp;quot;aiphaamehsii&amp;quot; which can be segmented into two different ways: &amp;quot;aip(go) haam(carry) eh(deviate) sii(color)&amp;quot; [paj0 ha:m4 he:4 si:4] and &amp;quot;aip(go) haa(see) mehsii(queen)&amp;quot; [paj0 ha:4 ma:3 he:4 si:4]. Only the second choice is plausible. One may say that it is not semantically ambiguous. However, simple algorithms such as maximal matching [6, 9] and longest matching [6] may not be able to discriminate this kind of ambiguity. Probabilistic word segmentation can handle this kind of ambiguity successfully. null CDSA needs surrounding context to decide which segmentation is the most probable one. Though the number of possible alternatives occurs less than the context independent one, it is more difficult to disambiguate and causes more errors. For example, a string &amp;quot;taaklm&amp;quot; can be segmented into &amp;quot;taa klm&amp;quot; (round eye) and &amp;quot;taak lm&amp;quot; (to expose wind) which can be pronounced [ta:0 klom0] and [ta:k1 lom0] respectively.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Homograph Ambiguity
</SectionTitle>
      <Paragraph position="0"> Thai homographs, which cannot be determined the correct pronunciation without context, can be classified into six main categories as follows:  1. Number such as 10400 in postcode, it can be pro- null 4. Proper Name such as &amp;quot;smphl&amp;quot; is pronounced [som4 phon0] or [sa1 ma3 phon0].</Paragraph>
      <Paragraph position="1"> 5. Same Part of Speech such as &amp;quot;ephlaa&amp;quot; (time) can be pronounced [phe:0 la:0], while &amp;quot;ephlaa&amp;quot; (axe) is pronounced [phlaw0].</Paragraph>
      <Paragraph position="2"> 6. Different Part of Speech such as &amp;quot;aehn&amp;quot; is pronounced [nx:4] or [hx:n4].</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Previous Approaches
</SectionTitle>
      <Paragraph position="0"> POS n-gram approaches [7, 10] use statistics of POS bigram or trigram to solve the problem. They can solve only the homograph problem that has different POS tag.</Paragraph>
      <Paragraph position="1"> They cannot capture long distance word associations.</Paragraph>
      <Paragraph position="2"> Thus, they are inappropriate of resolving the cases of semantic ambiguities.</Paragraph>
      <Paragraph position="3"> Bayesian classifiers [8] use long distance word associations regardless of position in resolving semantic ambiguity. These methods can successful capture long distance word association, but cannot capture local context information and sentence structure.</Paragraph>
      <Paragraph position="4"> Decision trees [2] can handle complex condition, but they have a limitation in consuming very large parameter spaces and they solve a target problem by applying only the single strongest feature.</Paragraph>
      <Paragraph position="5"> Hybrid approach [3, 12] combines the strengths of other techniques such as Bayesian classifier, n-gram, and decision list. It can be capture both local and long distance context in disambiguation task.</Paragraph>
      <Paragraph position="6"> Our Model To solve both word boundary ambiguity and homograph ambiguity, we treat these problems as the problem of disambiguating pronunciation. We construct a confusion set by listing all of its possible pronunciations. For example, C = {[ma:0 kwa:1], [ma:k2 wa:2]} is the confusion set of the string &amp;quot;maakkwaa&amp;quot; which is a boundaryambiguity string and C={[phe:0 la:0] ,[phlaw0]} is the confusion set of the homograph &amp;quot;ephlaa&amp;quot;. We obtain the features that can discriminate each pronunciation in the set by Winnow based on our training set.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Winnow
</SectionTitle>
      <Paragraph position="0"> Winnow algorithm used in our experiment is the algorithm described in [1]. Winnow is a neuron-like network where several nodes are connected to a target node [4, 5]. Each node called specialist looks at a particular value of an attribute of the target concept, and will vote for a value of the target concept based on its specialty; i.e. based on a value of the attribute it examines. The global algorithm will then decide on weighted-majority votes receiving from those specialists. The pair of (attribute=value) that a specialist examines is a candidate of features we are trying to extract. The global algorithm updates the weight of any specialist based on the vote of that specialist. The weight of any specialist is initialized to 1. In case that the global algorithm predicts incorrectly, the weight of the specialist that predicts incorrectly is halved and the weight of the specialist that predicts correctly is multiplied by 3/2. The weight of a specialist is halved when it makes a mistake even if the global algorithm predicts correctly.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Features
</SectionTitle>
      <Paragraph position="0"> To train the algorithm to resolve pronunciation ambiguity, the context around a homograph or a boundaryambiguity string is used to form features. The features are the context words, and collocations. Context words are used to test for the presence of a particular word within +10 words and -10 words from the target word.</Paragraph>
      <Paragraph position="1"> Collocations are patterns of up to 2 contiguous words and part-of-speech tags around the target word. Therefore, the total number of features is 10; 2 features for context words, and 8 features for collocations.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML