File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-2038_metho.xml

Size: 18,038 bytes

Last Modified: 2025-10-06 14:07:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2038">
  <Title>A New Algorithm for the Alignment of Phonetic Sequences</Title>
  <Section position="3" start_page="0" end_page="289" type="metho">
    <SectionTitle>
2 Comparing Phones
</SectionTitle>
    <Paragraph position="0"> To align phonetic sequences, we first need a function for calculating the distance between individual phones. The numerical value assigned by the function to a pair of segments is referred to as the cost, or penalty, of substitution. The function is often extended to cover pairs consisting of a segment and the null character, which correspond to the opera- null lions of insertion and deletion (also called indels).</Paragraph>
    <Paragraph position="1"> A distance function that satisfies the following axioms is called a metric:  1. Va, b : d(a, b) &gt;_ 0 (nonnegative property) 2. Va, b : d(a,b) = 0 C/~ a = b (zero property) 3. Va, b : d(a,b) = d(b,a) (symmetry) 4. Va, b, c : d(a, b) + d(b, c) &gt; d(a, c) (triangle inequality) null</Paragraph>
    <Section position="1" start_page="288" end_page="289" type="sub_section">
      <SectionTitle>
2.1 Covington's Distance Function vs.
Feature-Based Metrics
</SectionTitle>
      <Paragraph position="0"> Covington (1996), for his cognate alignment algorithm, constructed a special distance function. It was developed by trial and error on a test set of 82 cognate pairs from various related languages. The distance function is very simple; it uses no phonological features and distinguishes only three types of segments: consonants, vowels, and glides. Many important characteristics of sounds, such as place or manner of articulation, are ignored. For example, both yacht and will are treated identically as a glidevowel-consonant sequence. The function's values for substitutions are listed in the &amp;quot;penalty&amp;quot; column in Table 2. The penalty for an indel is 40 if it is preceded by another indel, and 50 otherwise. Covington (1998) acknowledges that his distance function is &amp;quot;just a stand-in for a more sophisticated, perhaps feature-based, system&amp;quot;. 1 Both Gildea and Jurafsky (1996) and Nerbonne and Heeringa (1997) use distance functions based on binary features. Such functions have the ability to distinguish a large number of different phones.</Paragraph>
      <Paragraph position="1"> The underlying assumption is that the number of binary features by which two given sounds differ is ICovington's distance function is not a metric. The zero property is not satisfied because the function's value for two identical vowels is greater than zero. Also, the triangle inequality does not hold in all cases; for example: p(e,i) = 30 and p(i,y) = 10, but p(e,y) = 100, where p(x,y) is the penalty for aligning \[xl with lyl.</Paragraph>
      <Paragraph position="3"> alignment algorithms.</Paragraph>
      <Paragraph position="4"> a good indication of their proximity. Phonetic segments are represented by binary vectors in which every entry stands for a single articulatory feature. The penalty for a substitution is defined as the Hamming distance between two feature vectors. The penalty for indels is established more or less arbitrarily. 2 A distance function defined in such a way satisfies all metric axioms.</Paragraph>
      <Paragraph position="5"> It is interesting to compare the values of Covington's distance function with the average Hamming distances produced by a feature-based metric. Since neither Gildea and Jurafsky (1996) nor Nerbonne and Heeringa (1997) present their feature vectors in sufficient detail to perform the calculations, I adopted a fairly standard set of 17 binary features from Hartman (1981). 3 The average feature distances between pairs of segments corresponding to every clause in Covington's distance function are given in Table 2, next to Covington's &amp;quot;penalties&amp;quot;. By definition, the Hamming distance between identical segments is zero. The distance between the segments covered by clause #3 is also constant and equal to one (the feature in question being \[long\] or \[syllabic\]). The remaining average feature distances were calculated using a set of most frequent phonemes represented by 25 letters of the Latin alphabet (all but q). In order to facilitate comparison, the rightmost column of Table 2 contains the average distances interpolated between the minimum and the maximum value of Covington's dislance function. The very high correlation (0.998) between Covington's penalties and the average distances demonstrates that feature-based phonology provides a theoretical basis for Covington's manually constructed distance function.</Paragraph>
      <Paragraph position="6"> 2Nerbonne and Heeringa (1997) fix the penalty for indels as half the average of the values of all substitutions. Gildea and Jurafsky (1996) set it at one fourth of the maximum substitution cost.</Paragraph>
      <Paragraph position="7"> 3In order to handle all the phones in Covington's data set, two features were added: \[tense\] and \[spread glottis\].  distance function (columns 4 and 5).</Paragraph>
    </Section>
    <Section position="2" start_page="289" end_page="289" type="sub_section">
      <SectionTitle>
2.2 Binary vs. Multivalued Features
</SectionTitle>
      <Paragraph position="0"> Although binary features are elegant and widely used, they might not be optimal for phonetic alignment. Their primary motivation is to classify phonological oppositions rather than to reflect the phonetic characteristics of sounds. In a strictly binary system, sounds that are similar often differ in a disproportionately large number of features. It can be argued that allowing features to have several possible values results in a more natural and phoneticaUy adequate system. For example, there are many possible places of articulation, which form a nearcontinuum ranging from \[labial\] to \[glottal\].</Paragraph>
      <Paragraph position="1"> Ladefoged (1995) devised a phonetically-based multivalued feature system. This system has been adapted by Connolly (1997) and implemented by Somers (1998). It contains about 20 features with values between 0 and 1. Some of them can take as many as ten different values (e.g. \[place\]), while others are basically binary oppositions (e.g.</Paragraph>
      <Paragraph position="2"> \[nasal\]). Table 3 contains examples of multivalued features.</Paragraph>
      <Paragraph position="3"> The main problem with both Somers's and Connolly's approaches is that they do not differentiate the weights, or saliences, that express the relative importance of individual features. For example, they assign the same salience to the feature \[place\] as to the feature \[aspiration\], which results in a smaller distance between \[p\] and \[k\] than between \[p\] and \[phi. I found that in order to avoid such incongruous outcomes, the salience values need to be carefully differentiated; specifically, the features \[place\] and \[manner\] should be assigned significantly higher saliences than other features (the actual values used in my algorithm are given in Table 4). Nerbonne and Heeringa (1997) experimented with weighting each feature by information gain but found it had an adverse effect on the quality of the alignments. The question of how to derive salience values in a principled manner is still open.</Paragraph>
    </Section>
    <Section position="3" start_page="289" end_page="289" type="sub_section">
      <SectionTitle>
2.3 Similarity vs. Distance
</SectionTitle>
      <Paragraph position="0"> Although all four algorithms listed in Table 1 measure relatedness between phones by means of a distance function, such an approach does not seem to be the best for dealing with phonetic units. The fact that Covington's distance function is not a metric is not an accidental oversight; rather, it reflects certain inherent characteristics of phones. Since vowels are in general more volatile than consonants, the preference for matching identical consonants over identical vowels is justified. This insight cannot be expressed by a metric, which, by definition, assigns a zero distance to all identical pairs of segments. Nor is it certain that the triangle inequality should hold for phonetic segments. A phone that has two different places of articulation, such as labio-velar \[w\], can be close to two phones that are distant from each other, such as labial \[b\] and velar \[g\].</Paragraph>
      <Paragraph position="1"> In my algorithm, below, I employ an alternative approach to comparing segments, which is based on the notion of similarity. A similarity scoring scheme assigns large positive scores to pairs of related segments; large negative scores to pairs of dissimilar segments; and small negative scores to indels. The optimal alignment is the one that maximizes the overall score. Under the similarity approach, the score obtained by two identical segments does not have to be constant. Another important advantage of the similarity approach is the possibility of performing local alignment of phonetic sequences, which is discussed in the following section.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="289" end_page="291" type="metho">
    <SectionTitle>
3 Tree Search vs. Dynamic Programming
</SectionTitle>
    <Paragraph position="0"> Once an appropriate function for measuring similarity between pairs of segments has been designed,  we need an algorithm for finding the optimal alignment of phonetic sequences. While the DP algorithm, which operates in quadratic time, seems to be optimal for the task, both Somers and Covington opt for exhaustive search strategies. In my opinion, this is unwarranted.</Paragraph>
    <Paragraph position="1"> Somers's algorithm is unusual because the selected alignment is not necessarily the one that minimizes the sum of distances between individual segments. Instead, it recursively selects the most similar segments, or &amp;quot;anchor points&amp;quot;, in the sequences being compared. Such an approach has a serious flaw. Suppose that the sequences to be aligned are tewos and divut. Even though the corresponding segments are slightly different, the alignment is straightforward. However, an algorithm that looks for the best matching segments first, will erroneously align the two t's. Because of its recursive nature, the algorithm has no chance of recovering from such an error. 4  settings.</Paragraph>
    <Paragraph position="2"> Covington, who uses a straightforward depth-first search to find the optimal alignment, provides the following arguments for eschewing the DP algorithm. null First, the strings being aligned are relatively short, so the efficiency of dynamic programming on long strings is not needed. Second, dynamic programming normally gives only one alignment for each pair of strings, but comparative reconstruction may need the n best alternatives, or all that meet some criterion.</Paragraph>
    <Paragraph position="3"> Third, the tree search algorithm lends itself to modification for special handling of metathesis or assimilation. 5 (Covington, 1996) The efficiency of the algorithm might not be relevant in the simple case of comparing two words, but if the algorithm is to be of practical use, it will have to operate on large bilingual wordlists. Moreover, combining the alignment algorithm with some sort of strategy for identifying cognates on the basis of phonetic similarity is likely to require comparing thousands of words against one another. Having a polynomially bound algorithm in the core of such a system is crucial. In any case, since the DP algorithm involves neither significantly larger overhead nor greater programming effort, there is no reason to avoid using it even for relatively small data sets. The DP algorithm is also sufficiently flexible to accommodate most of the required extensions without compromising its polynomial complexity. A simple modification will produce all alignments that are within e of the optimal distance (Myers, 1995). By applying methods from the operations research literature (Fox, 1973), the algorithm can be adapted to deliver the n best solutions. Moreover, the basic set of editing operations (substitutions and indels)  can be extended to include both transpositions of adjacent segments (metathesis) (Lowrance and Wagner, 1975) and compressions and expansions (Oommen, 1995). Other extensions of the DP algorithm that are applicable to the problem of phonetic alignment include affine gap scores and local comparison. null The motivation for generalized gap scores arises from the fact that in diachronic phonology not only individual segments but also entire morphemes and syllables are sometimes deleted. In order to take this fact into account, the penalty for a gap can be calculated as a function of its length, rather than as a simple sum of individual deletions. One solution is to use an affine function of the form gap(x) = r + sx, where r is the penalty for the introduction of a gap, and s is the penalty for each symbol in the gap. Gotoh (1982) describes a method for incorporating affine gap scores into the DP alignment algorithm. Incidentally, Covington's penalties for indels can be expressed by an affine gap function with r -- 10 and s= 40.</Paragraph>
    <Paragraph position="4"> Local comparison (Smith and Waterman, 1981) is made possible by using both positive and negative similarity scores. In local, as opposed to global, comparison, only similar subsequences are matched, rather than entire sequences. This often has the beneficial effect of separating inflectional and derivational affixes from the roots. Such affixes tend to make finding the proper alignment more difficult. It would be unreasonable to expect affixes to be stripped before applying the algorithm to the data, because one of the very reasons to use an automatic aligner is to avoid analyzing every word individually. null</Paragraph>
  </Section>
  <Section position="5" start_page="291" end_page="292" type="metho">
    <SectionTitle>
4 The algorithm
</SectionTitle>
    <Paragraph position="0"> Many of the ideas discussed in previous sections have been incorporated into the new algorithm for the alignment of phonetic sequences (ALINE). Similarity rather than distance is used to determine a set of best local alignments that fall within E of the optimal alignment. 6 The set of operations contains insertions/deletions, substitutions, and expansions/compressions. Multivalued features are employed to calculate similarity of phonetic segments.</Paragraph>
    <Paragraph position="1"> Affine gaps were found to make little difference when local comparison is used and they were subse6Global and serniglobal comparison can also be used. In a semiglobal comparison, the leading and trailing indels are assigned a score of zero.</Paragraph>
    <Paragraph position="2"> algorithm Alignment input: phonetic sequences x and y output: alignment of x and y define S(i,j) = _oo when i &lt; 0 orj &lt; 0 for i +-- 0 to Ixl do S(i, 0) ~ 0 forj ~ 0 to lYl do s(o, j) ~ o for i +-- 1 to Ix\[ do for j ~ 1 to lYl do S(i, j) ~-- max(</Paragraph>
    <Paragraph position="4"> for i +-- 1 to Ix\[ do for j ~ 1 to \[Y\[ do</Paragraph>
    <Paragraph position="6"> ment of two phonetic sequences.</Paragraph>
    <Paragraph position="7"> quently removed from ALINE. 7 The algorithm has been implemented in C++ and will be made available in the near future.</Paragraph>
    <Paragraph position="8"> Figure 1 contains the main components of the algorithm. First, the DP approach is applied to compute the similarity matrix S using the G scoring functions. The optimal score is the maximum entry in the whole matrix. A recursive procedure Retrieve (Figure 2) is called on every matrix entry that exceeds the threshold score T. The alignments are retrieved by traversing the matrix until a zero entry is encountered. The scoring functions for indels, substitutions and expansions are defined in Figure 3. Cskip, Csub, and Cexp are the maximum scores for indels, substitutions, and expansions, respectively. Cvwt determines the relative weight of consonants and vowels. The default values are Cskip = -10, Csub = 35, Cexp = 45 and Cvwt = 10. The diff function returns the difference between segments p and q for a given feature f. Set Rv contains features relevant for comparing two vowels: Syllabic, Nasal, Retroflex, High, Back, Round, and Long. Set 7They may be necessary, however, when dealing with languages that are rich in infixes.</Paragraph>
    <Paragraph position="9">  procedure Retrieve(i, j, s ) if S(i, j) = 0 then print(Out) print(&amp;quot;alignment score is s&amp;quot;) else if S(i- l, j-1) + Gsub(xi,Yj) + s ~ Tthen push(Out, &amp;quot;align Xi with yj&amp;quot;) Retrieve(i- 1, j- 1, s + Osub(Xi ,y j))</Paragraph>
    <Paragraph position="11"> push(Out, &amp;quot;align null with yj&amp;quot;) Retrieve(i, j-1, s + Gskip(yj) ) pop(Out) if S(i- 1, j- 2) + CSexp (xi, y j- lYj) + s _&gt; T then push(Out, &amp;quot;align xi with Yj-lYj&amp;quot;)</Paragraph>
    <Paragraph position="13"> from the similarity matrix.</Paragraph>
    <Paragraph position="14"> Rc contains features for comparing other segments: Syllabic, Manner, Voice, Nasal, Retroflex, Lateral, Aspirated, and Place. When dealing with doublearticulation consonantal segments, only the nearest places of articulation are used. For a more detailed description of the algorithm see (Kondrak, 1999).</Paragraph>
    <Paragraph position="15"> ALINE represents phonetic segments as vectors of feature values. Table 4 shows the features that are currently used by ALINE. Feature values are encoded as floating-point numbers in the range \[0.0, 1.0\]. The numerical values of four principal features are listed in Table 3. The numbers are based on the measurements performed by Ladefoged (1995). The remaining features have exactly two possible values, 0.0 and 1.0. A special feature 'Double', which has the same values as 'Place', indicates the second place of articulation. Thanks to its continuous nature, the system of features and their values can easily be adjusted and augmented.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML