XML Viewer - p98-2174

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2174_metho.xml
Size: 16,526 bytes
Last Modified: 2025-10-06 14:15:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2174">
  <Title>Practical Glossing by Prioritised Tiling</Title>
  <Section position="3" start_page="1061" end_page="1064" type="metho">
    <SectionTitle>
2 A Basic Model of a Glosser
</SectionTitle>
    <Paragraph position="0"> To gloss a text, we first segment it into sentences and use the POS tag probabilities assigned by a bigram tagger to order the results of morphological analysis. We obtain a complete tag probability distribution by using the Forwards-Backwards algorithm (see Chamiak, 1993) and eliminate only those tags whose probability falls below a certain threshold. Each morphological analysis compatible with one of the remaining tags is passed on to the next phase, together with its associated tag probabilities.</Paragraph>
    <Paragraph position="1"> The next phase identifies source words and collocations by matching them against key descriptors, which are variable length, possibly discontinuous, word or morpheme n-grams. A key descriptor is written: WI_RI &lt;d1&gt; W2_R2 &lt;d2&gt; ... &lt;dn-1&gt; Wr~__Rn where Wi_Ri means a word W~ with morpho-syntactic restrictions R~, and W~_R~ &lt;d~&gt; W~/I_Ri+I means W~&lt;_R~+~ must occur within di words to the right of W~Ri. For example, a key descriptor intended to match the collocation in a fragment like a procedure used by many researchers for describing the effects ... might be: procedure_N &lt;5&gt; for_PREP &lt;i&gt; +ing_V0</Paragraph>
    <Section position="1" start_page="1061" end_page="1061" type="sub_section">
      <SectionTitle>
2.1 Collocations and Key Descriptors
</SectionTitle>
      <Paragraph position="0"> We posit the existence of a collocation whenever two or more words or morphemes occur in a fixed syntactic relationship more frequently than would be expected by chance, and which are ideally translated together.</Paragraph>
      <Paragraph position="1"> * refining morpho-syntactic restrictions within the limitations of our current architecture, * using a very thorough dictionary of such collocations, and * prioritising key descriptors and using their elements as consumable resources, we find that the application of key descriptors gives a satisfactory approximation to plausible dependency structures.</Paragraph>
      <Paragraph position="2"> Two major carriers of syntactic dependency information in language are category/word-order and closed class elements. Our notion of collocation embraces the full array of closed-class elements that may be associated with a word in a particular dependency structure. This includes governed prepositions and adverbial particles, light verbs, infinitival markers and bound elements such as participial, tense and case affixes. The morphological analysis phase recognises the component structure of complex words and splits them into resources that may be consumed independently.</Paragraph>
      <Paragraph position="3"> Those aspects of dependency structure that are not signalled collocationally are often recognisable from particular category sequences and thus can be detected by an n-gram tagger.</Paragraph>
      <Paragraph position="4"> For instance, in English, transitivity is not marked by case or adposition, but by the immediate adjacency of predicate and noun phrase. By distinguishing transitive and intransitive verb tags, we provide further constraints to narrow the range of dependency structures.</Paragraph>
    </Section>
    <Section position="2" start_page="1061" end_page="1062" type="sub_section">
      <SectionTitle>
2.2 A Probabilistic Characterisation of
Collocation
</SectionTitle>
      <Paragraph position="0"> As a linguistic representation of collocations, key descriptors are clearly inadequate. A more correct representation would characterise the stretches spanned by the &lt;di&gt; as being of certain categories, or better, that the Wi form a connected piece of dependency representation.</Paragraph>
      <Paragraph position="1"> However, by: * expanding the notion of collocation to include a variety of closed-class morphemes, Key descriptors require prioritisation for the tiling phase. In order to effect this, we associate a probabilistic ranking function, fkd, with each key descriptor kd.</Paragraph>
      <Paragraph position="2"> Consider a collocation such as an English transitive phrasal verb, e.g. make up. We may collect all the instances where the component words occur in a sentence in this order with appropriate constraints. By classifying each as a positive or negative instance of this collocation  (in any sense), we can estimate a probability distribution f~,k,_vr&lt;~&gt;,e_aov(d) over the number of words, d, separating the elements of this collocation. Suppose then that the tagger has assigned tag probability distributions p~ and p~ to the two elements separated by d words in a text fragment, s. The probability that the key descriptor make VT &lt;d&gt; up ADV correctly matches s is given by:</Paragraph>
      <Paragraph position="4"> and thus increases as a proportion of the total.</Paragraph>
      <Paragraph position="5"> The fall in true instances is accentuated by the tendency for languages to order dependent phrases with the smallest ones nearest to the head 2, and is thus most marked in the phrasal verb case.</Paragraph>
      <Paragraph position="6"> As the number of elements in the equivalence goes up, so does the dimensionality of the frequency distribution. While the multiplied tag probabilities must decrease, the f values increase more, since the corpus evidence tells us that a match comprising more elements is nearly always the correct one.</Paragraph>
      <Paragraph position="8"> A typical graph off for the phrasal verb case is depicted in Figure 2. In such cases, we observe that the probability falls slowly over the space of a few words and then sharply at a given d. In other cases, the slope is gentler, but for the vast majority of collocations it decreases monotonically.</Paragraph>
    </Section>
    <Section position="3" start_page="1062" end_page="1062" type="sub_section">
      <SectionTitle>
Verb Particle Collocation
</SectionTitle>
      <Paragraph position="0"> The overall downward trend in f can be attributed to the interaction of two factors. On the one hand, the total number of true instances follows the distribution of length of phrases that may intervene (in the case of make up, noun phrases), i.e. it falls with increasing separation.</Paragraph>
      <Paragraph position="1"> On the other, the absolute number of false instances remains relatively constant as d varies, In section 3.3, we show how we heuristically approximate the various features off.</Paragraph>
      <Paragraph position="2">  We prioritise key descriptors to reflect their appropriateness. We then use this ordering to tile the source sentence with a consistent set of key descriptors, and hence their translations. The following sections describe the algorithm.</Paragraph>
    </Section>
    <Section position="4" start_page="1062" end_page="1062" type="sub_section">
      <SectionTitle>
3.1 General Algorithm
</SectionTitle>
      <Paragraph position="0"> The bilingual equivalences are treated as a simple &amp;quot;one-shot&amp;quot; production system, which annotates a source analysis with all of the possible translations. The tiling algorithm selects the best of these translations by treating bilingual equivalences as consumers competing for a resource (the right to use a word as part of a translation). In order to make the system efficient, we avoid a global view of linguistic structure. Instead, we assume that every equivalence carries enough information with it to decide whether it has the right to lock (claim) a resource. Competing consumers are simply compared in order to decide which has priority.</Paragraph>
      <Paragraph position="1"> To support this algorithm, it is necessary to associate with every translation a justification the source items from which the target item was derived.</Paragraph>
      <Paragraph position="2"> 2 This observation has been extensively explored (in a phrase structure framework) by Hawkins (1994).</Paragraph>
      <Paragraph position="4"> sort consumers according to priority_fn the words from which the equivalence was derived have the words been claimed by a bilingual equivalence? mark the words as consumed  order and progressively lock the available resources. At the end of this process, the bilingual equivalences that have successfully locked resources comprise the fringe.</Paragraph>
    </Section>
    <Section position="5" start_page="1062" end_page="1062" type="sub_section">
      <SectionTitle>
3.2 Complexity
</SectionTitle>
      <Paragraph position="0"> We index each bilingual equivalence by choosing the least frequent source word as a key. We retrieve all bilingual equivalences indexed by all the words in a sentence. Retrieval on each key is more or less constant in time. The total number of equivalences retrieved is proportional to the sentence length, n, and their individual applications are constant in time. Thus, the complexity of the rule application phase is order n. The final phase (the algorithm of Figure 3) is fundamentally a sorting algorithm. Since each phase is independent, the overall complexity is bounded to that of sorting, order n log n.</Paragraph>
      <Paragraph position="1"> This algorithm does not guarantee to fully tile the input sentence. If full filing were desired, a tractable solution is to guarantee that every word has at least one bilingual equivalence with a single word key descriptor. However, as will be apparent from Figure 1, glossing the commonest and most ambiguous words would obscure the clarity of the gloss and reduce its precision.</Paragraph>
      <Paragraph position="2"> The algorithm as presented operates on source language words in their entirety. Morphological analysis introduces a further complexity by splitting a word into component morphemes, each of which can be considered a resource. The algorithm can be adapted to handle this by ensuring that a key descriptor locks a reading as well as the component morphemes. Once a reading is locked, only morphemes within that reading can be consumed.</Paragraph>
    </Section>
    <Section position="6" start_page="1062" end_page="1064" type="sub_section">
      <SectionTitle>
3.3 Prioritising Equivalences
</SectionTitle>
      <Paragraph position="0"> If the probabilistic ranking function, f, were elicited by means of corpus evidence, the prioritisation of equivalences would fall out naturally as the solutions to equation 1. In this section, we show how a sequence of simple heuristics can approximate the behaviour of the equation.</Paragraph>
      <Paragraph position="1"> We first constrain equivalences to apply only over a limited distance (the search radius),  which we currently assume is the same for all discontinuous key descriptors. This corresponds approximately to the steep fall in the cases illustrated in Figure 2.</Paragraph>
      <Paragraph position="2"> After this, we sort the equivalences that have applied according to the following criteria: Reading priority orders equivalences which differ only in the categories they assign to the same words. For instance, in the fragment the  way to London, the key descriptor way__N &lt; 1 &gt; to_PREP (= road to) will be preferred over way_N &lt;1&gt; to_TO (= method of) since the probability of the latter POS for to will be lower. 1. baggability 2. compactness 3. reading 4. rightmostness 5. frequency priority  Baggability is the number of source words consumed by an equivalence. For instance, in the fragment ... make up for lost time .... we prefer make up for (= compensate) over make up (= reconcile, apply cosmetics, etc). We indicated in section 2.2 that baggability is generally correct.</Paragraph>
      <Paragraph position="3"> However, baggability incorrectly models all values of fin n-dimensional space as higher than any value in n-1 dimensional space. In a phrase like formula milk for crying babies, baggability will prefer formula for ... ing to formula milk. Compactness prefers collocations that span a smaller number of words. Consider the fragment ...get something to eat... Assume something to and get to are collocations. The span of something to is 2 words and the span of get to is 3. Given that their baggabflity is identical, we prefer the most compact, i.e. the one with the least span. In this case, we correctly prefer something to, though we will go wrong in the case of get someone to eat. Compactness models the overall downward trend off.</Paragraph>
      <Paragraph position="4"> Reading priority modds the tagger probabilities of equation 1. Of course, placing this here in the ordering means that tagger probabilities never override the contribution of f. There are many cases where this is not accurate, but its effect is mitigated by the use of a threshold for tag probabilities - very unlikely readings are pruned and therefore unavailable to the key descriptor matching process.</Paragraph>
      <Paragraph position="5"> Rightmostness describes how far to the right an expression occurs in the sentence. All other criteria being equal, we prefer the rightmost expression on the grounds that English tends to be right-branching.</Paragraph>
      <Paragraph position="6"> Frequency priority picks out a single equivalence from those with the same key descriptor, which is intended to represent its most frequent sense, or at least its most general translation.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1064" end_page="1064" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The above algorithm is implemented in the SID system for glossing English into Japanese a. A large dictionary from an existing MT system was used as the basis for our dictionary, which comprises about 200k distinct key descriptors keying about 400k translations. SID reaches a peak glossing speed of about 12,000 words per minute on a 200 MHz Pentium Pro.</Paragraph>
    <Paragraph position="1"> To evaluate SID we compared its output with a 1 million word dependency-parsed corpus (based on the Penn TreeB ank) and rated as correct any collocation which corresponded to a connected piece of dependency structure with matching tags. We added other correctness criteria to cope with those cases where a collocate is not dependency-connected in our corpus, such as a subject-main verb collocate separated by an auxiliary (a rally was held), or a discontinuous adjective phrase (an interesting man to know).</Paragraph>
    <Paragraph position="2"> Correctness is somewhat over-estimated in that a dependent preposition, for example, may not have the intended collocational meaning (it marks an adjunct rather than an argument), but</Paragraph>
  </Section>
  <Section position="5" start_page="1064" end_page="1065" type="metho">
    <SectionTitle>
3 Available in Japan as part of Sharp's Power E/J
</SectionTitle>
    <Paragraph position="0"> translation package on CD-ROM for Windows (r) 95.</Paragraph>
    <Paragraph position="1"> A trial version is available for download at http://www.sharp.co.jp/sc/excite/soft_map/ej-a.htm  this appears to be more than offset by tag mismatch cases which might be significant but are not in many particular cases - e.g. Grand Jury where Grand may be tagged ADJ by SID but NP in Penn, or passed the bill on to the House, where on may be tagged ADV by SID but IN (= preposition) in Penn.</Paragraph>
    <Paragraph position="2"> To obtain a baseline recall figure we ran SID over the corpus with a much lower tag probability threshold and much higher search radius 4, and counted the total number of correct collocations detected anywhere amongst the alternatives.</Paragraph>
    <Paragraph position="3"> SID detected a total of c. 150k collocations with its parameters set to their values in the released version 5, of which we judged 110k correct for an overall precision of 72%, which rises to 82% for fringe elements. Overall recall was 98% (75% for the fringe). These figures indicate that the user would have to consult the alternatives for nearly a fifth of collocations (more if we consider sense ambiguities), but would fail to find the right translation in only 2% of cases. Preliminary inspection of the evaluation results on a collocation by collocation basis reveals large numbers of incorrect key descriptors which could be eliminated, adjusted or further constrained to improve precision with little loss of recall. This leads us to believe that a fringe precision figure of 90% or so might represent the achievable limit of accuracy using our current technology.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML