File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-1241_evalu.xml

Size: 4,746 bytes

Last Modified: 2025-10-06 14:00:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1241">
  <Title>Reconciliation of Unsupervised Clustering, Segmentation and Cohesion</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5. Segmentation and Grammar
</SectionTitle>
    <Paragraph position="0"> Both Powers (1989) and Powers (1994) depend for their hierarchical organization on a fuzzy approach to segments. At the word level, Powers (I 989) allowed four hypotheses: a word should group with the word to the left or the word to the right, or with a phrase to the left of a phrase to the fight, where a phrase has previously been recognized as a candidate group. Hypotheses were rated according to their usage, and those involved in the most highly rated overall parse were reinforced. Powers (1992) allowed one or two (or in some experiments three) given or induced units to operate as a putative unit for the purposes of distributional analysis. Apart from thresholding (to eliminate noise, and to make it amenable to the small computer available), frequency information was ignored and each context was associated with a coset of(one to three) units on either side. Classes were formed by a technique which tunas out to be clustering using a Hamming distance of 2 (or 3 in some experiments), in which classes can be merged (union) and the eornmon coset determined (intersection).</Paragraph>
    <Paragraph position="1"> The size and coverage of the individual left and fight cosets and their union and intersection gave eight measures of the strength of a class, and in all eases identified the vowels as the strongest class for the original dictionary corpus, and for most other corpora tried, with right context appearing more useful than left, eoset size being more accurate than coset coverage, union size being more reliable than intersection size.</Paragraph>
    <Paragraph position="2"> Note that Powers (1997a) generalizes the approach and considers a multitude of different clustering metrics and methods, introducing a pair of goodness measures which allow a more principled approach to closing and evaluating clusters (rather than closing at a specific cluster, you close when the goodness measure reaches its first local maximum).</Paragraph>
    <Paragraph position="3"> In the Powers (1992) experiments, classes were added as new units and the process was repeated. The fuzzy variable size candidate units for the next level meant that hyperelasses of context-free rules were learned.</Paragraph>
    <Paragraph position="4"> However the grammar led to high levels of ambiguity using non-deterministic parsing, and the presented hierarchy is based arbitrarily on a simple greedy approach, but (for this reason) performance as a recognizer/parser was not evaluated.</Paragraph>
    <Paragraph position="5"> Though in this work phonologically, morphologically and grammatically meaningful classes and structure were formed, up to phrase/clause level, no interpretation of the structures or classes was offered, and no attempt was made to discover or propose cohesive constraints or semantic relationships. At the same time however,  parser which uses precisely the kind of morphological and grammatical classes which are thro:wn, up by the self-organizing and clustering experiments, and have started to address how one develop meaningful statistics for a true grammar learning system without any preconceived notions of what the correct parse/phrase structure is (if any). In particular Powers (1997b) performed experiments in the context of grammat checking application, using automatic segmentation techniques based on those of Harris (I 960) and similar to those used by Brent (1997), but combined with context-conditioned probabilities which were used to decide between confusable words. The same technique has been applied in a Loebner Prize entry by Bastin and Cordier (1997).</Paragraph>
    <Paragraph position="6"> This gives us two competing approaches to segmentation. In the first, segmentation is a side effect of the fuzzification of input units during classification (the segments chosen are those which give the best classification according to some metric). Incidentally, Powers (1992) also reports work in which hyphenation points were marked, thus introducing an element of supervision, but it did not improve performance (which agained suffered from ambiguity and thus didn't produce definite results, being non-probabilistie, although a greedy algorithm performed quite reasonably). The second (Harris) approach examines the conditional information or perplexity for each possible prefix/suffix to determine likely segmentation points -- which is expected to show a local maximum in the perplexity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML