File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/87/e87-1007_concl.xml
Size: 26,269 bytes
Last Modified: 2025-10-06 13:56:09
<?xml version="1.0" standalone="yes"?> <Paper uid="E87-1007"> <Title>in Newsletter of the International Computer Archive of</Title> <Section position="5" start_page="38" end_page="41" type="concl"> <SectionTitle> LOB and CLAWS </SectionTitle> <Paragraph position="0"> One such alternative word-tag disambiguation mechanism was developed for the analysis of the Lancaster-Oslo/Bergen (LOB) Corpus. The LOB Corpus is a million-word collection of English text samples, used for experimentation and inspiration in computational linguistics and related studies (see for example \[Leech et al 83a\], \[Atwell forthcoming b\]). CLAWS, the Constituent-Likelihood Automatic Word-tagging System (\[Leech et al 83b\], \[Atwell et al 84\]), was developed to annotate the raw text with basic granmlatical information, to make it more useful for linguistic research; CLAWS did not attempt a full parse of each sentence, but simply marked each word with a grammatical code from a set of 133 WORDTAGS. The word-tagged LOB Corpus is now available to other researchers (see \[Johansson et ai 86\]).</Paragraph> <Paragraph position="1"> CLAWS was originally implemented in Pascal, but it is currently being recoded in C and in POPLOG Prolog.</Paragraph> <Paragraph position="2"> CLAWS can deal with Unrestricted English text input including &quot;noisy&quot; or ill-formed sentences, because it is based on Constituent Likelihood Grammar, a novel probabilistic approach to grammatical description and analysis described in \[Atwell 83\]. A Constituent Likelihood Grammar is used to calculate likelihoods for competing putative analysis; not only does this tell us which is the 'best' analysis, but it also shows how 'good' this analysis is. For assigning word-tags to words, a simple Markovian model can be used instead of a probabilistic rewrite-role system (such as a prohabilistic context-free grammar); this greatly simplifies processing.</Paragraph> <Paragraph position="3"> CLAWS first uses a dictionary, sufflxlist and other default routines to assign a set of putative tags to each word; then, for each sequence of ambiguously-tagged words, the likelihood of every possible combination or 'chain' of tags is evaluated, and the best chain is chosen. The likelihood of each chain of tags is evaluated as a product of all the 'links' (tag-pair-likelihoods) in the sequence; tag-pair likelihood is a function of the frequency of that sequence of two tags in a sample of tagged text, compared to the frequency of each of the two tags individually.</Paragraph> <Paragraph position="4"> An important advantage of this simple Markovian model is that word-tagging is done without parsing: there is no need to work out higher-level constituent-structure trees before assigning unambiguous word-tags to words. Despite its simplicity, this technique is surprisingly robust and successful: CLAWS has been used to analyse a wide variety of Unrestricted English, including extracts form newspapers, novels, diaries, learned journals, E.E.C. regulations, etc., with a consistent accuracy of c96%. Although the system did not have parse trees available in deciding word-classes, only cA% of words in the LOB Corpus had to have their assigned wordtag corrected by manual editing (see \[Atwell 81, 82\]).</Paragraph> <Paragraph position="5"> Another important advantage of the simple Markovian model is that it is relatively straightforward to transfer the model from English to other Natural Languages. The basic statistical model remains, only the dictionary and Markovian tag-pair frequency table need to be replaced. We are experimenting with the possibility of (partially) automating even this process - see \[Atweli 86a, 86b, forthcoming c\], \[Atwell and Drakos 87\].</Paragraph> <Paragraph position="6"> The general Constituent Likelihood approach to grammatical analysis, and CLAWS in particular, can be used to analyse text including ill-formed syntax. More importantly, it can also be adapted to flag syntactic errors in texts; unlike other techniques for error-detection, these modifications of CLAWS lead to only limited increases in processing requirements. In fact, various different types of modification are possible, yielding varying degrees of success in error-detection. Several different techniques have been explored.</Paragraph> <Section position="1" start_page="38" end_page="38" type="sub_section"> <SectionTitle> Error Likelihoods </SectionTitle> <Paragraph position="0"> A very simple adaptation of CLAWS (simple in theory at least) is to augment the tag*pair frequency table with a tag-pair error likelihood table. As in the original system, CLAWS uses the tag-pair frequency table and the Constituent Likelihood formulae to find the best word-tag for each word. Having found the best tag for each word, every cooccurring pair of tags in the analysis is re-assessed: the ERRO~_-LIKELIHOOD of each tag-pair is checked. Errorlikelihood is a measure of how frequently a given tag-pair occurs in an error as compared to how frequently it occurs in valid text. For example, if the user types ... my farther was ...</Paragraph> <Paragraph position="1"> CLAWS will yield the word-tag analysis ... PP$ RBR BEDZ ...</Paragraph> <Paragraph position="2"> which means <possessive personal pronoun>, <comparative adverb>, <past singular BE>. This analysis is then passed to the checking module, which uses tag-pair frequency statistics extracted from copious samples of error-full texts. These should show that tag-pairs <PP$ RBR> and <RBR BEDZ> often occur where there is a typing error, and rarely occur in grammatically correct constructs; so an error can be flagged at the corresponding point in the text.</Paragraph> <Paragraph position="3"> Although the adjustment to the model is theoretically simple, the tag-pair error likelihood frequency figures required could only be gleaned by human analysis of huge amounts of error-full text. Our initial efforts to collect an Error Corpus convinced us that this approach was impractical because of the time and effort required to collect the necessary data. In any case, an alternative technique which managed without a separate table of tag-pair error likelihoods turns out to be quite successful.</Paragraph> </Section> <Section position="2" start_page="38" end_page="40" type="sub_section"> <SectionTitle> Low Absolute Likelihoods </SectionTitle> <Paragraph position="0"> This alternative technique involved using CLAWS unmodified to choose the best tag for each word, as before, and then measuring ABSOLUTE LIKELIHOODS of tagpairs. Instead of a separate tag-pair error likelihood table to assess the grammaticality, the same tag-pair frequency table is used for tag-assignment and error-detection. The tag-pair frequency table gives frequencies for grammatically well-formed text, so the second module simply assumes that if a low-likelihood tag pair occurs in the input text, it indicates a grammatical error. In the example above, tag-pairs <PP$ RBR> and <RBR BEDZ> have low likelihoods (as they occur only rarely in grammatically well-formed text), so an error can be diagnosed.</Paragraph> <Paragraph position="1"> Figure 1 is a fuller example of this approach to error diagnosis. This shows the analysis of a short text; please note that the text was constructed for illustration purposes only, and the characters mentioned bear no resemblance to real living people! The text contains many mis-typed words, but these mistakes would not be detected by a conventional spelling-checker, since the error-forms happen to coincide with other legal English words; the only way that these errors can be detected is by noticing that the resultant phrases and clauses are ungrammatical. The granunarchecking program first divides the input text into words. Note that this is not entirely trivial: for example, enclitics such as I'll, won't are split into two words I / 'II, will + n't. The left-hand column in Figure I shows the sequence of words in the sample text, one word per line. The second column shows the grammatical tag chosen using the Constituent Likelihood model as best in the given context.</Paragraph> <Paragraph position="2"> The third column shows the absolute likelihood of the chosen grammatical tag; this likelihood is normalised relative to a threshold, so that values greater than one constitute &quot;acceptable&quot; grammatical analyses, whereas values less than one am indicative of unacceptably improbable grammar.</Paragraph> <Paragraph position="3"> Whenever the absolute likelihood value falls below this acceptability threshold, the flag ERROR? is output in the fourth column, to draw visual attention to the putative error. Thus, for example, the first word in the text, my, is tagged PP$ (possessive personal pronoun), and this tag has a normalisad absolute likelihood of over 15, which is acceptable; the second word, farther, is tagged RBR (comparative adverb), but this time the absolute likelihood is below one (0.264271), so the word is flagged as a putative ERROR? This technique is extremely primitive, yet appears to work fairly well. There is no longer any need to gather error-likelihoods from an Error Corpus. However, the definition of what constitutes a &quot;low&quot; likelihood is not straightforward. On the whole, there is a reasonably clear correlation between words marked ERROR? and actual mistakes, so clearly low values can be taken as diagnostic of errors, once the question of what constitutes &quot;lowness&quot; has been defined rigorously. In the example, the acceptability level is defined in terms of a simple threshold: likelihoods are normalised so values below 1.000000 are deemed too low to be acceptable. The appropriate normalisetion scaling factor was found empirically. Unfortunately, a threshold at this level would mean some minor troughs would not be flagged, e.g. clever in I stole a meat clever .... (which was tagged JJ (adjective) but should have been the noun cleaver ) has a normalised likelihood of 4.516465; tame in the gruesome tame ofEroc Attwell... (which was also tagged JJ (adjective) but should have been the noun tale ) also has a normalised likelihood of 4.516465; and the phrase won day (which should have been one day ) involves a normalised likelihood of 4.060886 (although this is, strictly speaking, associated with day rather than won, an error flag would be sufficiently close to the actual error to draw the user's attention to it). However, if we raised the threshold (or alternatively changed the normalisation function so that these normalised likelihoods are below 1.000000), then more words would be flagged, lowering the precision of error diagnosis. In some cases, error diagnosis would be &quot;blurred&quot;, since sometime-'~ words immediately before and/or after the error also have low likelihoods; for example, was in my farther vms very crawl.., has a likelihood of 1.216545.</Paragraph> <Paragraph position="4"> Worse, some error flags would appear in completely inappropriate places, with no true errors in the immediate context; for example, the exclamation mark at the end of he won't get away with this! has a likelihood of 4.185351 and so would probably be flagged as an error if the threshold were raised.</Paragraph> <Paragraph position="5"> Mother way to define a trough would be as a local minimum, that is, a point on where points immediately before and after have higher likelihood values, even a trough with a quite high value is flagged this way so long as surrounding points are even higher. This would catch clever, tame and won day mentioned above. However, strictly speaking several other words not currently flagged in Figure 1 are also local minima, for example my in perhaps my friends would ... and ~ in he ba/d at me /f \[ ... So, this definition is liable to cause a greater number of 'red herring' valid words to be erroneously flagged as putative mistakes, again leading to a worse precision.</Paragraph> <Paragraph position="6"> Once an optimal threshold or other computational definition of low likelihood has been chosen, it is a simple matter to amend the output routine to produce output in a simplified format acceptable to Word Processor users, without grammatical tags or likelihood ratings but with putative errors flagged. However, even with an optimal measure of lowness, the success rate is unlikely to be perfect. The model deliberately incorporates only rudimentary knowledge about English: a lexicon of words and their wordtags, and a tag-pair frequency matrix embodying knowledge of tag cooccurrence likelihoods.</Paragraph> <Paragraph position="7"> Certain types of error are unlikely to be detected without some further knowledge. One limited augmentation to this simple model involves the addition of error tags to the analysis procedure.</Paragraph> <Paragraph position="8"> Error-Tags A rather more sophisticated technique for taking syntactic context into account involves adding ERROR-TAGS to lexical entries. These are the tags of any similar words (where these are different from the word's own tags). In the analysis phase, the system must then choose the best tag (from error-teg(s) and 'own' tag(s)) according to syntactic context, still using the unmodified CLAWS Constituent-Likelihood model. For example, in the sentence l am very hit. an error can be diagnosed if the system works out that the tags of input word hit ( NN, VB, VBD, and VBN <singular cormnon noun>, <verb infinitive>, <verb past tense>, <verb past participle>) are all much less likely in the given context than J3 (<adjective>), known to be the tag of a similar word ( hot ). So, a rather more soph/sticated error-detection system includes knowledge not just about tags of words, but also about what alternative word-classes would be plausible if the input was an error. This information consists in an additional field in lexicon entries: each dictionary entry must hold (i) the word itself, (ii) the word's own tags, and (iii) the error-tags associated with the word. For example: Note that error-tags are marked with # to distinguish them from own tags. CLAWS then chooses the best tag for each word as usual. However, in the final output, instead of each word being marked with the chosen word-tag, words associated with an ERROR TAG are flagged as potential errors.</Paragraph> <Paragraph position="9"> To illustrate why error-tags might help in error diagnosis, notice that dense in I maid several dense in h/s ... does not have a below-threshold absolute likelihood, and so is not flagged as a putative error. An error-tag based system could calculate that the best sequence of tags (allowing error-tags) for the word sequence several dense in his ... is \[AP NNS~ IN PP$\] (<post-determiner>, <plural common noun>, <preposition>, <possessive personal pronoun>). Since NNS is an error-tag, an error is flagged. However, the simpler absolute likelihood based model does not allow for the option of choosing NNS as the tag for dense, and is forced to choose the best of the 'own' tags; this in turn causes a mistagging of /n as NNU (<abbreviated unit of measurement>, since \[JJ NNU\] (<adjective> <abbreviated unit of measurement>) is likelier than \[JJ IN\] (<adjective> <preposition>). Furthermore, \[JJ NNU\] turns out not to be an exceptionally unusual tag cooccurrence. The point of all this is that, without error-tags, the the system may mistag words immediately before or after error-words, and this mistagging may well distort the absolute likelihoods used for error diagnosis.</Paragraph> <Paragraph position="10"> This error-tag-based technique was originally proposed and illustrated in \[Atwell 83\]. The method has been tested with a small test lexicon, but we have yet to build a complete dictionary with error-tags for all words. Adding error tags to a large lexicon is a non-trivial research task; and adding error-tags to the analysis stage increases computation, since there are more tags to choose between for each word. So far, we have not found conclusive evidence that the success rate is increased significantly; this requires further investigation. Also to be more fully investigated is how to take account of other relevant factors in error diagnosis, in addition to error-tags.</Paragraph> </Section> <Section position="3" start_page="40" end_page="40" type="sub_section"> <SectionTitle> Full Cohorts </SectionTitle> <Paragraph position="0"> In theory at least, the Constituent-Likelihood method could be generslised to take account of all relevant contextual factors, not just syntactic bonding. This could be done by generating COHORTS for each input word, and then choosing the cohort-member word which fits the context best. For example, if the sentence you were very hit were input, the following cohorts would be generated: you yew ewe were where wear very vary veery hit hot hut hat (the term &quot;cohort&quot; is adapted from \[Marslen-Wilson 85\] with a slight modification of meaning). Cohorts of similar words can be discovered from the spelling-check dictionary using the same algorithm employed to suggest corrections for misspellings in current systems; these techniques are fairly well-understood (see, for example, \[Yannakoudekis and Fawthrop\], \[Veronis 87\], \[Borland 85\]). Next, each member of a cohort is assigned a relative likelihood rating, taking into account relevant factors including: i) the degree of similarity to the word actually typed (this measure would be available anyway, as it has to be calculated during cohort generation; the actual word typed gets a similarity factor of 1, and other members of the cohort get appropriate lower weights) ii) the 'degree of fit' in the given syntactic context (measured as the syntactic constituent likelihood bond between the tag(s) of each cohort member and the tag(s) of the words before and after, using the CLAWS constituent likelihood formulae); iii) the frequency of usage in general English (common words like &quot;you&quot; and &quot;very&quot; get a high weighting factor, rare words like &quot;ewe&quot;, &quot;yew&quot;, and &quot;veery&quot; get a much lower weighting; word relative frequency figures can be gleaned from statistical studies of large Corpora, such as \[Hofland and Johansson 82\], \[Francis and Kucera 82\], \[Carroll et al 71\]); iv) if a cohort member occurs in a grammatical idiom or preferred collocation with surrounding words, then its relative weighting is increased (e.g. in the context &quot;fish and ...&quot;, ch/ps gets a higher collocation weighting than chops ); collocation preferences can also be elicited from studies of large corpora using techniques such as those of \[Sinclair et</Paragraph> <Paragraph position="2"> v) domain-dependent lexical preferences should ideally be taken into account, for example in an electronics manual current should get a higher domain weighting than currant.</Paragraph> <Paragraph position="3"> All these factors are multiplied (using appropriate weightings) to yield a relative likelihood rating for each member of the cohort. The cohort-member with the highest rating is (probably) the intended word; if the word actually tylied is different, an error can be diagnosed, and furthermore a correction can be offered to the user.</Paragraph> <Paragraph position="4"> Unfortunately, although this approach may seem sensible in theory, in practice it would require a huge R&D effort to gather the statistical information needed to drive such a system, and the resulting model would be computationally complex and expensive. It would be more sensible to try to incorporate only those features which contribute significantly to increased error-detection, and ignore all other factors.</Paragraph> <Paragraph position="5"> This means we must test the existing error-detection system extensively, and analyse the failures to try to discover what additional knowledge would be useful to the system.</Paragraph> </Section> <Section position="4" start_page="40" end_page="41" type="sub_section"> <SectionTitle> Error Corpus </SectionTitle> <Paragraph position="0"> The error-likelihoud and full-cohort techniques would appear to give the best error-detection rates, but require vast computations to build a general-purpose system from scratch.</Paragraph> <Paragraph position="1"> The error-tag technique also requires a substantial research effort to build a large general-purpose lexicon. A version of the Constituent Likelihood Automatic Word-tagging System modified to use the ABSOLUTE LIKELIHOOD method of error-detection has been more extensively tested; this system cannot detect all grammatical errors, but appears to be quite successful with certain classes of errors. To test alternative prototypes, we are building up an ERROR CORPUS of texts containing errors. The LOB Corpus includes many errors which appeared in the original published texts; these are marked SIC in the text, and noted in the Manual which comes with the Corpus files, \[Johansson et al 78\]. The initial Error Coqms consisted in these errors, and it is being added to from other sources (see Acknowledgements below).</Paragraph> <Paragraph position="2"> The errors in the Error Corpus can be (manually) classified according to the kind of processing required for detection (the examples below starts with a LOB line reference number): A: non-word error-forms, where the error can be found by simple dictionary-lookup; for example, A21 115 As the news pours in from around the world, beleagared (SIC) Berlin this weekend is a city on a razor's edge.</Paragraph> <Paragraph position="3"> B: error-forms involving valid English words in an invalid grammatical context, the kind of en, or the CLAWS-based approach could be expected to dete~ (these may he due to spelling or typing or grammatical mistakes by the typist, but this is irrelevant here: the classification is according to the type of processing required by the detection program); for example E18 121 Unlike an oil refinery one cannot grumble much about the fumes, smell and industrial dirt, generally, for little comes out of the chimney except possibly invisible gasses. (SIC) C: error-forms which are valid English words, but in an abnormal grammatical/semantic context, which a CLAWStype system would not detect, but which could conceivably he caught by a very sophisticated parser, for example, breaking 'long-distance' number agreement roles as in .415 170 It is, however, reported that the tariff on textile.C/ and cars imported from the Common Market are (SIC) to be reduced by 10 per cent.</Paragraph> <Paragraph position="4"> D: lexicaily and syntactically valid error-forms which would require &quot;intelligenf' semantic analysis for detection; for example, P17 189 She did not imagine that he would pay her a visit except in Frank's interest, and when she hurried into the room where her mother was trying in vain to learn the reason of his visit, her first words were of her fiancee. (SIC) or \[(29 35 He had then sown (SIC) her up with a needle, and, after a time she had come hack to him cured and able to bear more children.</Paragraph> <Paragraph position="5"> Collection and detailed analysis of texts for this Error Corpus is still in progress at the time of writing; but one important early impression is that different sources show widely different distributions of error-classes. For example, a sample of 150 errors from three different sources shows the following distribution: i) Published (and hence manually proofread) text: A:52% B:28% C:8% D:12% ii) essays by 11- and 12-year-old children:</Paragraph> <Paragraph position="7"> Because of this great variation, precision and recall rates are also liable to vary greatly according to text source. In a production version of the system, the 'unusualness' threshold (or other measure) used to decide when to flag putative errors will be chosen by the user, so that users can optimise precision or recall. It is not clear how this kind of usercustomisation could be built into other WP text-checking systems; but it is an obvious side-benefit of a Constituent Likelihood based system.</Paragraph> <Paragraph position="8"> Conduslous The figures above indicate that a CLAWS-based grammar-checker would be paff.iculady useful to non-native English speakers; but even for this class of users, precision and recall are imperfecL The CLAWS-based system is inadequate on its own, but should properly be used as one tool amongst many; for example as an augmentation to the</Paragraph> </Section> <Section position="5" start_page="41" end_page="41" type="sub_section"> <SectionTitle> Writer's Workbench collection of text-critiquing and </SectionTitle> <Paragraph position="0"> proofreading programs, or in conjunction with other English Language Teaching tools such as a computerised ELT dictionary (such as those discussed by \[Akkerman et al 85\] or \[Atwell forthcoming a\]. Other systems for dealing with syntactically ill-formed English attempt a full grammatical parse of each input sentence, and in addition require errorrecovery routines of varying degrees of sophistication. This involves much more processing than the CLAWS-based system; and yet even these systems fall to diagnose all errors in a text. Cleady, the Constituent-Likelihood en~r-detection technique is ideally suited to applications where fast processing and relatively small computing requirements are of paramount impoff,ance, end for users who find imperfect error-detection better than none at all. I freely admit that the system has not yet been comprehensively tested on a wide variety of WP users; as with all AI research systems, a lot af work still has to be done to engineer a generally-acceptable commercial product. We are cun-ently looking for sponsors and collaborators for this research: anyone interested in developing the prototype into a robust system (for example, to be integrated into a WP system) is invited to contact the author!</Paragraph> </Section> </Section> class="xml-element"></Paper>