File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2176_metho.xml
Size: 16,455 bytes
Last Modified: 2025-10-06 14:13:42
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2176"> <Title>TOWARDS A MORE USER-FRIENDLY CORRECTION</Title> <Section position="4" start_page="0" end_page="1083" type="metho"> <SectionTitle> 2. DETECTION AND CORRECTION OF SYNTACTIC ERRORS </SectionTitle> <Paragraph position="0"> Any error which prevents the system from producing an interpretation (or more simply a parsing) for the input sentence is considered to be a syntactic error. These errors may be of very different kinds, but we can give two rough classes: (a) errors due to the system: the input is correct but the linguistic coverage is insufficient; (b) errors due to the user:, the input is incorrect. This classification, which can &quot;also be used for lexical errors, is far more relevant for tile syntactic level because type (a) errors at this level are very frequent in free texts, such as newspaper articles for example. In order to avoid deadlocks due to these errors, one must build robust parsers, with wide coverage (Chanod, 91; Genthial, 90). We are going to concentrate here on type (b) errors.</Paragraph> <Paragraph position="1"> We suppose the system has &quot;all the required competence and the deadlock is due to a misuse of the language by the user. We may then consider two ways to proceed: * either we relax constraints in order to obtain results, even incorrect, then we filter these results to find the origin of the error and finally correct it (Douglas, 92; Weischedel, 83); deg or we try to foresee the errors and we integrate in the grammar a'way to express all possible types of errors, thus avoiding deadlocks of the parsing process (GoOser, 90).</Paragraph> <Paragraph position="2"> We have chosen the first way because the richness of natural language makes it very difficult to describe all correct utterances. Therefore, it is in our opinion, impossible to enumerate exhaustively all possible errors, especially if we intend to verify texts read by automatic devices (scanners and characters recognition software).</Paragraph> <Paragraph position="3"> The first method can be encountered for example in systems which aim to build a logico-semantic interpretation of the input sentence: in these systems, syntactic constraints are almost completely relaxed and parsing is based on semantic information (Granger, 83; Wan, 92).</Paragraph> <Paragraph position="4"> We have therefore built a prototype (Courtin, 91) which can detect and correct agreement errors in number, gender and person, in simple French sentences. The most interesting feature of this prototype is not its coverage, which is limited, but the exhaustive design and implementation of all agreement rules of French grammar. It works as follows: we first make a morphological analysis of the input sentence, then we build all possible dependency structures for the sentence.</Paragraph> <Paragraph position="5"> Following the principle of relaxation of constraints, the process of building dependency structures does not take into account morphological variables, it uses only the lexical category of words. The resulting trees are then passed on to a checker which will attempt to verify the variables borne by the nodes, examining them by pairs, each pair composed of a governor and a dependant.</Paragraph> <Paragraph position="6"> So to verify lesplu calculsin scientifiquesin z (scientific computations), we will first verify the pair (calculsin, Iespl.u) which is incorrect because of a disagreement m number between calcul~n, and leSplu. We will then ask the user to choose between the two solutions : les calculs (plural) and le calcul (singular). In order to generate these solutions, we use a morphological 2As it is not easy to find good examples of complex agreement errors in english, we use French examples but we make explicit the variables causing trouble : here the number with sin for singular and plu for plural. generator which is of course based on the same data as the morphological parser mentioned above.</Paragraph> <Paragraph position="7"> The user's choice is then introduced in the tree and the verification process resumes. If the user chose the plural, we will have an error again with (calculsplu, scientifiquesin) leading to a new, obviously useless, question to the user. This traversing of trees using pairs has proved useful to design agreement rules, but is clearly not adapted to a user-friendly correction. Moreover, it does not take into account the context of the incorrect pair. We therefore propose f&quot;st the use of correction strategies and then a new way of traversing the trees which are to be verified.</Paragraph> </Section> <Section position="5" start_page="1083" end_page="1084" type="metho"> <SectionTitle> 3. USING CORRECTION HEURISTICS </SectionTitle> <Paragraph position="0"> By definition of the concept of agreement error, every such error always gives two lexical units which may be corrected. The choice of the unit to be corrected is left to the user but we think that in most cases the proper correction can be chosen automatically. Actually, when a human being rereads a text, even if he is not the author, he very rarely hesitates between the two possible corrections of an agreement error. One can always say that a human reader tmderstands the written text but we can &quot;also imagine simple heuristics (i.e. machine computable), which could allow correction without hesitation.</Paragraph> <Section position="1" start_page="1083" end_page="1084" type="sub_section"> <SectionTitle> 3.1. Heuristics </SectionTitle> <Paragraph position="0"> For examples of such heuristics, we could have (V6ronis, 88, quoted in Genthial, 92): a) number of errors in a group: lesi n vdlospt u eStsin redevenusin d la mode will be corrected in the singular le v~lo est redevenu cila mode (only one word corrected), rather thin1 the plural les v~los sont redevenus d la mode (three words corrected with, moreover, an alteration of the meaning, very hard to detect with simple techniques); b) it is better to correct in a way that does not modify the phonetics of the sentence: Lesmasfem chiensmas dressgesfem.., will be corrected in the masculine Les chiens dresses.., rather than the feminine Les chiennes dressdes... We find here again the idea, often used at the lexical level, that incorrect written utterances follow the phonetics of the correct form.</Paragraph> <Paragraph position="1"> c) writer laziness: a writer sometimes omits an s where one is necessary, but rarely adds one where it is not&quot; les lu enfantrin is thus . p ~ ... corrected as leSplu enfantSplu ....</Paragraph> <Paragraph position="2"> d) one cml give priority to the head of the phrase (underlined): IeSplu petitSptu ~in qui ontplu.., becomes singular le petit enfant qu/a... The idea here is that the writer takes more care of the main word of a phrase than of the others.</Paragraph> <Paragraph position="3"> We could also find other criteria, by studying corpora or by interviewing professionals such as teachers of French or journalists.</Paragraph> <Paragraph position="4"> These heuristics are of course open to criticism, the main argument against them being that they am no longer valid with the use of text editors because cutting and pasting of portions of text may introduce errors which would not have been made in linear writing.</Paragraph> <Paragraph position="5"> Moreover, they are often conflicting: consider for example the sentence j'aime lesplu calculsin scientifiquesin which includes an agreement error in number. The (a) criterion leads to correct lesplu in lesin because 2 words among 3 are singular. The s not being pronounced at the end of French words, the (b) criterion leads to correct plural les calculs scientifiques, without phonetic alteration. The (c) criterion imposes the plural and the (d) criterion the singular of calcul, which is the governor.</Paragraph> <Paragraph position="6"> 3.2. Weightings Despite everything, we can hope to obtain automatic corrections thanks to the use of more than one criterion and if we are able to weight the various criteria in order to compute a confidence factor for each correction.</Paragraph> <Paragraph position="7"> Consider for example, for the above criteria, that the confidence factor is computed with the following formulae:</Paragraph> <Paragraph position="9"> where the Ki are weights assigned to each criterion. We will take Ka = 2, K b = 2, Kc = 2 and K d = 1.</Paragraph> <Paragraph position="10"> If we apply these weightings to lesplu calculsin scientifiqUesin, we get Table 1.</Paragraph> <Paragraph position="12"> A null value fits a case where the confidence factor can not be evaluated: thus for the (c) criterion we can only correct in plural and for the (d) criterion, on this example, singular is imposed by the governor.</Paragraph> <Paragraph position="13"> If we sum the factors of each row, the correction j'aime les calculs scientifiques (plural) wins by 5,33 (51,6%) against 5 (48,4%) for j'aime le calcul scientifique (singular). It is true that in this case, the weakness of the difference makes it advisable to ask the user to choose his correction, but we can decide to use a threshold T such that, if the absolute value of the difference between the two confidence factors (0.3 on the example) is above T, correction will be automatically done for the solution with tim higher confidence factor.</Paragraph> </Section> <Section position="2" start_page="1084" end_page="1084" type="sub_section"> <SectionTitle> 3.3. Adaptability </SectionTitle> <Paragraph position="0"> One of our hypotheses is that the value ,and thus tile weight of a correction criterion depends on a given user or at least on a given class of users (scientists who master the language but not the keyboard, children or foreigners learning the language, secretaries who master both keyboard and language but are inattentive .... ).</Paragraph> <Paragraph position="1"> Consequently, we want to build a system where the criterion weights are not fixed, but may be dynamically updated by means of a simple learning mechanism. Initially, weights are either arbitrarily chosen, or chosen following the assignment of the user to a particular class, and the automatic correction threshold is set very high. With that configuration, most errors lead to a consultation of the user and his answer is used to increase the weight of those criteria which would have selected the proper answer and to weaken the weight of the others.</Paragraph> <Paragraph position="2"> In the above example, if the user forces the singular, the system will increase the weight of the (a) and (d) criteria and weaken the weight of (b) and (c).</Paragraph> <Paragraph position="3"> In the same way, the threshold will decrease each time the weights ,are modified until it reaches a lower limit, arbitrarily fixed or chosen by the user.</Paragraph> <Paragraph position="4"> However, the implementation of these correction criteria in a verification-correction system for agreement errors assumes that the lOtlZ; minimal unit of correction, which was a pair (governor, dependant) in the prototype described in SS2, must be redefined in order to render possible the evaluation of the confidence factor for each correction proposal.</Paragraph> </Section> </Section> <Section position="6" start_page="1084" end_page="1084" type="metho"> <SectionTitle> 4. A NEW CORRECTION METHOD </SectionTitle> <Paragraph position="0"> Consider for example the sentence: leSplu jeuneSplu cyclistesin que J'sinaisin rencontr~sin montaientplu d bonmas allurefem 3. It contains an agreement error in gender between bonmas and allurefem, and two agreement errors in number: one-in the nominal phrase-, les lujeunes, plu cyclistesin, and the other between ~e subject cychstesin and the verb montaienttTlu. If we choose to correct this sentence b~; forcing the plural, we introduce a new error between the past participle rencontrdsin, and its object complement cyclistes, which has became plural. The associated dependency tree is shown in Fig. 1.</Paragraph> <Paragraph position="1"> The agreement rules which apply are then: deg agreement between determiners, adjectives and noun inside a nominal phrase; ,~ agreement between the past participle of the relative clause rencontr~ and its object cycliste because it is placed before; * agreement between the subject and the verb; * agreement between the subject and the auxiliary a/in the relative clause.</Paragraph> <Paragraph position="2"> Reading these rules suggests dividing the verification-correction problem according to agreement dependency existing between the nodes of the tree. We then apply the following method: 1) Partitioning of the tree in three sub-trees, each one connected, but not necessarily disconnected two by two. There must exist a 3Something like: the young cyclist I have met were climbing at good speed. dependency between the variables (gender, number, person .... ) of the nodes of a sub-tree 2) Checking of agreement rules for each sub-tree obtained: here we exploit the previous work by verifying only those rules which have decided that a sub-tree was actually one. We verify by the classical method of tree traversing with unification of the values of variables. We then eliminate the groupj'ai, which is correct. 3) ff at least one error is detected in a group, we must attempt to correct it by using the heuristics defined above. For bonmas allurefem, we will correct in the feminine bonne allure because allure has no masculine.</Paragraph> <Paragraph position="3"> 3.1) However, it is interesting to divide complex groups into more simple ones, always according to the agreement rule involved. In the example, we will divide the first group, which includes the relative clause, into the three sub- null Such a partitioning is interesting because the agreement error in number, detected on the whole group does not appear in all the subgroups. If we attempt to correct separately each sub-group (with the criteria and the weights defined above) we obtain Table 2.</Paragraph> <Paragraph position="4"> If the threshold T is small enough (< 2), we can consider les jeunes cyclistes (plural) as the good correction for the first sub-group, the second sub-group is correct and the plural corrects the third. But these results leave an error on the whole group.</Paragraph> <Paragraph position="5"> 3.2) So we must evaluate the whole group correction by using the results of each subgroup. Here again, we can exploit various criteria of evaluation: * simple majority: we choose the most frequently selected correction in the subgroups. Plural wins by 2 to 1. We could also weight each group according to the number of words or to statistical criteria on errors: agreement errors on past participles used with the auxiliary avoir (have) are especially frequent in French, due to the complexity of the rules involved; so the weight of the second sub-group would be lowered.</Paragraph> <Paragraph position="6"> deg proportional majority: we sum the confidence factors of all sub-groups for each possible correction. This leads to correction in the plural (17,66) rather than the singular (16,33). We can here again use a threshold below which the conclusion is not considered reliable.</Paragraph> <Paragraph position="7"> * weighted proportional majority which uses the percentages and so is a mixture of the two previous ones: we sum the percentage of each sub-group. Plural wins by 161,9 against 138,1 for the singular. Comparing with the previous method, we weaken the importance of the second sub-group which, being correct, has a big difference between the two confidence factors.</Paragraph> <Paragraph position="8"> In the example, the plural wins, but when it is not possible to automatically choose the good correction, the choice is left to the user. It is then very interesting to exploit the partitioning of the tree to ask a very relevant question to the user:, the intersection of the three sub-trees is the word cycliste, so we can question the user as follows: In the sentence: les jeunes cycliste que j'ai rencontrd rmmtaient d bonne allure.</Paragraph> <Paragraph position="9"> Did you want to say un c~ycliste (singular) or des cyclistes (plural) ? According to the answer, the whole sentence is corrected and possibly the weights and the threshold axe updated.</Paragraph> </Section> class="xml-element"></Paper>