File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1034_metho.xml

Size: 10,701 bytes

Last Modified: 2025-10-06 14:12:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1034">
  <Title>Knowledge integration in a robust and efficient morpho-syntactic analyzer for Frencht</Title>
  <Section position="4" start_page="167" end_page="167" type="metho">
    <SectionTitle>
3. CONSTRUCTION OF A PARSE TREE OR OF A
FOREST
</SectionTitle>
    <Paragraph position="0"> We have compiled an emph-ical grammar of written French which is described by a context-free grammar. Our parser is based on the work of Tomita/Tomita 1986//Tomita 1987/. In a Tomita parser, a general purpose parsing procedure is driven by a parsing table which is generated mechanically from the context-free grammar of the language to be parsed. Tomita's main contribution has been to propose the use of a graph-structured stack which allows the parser to handle multiple structural ambiguities efficiently. We use YACC /Johnson 1983/, a LALR(1) parsing table generator available in UNIX to automatically generate the parsing table which drives the general parsing procedure. When generating the parsing tables, YACC detects and sign',ds cases of sn~uctural ambiguity.</Paragraph>
    <Paragraph position="1"> Many cases can arise in parsing French.</Paragraph>
    <Paragraph position="2"> Consider first the case when a word has been assigned multiple categories. Some of the ambiguities can be resolved by considering the expectations of the grammar. Consider the word court which can be an adjective, an adverb, a noun or a verb. If court is found in the context il : \[ProC1\] court : Adj / Adv / N / V \[3 rd person singular, etc.\], the grammar accepts only the verb at this point. Similarly the word une which can be a determinant, a noun or a pronoun can automatically be reduced to noun in the context il a lu la une du journal.</Paragraph>
    <Paragraph position="3"> Consider now the case when the parser cannot derive a parse tree: based on the hypothesis that there may be a spelling error which caused an erroneous category to be assigned, the parser calls the spelling correetor to revise the spelling of a word and hence the category assigned to it. In the case of the previous example il *pin, of the spelling alternatives for pin, only peint, the verb, is retained since pain is no more possible in this context than pin.</Paragraph>
    <Paragraph position="4"> Indeed, in our grammar of the sentence only a verb or another clitic pronoun may appear after a clitic pronoun. Similarly, in the sentence ils *on apportd le livre, *on will be corrected to ont .</Paragraph>
    <Paragraph position="5"> The parser efficiently constructs a parse tree or a forest of parse trees which account for the sentence. In a Tomita parser, the forest of parse trees is represented by a data structure analogous to a chart/Winograd 1983/, which allows for &amp;quot;local ambiguity packing&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="167" end_page="167" type="metho">
    <SectionTitle>
4. ANALYSIS OF THE PARSE TREE OR FOREST
</SectionTitle>
    <Paragraph position="0"> A forest of parse trees can be produced in classical cases of structural ambiguity such as in Pierre expddie des porcelaines de Chine. The two parse trees generated for this sentence can be seen in Fig. 4 and 5. The bracketed Lisp representation of these trees can be found in Fig. 6 and 7.</Paragraph>
    <Paragraph position="2"> A forest of parse trees can also be caused by cases of lexical ambiguity such as il veut le boucher. In many cases, only some of the trees in the forest need be retained, since the system can automatically clear the forest. For example, although two parse trees are constructed for the sentence Jean n'a pas C/ffectud de lancer (lancer could be an infinitive verb or a noun), ordy the tree with lancer categorized as a noun is retained, as shown in Fig. 8.</Paragraph>
    <Paragraph position="4"> At this level, the sub-categorization of the verb is of great help: this information is also stored in the dictionary of course. For example, effectuer does not allow an infinitive phrase as a complement. Simih'trly, in the sentence il a remarqud Marie arriwmt d tottte allure, Marie arrivant d toute allure could be an adverbial plwase, Marie could be the object of remarquer and arriwmt d torte allure could be ml adverbial phrase, finally Marie arrivant d tot,.te allure could be the object of remarquer. The first hypothesis (uee) is rejected since remarquer is sub-categorized as requiring a di:,:cct complement.</Paragraph>
    <Paragraph position="5"> Sub-categori:;ation is used to clear the forest of trees, Fig. 9-12, resulting from the analysis of the sentence il pense d l'envie de  The sub-categorization information for the verb penser allows us to eliminate the lrees of Fig. 11 and 12. Since Paul cannot be sub-categorized by an infinitive sentence, as peur can be (la peur de s'enrichir), the tree in Fig. 9 can also be eliminated. The only remaining analysis is the tree in Fig. 10.</Paragraph>
    <Paragraph position="6"> Verb sub-categorization also allows the system to COXTeCt some spelling mistakes at this stage. For example, the sentence *il panse que Marie viendra will be corrected to il pense que Mcwie viendra since panser does not accept a completivc.</Paragraph>
    <Paragraph position="7"> Similarly, in il va *ou il veut, *ou is corrected to ot~. At this level we also correct, using information stored in the dictionary, an error of the type *quoique tu discs, je partirai to qu/)i qtte tu discs, je partirai, since the sub-categorization of dire is not satisfied in the first case. It is also verb sub-categorization information which allows us to conect certain trees and improve others.</Paragraph>
    <Paragraph position="8"> Consider the case of con'ecting a u'ee. For the sentence, il punit qui ment, i~fitially qui ment is labelled as a sentence connected to the verb punir. Then, the sentence qui rnent is relabelled as a noun phrase.</Paragraph>
    <Paragraph position="9"> Consider now the case where the sub-categorization allows us to improve a tree. In the sentence Pierre lira un livre cette nuit , cette nuit initially labelled noun phrase, will be relabeIled adverbial phrase since lire cannot be sub-categorized by two noun phrases, as nommer can be, for example.</Paragraph>
  </Section>
  <Section position="6" start_page="167" end_page="169" type="metho">
    <SectionTitle>
5. CORRECTING SYNTAX ERRORS AND AGREEMENT
ERRORS
</SectionTitle>
    <Paragraph position="0"> Experience has shown that syntactic errors are relatively infrequent. For example, in a study of the syntax of primary school students/Dubuisson &amp; Emirkanian 1982a//Dubuisson &amp; Emirkanian 1982b/, out of 6580 communication units, only 79 (1.2%) were found to be ungrammatical. The unit of communication is equivalent to what the traditional gratmnar calls the sentence, that is the root sentence and any embedded sentences/Loban 1976/. We observed/Lafontaine, Dubuisson  and Emirkanian 1982/that the most frequent problem is in the use of subordination (53% of the errors), the use of complex relative clauses in particular (24 cases out of 42). Children also have problems with multiple embeddings: in general when they connect an embedded sentence to another, the resulting sentence is ungrammatical, the main sentence being absent or incomplete. The other problems are related to coordination, to constituent mobility and to the use of clitic pronouns where we observed a strong influence from the oral.</Paragraph>
    <Paragraph position="1"> As for relative clauses, we counted non-standard clauses as ungrammatical, though they follow rules as do the standard relative clauses. La fille que je te parle et la fille que je parle avec are examples of non-staudard relative clauses whilst the sentence *la fiUe dont que je te parle is ungrammatical We have chosen for now to focus our attention on two of these problems: complex relative clauses and sequences of clitics. As part of a previous research project, we developed algorithms for handling complex relative clauses/Emirkanian &amp; Bouchard 1987/ and sequences of elitics/Emirkanian &amp; Bouchard 1985/. For the sentence la fille que je parle, the syntax correction algorithm proposes la fille de qui/dont/de laquelle/avec quilavec laquelle/d qui/c~ laquelle je parle. On the other hand, in response to the sentence la fille que je te parle, the algorithm proposes dont, de qui and de laquelle as possible choices. Again it is the sub-categorization of the verb which gives us a handle on the problems with sequences of clitic pronouns. The program con'ects *je lui aide toje l'aide, for example. However, in most cases, only an error is reported, the system is unable to correct the error since it cannot identify precisely tile referent of the clitic. *J'y donne and *je lui donne are examples of ungrammatical sentences; the system cannot propose with certainty the missing clitics: it will propose la lui, le lui, etc.., in the first case and le lui , la lui , lui en , etc.., in the second case.</Paragraph>
    <Paragraph position="2"> During morphological analysis, based on the information gleaned fi'om the dictionary, the information collected in the parse tree and the agreement rules of French, the system isolates the noun phrases and checks to see if the agreement rules for number and gender have been appIied. It then checks for agreement between the subject and the verb. Note that, for example, in the case of *les belles chameaux , the system proposes both les beaux chameaux and les belles chamelles . In response to the sentence *le professeur explique la lemon aux ~ldve de la classes, the system proposes le professeur explique la leqon aux dldves de la classe , aux dldves des classes, ~ l'~ldve de la classe and also d l'#ldve des classes , even if, based on our knowledge of the world, we know that the last answer is less probable.</Paragraph>
    <Paragraph position="3"> The agreement rules which we have formalized, some of which are recorded in the dictionary, allow our system to correct the errors most frequently found in written text /Lebrun 1980/ /Pelchat 1980/. These errors are due, in particular for number agreement, to semantic interferences or to the proximity of other elements: for example, * il veut ~tre trds riches instead of U veut dtre trds riche , *je les voient instead of je les vois and * Michel nous donnent des bonbons instead of Michel nous donne des bonbons .</Paragraph>
    <Paragraph position="4"> Finally, note that certain lexieal ambiguities (there are relatively few remaining at this stage) could be resolved here: for example, this is the case for le chouette anglais , but la chouette anglaise still remains ambiguous.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML