File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-1034_abstr.xml

Size: 4,626 bytes

Last Modified: 2025-10-06 13:46:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1034">
  <Title>Knowledge integration in a robust and efficient morpho-syntactic analyzer for Frencht</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Our goal is to construct a morpho-syntactic analyzer for French which is capable of automatically detecting and of correcting (automatically or with help from the user) spelling mistakes, agreement errors and the most important syntax errors. This system could be used to analyze word processor output, for example.</Paragraph>
    <Paragraph position="1"> Since our main goal is to implement a robust and efficient analyzer for French, we have designed a system which can detect errors as opposed to one which can only process well-formed input.</Paragraph>
    <Paragraph position="2"> A number of systems for English text analysis have been developed. The Writer's Workbench/Frase 1983/is a collection of tools developed at AT&amp;T's Bell Laboratories: the two most important ones address proof reading and style analysis. The EPISTLE project /Miller, Heidorn &amp; Jensen 1981/is a vast project undertaken at IBM's Thomas J. Watson research laboratory, the long term goal of which is to develop a system which not only supports writing, but also text understanding. WANDAH/Friedman 1984/, a system that was developed at UCLA, comprises three sub-systems: a word processor designed to support interactive composition, tools to assist composition and tools to help in the editing and the revising phases.</Paragraph>
    <Paragraph position="3"> These systems are difficult to adapt to French since they are based on knowledgg which is specific to English. Furthermore, in these systems the gnowledge is rarely represented explicitly: indeed, the knowledge/has most often been &amp;quot;compiled&amp;quot; for reasons of efficiencyj Thus, these systems cannot easily reason about the knowledge they have. /-The ~ovel feature of our system is that it is based on an integration at different levels of the knowledge of French. This knowledge is represented explicitly in the system and the system keeps track of the decisions it has made, which will allow it not only to justify its decisions but also to reason about its reasoning. The main problem is in the integration of knowledge of the language, knowledge which is at different levels: knowledge of orthography/Catach 1980/, of traditional grammar/Le nouveau Bescherelle 1980//Grevisse 1969/, of syntax/Grevisse 1969/ /Gross 1975//Boons, Guillet &amp; Lecl~re 1976/and also of the most frequently encountered errors/Catach, Duprez &amp; Legris 1980//Class &amp; Hor~,uelin 1979//Lafontaine, Dubuisson &amp; &amp;quot;~ Research ftlnded by the Social Sciences Research Council of ..... Canada (SSRCC grant no. 410-85-1360).</Paragraph>
    <Paragraph position="4"> Emirkanian 1982/. In order to be able to use such ka,owledge, it must on the one hand be made operational and it must on the other hand be orchestrated.</Paragraph>
    <Paragraph position="5"> In our system, these sources of knowledge are used as follows.</Paragraph>
    <Paragraph position="6"> Each sentence of the text is split up into words. Each word i.~ categorized by dictionary look-up; knowledge of French orthography is represented as a collection of correction rules. An efficient parser, driven by a context-free grammar, builds a parse tree oi&amp;quot; a forest of parse trees in the case of ambiguity. This parser is deterministic in the sense that it blocks as soon as an error i.~ detected. The parser can recall the spelling corrector, if need be,. Then, knowledge of the sub-categorization of French verbs allows the system to eliminate automatically certain ambiguities and to detect and correct many errors. Finally, the user is consulted whenever the system cannot intervene.</Paragraph>
    <Paragraph position="7"> Before presenting the system in depth, we must emphasize that the system we have designed is intended to assist at the knowledge level and not at the competence level. It is not designed as a tool to improve written communication skills.</Paragraph>
    <Paragraph position="8"> The main sub-tasks of the system are as foUows: word categorization by dictionary look-up and spelling col~rection, construction of a parse tree or of a forest of parse trees in cases of ambiguity, correction of syntax errors, detection and co~xection of morphological errors by processing the parse u'ee.</Paragraph>
    <Paragraph position="9"> We shall now examine these three phases.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML