File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/a83-1024_metho.xml

Size: 19,511 bytes

Last Modified: 2025-10-06 14:11:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="A83-1024">
  <Title>AUTOMATIC REPRESENTATION OF T~E SEMANTIC RELATIONSHIPS CORRESPONDING TO A FRENCH SURFACE EXPRESSION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
AUTOMATIC REPRESENTATION OF T~E SEMANTIC RELATIONSHIPS CORRESPONDING TO A
FRENCH SURFACE EXPRESSION
Gian Plato Zarrl
</SectionTitle>
    <Paragraph position="0"> Centre National de la Recherche Sclentlflqua Laboratolra d'informatiqua pour 1as Sciences de l'So---e</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FRANCE
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> The work presented here is a preliminary sn~y concerning the automatic translation of French natural language statements into the RESEDA semantic metalanguage. The text in natural language is first (pre)processed in order to obtain its syntactic structure. The &amp;quot;semantic parsing&amp;quot; process begins with marking the &amp;quot;triggers&amp;quot;, defined as lexical units which call one or more of the predicative patterns allowed for in the metalanguage.</Paragraph>
    <Paragraph position="1"> The patterns obtained are then merged, and their case slots filled with the elements found in the surface structure according to the predictions associated with the slots.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
I INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The work that I intend to presen~ here is a preliminary study concerning the automatic translation of French natural language statements into the RESEDA semantic language.</Paragraph>
    <Paragraph position="1"> The RESEDA project itself is concerned with the creation and practical exploltation of a system for managing a biographical database using Artiflcial Intelligence (AI) techniques. The term &amp;quot;biographical data&amp;quot; must be understood in its widest possible sense : being in fact any event, in the public or private Ills, physical or intellectual, etc., that it is possible to gather about the personages we are interested in. In the present state of the system, this information concerns a well-defined period in time (approximately between 1350 and 1450) and a particular subject area (French history), but we are now working on the adaptation of RESEDA's methodology to the processing of other biographical data, for example medical or legal data.</Paragraph>
    <Paragraph position="2"> RESEDA differs from &amp;quot;classical&amp;quot; factual data-base management systems in two ways: - The information is recorded in the base using a particular Data Definition Language (metalanguage) which uses knowledge representation techniques.</Paragraph>
    <Paragraph position="3"> - A user interrogating the base obtains not only information which has been directly introduced This research is Jointly financsd by the &amp;quot;Agmnce de l'Informatique - A.D.I.&amp;quot; (CNRS/ADI contract n deg 507568) and the &amp;quot;Centre National de la Recherche Scientifique - C.N.R.S.&amp;quot; (ATP n deg 955045).</Paragraph>
    <Paragraph position="4"> into it, but also &amp;quot;hidden&amp;quot; information found using inference mechanisms particular to the system : in this respect, the most important character-Istic of the system lies in the Possibility of using inference procedures to question the data-base about causal relationships which may exist between the different recorded facts, and which are not explicitly declared at the time of data entry (Zarrl, 1979;1981). For example, the system may try to explain by inference top-level changes in the State administration in terms of changes in Political power.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="143" type="metho">
    <SectionTitle>
II T~E RESEDA METALANGUAGE
</SectionTitle>
    <Paragraph position="0"> The biographical information which constitutes the systea's database is organized in the form of units called &amp;quot;planes&amp;quot;. There are several different types of plane, sea Zarri e_~ al..L. (1977); the &amp;quot;predicative planes&amp;quot;, the most important, correspond to a &amp;quot;flash&amp;quot; which illustrates a partlculam mament in the &amp;quot;Ills story&amp;quot; of one or more personagas. A predicative plane is made up of one of flve possible &amp;quot;predicates&amp;quot; (BE-AFFECTED-BY, BERAVE, BE-PRESENT, MOVE, PRODUCE) ; one or more &amp;quot;modulators&amp;quot; may be attached to each predicate.</Paragraph>
    <Paragraph position="1"> The modulator's function is to specify and delimit the semantic role of the predicate. Each predicate is accompanied by &amp;quot;case slots&amp;quot; which introduce their own arguments ~ dating and space location is also given within a predicative plane, as is the bibliographic authority for the statement. Predicative planes can be linked together in a number of ways ; one way is to use expllcit links of &amp;quot;coordination&amp;quot;, &amp;quot;alternative&amp;quot;, &amp;quot;causalic/'5 &amp;quot;finality&amp;quot;, &amp;quot;condition&amp;quot;, etc. The data representation we have chosen in the RESEDA project is basically, therefore, a kind of &amp;quot;case grammar&amp;quot;, according to the particular meaning attached to the term in an AI context (Bruce, \[975~ Charniak, 1981; etc.).</Paragraph>
    <Paragraph position="2"> For example, the data &amp;quot;Andr~ Marchant was named provost of Paris by the King's Council on 22nd September 1413 ;he lost his post on 23rd October 1414, to the benefit of Tanguy du ChAtel, who was granted this office&amp;quot;, will be represented in three planes - that of the nomination of Andr~ Merchant, his dismissal and the nomination of Tanguy du Oh&amp;tel.</Paragraph>
    <Paragraph position="3"> The coding of information must be made on two distinct levels : an &amp;quot;external coding, up until  now performed manually by the analyst, gives rise to a first type of representation, formalized according to the categories of the RESEDA metalanguage ; a second automatic stage results in the &amp;quot;internal&amp;quot; numeric code. The external &amp;quot;manual&amp;quot; coding of the three events just stated is given in figure I. The code in capital letters indicates a predicate and</Paragraph>
    <Paragraph position="5"> bibl: Demurgerl,273 figure I its associated &amp;quot;case slots&amp;quot;. Every predicative plane is characterized by a pair. of &amp;quot;time references&amp;quot; (datel-date2) which give the dtLration of the episode in question. In these three planes, the second date slot (date2) is empty because their modulators (begin, end) specify a change of state associated with a punctual event. &amp;quot;Andr~-Marchant&amp;quot; and &amp;quot;Tanguy-du-Ch&amp;tel&amp;quot; are historical personages known to the system ; &amp;quot;provost&amp;quot;, &amp;quot;king's-council&amp;quot; and &amp;quot;letters-of-nomination&amp;quot; are terms of RESEDA's lexicon. The classifications associated with the terms of the lexicon provide the major part of the system's socio-historical knowledge of the period.</Paragraph>
    <Paragraph position="6"> &amp;quot;Paris&amp;quot; is the &amp;quot;location of the object&amp;quot;. If the historical sources analyzed gave us the exact causes of these events, we would introduce into the database the corresponding planes and associate them with these three planes by an explicit link of type &amp;quot;CAUSE&amp;quot;.</Paragraph>
    <Paragraph position="7"> This manual procedure for converting information in natural language into one or more planes has at least two major disadvantages which the proposed study intends to deal with : - The manual representation of biographical information in the terms of the metalanguage can only be performed by a specialist. This is done, at the moment, by the researchers themselves who have constructed the prototype system. Such a method is obviously out of the question if the system is to be used routinely by an uninitiated public, especially as RESEDA was conceived as a system supplied continuously with biographical information extracted from many different sources.</Paragraph>
    <Paragraph position="8"> - In spite of the fact that the syntax of RESEDA's metalanguage imposes strict constraints on the forming of predicative schemata accepted by the system and that these are then thoroughly checked, we cannot completely exclude the possibillty of two coders translating the same information differently.</Paragraph>
  </Section>
  <Section position="5" start_page="143" end_page="146" type="metho">
    <SectionTitle>
III DESCRIPTION OF THE METHOD OF AUTOMATIC CODING
</SectionTitle>
    <Paragraph position="0"> To describe our methodology, I will use the example given in the preceeding section. The initial text in natural language is first (pre) processed to obtain its constituent structure.</Paragraph>
    <Paragraph position="1"> For this purpose, we have used in a first approach the French surface grammar implemented in DEREDEC, a software package developed at the University of Quebec at Montreal by Pierre Plante (1980a;1980b).</Paragraph>
    <Paragraph position="2"> This system, comparable to an ATN parser, permits a breakdown of the surfaoe text into its syntactic constituents, and establishes, between these constituents, syntagmatic relationships of the type &amp;quot;topic-comment&amp;quot;, &amp;quot;determination&amp;quot; and &amp;quot;~oordina~ion&amp;quot;. This preliminary analysis provides a context for subsequent processing, without necessarily removin~ all the ambiguities : in the same vein, see Bog~raev and Sparck Jones (1982).</Paragraph>
    <Paragraph position="3"> The specific tools that we intend to develop for ~his project are of two types : a general procedure which can be likened to a sort of semantic parsing, and a system of heuristic rules.</Paragraph>
    <Paragraph position="4"> A. ~emantic Parsln~ The first stage of the general procedure consists of marking the &amp;quot;triggers&amp;quot;, defined as lexical units which call for one or more of the predicative patterns allowed for in RESEDA's metalanguage. Thus we do not take into consideration every one of the lexical items met in the surface text, retaining only those directly pertaining to the &amp;quot;translation&amp;quot; to be done.</Paragraph>
    <Paragraph position="5"> However, we do not limit ourselves to a simple keyword approach, since a number of operations utilizing data provided by the morpho-syntactic analysis executed by DEREDEC are necessary before the predicative patterns which will be actually used afterwards can be selected.</Paragraph>
    <Paragraph position="6"> One of the results of the DEREDEC analysis ~s a kind of lemmatization enabling the reduction of surface forms in the text to a canonical form ; for example, infinitive in the case of verbs. The canonical forms found in the text under examination are compared with a list of potential triggers stored permanently in the system. In the case of the sentence we are analyzing we can construct from this list the following sub-list : verbal forms - &amp;quot;name&amp;quot;, &amp;quot;loss&amp;quot;, &amp;quot;qrant&amp;quot; ; ter~s pertainang directly to the metalanguage or terms which have a direct correspondence with elements of the meta-language : &amp;quot;office&amp;quot;, synonymous with &amp;quot;post&amp;quot; in RESEDA (&amp;quot;post&amp;quot; is a &amp;quot;generic&amp;quot; term, a &amp;quot;head&amp;quot; of a &amp;quot;sub-tree&amp;quot; in RESEDA's lexicon), and its specification &amp;quot;provost&amp;quot;. The results of the pre-analysis executed by DEREDEC enable the elimination of potential patterns associated with the triggers &amp;quot;name&amp;quot; and &amp;quot;grant&amp;quot; which would correspond to surface constructions of type &amp;quot;active&amp;quot;, as in the hypothetical example &amp;quot;The Duke of Orl~ans named Andr~ Marchant provost of Paris ...&amp;quot;. The patterns  which will be actually utilized afterwards are therefore those shown in figure 2. Note that in the case of a trigger &amp;quot;name (active form)&amp;quot; the parsonage who figures as surface object would have found as the &amp;quot;SUBJECT&amp;quot; of &amp;quot;BE-AFFECTED-BY&amp;quot;,whlIst the surface subject would have been associated with the slot &amp;quot;sOURCE&amp;quot; of &amp;quot;BE-AFFECTED-BY&amp;quot;. the papal court (social body)&amp;quot;. Therefore, for example, the pattern in figure 3 is also associated with the trigger &amp;quot;name (passive form)&amp;quot; The patterns in this second set will be elimina=ed at the end of the construction procedure since, as xt is not possible to obtain a surface realization of the concept &amp;quot;&lt;soclal-body&gt;&amp;quot; in the position  In reality, the predicative structures selected are not limited to those shown in figure I.</Paragraph>
    <Paragraph position="7"> They are in fact repeated with predicative patterns of the type &amp;quot;BE-AFFECTED-BY&amp;quot; which have as &amp;quot;SUBJECT&amp;quot; &amp;quot;&lt;social-body&gt;&amp;quot;,and as &amp;quot;OBJECT&amp;quot; &amp;quot;&lt;personage&gt;&amp;quot; accompanied by the specification (&amp;quot;$PECIF&amp;quot;) of a &amp;quot;&lt;post&gt;&amp;quot;. These constructions each correspond to the description : &amp;quot;A personage receives a post in a certain organization (the organization in question, SUBJECT, is &amp;quot;augmented&amp;quot;, BE-AFFECTED-BY, by the personage, OBJECT, in relation, SPECIF, to a given post)&amp;quot;. A corresponding surface expression would be, for example, the following : &amp;quot;Andr~ Marchant (personage) is named secretary (post) of &amp;quot;SUBJECT&amp;quot;, they cannot provide complete predicative structures.</Paragraph>
    <Paragraph position="8"> The last stage of the general procedure consists of examining the triggers belonging to the same morpho-syntactic environments, as defined by the results of the DEREDEC analysis. If there are several triggers pertaining to the same envlronment, and if the predicative patterns triggered are the same - which means that the predicates and case slots must be the same and that the modulators, dates and space location information must be compatible - then it can be said that the triggers refer to the same situation. As a  figure 3 result, the predicative patterns are merged as to obtain the most complete description possible ; the predictions about filling the slots linked with the cases of the resulting patterns together govern to search for fillers in the surface expression. null Thus, the first two triggers in figure 2, recognized as relevant to the same environment, are combined in the formula in figure 4, which gives the general framework of plane i in figure I.</Paragraph>
    <Paragraph position="9"> elements &amp;quot;Andr~ Marchant&amp;quot;, &amp;quot;provost&amp;quot;, &amp;quot;King's Council&amp;quot; and &amp;quot;22nd September 1413&amp;quot; - standardized according to RESEDA's conventions, see figure I will take up the slots &amp;quot;SUBJECT&amp;quot;, &amp;quot;OBJECT&amp;quot;, &amp;quot;SOURCE&amp;quot; and &amp;quot;datel&amp;quot; directly. The filling-in operations are usually much more complicated, and require the use of complex inference rules. I shall say just a few words here about the heuristic rules designed to solve cases of anaphora (as in our example, &amp;quot;he&amp;quot;, &amp;quot;this office&amp;quot;, &amp;quot;who&amp;quot;).  The example we are considering illustrates a particularly simple case, in which it is not necessary to establish links between the planes to be created. If we had to process the sentence &amp;quot;Philibert de St L~ger is nominated seneschal of Lyon on the 30th of July 1412, in lieu of the late A. de Viry&amp;quot;, three planes should be generated : one for the nomination of Philibert de St L~ger, one for the death of A. de Viry, and another establishing a weak causality llnk (&amp;quot;CONFER&amp;quot;, in our metalanguage) between the first two planes.</Paragraph>
    <Paragraph position="10"> Surface items such as conjunctions, prepositions and sentential adverbs can be used to infer links between planes : causality, finality, coordination, etc. More precisely, in the last example, &amp;quot;in lieu of&amp;quot; is a potential trigger according to the following rule : if the main noun group of the surface prepositional phrase contains a trigger, this phrase constitutes a plane environment and &amp;quot;CONFER&amp;quot; introduces the plane created.</Paragraph>
    <Paragraph position="11"> B. Heuristic Rules The process I have outlined so far requires a corpus of heuristic rules - organized in the form of &amp;quot;grammars&amp;quot; associated with the predicative patterns of RESEDA's metalanguage - which will enable the slots in these patterns to be filled using the surface information in accordance with the predictions which characterize the slots. In the case of the pattern in figure 4, this filling-in poses no real problems, since the surface  In the approach that we propose, marks of anaphora are identified during the general analysis procedure ; the actual solving brings into play a number of criteria from simple pairing off and morphological agreement to more subtle criteria, like contextual proximity, persistence of theme, etc. Thus, morphological agreement and contextual proximity are used to replace &amp;quot;who&amp;quot; by &amp;quot;Tanguy du ChAtel&amp;quot; in our example ; persistence of the theme enables us to fill in the missing date for Tanguy du Chatel's posting with the date &amp;quot;23rd October 1414&amp;quot; appearing in the surface expression.</Paragraph>
    <Paragraph position="12"> We would like to integrate this approach, which has been purely empirical up to now, into the framework of a more general theory. Two directions of enquiry seem particularly interesting in order to develop our own philosophy of the subject.</Paragraph>
    <Paragraph position="13"> The PAL system of Candace Sidner (1979;1981), is a top-down anaphora resolution method which Makes use of the notion of focus (likened to the theme of the discourse). By searching in the text for &amp;quot;focuses&amp;quot; which refer to a system of representation organized as a series of &amp;quot;frames&amp;quot;, it is able to solve references. If the reference is not found by using the frames themselves, it is inferred from other frames contained in the database. The interest in this study lies in the fact that RESEDA already has, as permanent data, a certain amount of general knowledge organized in a form very similar to that of frames. Thus, in m 7 example, the nomination and dismissal of Andr6 Merchant refers ~o the context of the &amp;quot;civil war at the beginning of the 15th century&amp;quot; which is one of those frames (Zarri et el., 1977). The approach used by Klappholz and Lockman (Lockman, 1978) depends on the hypothesis that there is a strong llnk between co-reference and the cohesive links of a discourse. These links, when marked progressively in the text, become indloes of the structure of the discourse, organized as a tree structure and created dynamically. These cohesive links (effect, cause, syllogism, exemplification, etc.) are very similar to the logical connections between planes in RESEDA (causality, finality, condition, etc.).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML