File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1032_metho.xml

Size: 20,223 bytes

Last Modified: 2025-10-06 14:11:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="E83-1032">
  <Title>NAmlmAL LANGUAGE INFO~4ATION RETPSIEV~ SYST~4 DI.&amp;LOG</Title>
  <Section position="3" start_page="0" end_page="200" type="metho">
    <SectionTitle>
2. Transformation of natural language
</SectionTitle>
    <Paragraph position="0"> sentences into logical formulae The user of the DIALOG system introducing his utterance into the system comes into direct contact with the natural language analysis module. This module plays the key role in the machine natural lang1~age communication process.</Paragraph>
    <Paragraph position="1"> Similarly as in many other information systems of this type, e.g. L\[fMAR /Woods 72/, PLANES /Yaltz 76/, SO~FIF, /~urton 76/, RENDEZ-VOUS /Codd 78/, PLIDIS /Berry-Rogghe 78/, OIALOGIC /Grosz eta\].</Paragraph>
    <Paragraph position="2"> 82/, the purpose of the module is to transform a text in the natural language into a chosen formal representation. Suc~ Such a representationmust meet a number of requirements. Firstly, it must be &amp;quot;intelligible&amp;quot; to the internal parts of the system, i.e. the deductive comoonent and/or managing the data base. Secondly, it must carry in a formal, and clear maner the sense and meaning of utterances in natural language. Finally, the representation should allow for a reproduction of the original input sentence with the aim of generating intermediate paraphrases and/or answers for the user.</Paragraph>
    <Paragraph position="3"> In the parser of the DIALOC, system, we attempted on the gratest, in our opinion, achievements in the field of natural language processing. The following works had the greatest influence on the final form of the module: /Berry-Rogghe 78/, /Bates 78/, /Carbonell 81/, /Cercone 80/, /Chomsky 65/, /Ferrari 80/, /Fillmore 68/, /Gershman 79/, /Grosz 82/, /Lnndsbergen 81/, /Marcus 80/, /Martin 81/, /Moore 81/, /Robinson 82/, /Rosenschein 82/, /Schank 78/, /Steinacker 82/, /Waltz 78/, /Wi\]ensky 80/, /Woods 72/ and /Woods 80/. We have transferred, with greater or less success, the most valuable achievements presented in these works, pertaining  mainly to the English language processing, into our system, using them in the treatment of the Polish language. We attempted thus, to preserve a certain distance with regard to the language itself, as well as the subject of conversation with the computer, so that the adapted solutions were of a broader character and through that became comparable with the state of research in that field in other countries.</Paragraph>
    <Paragraph position="4"> 2.1.The role, wlace and structure of the language analysis module The purpose of the language analysis module in the DIALOG system is transformation of the user's utterance /in Polish/ into the I order logic formulae.</Paragraph>
    <Paragraph position="5"> Other formal notations such as II order logic formulae, FUZZY formulae, Minsky frames and even the introduction of intensional logic elements are also considered. At present, ~e will concentrate on the process of transforming a natural sentence into a I order logic formula.</Paragraph>
    <Paragraph position="6"> The system is equipped with two independent modules: deduction and data base management. The data for these modules are the formulae generated by the parser. We will present only one module working on the basis of the we~( second order logic.</Paragraph>
    <Paragraph position="7"> The parsing system consists of the two closely cooperating parts: a syntactic analyser and a sem(nntlc interpreter. The whole was programmed with the aid o~ a mechanism called CATN /Cascaded ATN/ /Woods 80/, /Bolc, Strzalkowskl 82a,82b/ /Kochut 83/, where the syntactic component plays the role of the &amp;quot;upper&amp;quot;, i.e. the dominating &amp;quot;cascade&amp;quot;. For the syntactic analyser produces a structure of the sentence grammatical analysis, which in turn undergoes a semantical verification. In case, where the semantic interpreter is not able to give the meaning of the sentence, the syntactic component is activated again with the aim of presenting another grammatical analysis. If such an analysis cannot be found, the input sentence is treated as incorrect.</Paragraph>
    <Paragraph position="8"> 2.2. The syntactic analyser The syntactic component of the parser produces a gra~natical analysis of the input sentence in Polish. This was possible due to a skillful programming of rules governing the morphology and syntax of the language. Although, the whole system was oriented towards a defined type of texts /medical/, the accepted solutions make it a much more universal tool. We do not claim that the syntactic analyser in its present fol-m is able to solve all or the majority of problems of the Polish language syntax.</Paragraph>
    <Paragraph position="9"> It includes, however, rather wide subset of the colloquial language, enriched by constructions characteristic for medical texts.</Paragraph>
    <Paragraph position="10"> A natural language sentence introduced into the parser undergoes firstly a pretreatment in a so called spelling correcter. If all the words used in the sentence are listed in the system vocabulary then the sentence is passed for syntactic analysis. Otherwise the system attempts to state whether the speaker made a spelling error, giving him a chance to correct the error and even suggesting the proper word, or whether 11e used a word unknown to the system. In the last case, the user has a possibility of introducing the questioned word into the vocabulary but in practice it may turn out to be too troublesome for him. Usually then, the user is given a chance of withdrawing the unfortunate utterance or formulating it in a different way.</Paragraph>
    <Paragraph position="11"> The proper syntactic analysis begins at the moment of activating the first &amp;quot;cascade&amp;quot; of the parser. It consists of five ATN nets, with the aid of which the grammar of the subset of the Polish language has been written. The two largest nets SENTENCE /sentences/ and N0\[~-P_RR /nominal groups/ play a superiorrole in relation to others: ADH-PT~A /adiective groups/~ ADV-PT~A /adverb groups2 and Q-EXPR /question phrases/. The process of syntactic analysis is usually quite complex and uses essentially the non-deterministic character of orocessing in ATN. It Is justified by the-specific nature of the Polish language, which is characgerised by a developed in~ection and a Sentence free word order.</Paragraph>
    <Paragraph position="12"> The result of the syntactic analysis is a grammatical analysis of the input sentence in the form of a so called o-form. It is a nonflexional form of a sentence, ordered according to a fixed key. The construction of the o-form can be expressed ba the structure:</Paragraph>
    <Paragraph position="14"> ~(pre~. phrase)I}&amp;quot;(CAUSE/RES\[~(o-forn~\] END) The stick mark &amp;quot;|,, is usually used as a symbol of the meta-language. Here it is used as a symbol of the defined language.</Paragraph>
    <Paragraph position="15"> Symbols S and END comnrise a single clause. A clause expresses every elementary activity or event expressed in the  input sentence. Often, the o-form has a richer structure than a classical analysis tree. The elements of the o-form called ~subject~ , (direct ob-Ject~ , (indirect objectS, and ~adJective phrase) can also be expressed or modified with the use of clauses. The stick marks &amp;quot;I&amp;quot; separate the parts of the o-form and are its constatnt elements. Then transformed nuestion is subjected to semantic interpretation.</Paragraph>
    <Paragraph position="16"> The syntactic analyser manages the vocabulary, where inflexional forms of words are kept. The vocabulary definition specifies the syntactic categories, to which given words belong. It also describes forms of words with the aid of lexlcalparameters: case, number, person and gender. These parameters are of gret value in examining the grammatical construction of sentences.</Paragraph>
    <Paragraph position="17"> 2.3. The semantic interpreter When the syntactic analysis is successfully completed the o-form of the input dentence is forwarded for the semantic interpretation. The syntactic &amp;quot;cascade&amp;quot; is suspended, i.e. removed from the operational field, leaving place for the semantic &amp;quot;cascade&amp;quot;. The configuration of the removed &amp;quot;cascade&amp;quot; is remembered thus, in case of necessity of generating an alternative grammatical analysis.</Paragraph>
    <Paragraph position="18"> The semantic interpreter consists of the two main parts: a constant controlling part, working on the basis of a very general pattern adjustment, and compatible experts algorithms, where the knowledge of the system in the field of conversation has been coded. The process of interpretation is assisted by a special vocabulary of semantic rules and on additional vocabulary complementing the expert knowledge.</Paragraph>
    <Paragraph position="19"> The sentence in the o-form is forwarded directly to the controlling part of the interpreter, where such its parameters as time, negation, aspect ....</Paragraph>
    <Paragraph position="20"> are evaluated first. Then the central predicative element of the sentence &amp;quot;calls for&amp;quot; a proper semantic rule, which from then will guide the interpretation process. The rule has a form of ~ pattern-concept pair /Wilensky 80/ Gershman 79/, /Carbonell 81/, where ~he pattern reflects the scheme of an elementary event, wheras the concept indicates how its meaning should be expressed through formulae. The semantic rule is activated for the time of interpretation of a single clause. If the pat tern is adjusted to the cl~use, an atomic formula is generated, expressing the meaning of the clause. The meaning of the whole sentence is expressed as a logical combination of meanings of all the o-form clauses. The semantic rules bring different /on the surface/ descriptions of the same phenomenon into a common interpretation.</Paragraph>
    <Paragraph position="21"> The.general structure of formulae generated by the interpreter is expressed by an implication: 41^~2^ ...^~n-~ &amp;quot;where ~ has been introduced from a semantic rule and~i come from the system knowledge - special compatible parts of the interpreter called the experts.</Paragraph>
    <Paragraph position="22"> Individual o-form phrases, in the context of the dialogue subject, are interpreted in experts.</Paragraph>
    <Paragraph position="23"> In our system, designed for conversation with a phlsician, we have experts for names of sicknesses /SICKNESS/, names of ~rgaus /ORGAN/, internal substances /oUBSTANCE/, therapies /TREAT~NT/, medicaments /MEDICAmeNT/ and names of animate objects /ANIMATE/ and the remaining objects foreign to the body /PHYSOBJ/. Experts are activated on the request of a proper semantic rule.</Paragraph>
    <Paragraph position="24"> The controlling part of the inter~eter &amp;quot;instructs&amp;quot; the expert/s/ chosen by the pattern to interpret a notion or expression. The indicated expert can solve the problem on its o~m or seek for the help of other experts. Often, one complex expression has to be gualified by two or three exprrts.</Paragraph>
    <Paragraph position="25"> All the experts, as well as the controlling part of the interpreter /FOR~UJLA, CASES and QWORDS nets/ have been recoreded in ATN formalism and form a lower &amp;quot;cascade&amp;quot; of the parser. The interpreter is also egulpped with a mechanism of context pronominal reference solution.</Paragraph>
    <Paragraph position="26"> 2.4. Examples of transformation of a medicaltext into logical formulae We will present two examples of transformation of medical sentences into I order logic formulae. Before that, a few words on the adopted convention of formula notation. The symbols IMPLSYM and KONJSYM are logical operators /implication/ andS/conjunction/ respectively. Integer placed directly after the symbol KONJSYN indicates the number of conjlmction factors. Names of predicates are preceded by symbols '~&amp;quot;  cate arguments. The arguments specify their type /sort/, name of the variable and constant /if there is one/.</Paragraph>
    <Paragraph position="28"> . The deduction and knowledge representation module The deduction module is a separate part of the whole DIALOG system. Its maiz purpose is to collect and represent the knowledge gained by the system and also the ability to use the possessed information in accordance with the wishes of the user of the system.</Paragraph>
    <Paragraph position="29"> Our work on the achievement of the objectives indicated above was based on the experiences pre~ented by E.Konrad and N.Klein /Konrad 76/, /Klein 78/ from Technical University in West Berlin.</Paragraph>
    <Paragraph position="30"> In the previous chapter we presented how the text, written in Polish, is transformed into I order logic formulae. This, of course, implies the way of representation of the knowledge presented in the natural language.</Paragraph>
    <Section position="1" start_page="198" end_page="200" type="sub_section">
      <SectionTitle>
3.1. Knowledge representation
</SectionTitle>
      <Paragraph position="0"> The information included in the logical formulae coming from the language module has to be stored for later use. The logical formulae are then introduced into the data base. The data base, adequately filled with the mentioned formulae, constitutes the knowledge represenlation carried through the natural language sentences. It is as equivalent to the text as the I order logic allows to convey the meaning of th~ natural language sentences.</Paragraph>
      <Paragraph position="1"> Data Base The date base consists of three separate parts: a nucleus, ~ amplifier and a filter /Konrad 76/. Each of the parts includes a different , from the concep- null % tional point of view, elements: A. The nucleus includes groud literals, which represent facts occuring in the field of knowledge represented in the base. E.g.the information that the pancreas is a secretory organ is presented as a literal (~ WYDZ-NARZAD (TRZUSTtfA)~ From the system point of view there is no conceptional difference between the tee facts: the above one,and (ORGAN (\[nRZUSTKA)) Thus the type /sort/ ORGAN may be regarded as a predicate and the above atomic formula as true one.</Paragraph>
      <Paragraph position="2"> B. The amplifier is a part representing the &amp;quot;fundamental&amp;quot; knowledge of the system. The formulae included in the amplifier can be devided into three categories: null I/ dependent formulae /i/Vx~ ~s~..VXnCS~ A~x~,.. ,x~,Ixf=~ A is here any formula and n a predicate. As we can see each variable,  bound by the universal ~uantifier is of a specified sort.</Paragraph>
      <Paragraph position="3"> 2/ independent formulae /ii/ ~XlrSS...~Xn(S \] ~(Xl,...Xn) 3/ restrictive formulae /iii/Vx 1Cs\]... ~ XngS\] l~(xl,...,x n)  The majority of the formulae generated by the language analysis module is of the /i/ form.</Paragraph>
      <Paragraph position="4"> C. The filter contains the formulae representing the Imowledge necessary to preserve the integrity of the data base.</Paragraph>
      <Paragraph position="5">  Recapitulating, the nucleus represents the extensional part of the knowledge represented in the data base. It is the fundamental knowledge which cannot be obtained from the amalysis of the presented text, and which is assential to proper deduction. The amplifier represents the intensional part of the data base. The knowledge represented there is a co31ection of statements used for deduction.</Paragraph>
      <Paragraph position="6"> Each of the logical formulae is kept in a certain internal form, corresponding to the way of deduction, described later on. As we have already mentioned, the majority of formulae is of the /i/ form. Every such formula is converted, at the moment of inserting into the data base, to a pair of the following form: (~conclusion~premises testing procedure) 3.2. The knowled6e extraction Because of the menner of storing the knowledge described in the point 3.1, the answer to the question presented to the system does not have to be represented explicite in the data base. The deduction module should be able to obtain all the information included in the data base.</Paragraph>
      <Paragraph position="7"> The questions presented to the system are also converted to the logical formulae. Thus, the extraction of knowledge is reduced to the verification of a given formula towards the present content of the data base.</Paragraph>
      <Paragraph position="8"> The logical formula representing the question is converted to an appropriate LISP form. Evaluation of such a form is equivalent to examination whether the represented by it formula is true. This form correspond to the normal form of the logical formula /LISP function AND, OR and NOT are used/. The literals are tested by a TESTE function according to the following algorithm: I. Check the amplifier, trying to find the rule with the conclusion unifiable with the literal under proof. If such a formula does not exist that there  is no proof of a given literal; 2. If there is such a formula then: a. if it is indicated as an independent formula then STOP with a proof b. if it is indicated as a restrictive formula then STOP without a proof~ c. otherwise evaluate the form asso null ciated with the conclusion; if we obtain NIT, /false in LISP/ then search the amplifier for another rule and go to 2. If we obtain value different than NIL then STOP</Paragraph>
      <Paragraph position="10"> with a proof.</Paragraph>
      <Paragraph position="11"> Otherwise Stop without a nroof. It is therefore a so called backward deduction zystem. The nroof goes back from the formula - aim ~ to the facts, applying the formulae from the amplifier in the &amp;quot;Backward&amp;quot; direction. The answer can be YES or NO or it can be a list of constants depending on the kind of question.</Paragraph>
      <Paragraph position="12"> The I order logic has been enriched here with some elements of the II order language. Predicate variables, quantification of these variavles and retrieval of predicates as well as constants have been introduced.</Paragraph>
      <Paragraph position="13"> 3.3. Access to the data base The system communicates with the data base through commands of the specially designed language. These commands enable introduction and erasing from the data base.</Paragraph>
      <Paragraph position="14"> The basic commands serving the purpose of knowledge extraction are TEST  and FIND: a. TEST A - looking for the proof of a formula A. Answer YES/NO.</Paragraph>
      <Paragraph position="15"> b. FIND ~1&amp;quot;&amp;quot;11'mX~xl&amp;quot;'xn) ~r~1&amp;quot;';x1&amp;quot; '~ ~i - predicate variables - retrieval of all the pairs: m-tuple predicates and n-tuple oe constants which satisfy a given formula A.</Paragraph>
      <Paragraph position="16"> 3.4. Example  The formula presented in the example I and a formula below have been introduced into the amlifier.</Paragraph>
      <Paragraph position="17"> Sentence: Wzrost napi@cia mi~dni6wki d~mnastnicy mo~e by4 przyczyn~ OZT.</Paragraph>
      <Paragraph position="18"> /The rise of the tonicity of the duodenum muscular coat may be the reason of acute pancreatitis/  Formula corresponding to the question is presented in the Example 2. The amplifier contains the formula describing transitivity of the predicate I~LY. Facts - ground literals - were introduced into the nucleus. E.g.</Paragraph>
      <Paragraph position="20"> After converting the formulae of theorem~ and question into the LISP form its evaluation Will find the answer to the question. The answer is of course YES.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="200" end_page="200" type="metho">
    <SectionTitle>
4. Conclusion
</SectionTitle>
    <Paragraph position="0"> The results obtained during the work on the system confirmed our direction of research. Our further work will concentrate on constant improvement of the existing modules. At the sere time we will undertake attempts of enriching the system with better deductive modules such as resolution in modal logic, default reasoning /Relter/, FUZZY and Minsky frames.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML