File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1045_metho.xml

Size: 13,801 bytes

Last Modified: 2025-10-06 14:12:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1045">
  <Title>Automatic Extraction of Significant Terms from Texts), EIM~T X Kirschner, Z. (1987)= Kirschnert Z.= APd%C3-2: An English,to-Czech Machine Translation System, EBSAT X I I X</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. Th~ NotivaC/ion
</SectionTitle>
    <Paragraph position="0"> Using a computer, the morphological level is a basis for building the syntaotlcosemantic part of any NL analysis. The CL world pays more attention to morphology only after the work /Koskenniemi 1983/ was published. However, as Kay remarked (e.g.</Paragraph>
    <Paragraph position="1"> in /Kay 1987/), phonology was actually what was done in /Koskenniemi 1983/. Moreover, the strategy used there is best suited for agglutinative languages with almost one-to-one mapping between morpheme and grammatical meaning, but slavonic languages are different in this respect.</Paragraph>
    <Paragraph position="2"> One of the praotigal reasons for formalizing morphology is that although there are some computer implementations using a Czech morphology subsystem (/Haji~,Oliva 1986/, IKirschner 1983/, /Kirschner 1987/), based on the same sources (/EBSAT VI 198~/, /EBSAT VII 1982/), no unifying formalism for a complete description of formal morphology exists.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="222" type="metho">
    <SectionTitle>
2. The Po~malimm
</SectionTitle>
    <Paragraph position="0"> The terms alphabet, string, concatenation, * ~., symbol N (positive integers), indexes and are used here in the same way as in the formal grammar theory; the symbol exp(A) denotes the set of all subsets of A, e denotes an empty string. Uppercase letters .are used mainly for denotin~ sets and newly defined structures, lowercase letters are used for mappings, for elements of an alphabet and for strings.</Paragraph>
    <Paragraph position="1"> I~finition i. A finite set K of symbols is called a set of grammatical meanings (or simply meanings for short); values from K represent values of morphological categories (e.g, sg may represent singular number, p3 may represent dative (&amp;quot;3rd case&amp;quot;) for nouns, etc.).</Paragraph>
    <Paragraph position="2"> Definition 2- A finite set D = ((w,i) E A* x (N , {0))\], where A is an alphabet, is called a dictionary. A pair (w,i) ~ D is called a dictionary entry, w is a lexical unit and i is called pattern number. In the linguistic interpretation, a lexical unit represents the notion &amp;quot;systemic word&amp;quot;, but it need not be represented by a traditional dictionary form.</Paragraph>
    <Paragraph position="3"> Defini~i,n 3. Let A be a finite alphabet, K a finite set of meanings, V a finite alphabet of variables such that A a V = PS). The quintuple (A,V,K,t,R) where t is a mapping t: V ~&gt; exp(A*) assigni,~g types to variables, R is a finite 'set of rules (I,H,u,v,C), where I ~ N is is a finite set (of labels), C ~ (N u {0}7 is a finite set (of continuations), H n K is a set of meanings belongin~ to a particular rule from R, u,v E (A u V)-, is called a controlled rewriting system (ORS) |all variables from the left-hand side (u) must be present on the right-hand side (v) and vice versa (rule symmetry according to variables).</Paragraph>
    <Paragraph position="4"> Definition 4. Let T = (A,V,K,t,R) be a CRS.</Paragraph>
    <Paragraph position="5"> A (simple) substitution on T will be any mapping q: V -&gt; A* |q(v) s t(v).</Paragraph>
    <Paragraph position="6"> I)efini~ion 5- Let T = (A,V,K,t0R) be a ORS and q a simple substitution on T. Happin~ d: CA , V) z -&gt; A ~ such that d(e) = e |d(a) = a for a ~ A |d(v) = q{v) for v ~ V; d(bu) = d(b)d(u) for b E CA v V), u s CA , V) ~ will be called (generalized) substitution derived from q.</Paragraph>
    <Paragraph position="7"> Comment. The (generalized) substitution substitutes tin a given string) all variables by some string. The ~ame string is substituted for all oucu~ences of this variable (follows from the definition).</Paragraph>
    <Paragraph position="8"> Definition 6. Let T = (A,V,K,~,R) be a CRS and F ~ K. Let then G, G' ~ K, w,z ~ (A , V) ~, i E N, i' E (N u {0}). Me say that w ~an be directly rewritten in the state (i0G) to z with a continuation (i',G') according to meanings F (written as w(i,G) =&gt;\[T,F\] ~(i',G')), if there exist such rule (l,H,u,v,C) E R and such simple substitution q on T, that i ~ I, i' s C, H n F, G = G' , H, d(u) = w and d(v) = z, where d is the substitution derived from q.</Paragraph>
    <Paragraph position="9"> Relation =&gt;~\[T,F\] is defined as the reflexive and transitive closure of =&gt;iT,F\].</Paragraph>
    <Paragraph position="10"> Comment. The CRS is controlled through continuations and labels. After a dlreot rewriting operation, the only rules that could be applied next must have in their label at least one number from the rewritln K operation continuation. Please notice that: - this operation always rewrite~ whole words| - the restriction on the left-hand and right-hand side of a rule that it should be only string (of letters and/or variables) is not so strong as it may seem, because no restrictions are imposed on the substitution q. However, to be able to implement the rules in a particular implementation as finite state machines, we shall require q to be defined usin~ regular expressions onlyo ~fi~i~ion 7. Let T = (A,V,K,~,R) be a CRS and let n be the maximal numbe~ from all  labels from all rules from R; n-tuple P = (pl, ..., pn) will be called a list of patterna; on T (the elements of P are called patterna;) if for every i a mapping pi: exp(K) x A* -&gt; t)xp(A ~) is defined as z E pi(F,w) &lt;=&gt; wCi,F) =:&gt;~\[T,F\] zOO,{)).</Paragraph>
    <Paragraph position="11"> Comment. The &amp;quot;strange&amp;quot; sets G and G' from the definition 6 acquire a real meaning only in connection with the definition of patterns; they have a controlling task during pi cons%)ruction, namely, they check whether all meanings from F are used during the derivation. &amp;quot;To use a meaning k&amp;quot; means here that th,:~re is some rule (l,H,u,v,C) applied in the ~ourse of derivation from w(i,F) to z(O,()) such that k E H. Such meaning can then be removed from G when constructing G' (see Def~ 7); meanings not from H cannot.</Paragraph>
    <Paragraph position="12"> Thus, to get the empty set in z(O,()) when startin~ from w(i,F), all meanings from F must be &amp;quot;used&amp;quot; in this sense.</Paragraph>
    <Paragraph position="13"> A patte&gt;?n describes how to construct to a given wo&gt;zd w all possible forms according to meaning~ F.. In this sense, the notion of pattern does not differ substantially from the traditional notion of pattern in formal morphology, although traditionally, not the constructive description, but just some represent;afire of such a description is called a pattern.</Paragraph>
    <Paragraph position="14">  Deflnlt|x;n 8. Let D be a dictionary over an alphabet A, T = (A,V,K,t,R) a CRS and P a list of patterns on T. A quadruple H = (A,D,K,P) is called a morphology description on T (H\['C\]-description).</Paragraph>
    <Paragraph position="15"> Def|ni~|.t)n 9. Let T = (A,V,K,t,R) be a CRS and H = (A,D,K,F) an H\[T\]-description. Set L = (z ~ A:'~; there ex- w E A~ i E N, H ~ K; z  pi(H,w)} will be called a language generated by H\[T\]-description H. The element~ of L will be called word forms.</Paragraph>
    <Paragraph position="16"> Comment. The term morphology description introduced above is a counterpart to a description of a system of' formal morphology, as used in traditional literature on morpholo~y.</Paragraph>
    <Paragraph position="17"> Definition 9 is introduced here just for the purpose of formalization of the notion of word form, i.e. any form derived from any word from the dictionary using all possible meanings according to H\[T\].</Paragraph>
    <Paragraph position="18">  Definiti~)n 10- Let T = (A,V,K,t,R) be a ORS and M == (A,D,K,P) be HET\]-description. The term syn.i;hesis on M is used for a mapping s: exp(K) x A ~ -&gt; exp(A*); s(H,w) = (z; ex. i  N, i &lt;~= n; z ~ pi(H,w) &amp; (w,i) E D}. The term ant~lysis is used then for a mapping a: A ~ -&gt; exp(exp(K) x A~); a(z) = ((H,w); z s{H,w)).</Paragraph>
    <Paragraph position="19"> Comment. According to definition I0, synthesi~ means to use patterns for words from the dictionary only. The definition of analysis; is based on the syhthesis definition, so it clearly and surely follows the intuition what an analysis is. In this sense, these definitions don't differ substantially from the traditional view on formal morphology, as opposed to Koskenniemi; however, the so~called oomplex word forms (&amp;quot;have been called&amp;quot;) are not covered, and their an~Iysis is shifted to syntax.</Paragraph>
    <Paragraph position="20"> The definition of analysis is quite clear, but there is no procedure contained, capable of actually carrying out this process.</Paragraph>
    <Paragraph position="21"> However, thanks to rule symmetry it is possible to reverse the rewriting process: Definition tl. Let T = (A,V,K,t,R) be a ORS. Further, let G G = a K, i ~ N, i' ~ (N v (0)), z,w E A ~. He say that under ~he condition (i',G') it is possible to directly analyse a string z to w with a continuation (i,G) (we write z(i',G' ) =&lt;\[T\] w(i,G)), if there exists a rule (I,H,u,v,C) E R and a simple substitution q on T such that i E I, i' E C, G = G' u H, d(u) = w a d(v) = z, where d is the generalized substitution derived from q. A relation &amp;quot;it is possible to analyze&amp;quot; (=&lt;~\[T\]) is defined as a reflexive and transitive closure of =&lt;\[T\]. Definition 12. Let T = (A V,K,t,R) be a ORS and z e A . Every strln~ w s A , i e N and F }&lt; such that z(O,PS}) =&lt; &amp;quot;\[T\] w(i,F) is called a predecessor of z with a continuation (i,F). Lemma. Let T = (A,V,K,t,R) be a ORS and w E A* a predecessor of string z g A * with a continuation (i,P). Then z E pi(F,w), where pi is a pattern by T (see Def. 7). Proof (idea). The only &amp;quot;asymmetry&amp;quot; in the definition of =&gt; as opposed to =&lt;, i.e. the condition H n F, can be solved putting (see Def. 11) P = (} v HI u H~ u * .. ~, Hn (for n analysis steps). Then surely Hi a F for every i.</Paragraph>
    <Paragraph position="22"> Theorem. Let T = (A,V,K,t,R) be a CRS, H = (A,D,K,P) an H\[T\]-desoription, a an analysis by H and w s A* a predecessor of z e A ~ with a continuation (i,F). Moreover, let (w,i) E D. Then (F,w) ~ a(z).</Paragraph>
    <Paragraph position="23"> Proof follows from the precedin~ lemma and from the definition of analysis.</Paragraph>
    <Paragraph position="24"> Comment. This theorem helps us to manage an analysis of a word form: we begin with the form being analysed (z) and a &amp;quot;continuation '' (0,(3), using then &amp;quot;reversed&amp;quot; rules for back rewriting. In any state w(i,F) during this process, a correct analysis is obtained whenever (w,i) is found in the dictionary.</Paragraph>
    <Paragraph position="25"> At the same time we have in F the appropriate meanings. Passin~ along all possible paths of back rewriting, we obtain the whole set a(z).</Paragraph>
  </Section>
  <Section position="4" start_page="222" end_page="223" type="metho">
    <SectionTitle>
3. An Example
</SectionTitle>
    <Paragraph position="0"> To illustrate the most important features of the fcrmalism described above, we have chosen a simplified example of Czech adjectives (regular declination acccrding to two traditional &amp;quot;patterns&amp;quot; - mlad~ (young) and jarn~ (spring), with negation, full comparative and superlative, sg and pl, but only masc. anim. nominative and genitive).</Paragraph>
    <Paragraph position="2"/>
    <Paragraph position="4"> podl~#(O,{sup}) .............. S&amp;quot; not empty, so this is not a solution Possibilities without removinK &amp;quot;used&amp;quot; meanings are not shown; all lead to non-empty G' in the resultin~ z(O,G').</Paragraph>
    <Paragraph position="5"> Fig. 2 ................................... . ..................................</Paragraph>
    <Paragraph position="7"> * v.vp 3noveJsz(1,(negtmasc,pl,acc}). not in D nejnov~J~{(1,{masc,pllacc}) .... not in D nejnov~j~(3,{masc,pl,nom}) =&lt; ..... not in nov~(2,{sup,masc,pl,nom}) =&lt; ..... not in D nov~(1,{sup,masc,pl,nom}) ............ s D; SOLUTION ... same as 1st alter., but nom instead of ace ... * v.v~ nejnoveJsz(3,{masc,sg,nom}) =&lt; ..... not in D nov~(2,{sup,masc,sg~nom}) =&lt; .... not in D nov~(1,{sup,masc,sg,nom}) ........... s D; SOLUTION ... same as 1st altar., but sg,ncm instead of pl,acc nejnov~j~(3,{masc,pl,nom}) =&lt; ..... not in D nejnov~j~(2,{masc,pl,nom)) =&lt; ...not in D (2 alter.) nejnovSjg#(1,{masc,pl,nom}) .... not in D  * v .vs jnove3sy(1,{neg,masc,pl,nom)), not in D Fig. 3 ....................................................................... An example of synthesis: we want to obtain</Paragraph>
    <Paragraph position="9"> see fig. 2 An example of analysis: we want to obtain a n * w*v. ( eJnovejsz#); see fig. 3 Comment* Better written rules in CRS would not allow for the 4th alternative in the . v. vs. first step (&amp;quot;ne3nove3sy), because &amp;quot;~&amp;quot; could not be followed by &amp;quot;9&amp;quot; in any Czech word form; however, constructing the other unsuccessful alternatives could not be a priori cancelled only the dictionary can decide, whether e.~. &amp;quot;jnov~&amp;quot; is or is not a Czech adjective.</Paragraph>
    <Paragraph position="10"> Comment on comment. No o,~ange in the rules would be necessary if a separate phonology and/or orthography level is used; then, the &amp;quot;~&amp;quot; possibility, bein K orthographically impossible, is excluded there, of course.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML