File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1021_metho.xml
Size: 12,340 bytes
Last Modified: 2025-10-06 14:11:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1021"> <Title>G~T : A GENERAL TRANSDUCER FOR TEACHING C~TIONAL LINGUISTICS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. THE UNIFORMITY AND SIMPLICITY OF THE SYSTEM </SectionTitle> <Paragraph position="0"> As a tool for training ccr~putational linguists, major emphasis was placed on developing a system that is user friendly, uniform, and which provides a legible syntax.</Paragraph> <Paragraph position="1"> One of the important requirements in machine translation is the separation of linguistic data and algorithms (Vauquois, 1975). The linguist should have the means to express his knowledge declaratively without being obliged to mix ~u-This project is sponsored by the Swiss government. null tational algorithms and linguistic data. Production systems (Rosner, 1983) seem particularly suited to meet such requirements (Johnson, 1982); the production set that expresses the object-level knowledge is clearly separated from the control part that drives the application of the productions. Colmerauer's Q-system is the classic example of such a uniform production system used for machine translation (Colmerauer, 1970; Chevalier, 1978: TAUM-METEO). The linguistic knowledge is expressed declaratively using the same data structure during the whole translation process as well as tb~ sane type of production rules for dictionary entries, morphology, analysis, transfer and generation. The disadvantage of the Q-system is its quite unnatural rule-syntax for non-prrx/rammers and its lack of flexible control mechanism for the user (Vauquois, 1978).</Paragraph> <Paragraph position="2"> In the design of our system the basic uniform sch~re of Q-systems has been followed, but the rule syntax, the linguistic data structure and the control facilities have been modernized according to recent developments in machine translation (Vauquois, 1978; BoPStet, 1977; Johnson, 1980; Slocan, 1982). These three points will be developed in the next section.</Paragraph> </Section> <Section position="4" start_page="0" end_page="90" type="metho"> <SectionTitle> 3. DESCRIPTION OF THE SYST~4 </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Overview </SectionTitle> <Paragraph position="0"> The general framework is a production system where linguistic object knowledge is expressed in a rule-based declarative way. The system takes the dictionaries and the grammars as data, cc~piles these data and the interpreter then uses them to process the input text. The decoder transforms the result into a digestable form for the user.</Paragraph> </Section> <Section position="2" start_page="0" end_page="88" type="sub_section"> <SectionTitle> 3.2 Data structure </SectionTitle> <Paragraph position="0"> The data structure of the system is based on a chart (Varile, 1983). One of the main advantages of using a c~art is that the data structure does not change throughout the whole process of translation (Vauquois, 1978).</Paragraph> <Paragraph position="1"> In the Q-system all linguistic data on the arcs is represented by bracketed strings causing an unclean mixture of constituent structure and other linguistic attributes such as grammatical and semantic labels, etc. With this representation type checking is not possible. Vauquois proposes two changes : I) Tree structures with uun~lex labels on the nodes in order to allow interaction between different linguistic levels such as syntax or semantics, etc. 2) A dissociation of the gecmetry from a particular linguistic level. With these modifications a single tree structure with complex labels increases the power of representation in that several levels of interpretation can be processed simultaneously (Vauquois, 1978; Boftet, 1977).</Paragraph> <Paragraph position="2"> In our system each arc of the chart carries a tree geometry and each node of the tree has a plex labelling consisting of a possible string and the linguistic attributes. Through the separation of gecmetry and attributes, the linguist can deal with two distinct objects: with tree structures and complex labels on the nodes of the trees.</Paragraph> <Paragraph position="3"> tring='linguist' \] at=noun, gender=p~ Figure i. Tree with cc~plex labelling The range or kind of linguistic attributes possible is not predefined by the system. The linguist has to define the types he wants to use in a declaration part.</Paragraph> <Paragraph position="4"> e.g.: category = verb, noun, np, pp.</Paragraph> <Paragraph position="5"> semantic-features = human, animate.</Paragraph> <Paragraph position="6"> gender = masc, fern, neut.</Paragraph> <Paragraph position="7"> An important aspect of type declaration is the control it offers. ~ne system provides strong syntactic and semantic type checking, thereby constraining the application range in order to avoid inappropriate transductions. The actual implementation allows the use of sets and subsets in the type definition. Further extensions are planned.</Paragraph> <Paragraph position="8"> C~'ven that in this systmm the tree geometry is not bound to a specific linguistic level, the linguist has the freedom to decide which infommation will be represented by the geometry and which will be treated as attributes on the nodes. This representation tool is thus fairly general and allows the testing of different theories and strategies in MT or computational linguistics.</Paragraph> </Section> <Section position="3" start_page="88" end_page="88" type="sub_section"> <SectionTitle> 3.3 The rule slnltax </SectionTitle> <Paragraph position="0"> The basic tool to express object-knc~ledge is a set of production rules which are similar in form to context-free phrase structure rules, and well-known to linguists from fozmal grammar. In order to have the same rule type for all operations in a translation system the power of the rules must be of type 0 in the Chomsky classification, including string handling facilities.</Paragraph> <Paragraph position="1"> The rules exhibit two important additions to context-free phrase structure rules: - arbitrary structures can be matched on the left-hand side or built on the rlght-hand side, giving</Paragraph> <Paragraph position="3"> the pfx~er of unrestricted rules or transformational grammar ~ - arbitrary conditions on the application of the rule can be added, giving the pc~er of a context sensitive grammar.</Paragraph> <Paragraph position="4"> The power of unrestricted rewriting rules makes the transducer a versatile inset for expressing any rule-governed aspect of language whether this be norphology, syntax, semantics. The fact that the statements are basically phrase structure rules makes this language particularly congenial to linguists and hence well-suited for teaching purposes.</Paragraph> <Paragraph position="5"> The fozmat of rules is detenuined by the separation of tree structure and attributes on the nodes. Each rule has three parts: geometry, conditions and assignments, e.g.:</Paragraph> <Paragraph position="7"> The geometry has the standard left-hand side, production symbol (~, and right-hand side of a production rule. a,b,c are variables describing the nodes of the tree structure. The '+' indicates the sequence in the chart, e.g. a+b : Conditions and asslgrm~nts affect only the objects on the nodes.</Paragraph> </Section> <Section position="4" start_page="88" end_page="90" type="sub_section"> <SectionTitle> 3.4 Control structure </SectionTitle> <Paragraph position="0"> The linguist has ~ tools for controlling the application of the rewriting rules : i) The rules can be grouped into packets (grammars) which are executed in sequence.</Paragraph> <Paragraph position="1"> 2) Within a given grammar the rule-application can be controlled by means of paraneters set by the linguist. According to the linguistic operation envisaged, the parameters can be set to a ccmbination of serial or parallel and one-pass or iterate.</Paragraph> <Paragraph position="2"> In all, 4 different combinations are possible : parallel and one-pass parallel and iterate serial and one-pass serial and iterate In the parallel mode the rules within a grammar are considered as being unordered from a logical point of view. Different rules can be applied on the same piece of data and produce alternatives in the chart. The chart is updated at the end of every application-cycle. In the serial mode the rules are considered as being ordered in a sequence. Only one rule can be fired for a particular piece of data. But the following rules can match the result prDduced by a preceding rule. The chart is updated after every rule that fired. The parameters one-pass and iterate control the nunber of cycles. Either the interpreter goes through a cycle only once, or iterates the cycles as long as any rule of the grammar can fire.</Paragraph> <Paragraph position="3"> The four ccmbinations allow different uses according to the linguistic task to be performed, e.g.: Parallel and iterate applies the rules non-deterministically to cc~pute all possibilities, which gives the system the power of a Turing Maritime (this is the only control mode for the Q-system). Parallel and one-pass is the typical ccrnbination for dictionaries that contain alternatives. Two different rules can apply to the sane piece of data. The exhale below (fig. 2) uses this combination in the first GRAMMAR 'vocabulary'.</Paragraph> <Paragraph position="4"> Serial and one-pass allows rule ordering. A possible application of this combination is a preference mechanism via the explicit rule ordering using the longest-match-first technique. The 'preference' in the example below (fig. 2) makes use of that by progressive weakening of the selectional restriction of the verb 'drink'.</Paragraph> <Paragraph position="5"> Rule 24 fires without semantic restrictions and rule 25 accepts sentences where the optional argument is missing.</Paragraph> <Paragraph position="6"> The ~le should be sufficiently self-explanatory. It begins with the declaration of the attributes and contains three grannars. The result is shown for two sentences (fig. 3). To demonstrate which rule in the preference gran~ar has fired each rule prDduces a different top label: rule 21 = PHI, rule 22 . PH2, etc.</Paragraph> <Paragraph position="7"> DECLARE cat ~ dot, noun, verb, val_nodo, np, phi, ph2, ph3, ph4, phE; number 5 sg, pl; marker =human, liquld, notdrinkablo, phyeobjdegabetr; valancu 5 vl, v2, v3~ argument - argl, erg\],arg3J</Paragraph> </Section> </Section> <Section position="5" start_page="90" end_page="90" type="metho"> <SectionTitle> 4. FACILITIES FOR THE USER </SectionTitle> <Paragraph position="0"> There is a system user-interaction in the two main prograns of the system, the compiler and the interpreter. The following exanple (fig. 4) shows how the error n~_ssages of the ccrnpiler are printed in the u~L~ilation listing. Each star with a number points to the approximate position of the error and a message explains the possible errors. The cc~piler tries to correct the error and in the worst case ignores that portion of the text following the error.</Paragraph> </Section> <Section position="6" start_page="90" end_page="90" type="metho"> <SectionTitle> @RAHMAR er~ortest PARALEL ITERATE </SectionTitle> <Paragraph position="0"> *0 pop. O : -ES- ISERIAL/ ou /PARALLEL/ attendu RULE 1 a+b m) c(a,b) \[F ETRING(a)m'blable' ANO cot(b)m\[nom THEN cAt(d) :m \[nom\]; POe1 *2 pos. 0 -E8- /,/ attendua pop. 1 -E8- /3/ ottendue pop. 2 -SEN- td. pop de~lni dane 14 geometria (cote d~oit) RULE 2 a(a) m) c(a,b) *0 pop. 0 : -SKM-- ld. deJa utlllso put pa~tie gouche ZF cot(a)m\[det\] THEN categ(b) :m \[noun\]; oO o1 pop. ~ i -SEH- ld. ne represente poe un ensemble pos. -SEPI- id. ne ~ep~esente pas un o|ement The interpreter has a parameter that allows the sequence of rules that fired to be traced. The trace in figure 5 below corresponds to the execution of the example (i) in figure 3.</Paragraph> <Paragraph position="1"> int|rpreteur do @-cedes O'J. |few-14-84 applicotten de lo ~egle 1 application de la regle 1 applicotion de 14 ~egle 2 application de lo regle 3 application de la reglp 6 application de la ~ogle 7</Paragraph> </Section> class="xml-element"></Paper>