File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1070_metho.xml
Size: 25,978 bytes
Last Modified: 2025-10-06 14:12:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1070"> <Title>Schema Method: A Framework for Correcting Grammatically Ill-formed Input</Title> <Section position="3" start_page="0" end_page="342" type="metho"> <SectionTitle> 2. Non-native speaker's ill-formed phenomena </SectionTitle> <Paragraph position="0"> In this section, treated examples of non-native speaker's ill-formed phenomena are given. The application is a CAI system for Japanese junior high school students in a primary English course. Their errors are different from a native speaker's. Typical errors are shown in Table 1.</Paragraph> <Paragraph position="1"> English is very different from Japanese in parts of speech, word-order, tense, etc. For a Japanese, there is no concept of(l) countable and uncountable nouns ~:> ~ ~> in Table 1, (2) singular and plural forms <~ (3) articles ~> ~> (4) agree-merit between subject and verb @ (5) adverb word-order ~.</Paragraph> <Paragraph position="2"> Japanese interfered with the students' acquision of English. The following errors are often made by Japanese adults as well. (4)verb style <~ (5) category mistakes, word misuse ~>. Furthermore, junior high school students are reading and hearing a foreign language (English) for the first time, and thus have no concept of foreign language whatsoever. (6) Logical error @: the student who made the mistake explained that &quot;are + not -* aren't&quot;, &quot;is + not -* isn't&quot; so&quot; am+ not --* amn't&quot;. (7) Primary students are not familiar with English grammar and can't distinguish between &quot;Who&quot; or &quot;Where&quot; @ @. (8)Surface error: letter or punctuation problems <~*He plays piano. <~*He plsy the baseball.</Paragraph> <Paragraph position="3"> He plays the piano. He plays baseball.</Paragraph> <Paragraph position="4"> @*some good advices '<~*I am student.</Paragraph> <Paragraph position="5"> some good advice I am a student.</Paragraph> <Paragraph position="6"> @*A moon is smaller than an erath.</Paragraph> <Paragraph position="7"> The moon is smaller than the earth.</Paragraph> <Paragraph position="8"> ~*He is one of those men who is difficult to please.</Paragraph> <Paragraph position="9"> He is one of those men who are difficult to please.</Paragraph> <Paragraph position="10"> <~*I have finished my homework already.</Paragraph> <Paragraph position="11"> I have already finished my homework .</Paragraph> <Paragraph position="12"> ~>*He is listening music on the radio now.</Paragraph> <Paragraph position="13"> He is listening to music on the radio now.</Paragraph> <Paragraph position="14"> <~*We cannot play baseball in here.</Paragraph> <Paragraph position="15"> We cannot play baseball here.</Paragraph> <Paragraph position="16"> @*Yes, I amn't.</Paragraph> <Paragraph position="17"> Yes, I am not. Yes, I'm not.</Paragraph> <Paragraph position="18"> ~*Who does cook breakfast? ~*Where they live? Who cooks breakfast? Where do they live? @*Does mr. brown have a book Does Mr. Brown have a book? @*We must stop to complain.</Paragraph> <Paragraph position="19"> We must stop complaining.</Paragraph> <Paragraph position="20"> Grammatical errors ~ @ are treated, but not semantic errors ~> and absolutely ill-formed sentences which are not comprehensible. The aim is to diagnose grammatical errors and show a reason for the error. For example: Input sentence; Mr Brown has a pen, correction; Mr. Brown has a pen.</Paragraph> <Paragraph position="21"> the reason; A period is needed after&quot; Mr&quot;. The comma after &quot;pen&quot; should be a period.</Paragraph> </Section> <Section position="4" start_page="342" end_page="345" type="metho"> <SectionTitle> 3. Basic idea </SectionTitle> <Paragraph position="0"> In this section, the basic idea of the frsmework and ~be problem of the LFG unification mechanism in dealing with ill-formed input is described.</Paragraph> <Section position="1" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 3.1 Two -level filter </SectionTitle> <Paragraph position="0"> The framework uses two-level filters for input sentence classification: a well-formed sentence, a relatively ill-formed sentence or an absolutely ill-formed sentence as shown in Figure 1.</Paragraph> <Paragraph position="1"> (1)First an attempt to parse the input, using normal context-free grammar (Filter I ) is made~ Both a wello formed sentence and the relatively illoformed sentence which includes feature errors are passed through the filter (Filter I ).</Paragraph> <Paragraph position="2"> (2)Secondly, these inputs are checked with a strong filter (FilterII). A well-formed sentence passes, but a relatively ill-formed sentence does noL (3)An input which is not passed through the first filter (Filter I ), includes word-order or omitted-word errors~ or unnecessary words @ @. The input is classified by a filter (~), called Improper Grammar, as relatively ill-formed or absolutely ill-formed.</Paragraph> <Paragraph position="4"/> </Section> <Section position="2" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 3.2 Filter test </SectionTitle> <Paragraph position="0"> Filter ( I ) is a context~free grammar. This filter is a weak filter. Therefore some relatively ill-formed inputs are passed. Consider how many sentences are derived from the grammar rules in Figure 2. 25 (5 x 1x5) sentences are generated by the grammar rules and dictionary entries. Of course, not only well-formed sentences as in (1) below, but also ill-formed sentences as in (2), (3), (4) below~ are included.</Paragraph> </Section> <Section position="3" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 3.3 The problem of the LFG unification mechanism f~o </SectionTitle> <Paragraph position="0"> ill-formed input Relatively ill-formed sentences, as well as feature errors, pas;~ t~rough Filter( I ). Filter(II) must work as a strong grammatical filter. LFG contains such a strong filter, callc,d the unit'ication mechanism, '&quot;front F-. Descriptions to F-Structures fKaplan 1.982 (pp.203)/&quot;. For exmnpl% &quot;This is a apple&quot; In LFG a-disagreement, &quot;a apple&quot;, is rejected because the following equations are not unified.</Paragraph> <Paragraph position="2"> I~owever~ for diagnosis and error-correctlon there are :~ome drawbacks in LFG framework : (1)LFG canq: check an error of omission as in the noun phrase '~ apple' in the sentence &quot;This is apple&quot;.</Paragraph> <Paragraph position="3"> As tile sentence lacks the article &quot;an&quot;, there is no determiner equation and the unification mechanism does not work. Thus the sentence is recognized as a well-formed sentence.</Paragraph> <Paragraph position="4"> f O from (h :lack of article</Paragraph> <Paragraph position="6"> (2)LFG has no error-correction framework. It only rejects the ill-formed input. Addition of an error-correction mechanism i'~ thus necessary.</Paragraph> <Paragraph position="7"> 304 Improper Grammar \[Filter (liD\] In this application, users are non-native speakers unfamiliar with English grammar. Thus, a user often makes word-order errors, includes unnecessary words, or leaves out words @ @. A teacher could show why &quot;does&quot; is not necessary in the sentence @ &quot;*Who does cook breakfast'S&quot;, or wily &quot;do&quot; is needed in @ &quot;*Where they live?&quot;. If a :~ystem diagnoses such sentences, it needs to provide the grammar rules tbr analysis. The type of error shown in Figure 3 is called improper grammar.</Paragraph> <Paragraph position="9"> *wire doe~ cook breakfast ? *where they live ? Figure 3 Examples of improper grammar 4, '~?he fldegam(~wo~'k In Ibis section an overview of the framework is explained. Unificagon approach has some drawbacks for diagnosis as we described in 3.3. A new method is used as a filter (lI). The idea is to compare input style with proper m, rfi~ce sty\]~.s which are synthesized from lexical and grarmmaticai conditions. An interpretation schema collects l:he conditions (surface schema and LFG schema) and an L~\[erpretation rule synthesizes proper styles and judges whether the sentence is ill- or well,formed as shown in Figure 4. In this section, at first, new schemata are notated: surface schema (4.1), surface constraint (4.2), in~e~degpre~ation schema, interpretation schema with condition, conditional schema and kill schema (4.3). And then the ins~mnfiation mechanism and interpretation of new schemata are described (4.4) (4.5). Finally error-correction is illustrated (4.6).</Paragraph> </Section> <Section position="4" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 4.1 Inl~ut processing </SectionTitle> <Paragraph position="0"> \[Surface schema\[ A capital letter and a punctuation indicate surface of an input sentence. In this framework such inibrmation is represented as a schema, called a surface schema. In the input processing, the input sentence is converted into surface schemata. The schema is notated as follows.</Paragraph> <Paragraph position="1"> (gn f-name) =value &quot;gn&quot; is the designator which shows the word-order &quot;n&quot;. &quot;f-name&quot; is a function name of schema, like word, letter or mark, etc. &quot;value&quot; is its schema's value.</Paragraph> <Paragraph position="2"> For example, tile ill-formed input, &quot;MR.Brown have eat a apple,&quot; is represented as surface schemata in Figure 5. &quot;MR.&quot; is represented as lout-surface schemata: &quot;(gl word) -- mr&quot;; the word is &quot;mr&quot;.</Paragraph> <Paragraph position="3"> &quot;(gl mark) =period&quot;; the mark after the word is a period. &quot;(gl letter) = 1&quot;; the first letter of the word is a capital (&quot;M&quot;). &quot;(gl letter) = 2&quot;; the second of the word letter is a capital (&quot;R'). Input sentence: *MR. Brown have eat a -apl~ieV--I</Paragraph> <Paragraph position="5"/> </Section> <Section position="5" start_page="342" end_page="343" type="sub_section"> <SectionTitle> 4.2 Lexicon </SectionTitle> <Paragraph position="0"> \[Lexical surface constraint\[ In the lexicon, lexical features and constraints are involved as schemata. A constraint for a surface schema is called a surface constraint. A surface constraint is notated as follows:</Paragraph> <Paragraph position="2"> &quot;IT&quot; means meta-vm:iable. &quot;It&quot; is substituted for &quot;gn&quot;, when the surface constraint is instantiated.</Paragraph> <Paragraph position="3"> There are two kinds of surface constraints: lexical and granmaatical. The capital letter &quot;M&quot; in &quot;Mr.&quot; is a lexieal constraint, because it is capitalized regardless of sentence position. A lexical surface constraint is assigned to the dictionary (Figure 6).</Paragraph> <Paragraph position="4"> (IT word) =cmr; the word must be &quot;mr&quot;.</Paragraph> <Paragraph position="6"/> </Section> <Section position="6" start_page="343" end_page="344" type="sub_section"> <SectionTitle> 4.3 Grammar </SectionTitle> <Paragraph position="0"> \]Grammatical surface constraint\] The first letter in a sentence is always a capital letter and the last punctuation in a sentence is noted as a mark ( a period, a question mark or an exclamation point, etc.).</Paragraph> <Paragraph position="1"> These are regarded as grammatical constraints. In our h'amework these grammatical constraints are represented as grammatical surface constraints. They are assigned to grammar rul~ as shown in Figure 7.</Paragraph> <Paragraph position="2"> (ITF letter) = C/1; This means the first letter in the sentence must be acapital letter. ITFshows firstorderinthesentence.</Paragraph> <Paragraph position="3"> (ITL mark)=cperiod; This means the last mark in the sentence must be a period. IT L shows last order in the sentence.</Paragraph> <Paragraph position="4"> \[Interpretation schema\] In order to diagnose and correct errors, our framework has three steps; (1)collecting information on the input sentence, (2)synthesis of interpretation and (3) comparison of(l) and (2).</Paragraph> <Paragraph position="5"> The interpretation schema collects LFG schemata and surface schemata. It is assigned to lexicon or grammar rules. In the parsing process, it is instantiated and collects schemata. The schemata corrected by interpretation schema are conveyed to the interpretation rule. This schema is notated as follows.</Paragraph> <Paragraph position="7"> T is a meta:variable as well as LFG notation and &quot;f- name&quot; is a functional name of the interpretation schema. Its Values are sets of schemata* For example an interpretation schema for agreement between determiner and noun is notated as follows.</Paragraph> <Paragraph position="8"> (~) ( t DET-NOUN)=i{\[DET\],\[NOUN\]} \[DET\] means set of schemata from determiner, and \[NOUN\] means from noun.</Paragraph> <Paragraph position="9"> (Example 1) For the correctly-formed noun phrase &quot;an apple&quot;, the interpretation schema, DET-NOUN, is attached to grammar rule (1) as shown in Figure 8. In instantiation, the interpretation schema collects LFG schemata in lexicon and surface schemata as its values below.</Paragraph> <Paragraph position="10"> \[Interpretation schema with a condition and conditional schema\] An interpretation schema with a condition, and its conditional schema are a pair and act as an interpretation schema. An interpretation schema with condition can act when there is a conditional schema. These schemata are notated as (a) an interpretation schema with a condition:</Paragraph> <Paragraph position="12"> For example, this schema (~) means that if a noun phrase \[NP:f2\] is a pronoun \[PRONOUN\], it checks whether the case of pronoun is subjective \[subj\[. If the noun phrase is not a pronoun, such as &quot;an apple&quot;, there is no need to check.</Paragraph> <Paragraph position="14"> The following schema (~) is its conditional schema. It is attached to grammar rule (5) and means the noun phrase is a pronoun.</Paragraph> <Paragraph position="16"> A kill schema is the instantiation inhibition mechanism. It works to kill the interpretation schemata and is notated as follows:</Paragraph> <Paragraph position="18"> @)This schema checks agreement between determiner, adjective and noun such as 'the same name', '*some good advices', '*a good jobs', and '*a interesting book'.</Paragraph> <Paragraph position="19"> (r)This schema checks whether verb ibrm (V-FORM} is a proper tbrm for subject style (SUBJ). \[NP:f2\] is subject. For example &quot;Tom gives...&quot;, &quot;*He laugh ...&quot;, &quot;You made _..&quot; and &quot;*Mr,and Mrs. Brown laughs ...&quot;. (~)This schema checks whether auxiliary verb form (A-FORM) iC/,; a proper form for subject ~tyle (SUBJ). \[NP:f2\] is subject. For example &quot;*Tom have given...&quot; and &quot;He can laugh ...'.</Paragraph> <Paragraph position="20"> @ This schema checks whether verb form (V-FORM) is a proper titan for auxiliary verb. For example '~l~om has given...&quot;, '~*Tom has give..,&quot;, &quot;*You can laughed ...&quot; and &quot;He is speaking ...&quot; @This ~chema checks agreement between subjective &quot;be&quot; noun phrase, verb . and compliment. \[NP:f2\] is subjective ~aoun phrase and \[NP:fS\] is compliment. For exaraple &quot;*These is apples.&quot; , &quot;*He is students.&quot; and &quot;*They are a student.&quot; Figure 10 Examples of grammar and interpretation schema 1' is a metaovariable and &quot;f-name&quot; is a kill-schema's name. Its value in { ....... } is the killed schmnata's name.</Paragraph> <Paragraph position="21"> There are hierarchy and priority between interpretation schemata. A kill schema is used to keep interpretation schemata independent. The schema attached to noun phrase can collect schemata only wiflfin the noun phrase, while the schema attached to sentence level can collect schemata in the sentence. Thus, the former is local and the latter is global. For example, &quot;* This is a apples. &quot; Tile noun phrase, &quot; a apples &quot;, is wrong and should be &quot;an apple&quot;. But the local interpretation schema ~ (Figure 10) can't determine which is correct, &quot;an apple&quot; or &quot;apples&quot;, while the global interpretation schema @ can judge that &quot;an apple&quot; is correct. The global interpretation schema (r) checks ibr agreements within \[NP:fS\] instead of the local interpretation schemata (~) or (r). Therefore, the local interpretation schemata (J) and (.2), are not necessary.</Paragraph> <Paragraph position="22"> Thus, the kill schema @, which corresponds to the global interpretation schema @, kills local interpretation schemata Q) and (r).</Paragraph> <Paragraph position="24"/> </Section> <Section position="7" start_page="344" end_page="345" type="sub_section"> <SectionTitle> 4.4 lnstantiation </SectionTitle> <Paragraph position="0"> How to instantiate schema is explained. Both t and ~ meta-variables are assigned to actual variables (f l, f2....) as well as LFG.</Paragraph> <Paragraph position="1"> A surface schema, a surface constraint and an interpretation schema include &quot;IT&quot; meta-variables. &quot;IT&quot; recta-variables are assigned as follows.</Paragraph> <Paragraph position="2"> (Din input processing, the designator &quot;gn&quot; which shows the word-order in the input sentence is assigned to surface schema.</Paragraph> <Paragraph position="3"> (2)When' the dictionary is looked up, surface constraints in the lexicon are instantiated. &quot;IT&quot; meta-variable in a surface constraint is bound to the designator &quot;gn&quot; in surface schema.</Paragraph> <Paragraph position="4"> (3)When a grammar rule is fitted, surface constraints in the S: fl Grammar</Paragraph> <Paragraph position="6"> grammar rule are bound to tire designator &quot;gn'.</Paragraph> <Paragraph position="7"> An example is shown in Figure 11.</Paragraph> </Section> <Section position="8" start_page="345" end_page="345" type="sub_section"> <SectionTitle> 4.5 Interpretation (Filter It) </SectionTitle> <Paragraph position="0"> After the parsing proces.% interpretation schemata, interpretation schemata with a condition, conditional schemata and kill schemata are instantiated. Interpretation schemata are interpreted by interpretation rule. Input is judged for consistency or inconsistency.</Paragraph> <Paragraph position="1"> The interp,'etation schemata are independent, thus the interpreted order is free. The interpretation flow is as follows.</Paragraph> <Paragraph position="2"> (1)check conditional schema: if it is an interpretation schema with condition, find the paired condition. If conditional schema are not paired, inhibit the instantiated interpretation schema with a condition.</Paragraph> <Paragraph position="3"> (2)check kill schemata: if the kill schema includes interpretation schemata which should be killed, inhibit the instantiated interpretation schema.</Paragraph> <Paragraph position="4"> (3)Interpretation rule: if it is not included, interpret it. \[Interpretation rule I An interpretation rule diagnoses the input sentence.</Paragraph> <Paragraph position="5"> The schemata collected by an interpretation schema are checked by an interpretation rule. An interpretation rule synthesizes the word by using collected schemata. The diagnosis process is as follows.</Paragraph> <Paragraph position="6"> (1)Find input style from an interpretation schemata.</Paragraph> <Paragraph position="7"> (2)Synthesize correct style by *using an interpretation rule. (3)Compare input style with synthesized style, if consistent, the input style is right. If not, correct the input style to the synthesized correct style.</Paragraph> <Paragraph position="8"> An interpretation rule synthesizes the result with conditions from interpretation schema. For example, the I)ET-NOUN rule is Shown :in Table 2. This rule determines if the noun is corrected and synthesizes the specification (SPEC) tbrm as adapted for the noun.</Paragraph> <Paragraph position="9"> (Example 1) In the case 0f correctly-formed noun phrase &quot;an apple&quot;, the interpretation rule is shown in Figure 8. (1)input style: (gi word)=an, (gi+l word) = apple from surface schemata in Figure 8.</Paragraph> <Paragraph position="10"> (2)synthesized style: conditions are (~'NUM)=SG, ( '~ SPEC1)= 'an/the' from noun and ( i&quot; SPEC) ='an' from determinant in Figure 8, the result is (~ SPEC)=an from (3)Compare '(gn word)== an' with '( t SeEC)=an '. Th~ value is the same. Thus this noun phrase is correctly.-ibrmedo (Example 2) In the case of' the ill~tbrmed noun phrase &quot;(D apple&quot; which lacks an article, the interpretation rules are shown in Figure 9.</Paragraph> <Paragraph position="11"> (1)input style: ~, (gj word) :--= apple fYom surface schemata. (2)synthesized style: conditions are (\]'NUM):=::\[~G, ( 1&quot; SPEC1) =: ~an/the' from noun, the re~';ult is ( ~ SPI~C)---- an from rule 10 in Table 2.</Paragraph> <Paragraph position="12"> (3)Comparison O with ( $ SI'EC)=an, as a result it lacl~:s the article &quot;an&quot;. Add the surface constraint &quot;(gn -0.5 wo, rd):::c an&quot; beibre &quot;(gn word) = capple&quot;.</Paragraph> </Section> <Section position="9" start_page="345" end_page="345" type="sub_section"> <SectionTitle> 4.6 Error eorrecLion </SectionTitle> <Paragraph position="0"> The error correction phase explains the erroz to the user. For example, &quot;*MR. F, rown have eat apple/~ the f:low of'error correction is shown in Figure 12. input sentence i~ converted into surface schemata and parsed. Surface constraints and interpretation schemata are then obtained.</Paragraph> <Paragraph position="1"> These interpretation rules are diagnosed and three errors found; (1)SUBJ&A-FORM, (2)AUX&V.FORM and (3)DleT-.i'~OUN Input sentence: *MR. Brown have eat apple, (;,Vigure 10), Fmih~('w_ore, surface errors, (4)MARK and (5)I,ETTER, a~'e l~rlmd by the difference between surface :~chen-mta at~d surface constraints. The surface constraints are replacC/~d by ~;ynthesized schemata. The corrected seaten,:e, &quot;Mr. Brown has eaten an apple. &quot;, is then synthe~:.;zed ~Yom surface constraints. The explanations 1) ~ deg5) are g, mcrated by tim result of interpretation rule. This i}'m~,ework to a CAI ~;ystem, called&quot; :\],~ :,i{}(English) (JA\[&quot;~ was applied and designed to teach English to junior hiKh ;-mimol students. This :.~y~tem has two main modules; (l)machine translation/Kudo 19g(i/and (2)this Crm,lework.</Paragraph> <Paragraph position="2"> If stm!e~t~: p~'oda(:C/~ ill..fiwmed English int)nL, the sy~tem corrects the errors arid shows why they are wren F. If there are no erro~'s~ gi~e sentence i.~ translated into Japanese.</Paragraph> <Paragraph position="3"> This sy,~i.em was implemented i~ Prolog (about 120KB).</Paragraph> <Paragraph position="4"> Performam~e is reul-.tir~,e (answers within 5 seconds).</Paragraph> <Paragraph position="5"> Actually t,his system was ilscd by junior high school st,dentsdeg We collected mistakes and then ted back to th,; systemdeg This ~Lystem is one of applications of this \]Yamework in a limited d(m) aii). The framework is easy to apply to another domain. To construct a m',,v system, only need be changed the grammar, dictionary and interpretation rulcs.</Paragraph> <Paragraph position="6"> 6, i~imitati,, m and futm,e worl~ The ti'a:mework/b.c grammatically ill...Ibrmed input was described~ (1'he following problems remain unsolved: (1)The _ Im ui(m of semantically illdbrmed input: in this framework a semantically ill-..formed sentence is passed. A scma~,~.ic iii ~er mast be added alter filter ( l I ).</Paragraph> <Paragraph position="7"> (2)The problem of interpretation: interpretation is often changed by context and situation. Human beings correct ill-formed sentences by recognizing context and situation.</Paragraph> <Paragraph position="8"> Fo~&quot; example, I1, is a boy? Which ic Lerpre~ation is right, dialogue situation, word..</Paragraph> <Paragraph position="9"> order error Cls he a boy ?) or misimnctuation (He is a bay.)? A system wil) need a context recognizer and a situation recognizer~ C(meiasio~ ~ This paper has suggested the schema method, a new i~amework tbr correcting ill-formed input. This fl'amework recognizes input at two steps with weak and strong filters. When it is known what sentences are passed by the filter, it ca~ be u:~ed even if' imperfect. This method has the tbllowing ad vantages: Cl)the proL(\[cul of control strategies for relaxation can be avoided beet tase the relaxation teelmiqae is not used, and (2)comfmtational efficiencydeg The LF(i~ floamework tbr correcting grannaatically illfi)~-med input was extended; a. mlrface schema and an i~terpretation schema have bee~ proposed. This fl&quot;arnework ca~, correct enters without breaking LFG fi'amework, because these schm~mta, as well as LFG schema, cab be treated. Therefbre to make an applied system is very easy. This tYamework was implemented in Prolog to devise.a ~J~ef'ul CAI systemdeg</Paragraph> </Section> </Section> <Section position="5" start_page="345" end_page="345" type="metho"> <SectionTitle> Acknowledgment </SectionTitle> <Paragraph position="0"> We would like to thank Akira Kurematu, president of</Paragraph> </Section> class="xml-element"></Paper>