File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/p79-1006_metho.xml

Size: 22,724 bytes

Last Modified: 2025-10-06 14:11:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="P79-1006">
  <Title>UNGRAMHATICALITY AND EXTRA-GRAMMATICALITY IN NATURAL LANGUAGE UNDERSTANDING SYSTEMS</Title>
  <Section position="4" start_page="19" end_page="20" type="metho">
    <SectionTitle>
II.3 Conjunction
</SectionTitle>
    <Paragraph position="0"> Conjunction is an extremely common phenomenon, but it is seldom directly treated in 8 grammar. We have considered several typos of conjunction.</Paragraph>
    <Paragraph position="1"> Simple forms of conjunction occur most frequently, as in John loves Mary and hates Sue.</Paragraph>
    <Paragraph position="2"> Gapping occurs when internal segments of the second conjunct are missina, as in John loves Mary and Wary John.</Paragraph>
    <Paragraph position="3"> The list form of conjunction occurs when more than two elements are joined in a single phrase, as in John loves Wary. Sue, Nancy. end Bill.</Paragraph>
    <Paragraph position="4"> Correlative conjunction occurs in sentences to coordinate the Joining of constituents, as in John both loves and hates Sue.</Paragraph>
    <Paragraph position="5"> The reason conJuncts are generally left out of grammars is that they can appear in so many places that inclusion would dramatically increase the size of the grammar. The same argument applies to the ungrammatical phenomena. Since they allow so much variation compared to grammatical forms, including them with existing techniques would dramatically increase the size oF a gram~aar. Further there is a real distinction in terms of completeness and clarity of intent between grammatical and ungrammatical forms. Hence we feel justified In suggesting speciai techniques for their treatment.</Paragraph>
    <Paragraph position="6"> III. Proposed Mechanisms and How They Apply The following presentation of our techniques assumes an understanding of the ATN model. The techniques are applied to the langumae phenomena discussed ~n the previous section.</Paragraph>
    <Paragraph position="7">  The first two methods described are relaxation methods which allow the successful traversal of ATN arcs that miaht not otherwise be traversed. Durin8 parsina, whenever an arc cannot be taken, a check is made to see if some form of relaxation can apply. If it can. then a backtrack point is created which includes the relaxed version of the arc. These alternatives are not considered until after all possible 8rammatlcsl paths have been attempted thereby insurtn8 that 8rammaticel inputs are still handled correctly. Relaxation of previously relaxed arcs is also possible. Two methods of relaxation have been Investigated.</Paragraph>
    <Paragraph position="8"> Our first method involves relaxln8 a test on an arc, similar to the method used by Weisohedel in \[WEI79\]. Test relaxation occurs when the test portion of an arc contains a relaxable predicate and the test fails. Two methods of test relaxation have been identified and implemented based on predicate type. Predicates can be desianated by the grammar writer as either absolutely violable in which case the opposite value of the predicate (determined by the LISP function NOT applied to the predicate) Is substituted for the predicate during relaxation or conditionally violable in which case s substitute predicate is provided. For example, consider the following to be a test that fails:</Paragraph>
    <Paragraph position="10"> If the predicate INFLECTING was declared absolutely violable and its use in this test returned the value NIL, then the negation of (INFLECTING Y) would replace It in the test creating a new arc with the test:</Paragraph>
    <Paragraph position="12"> If INTRANS were conditionally violable with the substitute predicate TRANS, then the following test would appear on the new arc:</Paragraph>
    <Paragraph position="14"> Whenever more than one test in a failing arc is violable, all possible single relaxations are attempted independently. Absolutely violable predicates can be permitted in cases where the test describes some superficial consistency checking or where the test's failure or success doesn't have a direct affect on meaning, while conditionally violable predicates apply to predicates which must be relaxed cautiously or else loss of meaning may result.</Paragraph>
    <Paragraph position="15"> ChomsMy discusses the notion of organizing word categories hierarchically in developing his ideas on degrees of grammaticalness. We have applied and extended these ideas In our second method of relaxation called catesory relaxation. In this method, the 8rammar writer produces, along with the grammar, a hierarchy describing the relationship amen8 words, categories, and phrase types which is utilized by the relaxation mechanism to construct relaxed versions of arcs that hive failed. When an arc fails because of an arc type failure (i.e., because a particular word, category, or phrase was not found) a new arc (or arcs) may be created according to the description of the word, category, or phrase in the hierarchy. Typically. PUSH arcs will relax to PUSH arcs, CAT arcs to CAT or PUSH arcs, and WRD or HEM arcs to CAT arcs. Consider. for example, the syntactic cateaory hierarchy for pronouns shown in Figure 1. For this example, the cateaory relaxation mechanism would allow the relaxation of PERSONAL pronouns to include the category PRONOUN. The arc produced from category relaxation of PERSONAL pronouns also includes the subcategories REFLEXIVE and DEMONSTRATIVE in order to expand the scope of terms during relaxation. As with test relaxation, successive relaxations could occur.</Paragraph>
    <Paragraph position="16"> For both methods of relaxation, &amp;quot;deviance notes&amp;quot; are generated which describe the nature of the relaxation in each case. Where multiple types or multiple levels of relaxation occur, a note is generated for each of these. The entire list of deviance notes accompanies the final structure produced by the parser.</Paragraph>
    <Paragraph position="17"> In this way, the final structure is marked as deviant and the nature of the deviance is available for use by other components of the understanding system.</Paragraph>
    <Paragraph position="18"> In our implementation, test relaxation has been fully implemented, while category relaxation has been implemented for all cases except those involving PUSH arcs. Such an implementation is anticipated, but requires a modification to our backtracking algorithm.</Paragraph>
  </Section>
  <Section position="5" start_page="20" end_page="20" type="metho">
    <SectionTitle>
III.2 Co-Occurrence and Relaxation
</SectionTitle>
    <Paragraph position="0"> The solution being proposed to handled forms that are deviant because of co-occurrence violations centers around the use of relaxation methods. Where simple tests exist within a grammar to filter out unacceptable forms of the type noted above, these tests may be relaxed to allow the acceptance of these forms. This doesn't eliminate the need for such tests since these tests help in disambiguation and provide a means by which sentences are marked as having violated certain rules.</Paragraph>
    <Paragraph position="1"> For co-occurrence violations, the point in the grammar where parsing becomes blocked is often exactly where the test or category violation occurs. An arc at that point is being attempted and fails due to a failure of the co-occurrence test or categorization requirements. Relaxation is then applied and an alternative generated which may be explored at a later point via backtracking. For example, the sentence: WJohn love Mary shows a disagreement between the subject (John) and the verb (love). Most probably this would show up during parsing when an arc is attempted which is expecting the verb of the sentence. The test would fall and the traversal would not be allowed. At that point, an ungrammatical alternative is created for later backtracking to consider.</Paragraph>
    <Paragraph position="2"> III.) Patterns and the Pattern Arc In this section, relaxation techniques, as applied to the grammar itself, are introduced through the use of patterns and pattern-matching algorithms. Other systems have used patterns for parsing. We have devised a powerful method of integrating, within the ATN formalism, patterns which are flexible and useful.</Paragraph>
    <Paragraph position="3"> In our current formulation, which we have implemented and are now testing, a pattern is a linear sequence of ATN arcs which is matched against the input string. A pattern arc (PAT) has been added to the ATN formalism whose form is similar to that of other arcs:  The pattern (&lt;part&gt;) is either the name of a pattern, a &amp;quot;&gt;&amp;quot;, or a list of ATN arcs, each of which may be preceded by the symbol &amp;quot;&gt;&amp;quot;, while the pattern mode (&lt;mode&gt;) can be any of the keywords, UNANCHOR, OPTIONAL, or SKIP. These are discussed below. To refer to patterns by name, a dictionary of patterns is supported. A dictionary of arcs is also supported, allowing the referencing of arcs by name as well. Further, named arcs are defined as macros, allowing the dictionary and the grammar to be substantially reduced in size.</Paragraph>
  </Section>
  <Section position="6" start_page="20" end_page="20" type="metho">
    <SectionTitle>
THE PATTERN MATCHER
</SectionTitle>
    <Paragraph position="0"> Pattern matching proceeds by matching each arc in the pattern against the input string, but is affected by the chosen &amp;quot;mode&amp;quot; of matching. Since the individual component arcs are, in a sense, complex patterns, the ATN interpreter can be considered part of the matching algorithm as well. In ares within patterns, explicit transfer to a new state is ignored and the next arc attempted on success is the one following in the pattern. An are in a pattern prefaced by &amp;quot;&gt;&amp;quot; can be considered optional, if the OPTIONAL mode has been selected to activate this feature. When this is done, the matching algorithm still attempts to match optional area, but may ignore them. A pattern unanchoring capability is activated by specifying the mode UNANCHOR.</Paragraph>
    <Paragraph position="1"> In this mode, patterns are permitted to skip words prior to matching. Finally, selection of the SKIP mode results in words being ignored between matches of the arcs within a pattern. This is a generalization of the UNANCHOR mode.</Paragraph>
    <Paragraph position="2"> Pattern matching again results in deviance notes.</Paragraph>
    <Paragraph position="3"> For patterns, they contain information necessary to determine how matching succeeded.</Paragraph>
  </Section>
  <Section position="7" start_page="20" end_page="20" type="metho">
    <SectionTitle>
SOURCE OF PATTERNS
</SectionTitle>
    <Paragraph position="0"> An automatic pattern generation mechanism has been implemented using the trace of the current execution path to produce a pattern. This is invoked by using a &amp;quot;&gt;&amp;quot; as the pattern name. Patterns produced in this fashion contain only those arcs traversed at the current level of recursion in the network, although we are planning to implement a generalization oPS this in which PUSH arcs can be automatically replaced by their subnet~ork paths. Each are in an automatic pattern is marked as optional. Patterns can also be constructed dynamically in precisely the same way grammatical structures are built using BUILDQ. The vehicle by which this is accomplished is discussed next.</Paragraph>
  </Section>
  <Section position="8" start_page="20" end_page="20" type="metho">
    <SectionTitle>
AUTOMATIC PRODUCTION OF ARCS
</SectionTitle>
    <Paragraph position="0"> Pattern arcs enter the grammar in two ways. They are manually written into the grammar in those cases where the ungrammaticalities are common and they are added to the grammar automatically in those cases where the ungrammaticality is dependent on context. Pattern arcs produced dynamically enter the grammar through one of two devices. They may be constructed as needed by special macro arcs or they may be constructed for future use through an expectation mechanism.</Paragraph>
    <Paragraph position="1"> As the expectatlon-based parsing efforts clearly show, syntactic elements especially words contain important clues on processing. Indeed. we also have found It useful to make the ATN mechanism more &amp;quot;active&amp;quot; by allowing it to produce new arcs based on such clues.</Paragraph>
    <Paragraph position="2"> TO achieve this, the CAT, MEM, TBT, and WRD arcs have been generalized and four new &amp;quot;macro&amp;quot; arcs, known as CAT e. HEM e, TST a, and WRD e. have been added to the ATN formalism. These are similar In every way to their counterparts, except that as a final action, instead of indicating the state to which the traversal leads, a new arc is oonstructed dynamically and immediately executed.</Paragraph>
    <Paragraph position="3"> The difference in the form that the new arc takes is seen in the following pair where &lt;crest act&gt; Is used to define the dynamic arc: (CAT &lt;cat&gt; &lt;test&gt; &lt;act&gt; a &lt;term &gt;) (CAT e &lt;cat&gt; &lt;test&gt; &lt;act&gt; a &lt;creat act&gt;) Arcs computed by macro arcs can be of any type permitted by the ATN, but one of the most useful arcs to compute in this manner is the PAT arc discussed above.</Paragraph>
  </Section>
  <Section position="9" start_page="20" end_page="22" type="metho">
    <SectionTitle>
EXPECTATIONS
</SectionTitle>
    <Paragraph position="0"> The macro arc forces immediate execution of an arc.</Paragraph>
    <Paragraph position="1"> Arcs may also be computed and temporarily added to the grammar for later execution through an &amp;quot;expectation&amp;quot; mechanism. Expectations are performed as actions within arcs (analogous to the HOLD action for parsing structures) or as actions elsewhere In the MLU system (e.g., during generation when particular types of responses can be foreseen). Two forms are allowed: (EXPECT &lt;crest act&gt; &lt;state&gt;) (EXPECT &lt;crest act&gt; ) In the first case, the arc created is bound to a state as specified. When later processing leads to that state, the expected arc will be attempted as one alternative at that state. In the second case, where no state is specified, the effect is to attempt the arc at every state visited during the parse.</Paragraph>
    <Paragraph position="2"> The range of an expectation produced during parsing is ordinarily limited to a single sentence, with the arc disappearing after it has been used; however, the start state, S e, is reserved for expectations intended to be active at the beginning of the next sentence. These will disappear in turn at the end--~prooessing for that sentence. IIZ.q Patterns t Elllpsls~ and Extraneous Forms The Pattern arc is proposed as the primary mechanism for handling ellipsis and extraneous forms. A Pattern arc can be seen as capturing a single path through a netWOrk. The matcher gives some freedom In how that path relates to a string. We propose that the appropriate parsing path through a network relates to an elliptical sentence or one with extra words in the same way. With contextual ellipsis, the relationship will be in having some of the arcs on the correct path not satisfied. In Pattern arcs, these will be represented by arcs marked as optional. With contextual ellipsis, dialogue context will provide the defaults for the missing components. With Pattern arcs, the deviance notes will show what was left out and the other components in the ~U system will be responsible for supplying the values.</Paragraph>
    <Paragraph position="3"> The source of patterns for contextual ellipsis is important. In Lifer \[HEN77\], the previous user input can be seen as a pattern for elliptical processing of the current input. The automatic pattern generator developed here, along with the expectation mechanism, will capture this level of processing. But with the ability to construct arbitrary patterns and to add them to the grammar from other components of the MLU system, our approach can acccomplish much more. For example, a question generation routine could add an expectation of a yes/no answer in front of a transformed rephrasing of a question, as in Did Amy klas anyone? Yes, Jismy was kissed.</Paragraph>
    <Paragraph position="4"> Patterns for telegraphic ellipsis will have to be added to the grammar manually. Generally, patterns of usage must be identified, say in a study like that of Malhotra, so that appropriate patterns can be constructed. Patterns for extraneous forms will also be added In advance. These will either use the unachor option In order to skip false starts, or dynamically produced patterns to catch repetitions for emphasis. In general, only a limited number of these patterns should be required. The value of the pattern mechanism here, especially In the case of telegraphic ellipsis, will be in connecting the ungrammatical to grammatical forms.</Paragraph>
    <Paragraph position="5"> III.5 Conjunction and Macro Arcs Pattern arcs are also proposed as the primary mechanism for handling conjunction. The rationale for this is the often noted connection between conjunction and ellipsis, see for example Halltday and Haman \[HAL75\]. This is clear with gapping, as in the following where the parentheses show the missing component John loves Mary and Mary (loves) John.</Paragraph>
    <Paragraph position="6"> BUt it also can be seen with other forms, as in John loves Mary and (John) hates Sue.</Paragraph>
    <Paragraph position="7"> John loves Hary, (John loves) Sue, (John loves) Mancy, and (John loves) Bill.</Paragraph>
    <Paragraph position="8"> Whenever a conjunction is seen, a pattern is developed from the already identified elements and matched against the remaining segments of input. The heuristics for deciding from which level to produce the pattern force the most general interpretation in order to encourage an elliptical reading.</Paragraph>
    <Paragraph position="9"> All of the forms of conjunction described above are treated through a globally defined set of &amp;quot;conjunction arcs&amp;quot; (Some restricted cases, such as &amp;quot;and&amp;quot; following &amp;quot;between&amp;quot;, have the conjunction built into the grammar). In general, this set will be made up of macro arcs which compute Pattern arcs. The automatic pattern mechanism is heavily used. With simple conjunctions, the rightmost elements in the patterns are matched.</Paragraph>
    <Paragraph position="10"> Internal elements In patterns are skipped with gapping. The llst form of conjunction can also be handled through the careful construction of dynamic patterns which are then expected at a later point. Correlatives are treated similarly, with expectations based on the dynamic building of patterns.</Paragraph>
    <Paragraph position="11"> There are a number of details in our proposal which will not be presented. There are also visible limits.</Paragraph>
    <Paragraph position="12"> it is instructive to compare the proposal to the SYSCONj facility of Woods \[W0073\]. It treats conjunction as  showing alternative ways of continuing a sentence. This allows for sentences such as He drove his car through and broke a plate glass window.</Paragraph>
    <Paragraph position="13"> which at best we will accept with a misleading deviance note. However, it can not handle the obvious elliptical cases, such gapping, or the tightly constrained cases, such as correlatives. We expect to continue investigating the pattern approach.</Paragraph>
    <Paragraph position="14"> III.6 Interaction of Techniques As grammatical processing proceeds, ungrammatical possibilities are continually being suggested from the various mechanisms we have implemented. To coordinate all of these activities, the backtracking mechanism has been improved to keep track of the:le alternatives. All paths in the original grammar are attempted first. Only when these all fail are the conjunction alternatives and the manually added and dynamically produced ungrammatical alternatives tried. All of the alternatives of these sorts connected with a single state can be thought of as a single possibility. A selection mechanism is used to determine which backtrack point among the many potential alternatives is worth exploring next. Currently, we use a method also used by Welschedel and Black \[WEI79\] of selecting the alternative with the longest path length.</Paragraph>
    <Paragraph position="15"> IV. Conclusion and Open Questions These results are significant, we believe, because they extend the state of the art in several ways. Most obvious are the following: The use of the category hierarchy to handle arc type failures; The use of the pattern mechanism to allow for contextual ellipsis and gapping; More generally, the use of patterns to allow for many sorts of ellipsis and conjunctions; and Finally, the orchestration of all of the techniques in one coherent system, where because all grammatical alternatives are tried first and no modifications are made to the original grammar, its inherent efficiency and structure are preserved.</Paragraph>
  </Section>
  <Section position="10" start_page="22" end_page="22" type="metho">
    <SectionTitle>
IV.1 Open Problems
</SectionTitle>
    <Paragraph position="0"> Various questions for further research have arisen during the course of this work. The most important of these are discussed here.</Paragraph>
    <Paragraph position="1"> Better control must be exercised over the selection of viable alternatives when ungrammatical possibilities are being attempted. The longest-path heuristic is somewhat weak. The process that decides this would need to take into consideration, among other things, whether to allow relaxation of a criteria applied to the subject or to the verb in a case where the subject and verb do not agree. The current path length heuristic would always relax the verb which is clearly not always correct.</Paragraph>
    <Paragraph position="2"> No consideration has been given to the possible connection of one error wlth another. In some cases, one error can lead to or affect another.</Paragraph>
    <Paragraph position="3"> Several other types of ill-formedness have not been considered in this study, for example, idioms, metaphors, incorrect word order, run together sentences, incorrect punctuation, misspelling, and presuppositional failure. Either little is known about these processes or they have been studied elsewhere independently. In either case, work remains to be done.</Paragraph>
    <Paragraph position="4"> V. Acknowledgments We wish to acknowledge the comments of Ralph Weischedel and Marc Fogel on previous drafts of this paper. Although we would like to blame them, any shortcomings are clearly our own fault.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML