XML Viewer - j82-1002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/j82-1002_metho.xml
Size: 18,170 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="J82-1002">
  <Title>Generalized Augmented Transition Network Grammars For Generation From Semantic Networks 1</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. The Generalization
</SectionTitle>
    <Paragraph position="0"> The following sub-sections show the generalized syntax of the ATN formalism, and assume a knowledge of the standard formalism (Bates 1978 is an excellent introduction). Syntactic structures already familiar to ATN users but not discussed here remain unchanged. Parentheses and terms in upper case letters are terminal symbols. Lower case terms in angle American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 13 Stuart C. Shapiro Generalized Augmented Transition Network Grammars brackets are non-terminals. Terms enclosed in square brackets are optional. Terms followed by &amp;quot;...&amp;quot; may occur zero or more times in succession.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Terminal Actions
</SectionTitle>
      <Paragraph position="0"> Successful traversal of an ATN arc might or might not consume an input symbol. When parsing, such consumption normally occurs; when generating it normally does not, but if it does, the next symbol (semantic node) must be specified. To allow for these choices, we have returned to the technique of Woods 1970 of having two terminal actions, TO and JUMP, and have added an optional second argument to TO.</Paragraph>
      <Paragraph position="1"> The syntax is:</Paragraph>
      <Paragraph position="3"> Both cause the parser to enter the given state. JUMP never consumes the input symbol; TO always does. If the &lt;form&gt; is absent in the TO action, the next symbol to be scanned will be the next one in the input buffer. If &lt;form&gt; is present, its value will be the next symbol to be scanned. All traditional ATN arcs except JUMP and POP end with a terminal action.</Paragraph>
      <Paragraph position="4"> The explanation given in Burton 1976 for the replacement of the JUMP terminal action by the JUMP arc was that, &amp;quot;since POP, PUSH and VIR arcs never advance the input, to decide whether or not an arc advanced the input required knowledge of both the arc type and termination action. The introduction of the JUMP arc ... means that the input advancement is a function of the arc type alone.&amp;quot; That our reintroduction of the JUMP terminal action does not bring back the confusion is explained in Section 4.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Arcs
</SectionTitle>
      <Paragraph position="0"> We retain a JUMP arc as well as a JUMP terminal action. The JUMP arc provides a place to make an arbitrary test and perform some actions without consuming an input symbol. For symmetry, we introduce a TO arc: (TO (&lt;state&gt; \[&lt;form&gt;\]) &lt;test&gt; &lt;action&gt;... ) If &lt;test&gt; is successful, the &lt;action&gt;s are performed and transfer is made to &lt;state&gt;. The input symbol is consumed. The next symbol to be scanned is the value of &lt;form&gt; if it is present, or the next symbol in the input buffer if &lt;form&gt; is missing.</Paragraph>
      <Paragraph position="1"> Neither the JUMP arc nor the TO arc are really required if the TST arc is retained (Bates 1978, however, does not mention it), since they are equivalent to the TST arc with the JUMP or TO terminal action, respectively. However, they require less typing and provide clearer documentation. They are used in the example in Section 6.</Paragraph>
      <Paragraph position="2"> The PUSH arc makes two assumptions: 1) the first symbol to be scanned in the subnetwork is the current contents of the * register; 2) the current input symbol will be consumed by the subnetwork, so the contents of * can be replaced by the value returned by the subnetwork. We need an arc that causes a recursive call to a subnetwork, but makes neither of these two assumptions, so we introduce the CALL arc: (CALL &lt;state&gt; &lt;form&gt; &lt;test&gt; &lt;preaction or action&gt;...</Paragraph>
      <Paragraph position="3"> &lt;register&gt; &lt;action&gt;...</Paragraph>
      <Paragraph position="4"> &lt;terminal action&gt; ) where &lt;preaction or action&gt; is &lt;preaction&gt; or &lt;action&gt;. If the &lt;test&gt; is successful, all the &lt;action&gt;s of &lt;preaction or action&gt; are performed and a recursive push is made to the state &lt;state&gt; where the next symbol to be scanned is the value of &lt;form&gt; and registers are initialized by the &lt;preaction&gt;s. If the subnetwork succeeds, its value is placed into &lt;register&gt; and the &lt;action&gt;s and &lt;terminal action&gt; are performed.</Paragraph>
      <Paragraph position="5"> Just as the normal TO terminal action is the generalized TO terminal action without the optional form, the PUSH arc (which we retain) is equivalent to the following CALL arc:</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Forms
</SectionTitle>
      <Paragraph position="0"> The generalized TO terminal action, the generalized TO arc, and the CALL arc all include a form whose value is to be the next symbol to be scanned. If this next symbol is a semantic network node, the primary way of identifying it is as the node at the end of a directed arc with a given label from a given node. This identification mechanism requires a new form: (GETA &lt;arc&gt; \[&lt;node form&gt;\]) where &lt;node form&gt; is a form that evaluates to a semantic node. If absent, &lt;node form&gt; defaults to * The value of GETA is the node at the end of the arc labelled &lt;arc&gt; from the specified node, or a list of such nodes if there are more than one.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Tests, Preactions, and Actions
</SectionTitle>
      <Paragraph position="0"> The generalization of the ATN formalism to one that allows for writing grammars which generate surface strings from semantic networks, yet can be interpreted by the same interpreter which handles parsing grammars, requires no changes other than the ones described above. Specifically, no new tests, preactions, or actions are required. Of course each implementation of an ATN interpreter contains slight differences in the set of tests and actions implemented be- null yond the basic ones.</Paragraph>
      <Paragraph position="1"> 14 American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 Stuart C. Shapiro Generalized Augmented Transition Network Grammars 4. The Input Buffer  Input to the ATN parser can be thought of as being the contents of a stack, called the input buffer. If the input is a string of words, the first word will be at the top of the input buffer and successive words will be in successively deeper positions of the input buffer. If the input is a graph, the input buffer might contain only a single node of the graph.</Paragraph>
      <Paragraph position="2"> Adequate treatment of the * register is crucial for the correct operation of a grammar interpreter that does both parsing and generation. This is dealt with in the present section.</Paragraph>
      <Paragraph position="3"> On entering an arc, the * register is set to the top element of the input buffer, which must not be empty. The only exceptions to this are the CAT, VIR, and POP arcs. On a CAT arc, * is the root form of the top element of the input buffer. (Since the CAT arc is treated as a &amp;quot;bundle&amp;quot; of arcs, one for each sense of the word being scanned, and is the only arc so treated, it is the only arc on which (GETF &lt;feature&gt; *) is guaranteed to be well-defined.) VIR sets * to an element of the HOLD register. POP leaves * undefined since * is always the element to be accounted for by the current arc, and a POP arc is not trying to account for any element. The input buffer is not changed between the time a PUSH arc is entered and the time an arc emanating from the state pushed to is entered, so the contents of * on the latter arc will be the same as on the former. A CALL arc is allowed to specify the contents of * on the arcs of the called state. This is accomplished by replacing the top element of the input buffer by that value before transfer to the called state. If the value is a list of elements, we push each element individually onto the input buffer. This makes it particularly easy to loop through a set of nodes, each of which will contribute the same syntactic form to the growing sentence (such as a string of adjectives).</Paragraph>
      <Paragraph position="4"> While on an arc (except for POP), i.e. during evaluation of the test and the acts, the contents of * and the top element of the input buffer remain the same.</Paragraph>
      <Paragraph position="5"> This requires special processing for VIR, PUSH, and CALL arcs. Since a VIR arc gets the value of * from HOLD, rather than from the input buffer, after setting * the VIR arc pushes the contents of * onto the input buffer. The net effect is to replace the held constituent in a new position in the string. When a PUSH arc resumes, and the lower level has successfully returned a value, the value is placed into * and also pushed onto the input buffer. The net effect of this is to replace a sub-string by its analysis. When a CALL resumes, and the lower level has successfully returned a value, the value is placed into the specified register, and the contents of * is pushed onto the input buffer.</Paragraph>
      <Paragraph position="6"> (Recall that it was replaced before the transfer. See the previous paragraph.) The specified register might or might not be *. In either case the contents of * and the top of the input buffer are the same.</Paragraph>
      <Paragraph position="7"> There are two possible terminal acts, JUMP and TO. JUMP does not affect the input buffer, so the contents of * will be the same on the successor arcs (except for POP and VIR) as at the end of the current arc. TO pops the input buffer, but if provided with an optional form, also pushes the value of that form onto the input buffer.</Paragraph>
      <Paragraph position="8"> POPping from the top level is only legal if the input buffer is empty. POPping from any level should mean that a constituent has been accounted for. Accounting for a constituent should entail removing it from the input buffer. From this we conclude that every path within a level from an initial state to a POP arc must contain at least one TO transfer, and in most cases, it is proper to transfer TO rather than to JUMP to a state that has a POP arc emanating from it. TO will be the terminal act for most VIR and PUSH arcs.</Paragraph>
      <Paragraph position="9"> In any ATN interpreter having the operational characteristics given in this section, advancement of the input is a function of the terminal action alone, in the sense that, at any state JUMPed to, the top of the input buffer will be the last value of *, and, at any state jumped TO, it will not be.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5. The Lexicon
</SectionTitle>
    <Paragraph position="0"> Parsing and generating require a lexicon - a file of words giving their syntactic categories and lexical features, as well as the inflectional forms of irregularly inflected words. Parsing and generating require different information, yet we wish to avoid duplication as much as possible. This section discusses how a lexicon might be organized when it is to be used both for parsing and for generation. Figure 1 shows the lexicon used for the example in Section 6.</Paragraph>
    <Paragraph position="1"> During parsing, morphological analysis is performed. The analyzer is given an inflected form and must segment it, find the root in the lexicon, and modify the lexical entry of the root according to its analysis of the original form. Irregularly inflected forms, such as &amp;quot;seen&amp;quot; in Figure 1, must have their own entries in the lexicon. An entry in the lexicon may be lexically ambiguous, such as &amp;quot;saw&amp;quot; in Figure 1, so each entry must be associated with a list of one or more lexical feature lists. Each such list, whether stored in the lexicon or constructed by the morphological analyzer, must include a syntactic category and a root, as well as other features needed by the grammar.</Paragraph>
    <Paragraph position="2"> The lexical routines we use supply certain default features if they are not supplied explicitly. These are as follows: the root is the lexeme itself; nouns have  In the semantic network, some nodes are associated with lexical entries. In Figure 3, nodes SWEET, YOUNG, LUCY, BE, SEE, and SAWl are. During generation, these entries, along with other information from the semantic network, are used by a morphological synthesizer to construct an inflected word. We assume that all such entries are unambiguous roots, and so contain only a single lexical feature list. This feature list must contain any irregularly inflected forms. For example, the feature list for &amp;quot;see&amp;quot; in Figure 1 lists &amp;quot;saw&amp;quot; as its past tense and &amp;quot;seen&amp;quot; as its past participle. SAW1 represents the unambiguous sense of &amp;quot;saw&amp;quot; as a noun. It is used in that way in Figure 3. In Figure 1, SAW1 is given as the ROOT of the noun sense of SAW, but for purposes of morphological synthesis, the ROOT of SAW1 is given as SAW.</Paragraph>
    <Paragraph position="3"> In summary, a single lexicon may be used for both parsing and generating under the following conditions.</Paragraph>
    <Paragraph position="4"> The entry of an unambiguous root can be used for both parsing and generating if its one lexical feature list contains features required for both operations. An ambiguous lexical entry (such as SAW) will only be used during parsing. Each of its lexical feature lists must contain a unique but arbitrary &amp;quot;root&amp;quot; (SEE and SAW1) for connection to the semantic network and for holding the lexical information required for generation. Every lexical feature list used for generating must contain the proper natural language spelling of its root (SAW for SAW1) as well as any irregularly inflected forms. Lexical entries for irregularly inflected forms will only be used during parsing. In the lexicon of Figure 1, the entries for A, DOG, LUCY, SEE, SWEET, and YOUNG are used during both parsing and generation. Those for BE, IS, SAW, SEEN, and WAS are only used during parsing. The entry for SAW1 is only used during generation. Our morphological synthesizer recognizes &amp;quot;be&amp;quot; as a special case, and computes its inflected forms without referring to the lexicon.</Paragraph>
    <Paragraph position="5"> For the purposes of this paper, it should be irrelevant whether the &amp;quot;root&amp;quot; connected to the semantic network is an actual surface word like &amp;quot;give&amp;quot;, a deeper sememe such as that underlying both &amp;quot;give&amp;quot; and &amp;quot;take&amp;quot;, or a primitive such as &amp;quot;ATRANS&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6. Example
</SectionTitle>
    <Paragraph position="0"> In this section, we discuss an example of natural language interaction (in a small fragment of English) using an ATN parsing-generating grammar and SNePS, the Semantic Network Processing System \[Shapiro 1979\]. The purpose of the example is to demonstrate the use of the generalized ATN formalism for writing a parsing-generating grammar for which the &amp;quot;parse&amp;quot; of an input sentence is a generated sentence response, using a knowledge representation and reasoning system as the sentence is processed. Both the fragment of English and the semantic network representation technique have been kept simple to avoid obscuring the use of the generalized ATN formalism.</Paragraph>
    <Paragraph position="1"> Figure 2 shows an example interaction using SNeP-SUL, the SNePS User Language. The numbers in the left margin are for reference in this section. The string &amp;quot;**&amp;quot; is the SNePSUL prompt. The rest of each line so marked is the user's input. The following line is the result returned by SNePSUL. The last line of each interaction is the CPU time in milliseconds taken by the interaction. (The system is running as compiled LISP on a CDC CYBER 170/730. The ATN grammar is interpreted.) Figure 3 shows the semantic network built as a result of the sentences in Figure 2.</Paragraph>
    <Paragraph position="2"> The first interaction creates a new semantic network node, shown as B1 in Figure 3, to represent the instant of time &amp;quot;now&amp;quot;. The symbol &amp;quot;#&amp;quot; represents a SNePSUL function to create this node and make it the value of the variable NOW. From then on, the ex-</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 MSECS
</SectionTitle>
    <Paragraph position="0"> pression *NOW evaluates to B1. We will see *NOW used on some arcs of the grammar.</Paragraph>
    <Paragraph position="1"> The rest of the user inputs are calls to the SNePSUL function &amp;quot;:&amp;quot;. This function passes its argument list to the parser as the input buffer. The parser starts in state S. The form popped by the top level ATN grammar is returned as the value of the call to :, and is then printed as mentioned above. Thus, the line following the call to : may be viewed as the &amp;quot;parse&amp;quot; of the sentence passed to :.</Paragraph>
    <Paragraph position="2"> American Journal of Computational Linguistics, Volume 8, Number 1, January-March 1982 17 Stuart C. Shapiro Generalized Augmented Transition Network Grammars We will trace the first example sentence through the ATN grammar, referring to the other example sentences at various points. The parse starts in state S (Figure 4) with the input buffer being (YOUNG</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML