File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/j89-1002_metho.xml

Size: 46,436 bytes

Last Modified: 2025-10-06 14:12:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="J89-1002">
  <Title>SYNTACTIC GRAPHS: A REPRESENTATION FOR AMBIGUOUS PARSE TREES THE UNION OF ALL</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SYNTACTIC GRAPHS: A REPRESENTATION FOR
AMBIGUOUS PARSE TREES
THE UNION OF ALL
</SectionTitle>
    <Paragraph position="0"> In this paper, we present a new method of representing the Surface syntactic structure of a sentence.</Paragraph>
    <Paragraph position="1"> Trees have usually been used in linguistics and natural language processing to represent syntactic structures of a sentence. A tree structure shows only one possible syntactic parse of a sentence, but in order to choose a correct parse, we need to examine all possible tree structures one by one. Syntactic graph representation makes it possible to represent all possible surface syntactic relations in one directed graph (DG). Since a syntactic graph is expressed in terms of a set of triples, higher level semantic processes can access any part of the graph directly without navigating the whole structure. Furthermore, since a syntactic graph represents the union of all possible syntactic readings of a sentence, it is fairly easy to focus on the syntactically ambiguous points. In this paper, we introduce the basic idea of syntactic graph representation and discuss its various properties. We claim that a syntactic graph carries complete syntactic information provided by a parse forestmthe set of all possible parse trees.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In natural language processing, we use several rules and various items of knowledge to understand a sentence.</Paragraph>
    <Paragraph position="1"> Syntactic processing, which analyzes the syntactic relations among constituents, is widely used to determine the surface structure of a sentence, because it is effective to show the functional relations between constituents and is based on well-developed linguistic theory.</Paragraph>
    <Paragraph position="2"> Tree structures, called parse trees, represent syntactic structures of sentences.</Paragraph>
    <Paragraph position="3"> In a natural language understanding system in which syntactic and semantic processes are separated, the semantic processor usually takes the surface syntactic structure of a sentence from the syntactic analyzer as input and processes it for further understanding. ~ Since there are many ambiguities in natural language parsing, syntactic processing usually generates more than one parse tree. Therefore, the higher level semantic processor should examine the parse trees one by one to choose a correct one. 2 Since possible parse trees of sentences in ordinary expository text often number in the hundreds, it is impractical to check parse trees one by one without knowing where the ambiguous points are. We have tried to reduce this problem by introducing a new structure, the syntactic graph, that can represent all possible parse trees effectively in a compact form for further processing. As we will show in the rest of this paper, since all syntactically ambiguous points are kept in a syntactic graph, we can easily focus on those points for further disambiguation.</Paragraph>
    <Paragraph position="4"> Furthermore, syntactic graph representation can be naturally implemented in efficient, parallel, all-path parsers. One-path parsing algorithms, like the DCG (Pereira and Warren 1980), which enumerates all possible parse trees one by one with backtracking, usually have exponential complexity. All-path parsing algorithms explore all possible paths in parallel without backtracking (Early 1970; Kay 1980; Chester 1980; Tomita 1985). In these algorithms, it is efficient to generate all possible parse trees. This kind of algorithm has complexity O(N 3) (Aho and Ullman 1972; Tomita 1985).</Paragraph>
    <Paragraph position="5"> We use an all-path parsing algorithm to parse a sentence. Triples, each of which consists of two nodes and an arc name, are generated while parsing a sentence. The parser collects all correct triples and constructs an exclusion matrix, which shows co-occurrence constraints among arcs, by navigating all possible parse Copyright 1989 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission.  in the parse forest.</Paragraph>
    <Paragraph position="6"> In the next section, we motivate this work with an example. Then we briefly introduce X (X-bar) theory with head projection, which provides the basis of the graph representation, and the notation of graph representation in Section 3. The properties of a syntactic graph are detailed in Section 4. In Section 5, we introduce the idea of an exclusion matrix to limit possible tree interpretations of a graph representation.</Paragraph>
    <Paragraph position="7"> In Section 6, we will present the definition of completeness and soundness of the syntactic graph representation compared to parse trees by showing an algorithm that enumerates all syntactic readings using the exclusion matrix from a syntactic graph. We claim that those readings include all the possible syntactic readings of the corresponding parse forest. Finally, after discussing related work, we will ~uggest future research and draw some conclusions.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 MOTIVATIONAL EXAMPLE
</SectionTitle>
    <Paragraph position="0"> We are currently investigating a model of natural language text understanding in which syntactic and semantic processors are separated. 4 Ordinarily, in this model, a syntactic processor constructs a surface syntactic structure of an input sentence, and then a higher level semantic processor processes it to understand the sentence---i.e., syntactic and semantic processors are pipelined. If the semantic processor fails to understand the sentence with a given parse tree, the semantic processor should ask the syntactic processor for another possible parse tree. This cycle of processing will continue until the semantic processor finds the correct parse tree with which it succeeds in understanding the sentence.</Paragraph>
    <Paragraph position="1"> Let us consider the following sentences, from Waltz (1982): I saw a man on the hill with a telescope.</Paragraph>
    <Paragraph position="2"> I cleaned the lens to get a better view.</Paragraph>
    <Paragraph position="3"> When we read the first sentence, we cannot determine whether the man has a telescope or the telescope is used to see the man. This is known as the PP-attachment problem, and many researchers have proposed various ways to solve it (Frazier and Fodor 1979; Shubert 1984, 1986; Wilks et. al 1985). In this sentence, however, it is impossible to choose a correct syntactic reading in syntactic processing---even with commonsense knowledge. The ambiguities must remain until the system extracts more contextual knowledge from other input sentences.</Paragraph>
    <Paragraph position="4"> The problems of tree structure representation in the pipelined, natural language processing model are the following: First, since the number of parse trees of a typical sentence in real text easily grows to several hundreds, and it is impossible to resolve syntactic ambiguities by the syntactic processor itself, a semantic processor must check all possible parse trees one by one until it is satisfied by some parse tree. 5 Second, since there is no information about where the ambiguous points are in a parse tree, the semantic processor should check all possibilities before accepting the parse tree.</Paragraph>
    <Paragraph position="5"> Third, although the semantic processor might be satisfied with a parse tree, the system should keep the status of the syntactic processor for a while, because there is a fair chance that the parse tree may become unsatisfactory after the system processes several more sentences. For example, attaching the prepositional phrase (PP) &amp;quot;with a telescope&amp;quot; to &amp;quot;hill&amp;quot; or &amp;quot;man&amp;quot; would be fine for the semantic processor, since there is nothing semantically wrong with these attachments. However, these attachments become unsatisfactory after the system understands the next 20 Computational Linguistics, Volume 15, Number 1, March 1989 Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees sentence. Then, the semantic processor would have to backtrack and request from the syntactic processor another possible parse tree for the earlier sentence.</Paragraph>
    <Paragraph position="6"> We propose the syntactic graph as the output structure of a syntactic processor. The syntactic graph of the first sentence in the previous example is shown in Figure 1.</Paragraph>
    <Paragraph position="7"> In this graph, nodes consist of the positions, the root forms, and the categories of words in the sentence.</Paragraph>
    <Paragraph position="8"> Each node represents a constituent whose head word is the word in the node. Each arc shows a dominator-modifier relationship between two nodes. The name of each arc is uniquely determined according to the grammar rule used to generate the arc. For example, the snp arc is generated from the grammar rule, SNT ~ NP VP, vpp is from the rule, VP ~ VP PP, and ppn from the rule, PP ~ Prep NP, etc.</Paragraph>
    <Paragraph position="9"> As we can see in Figure 1, all syntactic readings are represented in a directed graph in which every ambiguity--lexical ambiguities from words with multiple syntactic categories and structural ambiguities from the ambiguous grammar--is kept. The nodes which are pointed to by more than one arc show the ambiguous points in the sentence, so the semantic processor can focus on those points to resolve the ambiguities. Furthermore, since a syntactic graph is represented by a set of triples, a semantic processor can directly access any part of a graph without traversing the whole. Finally, syntactic graph representation is compact enough to be kept in memory for a while. 6</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="24" type="metho">
    <SectionTitle>
3 X THEORY AND SYNTACTIC GRAPHS
</SectionTitle>
    <Paragraph position="0"> B X theory was proposed by Chomsky (1970) to explain various linguistic structural properties, and has been widely accepted in linguistic theories. In this notation, the head of any phrase is termed X, the phrasal category containing X is termed_X, and the phrasal category containing X is termed X. For example, the head of a noun__phrase is N (noun), N is an intermediate category, and N corresponds to noun phrase (NP). The general form of the phrase structure rules for X theory is roughly as follows:</Paragraph>
    <Paragraph position="2"> Yis the phrase that specifies X, and Z is the phrase that modifies X. 7 The properties of the head word of a phrase are projected onto the properties of the phrase.</Paragraph>
    <Paragraph position="3"> We can express a grammar with X conventions to cover a wide range of English.</Paragraph>
    <Paragraph position="4"> Since, in X theory, a syntactic phrase consists of the head of the phrase and the specifiers and modifiers of the head, if there are more than two constituents in the right-hand side of a grammar rule, then there are dominator-modifier (DM) relationships between the head word and the specifier or modifier words in the Computational Linguistics, Volume 15, Number 1, March 1989 phrase. Tsukada (1987) discovered that the DM relationship is effective for keeping all the syntactic ambiguities in a compact and handy structure without enumerating all possible syntactic parse trees. His representation, however, is too simple to maintain some important information about syntactic structure that will be discussed in detail in this paper, and hence fails to take full advantage of the DM-relationship representation. null We use a slightly different representation to maintain more information in head-modifier relations. Each head-modifier relation is kept in a triple that is equivalent to an arc between two nodes (i.e., words) in a syntactic graph. The first element of a triple is the arc name, which represents the relation between the head and modifier nodes. The second element is the lexical information of the head node, and the third element is that of the modifier node. The direction of an arc is always from a head to a modifier node. For example, the triple \[snp, \[1,see,v\], \[0~t~\]\] represents the arc snp between the two nodes \[1,aoo,v\] and \[0,t~_\] in Figure 1.</Paragraph>
    <Paragraph position="5"> Since many words have more than one lexical entry, we have to keep the lexical information of each word in a triple so that we can distinguish different usages of a word in higher level processing. The triples corresponding to some common grammar rules are as follows:  1. N--* Det N C/=~ \[det,\[\[nl,Rl\]lLl\],\[\[n2,R2\]lL2\] \] 2. N-* Adj N C/=~ \[mod,\[\[n3,R3\]lL3\],\[\[n4,R4\]ll4\] \] 3. N --~ N Prep C/=&gt; \[npp,\[\[n5,R5\]lLs\],\[\[n6,R61L6\] \]  Each ni represents the position, each Ri represents the root form, and each Li represents a list of the lexical information including the syntactic category of each word in a sentence. Parentheses signify optionality and the asterisk (*) allows repetition.</Paragraph>
    <Paragraph position="6"> Figure 2 shows the set of triples representing the syntactic graph in Figure I and the grammar rules used to parse the sentence. The sentence in Figure 2 has five possible parse trees in accordance with the grammar rules. All of the dependency information in those five parses is represented in the 12 triples. Those 12 triples represent all possible syntactic readings of the sentence with the grammar rules. Not all triples can co-occur in one syntactic reading in the case of an ambiguous sentence.</Paragraph>
    <Paragraph position="7"> The pointers of each triple are the list of the indices that are used as the pointers pointing to that triple. For example, Triple 2 in Figure 2 has a list of three indices as the pointers. Each of those indices can be used as a pointer to access the triple. These indices are actually used as the names of the triple. One triple may have more than one index. The issues of why and how to produce indices of triples will be discussed later in this section.</Paragraph>
    <Paragraph position="8"> Triple 3 in Figure 2 represents the vnp arc in Figure I between two nodes, \[1,aoo,v\] and \[3,ma, n,n\]. The node \[1,8oo,v\] represents a VP with head word  i. SNT--~NP VP snp header VP head of NP 2. NP--,art NP det head of NP art 3. NP--,N' head of N' 4. N' --,N' PP npp head of N' head of PP 5. N'-*noun noun 6. PP--*prep NP ppn prep headofNP 7. VP--~V' head of V' 8. V'-*V' NP vnp headef V' head of NP 9. V'--*V' PP vpp head of V' head of PP I0. V'--~verb verb</Paragraph>
    <Paragraph position="10"> \[1,see,v\], and the node \[34&amp;quot;aan,n\] represents an NP with head word \[3,ma, n,n\]. \[ 1,see,v\] becomes the head word, and \[3,rna, u,u\] becomes the modifier word, of this triple. The number 1 in r 1,sea,v\] is the position of the word &amp;quot;see&amp;quot; in the sentence, and v (verb) is the syntactic category of the word. Since a word may appear in several positions in a sentence, and one word may have multiple categories, the position and the category of a word must be recorded to distinguish the same word in different positions or with different categories. null A meaningful relation name is assigned to each pair of head and modifier constituents in a grammar rule.</Paragraph>
    <Paragraph position="11"> Some of these are shown at the top of Figure 2. Rules for generating triples augment each corresponding grammar rule. Some grammar rules in Prolog syntax used to build syntactic graphs are shown in Figure 3.</Paragraph>
    <Paragraph position="12"> An informal description of the algorithm for generating triples of a syntactic graph using the grammar rules in Figure 3 is the following: The basic algorithm of the parser is an all-path, bottom-up, chart parser that constructs a shared, packed-parse forest. Unlike an ordinary chart parser, the parser uses two charts, one for  ~ i. snt--*np + vp gr(\[snt, Vhd\], \[\[np, Nhd\], ~vp. Vhd\]\], ( true ), \[\[snp, Vhd, Nhd\]\]).</Paragraph>
    <Paragraph position="13"> ~ 2. np~article + npl gr(\[np. Nhd\].</Paragraph>
    <Paragraph position="14"> \[\[art, Det\], \[npl, Nhd\]\].</Paragraph>
    <Paragraph position="15"> ( true ), \[\[act, Nhd, Det\]\]).</Paragraph>
    <Paragraph position="16"> ~ 3. np--~npl gr(\[np, Nhd\], \[\[npl. Nhd\]\].</Paragraph>
    <Paragraph position="17"> (true), \[ \]).</Paragraph>
    <Paragraph position="18"> ~ 4. vp~be_aux + vp gr(Ivp, Aux\], category and head of LHS of rule.</Paragraph>
    <Paragraph position="19"> categories and heads of RHS.</Paragraph>
    <Paragraph position="20"> constraints, in this case, none list of triples generated Vhd is head word, Nhd is modifier.</Paragraph>
    <Paragraph position="21">  Nhd, the head of npl, becomes new head Nhd is head and Det is modifier.</Paragraph>
    <Paragraph position="22"> since there is only one constituent % in here no triple will be generated in this rule (be + vp) either passive or progressive  \[\[beaux. Aux\], \[vp, Vhd\]\], ( mempr(\[inflection, INFL\], Vhd).</Paragraph>
    <Paragraph position="23"> ( INFL = paprt ~ if inflection of vp is passive --~ ~ participle, then Triples = \[\[beaux, Aux. Vhd\], \[voice, Vhd, passive\]\] : ~ otherwise, ( INFL = prprt ~ if inflection is present participle -~ ~ then, Triples = \[\[be_aux, Aux, Vhd\], progressive, Vhd, yes\]\] ; ~ otherwise, fail ) ) ), ~ this rule cannot be applied. Triples).</Paragraph>
    <Paragraph position="24">  Figure 3 Augmented grammar rules for triple generation. constituents and the other for triples. Whenever the parser builds a constituent and its triple, the parser generates an index for the triple, 9 and records the triple on the chart of triples using the index. Then it records the constituent with the index of the triple on the chart of constituents.</Paragraph>
    <Paragraph position="25"> We use Rule 4 in Figure 3 to illustrate the parser. Rule 4 states that if there are two adjacent constituents, a be-aux followed by a vp, execute the procedure in the third argument position of the rule. The procedure contains the constraints that must be satisfied to make the rule to be fired. If the procedure succeeds, the parser records a new constituent \[vp,Vhd\]~the first argument of the rule---on the chart. Before the parser records the constituent, it must check the triples for the constituent. The procedure in the third argument position also contains the processes to produce the triples for the constituent.</Paragraph>
    <Paragraph position="26"> The fourth argument of a grammar rule is a list of triples produced by executing the augmenting procedure at the third argument position of the rule. If the constraints in the procedure are satisfied, the triples are also produced. The parser generates a unique index for each triple, records the triples on the chart of triples, and adds to the new constituent, the indices of the new triples. Then, the new constituent is recorded on the chart of constituents. In this example, the head of the new constituent is the same as that of be-aux; i.e., the be-aux dominates the vp.</Paragraph>
    <Paragraph position="27"> After finishing the construction of the shared, packed-parse forest of an input sentence, the parser navigates the parse forest to collect the triples that  of triple\] \[1,see\]. \[\[1002, 1046\], 22\] \] \[0,i\], \[\[1001\], notriple\] \] \[O,i\], \[\[1000\], notriple\] \] \[0,i\], \[\]\] \[1,see\], \[\[i045\], notriple\] \] \[l,see\], \[\[i004, i044\], 21\], \[\[1013, 1041\], 24\], \[\[1027, 1037\], 26\] \] l,see\], \[\[1004, i026\], i0\], \[\[lO13, lO23\], 13\]\] \[l,see\], \[\[1004, i012\], 03\] \] \[l,see\], \[\[i003\], notriple\] \] \[1,see\], \[\] \] \[3,man\], \[\[1008, i011\], 02\] \] \[2,a\], \[\] \] \[3,man\], \[\[i010\], notriple\] \] \[3,man\], \[\[1009\], notriple\] \] \[3,man\], \[\] \] \[4,on\], \[\[1017, I022\], 07\] \] \[4,on\], \[\]\] \[e,hill\], \[\[1018. 1021\], 06\] \] \[5.the\], \[\] \] \[6,hill\], \[\[I020\], notriple\] \] \[6,hill\], \[\[i019\], notrlple\] \] IS,hill\], \[\] \] \[3.man\], \[\[i008, i025\], 09\] \] \[3,man\], \[\[i024\], notriple\] \] \[3,man\], \[\[i010, i023\], 08\] \] \[7,wlth\], \[\[1031, i036\], 15\] \] \[7,with\], \[\] \] \[9,telescope\], \[\[1032, 1035\], 14\] \] \[8,a\], \[\] \] \[9,telescope\], \[\[i034\], notriple\] \] \[9,telescope\]. \[\[i033\], notriple\] \] \[9,telescope\], \[\] \] \[4.on\], \[\[1017, 1040\], 18\] \] \[6,hill\], \[\[1018, 1039\], 17\] \] \[6,hill\], \[\[i038\], notriple\] \] \[6,hill\]. \[\[1020, 1037\], 18\] \] \[3,man\], \[\[1008, i043\], 20\] \] \[3,man\], \[\[i042\], notriple\] \] \[3,man\], \[\[i010, i041\], 19\], \[\[1024, 1037\], 25\] \]  A packed node contains several nodes, each of which contains the category of the node, its head word, and the list of the pointers to its child nodes and the indices of the triples of the node. Node 1045 in Figure 4 is a packed node in which three different constituents are packed. Those three constituents have the same category, vpl, span the same terminals, (from \[ 1,see,v\] to \[9,telescope.n\] ), with the same head word, (\[ 1,see,v\]), but with different internal substructures. Note that several constituents may have different indices that point to the same triple. For example, in  gory, vpl, the same head, \[1,see,v\], and the same modifier, \[3,man,n\], but have different inside structures of the modifying constituent, np, whose head is \[3arian,n\]. The modifying constituent, np, may span from \[2,a\] to \[3maan\], from \[2,a\] to \[6~hill\], or from \[2,a\] to \[9,telescope\].</Paragraph>
    <Paragraph position="28"> There are different types of triples that do not have head-modifier relations. These types of triples are for syntactic characteristics of a sentence such as mood and voice of verbs. For example, grammar rule 4 in Figure 3 generates not only triples of head-modifier relations, but also triples that have the information about the voice or progressiveness of the head word of the VP, depending on the type of inflection of the word. This kind of information can be determined in syntactic processing and is used effectively in higher level semantic processing.</Paragraph>
    <Paragraph position="29"> Figure 4 Shared, Packed-Parse Forest. 4 PROPERTIES OF SYNTACTIC GRAPHS participate in each correct syntactic analysis of the sentence. The collecting algorithm is explained in Section 5.2 in detail.</Paragraph>
    <Paragraph position="30"> The representation of the shared, packed-parse forest for the example in Figure 2 is in figures 4 and 5.'o It is important to notice that the shared, packed-parse forest generated in this parser is different from that of other parsers. In the shared, packed-parse forest defined by Tomita (1985), any constituents that have the same category and span the same terminal nodes are regarded as the same constituent and packed into one node. In the parser for syntactic graphs, the packing condition is slightly different in that each constituent is identified by the head word of the constituent as well as the category and the terminals it spans. Therefore, although two nodes might have the same category and span the same terminals, if the nodes have different head words, then they cannot be packed together. We first define several terms used frequently in the rest of the paper.</Paragraph>
    <Paragraph position="31"> Definition 1: An in-arc of a node in a syntactic graph is an arc which points to the node, and an out-arc of a node points away from the node.</Paragraph>
    <Paragraph position="32"> Since, in the syntactic graph representation, arcs point from dominator to modifier nodes, a node with an in-arc is the modifier node of the arc, and a node with an out-arc is the dominator node of the arc.</Paragraph>
    <Paragraph position="33"> Definition 2: A reading of the syntactic graph of a sentence is one syntactic interpretation of the sentence in the syntactic graph.</Paragraph>
    <Paragraph position="34"> Since a syntactic graph is a union of syntactic analyses of a sentence, one reading of a syntactic graph is analogous to one parse tree of a parse forest. Definition 3: A root node of one reading of a syntactic graph is a node which has no in-arc in the reading. In most cases, the root node of a reading of the syntactic graph of a sentence is the head verb of the sentence in that syntactic interpretation. In a syntactically ambigu-Computational Linguistics, Volume 15, Number 1, March 1989 23  n P art n P art n man on the hill with a telescope Figure 5 Shared, Packed-Parse Forest-A Diagram. ous sentence, different syntactic analyses of the sentence may have different head verbs; thus there may be more than one root node in a syntactic graph. For example, in the syntactic graph of one famous and highly ambiguous sentence--&amp;quot;Time flies like an arrow&amp;quot;Dshown in Figure 6, there are three different root nodes. These roots are \[O,tlmo,v\], \[ 1,fly,v\], and \[ 2rUke,v\] 11 .</Paragraph>
    <Paragraph position="35"> Definition 4: The position of a node is the position of the word which is represented by the node, in a sentence.</Paragraph>
    <Paragraph position="36"> Since a word may have several syntactic categories, there may be more than one node with the same position in a syntactic graph. For example, since the word &amp;quot;time&amp;quot; in Figure 6, which appeared as the first word in the sentence, has two syntactic categories, noun and verb, there are two nodes, \[O,tame,n\] and \[O,Ume,v\], in the syntactic graph, and the position of the two nodes is 0.</Paragraph>
    <Paragraph position="37"> One of the most noticeable features of a syntactic  graph is that ambiguities are explicit, and can be easily detected by semantic routines that may use fu~her knowledge to resolve them. The following property explains how syntactically ambiguous points can be easily determined in a syntactic graph.</Paragraph>
    <Paragraph position="38">  except the root must by definition be dominated by a single constituent. Since a syntactic graph is the union of aU syntactic trees that the grammar derives from a sentence, some graph nodes may be dominated by more than one node; such nodes with multiple dominators have multiple in-arcs in the syntactic graph and show points at which the node participates in more than one syntactic tree interpretation of a sentence. In a graph resulting from a syntactically unambiguous sentence, no node has more than a single in-arc, and the graph is a tree with the head verb as its root.</Paragraph>
    <Paragraph position="39"> According to Property 1, no pair of arcs which point to the same node can co-occur in any one syntactic Computational Linguistics, Volume 15, Number 1, March 1989 Jungyun Seo and Robert F. Simmom Syntactic Graphs: A Representation for the Union of All Ambiguous parse Trees</Paragraph>
    <Paragraph position="41"> Sentence: Time flies like an arrow.</Paragraph>
    <Paragraph position="42"> Figure 6 Graph Representation and Parse Trees of a Highly Ambiguous Sentence. reading, because each node can be a modifier node only once in one reading. Therefore, we can focus on the arcs pointing to the same node as ambiguous points. In terms of triples, any two triples with identical modifier terms reveal a point of ambiguity, where a modifier term is dominated by more than one node.</Paragraph>
    <Paragraph position="43"> In the example in Figure 1, the syntactic ambiguities are found in two arcs pointing to \[4,on,p\] and in three arcs pointing to \[7,w-it, la,p\]. The PP with head \[4,on\] modifies the VP whose head is \[1,see\] and it also modifies the NP with head \[3,ma~\]. Similarly three different in-arcs to the node \[7,wit~\] show that there are three possible choices to which Node 7 can be attached. The semantic processor can focus on these three possibilities (or on the earlier two possibility set), using semantic information, to choose one dominator.</Paragraph>
    <Paragraph position="44"> Lacking semantic information, the ambiguities will remain in the graph until they can be resolved by additional knowledge from the context.</Paragraph>
    <Paragraph position="45">  used in every syntactic interpretation of the sentence and no word can have multiple categories in one interpretation, one and only one node from each position must participate in every reading of a syntactic graph. In other words, each syntactic reading derived from a syntactic graph must contain one and only one node from every position.</Paragraph>
    <Paragraph position="46"> Since every node, except the root node, must be attached to another node as a modifier node, we can conclude the following property from properties 1 and  2.</Paragraph>
    <Paragraph position="47"> Property 3: In any one reading of a syntactic graph, the following facts must hold: I. No two triples with the same modifier node can co-occur.</Paragraph>
    <Paragraph position="48"> 2. One and only one node from each position, except the root node of the reading, must appear  as a modifier node.</Paragraph>
    <Paragraph position="49"> Another advantage of the syntactic graph representation is that we can easily extract the intersection of all possible syntactic readings from it. Since one node from each position must participate in every syntactic reading of a syntactic graph, every node which is not a root node and has only one in-arc, must always be included in every syntactic reading. Such unambiguous nodes are common to the intersections of all possible readings. When we know the exact locations of several pieces in a jigsaw puzzle, it is much easier to place the other pieces. Similarly, if a semantic processor knows which arcs must hold in every reading, it can use these arcs to constrain inferences to understand and disambiguate. Property 4: There is no information in a syntactic graph about the range of terminals spanned by each triple, so one triple may represent several constituents which have the same head and modifying terms, with the same relation name, but which span different ranges of terminals.</Paragraph>
    <Paragraph position="50"> The compactness and handiness of a graph representation is based on this property. One arc between two nodes in a syntactic graph can replace several complicated structures in the tree representation, and multiple dominating arcs can replace a parse forest.</Paragraph>
    <Paragraph position="51"> For example, the arc vnp from \[1,see,v\] to \[3,man,n\] in Figure I represents three different constituents. Those constituents have the same category, vpl, the same head, \[1,soo,v\], and the same modifier, \[3~nan,n\], but have different inside structures of the modifying constituent, np, whose head is \[3,man,n\]. The modifying constituent, np, may span from \[2,a\] to \[3,ma~\], from \[2,a\] to \[6,hfll\], or from \[2,a\] to \[9,telescope\]. Actually, in the exclusion matrix described below, each triple with differing constituent structure is represented by multiple subscripts to avoid the generation of trees that did not occur in the parse forest.</Paragraph>
    <Paragraph position="52"> Another characteristic of a syntactic graph is that the number of nodes in a graph is not always the same as that of the words in a sentence. Since some words may have several syntactic categories, and each category may lead to a syntactically correct parse, one word may require several nodes. For example, there are eight Computational Linguistics, Volume 15, Number 1, March 1989 2S Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees nodes in the syntactic graph in Figure 6, while there are only five words in the sentence.</Paragraph>
  </Section>
  <Section position="5" start_page="24" end_page="24" type="metho">
    <SectionTitle>
5 EXCLUSION MATRIX
</SectionTitle>
    <Paragraph position="0"> A syntactic graph is clearly more compact than a parse forest and provides a good way of representing all possible syntactic readings with an efficient focusing mechanism for ambiguous points. However, since one triple may represent several constituents, and there is no information about the relationships between triples, it is possible to lose some important syntactic information. null This section consists of two parts. \]in Section 5. I, we investigate a co-occurrence problem of arcs in a syntactic graph and suggest the exclusion matrix, to avoid the problem. The algorithms to collect triples of a syntactic graph and to construct an exclusion matrix are presented in Section 5.2.</Paragraph>
    <Section position="1" start_page="24" end_page="24" type="sub_section">
      <SectionTitle>
5.1 CO-OCCURRENCE PROBLEM BETWEEN ARCS
</SectionTitle>
      <Paragraph position="0"> One of the most important syntactic displays in a tree structured parse, but not in a syntactic graph, is the co-occurrence relationship between constituents. Since one parse tree represents one possible syntactic reading of a sentence, we can see whether any two constituents can co-occur in some reading by checking all parse trees one by one. However, since the syntactic graph keeps all possible constituents as a set of triples, it is sometimes difficult to determine whether two triples can co-occur.</Paragraph>
      <Paragraph position="1"> If a syntactic graph does not carry the information about exclusive arcs, its representation of all possible syntactic structures may include interpretations not allowed by the grammar and cause extra overhead. For example, after a syntactic processor generates the triples, a semantic processor will focus on the ambiguous points such as triples 4 and 5, and triples 8, 9, and 10 in Figure 2 to resolve the ambiguities. In this case, if the semantic processor has a strong clue to choose Triple 4 over Triple 5, it should not consider Triple 10 as a competing triple with triples 8 and 9 since I0 is exclusive with 4.</Paragraph>
      <Paragraph position="2"> Some of the co-occurrence problems can be detected easily. For example, due to Property 1, since there can be only one in-arc to any node in any one reading of a syntactic graph, the arcs that point to the same node cannot co-occur in any reading. Triples including these arcs are called exclusive triples. The following properties of the syntactic graph representation show several cases when arcs cannot co-occur. These cases, however, are not exhaustive.</Paragraph>
      <Paragraph position="3"> Property 5: No two crossing arcs can co-occur. More formally, if an arc has n t -th and n'- -th words as a head and a modifier, and another arc has m I -th and m e -th words as a head and a modifier node, then, if nl&lt;mz&lt;n2&lt;m 2 or ml&lt;nt&lt;m2&lt;n 2, the two arcs cannot co-occur.</Paragraph>
      <Paragraph position="5"> In the syntactic graph in Figure 1, the arcs vpp from \[1,see,v\] to \[4,on,p\] and npp from \[3,ma,n,n\] to \[7,wita'x,p\] cannot co-occur in any legal parse trees because they violate the rule that branches in a parse tree cannot cross each other.</Paragraph>
      <Paragraph position="6"> The following property shows another case of exclusive arcs which cross each other.</Paragraph>
      <Paragraph position="7"> Property 6: In a syntactic graph, any modifier word which is on the right side of its head word cannot be modified by any word which is on the left side of the head word in a sentence. More formally, let an arc have a head word W t and a modifier word W 2 whose positions are n t and n 2 respectively, and nz&lt;n 2. Then if another arc has W 2 as a head word and a modifier word with position n~ where n3&lt;-nl, then those two arcs cannot co-occur.</Paragraph>
      <Paragraph position="8"> Assume that there are two arcs---one is \[npp, \[5,Wl,noun\], \[9,W2,eonj \]\], and the other is \[eonjpp, \[9,W2,eor~\], \[3,W3,prep\]\]. The first arc said that the phrase with head word W2 is attached to the noun in position 5. The other triple said that the phrase with head word W3 is attached to the conjunction. This attachment causes crossing branches. The corresponding parse tree for these two triples is in Figure 7. As we can see, since there is a crossing branch, these two arcs cannot co-occur in any parse tree.</Paragraph>
      <Paragraph position="9"> The following property shows the symmetric case of</Paragraph>
      <Paragraph position="11"> which is on the left side of its head word cannot be modified by any word which is on the right side of the head word in a sentence.</Paragraph>
      <Paragraph position="12"> Other exclusive arcs are due to lexical ambiguity. Definition 5: If two nodes, W i and Wj , in a syntactic graph have the same word and the same position but with different categories, W i is in conflict with Wj. , and we say the two nodes are conflicting nodes.</Paragraph>
      <Paragraph position="13"> Property 8: Since words cannot have more than one syntactic' category in one reading, any two arcs which have conflicting nodes as either a head or a modifier cannot co-occur.</Paragraph>
      <Paragraph position="14"> 26 Computational Linguistics, Volume 15, Number 1, March 1989 Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees The example of exclusive arcs involves the vpp arc from \[ 1,flTC,v\] to \[2~lce,la \] and the vnp arc from \[0,time,v\] to \[1,fly,n\] in the graph in Figure 6. Since the two arcs have the same word with the same position, but with different categories, they cannot co-occur in any syntactic reading. By examination of Figure 6, we can determine that there are 25 pairwise combinations of exclusive arcs in the syntactic graph of that five word sentence.</Paragraph>
      <Paragraph position="15"> The above properties show cases of exclusive arcs but are not exhaustive. Since the number of pairs of exclusive arcs is often very large in real text (syntactically ambiguous sentences), if we ignore the co-occurrence information among triples, the overhead cost to the semantic processor may outweigh the advantage gained from syntactic graph representation. Therefore we have to constrain the syntactic graph representation to include co-occurrence information.</Paragraph>
      <Paragraph position="16"> We introduce the exclusion matrix for triples (arcs) to record constraints so that any two triples which cannot co-occur in any syntactic tree, cannot co-occur in any reading of a syntactic graph. The exclusion matrix provides an efficient tool to decide which triples should be discarded when higher level processes choose one among ambiguous triples. For an exclusion matrix (Ematrix), we make an N x N matrix, where N is the number of indices of triples. If Ematrix(ij) = 1 then the triples with the indices i and j cannot co-occur in any syntactic reading. If Ematrix(ij) = 0 then the triples with the indices i and j can co-occur in some syntactic reading.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="24" end_page="28" type="metho">
    <SectionTitle>
5.2 AN ALGORITHM TO CONSTRUCT THE EXCLUSION
MATRIX
</SectionTitle>
    <Paragraph position="0"> Since the several cases of exclusive arcs shown in the previous section are not exhaustive, they are not sufficient to construct a complete exclusion matrix from a syntactic graph. A complete exclusion matrix can be guaranteed by navigating the parse forest when the syntactic processor collects the triples in the forest to construct a syntactic graph.</Paragraph>
    <Paragraph position="1"> As we have briefly described in Section 3, when the parser constructs a shared, packed forest, triples are also produced, and their indices are kept in the corresponding nonterminal nodes in the forest. 12 The parser navigates the parse forest to collect the triples--in fact, pointers pointing to the triples--and to build an exclusion matrix.</Paragraph>
    <Paragraph position="2"> As we can see in the parse forest in Figure 5, there may be several nonterminal nodes in one packed node.</Paragraph>
    <Paragraph position="3"> For each packed node, the parser collects all indices of triples in the subforests whose root nodes are the nonterminal nodes in the packed node, and then records those indices to the packed node. After the parser finishes collecting the indices of the triples in the parse forest, each packed node in the forest has a pointer to the list of collected indices from its subforest. Therefore, the root node of a parse forest has a pointer to the</Paragraph>
    <Paragraph position="5"> subforest subforest subforest D : packed node ==\]~ : list of all triples below this node i * triples of this node %.1~&amp;quot; =~ : pointer to the list of pointers pointing to triples Figure 8 Parse Forest Augmented with Triples. list of all indices of all possible triples in the whole forest, and those triples represent the syntactic graph of the forest.</Paragraph>
    <Paragraph position="6"> Figure 8 shows the upper part of the parse forest in Figure 5 after collecting triples. A hooked arrow of each nonterminal node points to the list of the indices of the triples that were added to the node in parsing. For example, pointer 2 contains the indices of the triples added to the node snt by the grammar rule: snt ~ np + vp A simple arrow for each packed node points to the list of all indices of the triples in the forest of which it is the root. This list is generated and recorded after the processor collects all indices of triples in its subnodes. Therefore the arrow of the root node of the whole forest, Pointer 1, contains the list of all indices of the triples in the whole forest.</Paragraph>
    <Paragraph position="7"> Since several indices may represent the same triple, after collecting all the indices of the triples in the parse forest, the parser removes duplicating triples in the final representation of the syntactic graph of a sentence. Collecting pointers to triples in the subforest of a packed node and constructing the Ematrix is done recursively as follows: First, Ematrix(i j) is initialized to 1, which means all arcs are marked exclusive of each other. Later, if any two arcs indexed with i and j Computational Linguistics, Volume 15, Number 1, March 1989 27 Jungyun See and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees function collect_triple(Packed_node) if Packed node. Collected if the indices of triples are already collected then return(Packednode. Triplelndex) ~ collected then. return the collected indices else Packed node. TripleIndex := eollectl(Packed_node) else collected them Packed node. Collected := true ~ set flag Collected.</Paragraph>
    <Paragraph position="8"> return?Packed_node. TriplsIndex) ~ return collected indices. function collectl(Packed_node) Triple_Indices: = { } for each Node in Packed node do  for each index i in Tripl do foreach index j in Trip2 do Ematrix(i, j):= 0 Ematrix(j. i):= 0 /*Ematrixissymmetric*/ function fully_cooccur(Triples) for each pair of indices i and j in Triples do Ematrix(i, j):= 0  co-occur in some parse tree, then the value of Ematrix, E(ij), is set to 0. For each nonterminal node in a packed node, the parser collects every index appearing below the nonterminal node--i.e., the index of the triples of its subnodes. If a subnode of the nonterminal node was previously visited, and its indices were already collected, then the subnode already has the pointer to the list of collected indices. Therefore the parser does not need to navigate the same subforests again, but it takes the indices using the pointer. The algorithm in pseudo-PASCAL code is in Figure 9.</Paragraph>
    <Paragraph position="9"> After the parser collects the indices of the triples from the subnodes of the nonterminal node, it adjusts the values in the exclusion matrix according to the following cases: I. If the nonterminal node has one child node, its own triples can co-occur with each other, and with every collected triple from its subforest.</Paragraph>
    <Paragraph position="10"> 2. If the nonterminal node has two child nodes, its own triples can co-occur with each other and with the triples collected from both left and right child nodes, and the triples from the left child node can co-occur with the triples from the right one.</Paragraph>
    <Paragraph position="11"> This algorithm is described in Figure I0.</Paragraph>
    <Paragraph position="12"> For example, the process starts to collect the indices of the triples from SNT node in Figure 8. Then, it collects the indices in the left subforest whose root is np. After all indices of triples in the subforest of np are collected, those indices and the indices of the triples of the node in 6 are recorded in 5. Similarly all indices in 7 and 4 are recorded in 3 as the indices of the triples in the right subforest of the snt node. The indices in 5 and 3 and the indices in 2 are recorded in I as the indices of the triples of the whole parse forest. In packed nodes with more than one nonterminal node, like vpl, all indices of the triples in the three subforests of vpl and  the indices in 8, 9, and 10 are collected and recorded in 7.</Paragraph>
    <Paragraph position="13"> By the first case in the above rule, every triple represented by the indices in 4 can co-occur with each other, and every triple represented by the indices in 4 can co-occur with every triple represented by the indices in 7. One example of the second case is that every triple represented by the indices in 2 can co-occur with each other, and every triple represented by the indices in 2 can co-occur with every triple represented by the indices in 5 and 3. Every triple represented by the indices in 5 can co-occur with the triples represented by the indices in 3. Whenever the process finds a pair of co-occurring triples it adjusts the value of Ematrix appropriately.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML