File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1026_metho.xml
Size: 23,954 bytes
Last Modified: 2025-10-06 14:12:18
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1026"> <Title>TINA: A PROBABILISTIC SYNTACTIC PARSER FOR SPEECH UNDERSTANDING SYSTEMS*</Title> <Section position="4" start_page="168" end_page="170" type="metho"> <SectionTitle> GENERAL DESCRIPTION </SectionTitle> <Paragraph position="0"> TINA is basically a context-free grammar, implemented by expansion at run-time into a network structure, and augmented with flags/parameters that activate certain filtering operations. The grammar is built from a set of training sentences, using a bootstrapping procedure. Initially, each sentence is translated by hand into a list of the rules invoked to parse it. After the grammar has built up a substantial knowledge of the language, many new sentences can be parsed automatically, or with minimal intervention to add a few new rules incrementally. The arc probabilities can be incrementally updated after the successful parse of each new sentence.</Paragraph> <Paragraph position="1"> of grammar nodes. All rules with the same LHS are combined to form a structure describing possible interconnections among children of a parent node associated with the left-hand category. A probability matrix connecting each possible child with each other child is constructed by counting the number of times a particular sequence of two siblings occurred in the RHS's of the common rule set, and normalizing by counting all pairs from the particular left-sibling to any right sibling. Two distinguished nodes, a START node and an END node, are included among the children of every grammar node. A subset of the grammar nodes are terminal nodes whose children are a list of vocabulary words.</Paragraph> <Paragraph position="2"> A functional block diagram of the control strategy is given in Figure 2. At any given time, a distinguished subset of &quot;active&quot; parse nodes are arranged on a priority queue. Each parse node contains a pointer to a grammar node of the same name, and has access to all the information needed to pursue its partial theory. The top node is popped from the queue, and it then creates a number of new nodes (either children or right siblings depending on its state), and inserts them into the queue according to their probabilities. If the node is an END node, it collects up all subparses from its sequence of left siblings, back to the START node, and passes the information up to the parent node, giving that node a completed subparse. The process can terminate on the first successful completion of a sentence, or the Nth successful completion if more than one hypothesis is desired.</Paragraph> <Paragraph position="3"> A parse in TINA is begun by creating a single parse node linked to the grammar node SENTENCE, and entering it on the queue with probability 1.0. This node creates new parse nodes with categories like STATEMENT, QUESTION, and REQUEST, and places them on the queue, prioritized. If STATEMENT is the most likely child, it gets popped from the queue, and returns nodes indicating SUBJECT, IT, etc., to the queue. When SUBJECT reaches the top of the queue, it activates units such as NOUN-GROUP (for noun phrases and associated post-modifiers), GERUND, and NOUN-CLAUSE. Each node, after instantiating first-children, becomes inactive, pending the return of a successful subparse from a sequence of children. Eventually, the cascade of first-children reaches the terminal-node ARTICLE, which proposes the words &quot;the,&quot; &quot;a,&quot; and &quot;an,&quot; testing these hypotheses against the input stream. If a match with &quot;the&quot; is found, then the ARTICLE node fills its subparse slot with the entry (ARTICLE &quot;the&quot;), and activates all of its possible right-siblings.</Paragraph> <Paragraph position="4"> Whenever a terminal node has successfully matched an input word, the path probability is reset to 1.0. Thus the probabilities that are used to prioritize the queue represent not the total path probability but rather the probability given the partial word sequence. Each path climbs up from a terminal node and back down to a next terminal node, with each new node adjusting the path probability by multiplying by a new conditional probability. The resulting conditional path probability for a next word represents the probability of that word in its syntactic role given all preceding words in their syntactic roles. With this strategy, a partial sentence does not become increasingly improbable as more and more words are added, x.</Paragraph> </Section> <Section position="5" start_page="170" end_page="170" type="metho"> <SectionTitle> NATURAL LANGUAGE ISSUES </SectionTitle> <Paragraph position="0"> This section describes how TINA handles agreement constraints and long distance movement, issues that are usually considered to be part of the task of a syntactic parser. Movement concerns a phenomenon of displacing a unit from its natural position in a phrase, usually to a preceding position. Such &quot;gaps&quot; occur commonly, for instance, in questions and passive voice, as in &quot;(Which article)i do you think I should read (ti)?&quot;.2 TINA is particulary effective in handling gaps. Complex cases of nested or chained gaps are handled correctly, and most ill-formed gaps are rejected. The mechanism resembles the &quot;hold&quot; register idea of ATN's \[1\] and the treatment of bounded domination metavariables in LFG's (\[2\], p. 235 if), but seems to be more straightforward than both of these.</Paragraph> <Paragraph position="1"> Each parse node comes equipped with ~i number of slots for holding information that is relevant to the parse. Included are person and number, verb-form (root, finite, etc.) and two special slots, the current-focus and the float-object, that are concerned with long-distance movement. This information is passed along from node to node: from parent to child, child to parent, and left-sibling to right-sibling. Certain nodes have the power to adjust the values of these features. The adjustment may take the form of an unconditional override, or it may be a constraint that must have a non-null union with the value for that feature passed to the node from its relative, as will become clear in the next section.</Paragraph> </Section> <Section position="6" start_page="170" end_page="170" type="metho"> <SectionTitle> VERB-FORM AND AGREEMENT </SectionTitle> <Paragraph position="0"> Certain nodes have special powers to set the verb-form either for their children or for their right-siblings.</Paragraph> <Paragraph position="1"> Thus, for example, HAVE as an auxilliary verb sets verb-form to past-participle for its right-siblings. The category GERUND sets the verb-form to present-participle for its children. Whenever a PREDICATE node is invoked, the verb-form has always been set by a predecessor.</Paragraph> <Paragraph position="2"> Certain nodes specify person/number restrictions which then propagate up to higher levels and back down to later terminal nodes. Thus, for example, A NOUN-PL node sets the number to \[PL\], but only if the left sibling passes to it a description for number that includes \[PL\] as a possibility (otherwise it dies, as in &quot;each boats&quot;). This value then propagates up to the SUBJECT node, across to the PREDICATE node, and down to the verb, which then must agree with \[PL\], unless its verb-form is marked as non-finite. A more complex example is a compound noun phrase, as in &quot;Both John and Mary have decided to go.&quot; Here, each individual noun is singular, but the subject expects a plural verb (have rather than has). TINA deals with this by making use of a node category AND-NOUN-PHRASE, which sets the number constraint to \[PL\] for its parents, and blocks the transfer of number information to its children. The OBJECT node blocks the transfer of any predecessor person/number information to its children, reflecting the fact that verbs agree in person/number with their subject but not their object.</Paragraph> </Section> <Section position="7" start_page="170" end_page="172" type="metho"> <SectionTitle> GAPS </SectionTitle> <Paragraph position="0"> The mechanism to deal with gaps involves four special types of grammar nodes, identified as generators, activators, blockers, and absorbers. Generators are parse nodes whose grammatical category allows them to fill the current-focus slot with the subparse returned to them by their children. The current-focus is passed on to right-siblings and their descendents, but not to parents, and thus effectively reaches nodes that are c-commanded by the generator and its descendents \[5,6\]. Activators are nodes that move the current-focus into the float-object slot. They also require that the float-object be absorbed somewhere among their descendants. Blockers (such as SUBJECT) are nodes that block the transmission of the float-object to their children. Finally, absorbers are allowed to use the float-object as their subparse.</Paragraph> <Paragraph position="1"> A simple example will help explain how this works. For the sentence &quot;(How many pies)/ did Mike buy (tl)?&quot; as illustrated by the parse tree in Figure 3, the Q-SUBJECT &quot;how many pies&quot; is a generator, so it fills the current-focus with its subparse. The DO-QUESTION is an activator; it moves the current-focus into the float-object position. Finally, the object of &quot;buy,&quot; an absorber, takes the Q-SUBJECT, as its subparse. The DO-QUESTION refuses to accept any solutions from its children if the float-object has not been absorbed.</Paragraph> <Paragraph position="2"> Thus, the sentence &quot;How many pies did Mike buy the pies?&quot; would be rejected. Furthermore, the same DO-QUESTION node deals with the sentence &quot;Did Mike buy the pies?,&quot; except in this case there is no current-focus and hence no gap.</Paragraph> <Paragraph position="3"> More complicated sentences involving nested or chained traces, are handled staightforwardly by this scheme. For instance, the sentence &quot;(Which hospital)./ was (Jane)/ taken (ti) to (t./)?&quot; can be parsed correctly by TINA, identifying &quot;Jane&quot; as the object of &quot;taken&quot; and &quot;which hospital&quot; as the object of &quot;to.&quot; This works because the VERB-PHRASE-P-O, an activator, writes over the float-object &quot;Which hospital&quot; with the new entry &quot;Jane,&quot; but only for its children. The original float-object is still available to fill the OBJECT slot in the following prepositional phrase.</Paragraph> <Paragraph position="4"> The example used to illustrate the power of ATN's \[1\], &quot;John was believed to have been shot,&quot; also parses correctly, because the OBJECT node following the verb &quot;believed&quot; acts as both an absorber and a (re)generator. Cases of crossed traces, which are blocked by the Strict Cycle Condition and the Subjacency Condition in the Government/Binding rule system \[7\] are automatically blocked here because the second current-focus gets moved into the float-object position at the time of the second activator, overriding the preexisting float-object set up by the earlier activator. The wrong float-object is available at the position of the first trace, and the parse dies, as in the following agrammatical sentence: The current-focus slot is not restricted to nodes that represent nouns. Some of the generators are adverbial or adjectival parts-of-speech (POS). An absorber checks for agreement in POS before it can accept the float-object as its subparse. As an example, the question, &quot;(How oily)i do you like your salad dressing (ti)?&quot; contains a Q-SUBJECT &quot;how oily&quot; that is an adjective. The absorber PRED-ADJECTIVE accepts the available float-object as its subparse, but only after confirming that POS is adjective, as shown in the parse tree in Figure 4.</Paragraph> <Paragraph position="5"> The current-focus has a number of other uses besides its role in movement. It plays an important part in identifying the subject of verbs and in establishing the references for pronouns. For a complete description of these and other topics, the interested reader is referred to \[4\].</Paragraph> </Section> <Section position="8" start_page="172" end_page="173" type="metho"> <SectionTitle> EVALUATION MEASURES </SectionTitle> <Paragraph position="0"> This section addresses several distinct performance measures for a grammar, including coverage, overgeneration, portability, perplexity and trainability. Coverage/overgeneration are concerned with the degree to which the grammar is able to capture appropriate generalities while rejecting ill-formed sentences. Perplexity, roughly defined as the geometric mean of the number of alternative word hypotheses that may follow each word in the sentence, is of particular concern in spoken language tasks. Portability and trainability concern the ease with which an existing grammar can be ported to a new task, as well as the amount of training data necessary before the grammar is able to generalize well to unseen data.</Paragraph> <Paragraph position="1"> To address these issues, we used two sets of sentences. The first set is the 450 compact acoustic-pho,etic sentences of the TIMIT database \[8\]. These sentences represent a fairly complex syntax, including questions, passive voice, compound and complex sentences, relative clauses, subjunctive form, comparatives, etc. They represent a semantically unrestricted space, which makes it hard to use them for tests of constraint reduction due to semantic filtering. The second set of sentences has become popular in the DARPA speech research community for both speech recognition and natural language processing applications. They concern a naval Resource Management (RM) task, and are fairly restrictive semantically. A particular subset of 791 designated training sentences and 200 designated test sentences from this task have been selected by researchers at Bolt Beranek and Newman, Inc. for studies in natural language. We have used these two sets for testing portability, perplexity and coverage.</Paragraph> <Paragraph position="2"> One of the unique aspects of TINA is that a grammar can be acquired automatically from a set of parsed sentences. A typical procedure is to gradually build up the rule system by parsing new sentences one by one, introducing new arcs as needed. Once a full set of sentences has been parsed in this fashion, the parse trees from the sentences are automatically converted to the set of rules used to generate each sentence. The training of both the rule set and the probability assignments is established directly from the provided set of parsed sentences; i.e. the parsed sentences are the grammar.</Paragraph> <Paragraph position="3"> We took advantage of this feature to test the system's capability of generalizing to unseen data from a small set of sentence examples. Since there were only 450 sentences in the TIMIT task, we were unwilling to set aside a portion of these as designated test sentences. Instead, we built a grammar that could parse all of the sentences, and then generated a subset grammar from 449 of the sentences, testing this grammar for coverage on the remaining one. We cycled through all 450 sentences in this fashion.</Paragraph> <Paragraph position="4"> Our experiments were conducted as follows. We first built a grammar from the 450 TIMIT sentences.</Paragraph> <Paragraph position="5"> We tested coverage on these sentences using the above jackknifing strategy. We assessed overgeneration by generating sentences at random from the TIMIT grammar, checking whether these sentences were well-formed syntactically. We then tested portability by beginning with the grammar established from the TIMIT task and then deriving a grammar for the 791 designated training sentences of the RM task. Within this task it was possible to make use of semantic categories, particularly within noun phrases, in order to reduce perplexity. We measured both perplexity and coverage on the remaining 200 test sentences of this task, using a grammar built automatically from the 791 parsed training sentences.</Paragraph> </Section> <Section position="9" start_page="173" end_page="174" type="metho"> <SectionTitle> COVERAGE WITHIN TIMIT </SectionTitle> <Paragraph position="0"> The result of the jackknifing experiment was that 75% of the unseen sentences were successfully parsed based on structures seen in the remaining 449 sentences. In most cases where the system failed, a single unique form occurred somewhere in the unseen sentence that had not appeared in any of the other sentences, as illustrated in Table 1. We do not mean to suggest that a grammar should not be able to handle such forms; however, we are encouraged that three quarters of the sentences could parse based on such a small amount of training data. It suggests that the system can learn generalizations fairly quickly.</Paragraph> <Paragraph position="1"> rather x than y: why predicate: Withdraw only as much money as you need.</Paragraph> <Paragraph position="2"> I'd rather not buy these shoes than be overcharged.</Paragraph> <Paragraph position="3"> Why buy oil when you always use mine? triple verb: Laugh, dance, and sing, if fortune smiles on you.</Paragraph> <Paragraph position="4"> &quot;are both&quot;: The patient and the surgeon are both recuperating from the lengthy operation.</Paragraph> <Paragraph position="5"> adjunct as subject: Right now may not be the best time for business mergers. An example of a sentence that succeeded is given in Table 2, along with a list of sentences that could be used to build the portions of the parse tree necessary for the successful parse. The parse of this sentence is given in Figure 4. Because of the sharing of structures among rules with the same LHS, the system is capable of synthesizing new rules from pieces of other rules, which allows it to learn more quickly.</Paragraph> </Section> <Section position="10" start_page="174" end_page="174" type="metho"> <SectionTitle> OVERGENERATION </SectionTitle> <Paragraph position="0"> The issue of overgeneration is extremely important for spoken language tasks, because of the need to keep perplexity as low as possible. TINA can be run in generation mode, where, instead of proposing all alternatives at a decision point, a random number generator is used to select a particular decision.</Paragraph> <Paragraph position="1"> Generation mode is an extremely useful tool for discovering errors in the grammar. Randomly generated sentences are typically syntactically correct but semantically anomalous. Occasionally an ill-formed sentence is generated, due to inappropriate generalities in the rules. Usually the situation can be corrected through rule modification.</Paragraph> <Paragraph position="2"> Since all of the arcs have assigned probabilities, the parse tree is traversed by generating a random number at each node and deciding which arc to take based on the outcome, using the arc probabilities to weight the alternatives. Some examples of sentences generated in this way are given in Table 3. While many of these sentences are clearly nonsense, due to the complete absence of semantic constraint, they all appear to be syntactically representative of the language. It is clear that proper semantic constraint would greatly decrease the perplexity, although, given the rich semantic base of the 450 sentences, it is not reasonable to expect to build a suitable semantic component for them. Applying semantic constraint appropriate in a sublanguage that a natural language system can interpret, however, is a much more feasible undertaking.</Paragraph> <Paragraph position="3"> Wash, of course, but puree the high hats and article colleges straight ahead.</Paragraph> <Paragraph position="4"> How did the income of gold open the algebraic hit? A child of execution stole the previous attitude.</Paragraph> <Paragraph position="5"> Which tunafish would lots of the muscles smash? Make a scholastic marriage; then only get ski fangs under a medical film.</Paragraph> <Paragraph position="6"> Whenever a simple perfume must diminish near stew, enter.</Paragraph> <Paragraph position="7"> It is fun to pledge to break down.</Paragraph> <Paragraph position="8"> Hyenas might be used to eat every coach.</Paragraph> <Paragraph position="9"> The screen is surely blistered occasionally.</Paragraph> </Section> <Section position="11" start_page="174" end_page="175" type="metho"> <SectionTitle> PORTABILITY </SectionTitle> <Paragraph position="0"> We tested ease of portability for TINA by beginning with a grammar bulit from the 450 TIMIT sentences and then deriving a grammar for the RM task. These two tasks represent very different sentence types. For instance, the overwhelming majority of the TIMIT sentences are statements, whereas there are no statements in the RM task, which is made up exclusively of questions and requests. The process of conversion to a new grammar involves parsing the new sentences one by one, and adding context-free rules whenever a parse fails. The person entering the rules must be very familiar with the grammar structure, but for the most part it is straightforward to identify and incrementally add missing arcs. The parser identifies where in the sentence it fails, and also maintains a record of the successful partial parses. These pieces of information usually are adequate to pinpoint the missing arcs. It required less than one person-month to convert the grammar from TIMIT to the RM task.</Paragraph> </Section> <Section position="12" start_page="175" end_page="175" type="metho"> <SectionTitle> PERPLEXITY AND COVERAGE WITHIN aM TASK </SectionTitle> <Paragraph position="0"> We built a subset grammar from the 791 parsed RM training sentences, and then used this grammar to test coverage and perplexity on the unseen 200 test sentences. The grammar could parse all of the training sentences and 78.5% of the test sentences. We are unwilling to examine the test sentences, as we may be using them for further evaluations in the future. Therefore, we cannot yet assess why a particular sentence failed, or whether the parse found by the grammar was actually the correct parse.</Paragraph> <Paragraph position="1"> A formula for the test set perplexity is \[9\]:</Paragraph> <Paragraph position="3"> where the wi are the sequence of all words in all sentences, N is the total number of words, including an &quot;end&quot; word after each sentence, and P(wilwl-1,...wl) is the probability of the ith word given all preceding words. 3 If all words are assumed equally likely, then P(wilwi_l, ...wl) can be determined by counting all the words that could follow each word in the sentence, along all workable partial theories. If the grammar contains probability estimates, then these can be used in place of the equally-likely assumption. If the grammar's estimates reflect reality, the estimated probabilities will result in a reduction in the total perplexity.</Paragraph> <Paragraph position="4"> An average perplexity for the 157 test sentences that were parsable was computed for the two conditions, without (Case 1) and with (Case 2) the estimated probabilities. The result was a perplexity of 374 for Case 1, but only 41.7 for Case 2. This is with a total vocabulary size of 985 words, and with a grammar that included several semantically restricted classes such as SHIP-NAME and READINESS-CATEGORY. The incorporation of arc probabilities reduced the perplexity by a factor of nine, a clear indicator that a proper mechanism for utilizing probabilities in a grammar can help significantly.</Paragraph> </Section> class="xml-element"></Paper>