File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1422_metho.xml

Size: 11,763 bytes

Last Modified: 2025-10-06 14:15:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1422">
  <Title>Fully Lexicalized Head-Driven Syntactic Generation</Title>
  <Section position="3" start_page="0" end_page="211" type="metho">
    <SectionTitle>
4 Off-Line Preprocessing: HPSG to TAG Compilation
</SectionTitle>
    <Paragraph position="0"> The subtasks in a direct syntactic generator based on an HPSG grammar will always include the application of schemata (the HPSG equivalent of phrase structure rules) such that all syntactic constraints introduced by a lexical item (especially its SUBCAT list)are fulfilled. This results in a constant repetition of, e.g., building up the projection of a verb in a declarative sentence.</Paragraph>
    <Paragraph position="1"> In preprocessing the HPSG grammar we aim at computing all possible partial phrase structures which can be derived from the information in a lexicon entry. Given such sets of possible syntactic realization together with a set of selected lexicon entries for an utterance and finally their dependencies, the task of a syntactic generator is simplified considerably. Instead of exploring all  possible, computationally expensive applications of HPSG schemata, it merely has to find suitable precomputed syntactic structures for each lexical item and combine them appropriately.</Paragraph>
    <Paragraph position="2"> For this preprocessing of the HPSG grammar, we adapted the 'HPSG to TAG compilation' process described in \[Kasper et al. 1995\]. The basis for the compilation is an identification of syntactically relevant selector features which express subcategorization requirements of a lexical item, e.g. the VALENCE features. In general, a phrase structure is complete when these selector features are empty.</Paragraph>
    <Paragraph position="3"> Starting from the feature structure for a lexical item, HPSG schemata are applied such that the current structure is unified with a daughter feature of the schema. The resulting structure is again subject to this process: This compilation process stops when certain termination criteria are met, e.g., when all selector features are empty. Thus, all projections from the lexical item are collected as a set of minimally complete phrase structures which can also be interpreted as elementary trees of a Tree-Adjoining Grammar (TAG).</Paragraph>
    <Paragraph position="4"> Instead of actually applying this compilation *process to all lexical items, certain abstractions over the lexical entries are specified in the HPSG grammar. In fact, the needs of the compilation process have led to a clear-cut separation of lexica! types and lexical entries as shown in Figure 1. A typical lexical entry is shown in Figure 2 and demonstrates that only three kinds of information are stored: the lexical type MV_NP_TRANS_LE 4, the semantic contribution (th e relation _SUIT_REL) and morphological information (the stem and potentiallyirregular forms): By expanding the lexical type, the full feature structure can be obtained.</Paragraph>
    <Paragraph position="5">  Some of the trees which result from the preprocessing of the lexical type MV_NP_TRANS_LE are shown in Figure 3. The figure Shows only the phrase structure and an abstraction of the 4MV_NP_TRANS_LE is an abbreviation for &amp;quot;Main Verb, NP object, TRANSitive Lexical Entry&amp;quot; used in sentences like  MV_NP_TRANS_LE.2 of Figure 3 represents an imperative clause. As a consequence PERSON has the value SECOND and CL-MODE is set to IMPERATIVE. Note that the compilation process stopped at this node since the selector features are empty.</Paragraph>
    <Paragraph position="7"> type MV_NP_TRANS_/E as defined in the HPSG grammar. Trees 3 and 4 differ only with respect to their feature structures which are not shown in this figure.</Paragraph>
    <Paragraph position="8"> From these trees, two kinds of knowledge bases are built. For the microplanner, the relation between the lexical and syntactic realization and the semantic representation (encoded in the SYNSEM LOCAl CONT feature) is extracted as a constraint. For the syntactic generator, the relevant syntactic information is extracted in the form of a Feature-Based Lexicalized TAG (FB-LTAG) grammar, see \[Joshi 1987, Vijay-Shanker and Joshi 1991, Schabes, Abeill4, and Joshi 1988\]. This includes the phrase structure and a selected part of the feature structure (mainly the SYNSEM LOCAL CAT and SYNSEM NON-LOCAL features). Figure 4 shows the bottom feature structure extracted from the root node of MV_NP_TRANSJE.2. Note that some of the feature paths are abbreviated, e.g. 5LCI stands for SYNSEM LOCAL CONT INDEX. The elementary TAG trees which are built from the compilation result have so-called restricted *feature structures which can be exploited for an efficient, specialized unification algorithm.</Paragraph>
    <Paragraph position="9"> The node names shown in the figures represent a disjunction of possible categories, e.g. NP.S.COMP in tree MV_NP_TRANS_LE.3 implies that the subject of a transitive verb may be a nominal or sentential phrase.</Paragraph>
  </Section>
  <Section position="4" start_page="211" end_page="214" type="metho">
    <SectionTitle>
5 The Syntactic Generator VM-GIFT
</SectionTitle>
    <Paragraph position="0"> The task of the syntactic generator is the construction of a sentence (or phrase, given the often incomplete utterances in spoken dialogs) from the microplanning result which is then sent to a speech-synthesis component. It proceeds in three major steps which are also depicted in Fig. 5.</Paragraph>
    <Paragraph position="1"> * A tree selection phase determines the set of relevant TAG trees. A first tree retrieval step maps every object of the dependency tree into a set of applicable elementary TAG trees. The main tree selection phase uses information from the microplanner output to further refine the set of retrieved trees.</Paragraph>
    <Paragraph position="2"> * A combination phase finds a successful combination of trees to build a (derived) phrase structure tree.</Paragraph>
    <Paragraph position="3"> * An inflection phase uses the information in the feature structures of the leaves (i.e. the words) to apply appropriate morphological functions, including the use of irregular forms as provided by the HPSG lexiconand regular inflection function as supplied (as LISP code) by the HPSG grammar.</Paragraph>
    <Paragraph position="4"> An initial preprocessing phase computes: the necessary auxiliary verbs from the tense, aspect, and sentence mood information. It also rearranges the dependency tree accordingly (e.g. subject arguments are moved from the main verb to become dependents of the inflected auxiliary verb). The two core phases are the tree selection and the tree combination phase. The tree selection phase consists of two steps. First, a set of possible trees is retrieved and then appropriate trees are selected from this set. The retrieval is driven by the HPSG instance or word class that is supplied by the microplanner. It is mapped to a lexical type by a lexicon that is automatically compiled from the HPSG grammar. The lexical types are then mapped to a tree family, i.e., a set of elementary TAG trees representing all possible minimally complete phrase structures that can be build from the instance. The additional information in the dependency tree is then used to add further feature</Paragraph>
    <Paragraph position="6"> values to the trees. This additional information acts as a filter for selecting appropriate trees in two stages: * Some values are incompatible with values already present in the trees. These trees can therefore be filtered immediately from the set. E.g., a syntactic structure for an imperative clause is marked as such by a feature and can be discarded if a declarative sentence is to be generated.</Paragraph>
    <Paragraph position="7"> * Additional features can prevent the combination with other trees during the combination phase. This is the case, for example with agreement features.</Paragraph>
    <Paragraph position="8"> The combination phase explores the search space of all possible combinations of trees from the candidate sets for each lexical item (instance). An inefficient combination phase is a potential drawback of using the precomputed TAG trees. However, there is sufficient information available fl'om the microplanner result and from the trees such that a well:guided best-first search strategy can be employed in the current system. The difference in run-time can be as dramatic as 24 seconds (comprehensive breadth-first) versus 1.5 seconds (best-first).</Paragraph>
    <Paragraph position="9"> As part of the tree selection phase, based on the rich annotation of the input structure, the tree sets are sorted locally. Then a backtracking algorithm traverses the dependency tree in a bottom-up fashion s. At each node, and for each subtree in the dependency tree, a candidate for the phrase structures of the subtree is constructed. Then all possible adjunction or substitution sites are computed, possibly sorted (e.g. allowing for preferences in word order) and the best candidate for a combined phrase structure is returned. Since the combination of two partial phrase structures by adjunction or substitution might fail due to incompatible feature structures , a backtracking  algorithm must be used. A partial phrase structure for a subtree of the dependency is finally checked for completeness. These tests include the unifiability of all top and bottom feature structures and the satisfaction of all other constraints (e.g. obligatory adjunctions or open substitution nodes) since no further adjunctions or substitutions will occur in this subtree.</Paragraph>
    <Paragraph position="10"> The necessity of a spoken dialog translation system to produce output robustly calls for some relaxations in these tests. E.g., 'obligatory' arguments may be missing in the utterance and the tests in the syntactic generator must accept a sentence with a missing obligatory object if no other complete phrase can be generated.</Paragraph>
    <Paragraph position="11"> Figure 6 shows an example of the input of from the microplanner after the preprocessing phase has inserted the entity LGV1 for the auxiliary will.</Paragraph>
    <Paragraph position="12">  In the tree retrieval phase for L5-WORK_ACCEPTABLE, first the HEAD information is used to determine the lexical types of the possible realizations SUIT_V1 and SUIT_V2, namely MV_NP_TRANS_LE and MV_EXPL_PREP_TRANSIE respectively. These types are then mapped to their respective sets of elementary trees, a total of 25 trees. In the tree selection phase (as described above), this number is reduced to six. For example, the tree MV_NP_TRANS_LPS.2 in Figure 3 has a feature CL-MODPS with the value IMPERATIVE Now, the microplanner output for the root entity LGV1 contains the information (INTENTION WH-QUESTION) The NTENTION information is unified with all appropriate Ck-MODPS features, which in this case fails. Thereforethe tree MV_NP_TRANS_kPS.2 can be discarded in the tree selection phas e .</Paragraph>
    <Paragraph position="13">  The combination phase uses the best-first bottom-up algorithm described above to determine one suitable tree for every entity and also a target node in the tree that is selected for the governing entity. For the above example, the selected trees and their combination nodes are shown in Figure  connect to suitable substitution or adjunction nodes. They correspond to the dependency tree. The inflection function finally uses attribute values like verb-form, number and person from the final tree to derive the correct inflections. Information about the sentence mode WH-QUESTION can be used to annotate the resulting string for the speech-synthesis module.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML