File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1017_metho.xml
Size: 17,000 bytes
Last Modified: 2025-10-06 14:14:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1017"> <Title>An Efficient Kernel for Multilingual Generation in Speech-to-Speech Dialogue Translation</Title> <Section position="3" start_page="110" end_page="112" type="metho"> <SectionTitle> 2 The Microplanner </SectionTitle> <Paragraph position="0"> A generation system for target language utterances in an approach to speech-to-speech translation has to work on input elements representing intermediate results of recognition, analysis, and transfer components. In that setting, several of the tasks of a complete natural language generation system such as selection and organization of the contents to be expressed are outside of the control of our generator. They have been decided by the human user of the translation system or they have been negotiated and computed by a transfer component.</Paragraph> <Paragraph position="1"> Nevertheless, there remain a number of different but highly interrelated subtasks of the generation process where decisions have to be made in order to determine and realize the translation result to be sent to a speech synthesis component. The diverse subtasks -- often collectively denoted as microplanning (cf. (Levelt, 1989; Hovy, 1996)) -- comprise the planning of a rough structure of the target language utterance, the determination of sentence borders, sentence type, topicalization, theme-rheme organization of sentential units, focus control, utilization of nominalized, or infinitival style, as well as triggering the generation of anaphora and lexical choice. In addition, they have to address the problem of expressibility of the selected contents in a text realization component, i.e., bridging the generation gap (see (Meteer, 1990)).</Paragraph> <Paragraph position="2"> The input to our microplanning component consists of semantic representations encoded in a minimal recursive structure following a variant of UDRT. Each individual indicated by some input utterance is formally represented by a discourse referent. Information about the individual is encoded within the DRS-conditions.</Paragraph> <Paragraph position="3"> Relations between descriptions of different discourse referents lead to a hierarchical semantic structure (see Figure 1 for a graphical representation of fragments of an example input to the generator). Discourse referents are depicted as boxes headed by individual names in; conditions are illustrated within those boxes.</Paragraph> <Paragraph position="4"> Besides these input terms from the transfer component, the generator may access knowledge about the dialogue act, the dialogue history as well as some prosodic information of the user's utterance.</Paragraph> <Paragraph position="5"> The output of the microplanner is a sentence plan that serves as input for the syntactic realization component. It describes a dependency tree over lexical items annotated with syntactic, semantic, and pragmatic information which is relevant to produce an acceptable utterance and guide the speech synthesis component.</Paragraph> <Section position="1" start_page="110" end_page="112" type="sub_section"> <SectionTitle> 2.1 Design of the Microplanning Kernel </SectionTitle> <Paragraph position="0"> An important design principle of our generator is the demand to cope with multidirectional dependencies among decisions of the diverse sub-tasks of microplanning without preferring one order of decisions over others. E.g., the choice of an interrogative sentence requires an (at least elliptical) verbal phrase as a major constituent of the sentence; nominalization or the choice of passive voice depends on the result of word choice, etc. Therefore, we conceived microplanning as a constraint-satisfaction problem (Kumar, 1992) representing undirected relations between variables. Thereby, variables are created for elements in the input to the generator. They are connected by means of weighted constraints.</Paragraph> <Paragraph position="1"> The domains of the variables correspond to abstractions of possible alternatives for syntactic realizations of the semantic elements including sets of specifications of lexical items and syntactic features. A solution of the constraint system is a globally consistent instantiation of the variables and is guaranteed to be a valid input for the syntactic generation module. Since there might be locally optimal mappings that lead to contradiction on a global level, the microplannet generally uses these weighted constraints to direct a backtracking or propagation process.</Paragraph> <Paragraph position="2"> One the one hand, the advantages of utilizing a constraint system lie in the declarativity of the knowledge sources allowing for an easier adaptation of the system to other domains and languages. We benefited from this design decision and realized microplanning for English and German by means of merely establishing new rule sets for lexical and syntactic choice. The core engine for constraint processing was reused without modification. On the other hand, having defined a suitable representation of the problem to be solved, a constraint-based approach also establishes a testbed for examining the pros and cons of different evaluation methods, including backtracking, constraint propagation, heuristics for the order of the instantiation of variable values, to name a few means of dealing with competition among alternatives and to find a solution.</Paragraph> <Paragraph position="3"> The microplanner makes use of the minimal recursive structure of its semantic input term (see Fig. 1) by triggering activities by bundles of conditions, discourse referents, and holes representing underspecified scope relations in the input. These three input categories are reflected by different microplanning rule sets that are applied conjointly during the process of microplanning. The rules are represented as patterncondition-action triples. A pattern is to be matched with part of the input, a condition describes additional context-dependent requirements to be fulfilled by the input, and the action part describes a bundle of syntactic features realizing lexical entities and their relations to complements and modifiers.</Paragraph> <Paragraph position="4"> A microplanning rule for the combination of the semantic predicates WORK_ACCEPTABLE, ARG3, and PERSPECTIVE which get realized as a finite verb, i.e., representing a 3:1 mapping of semantic predicates to a syntactic specification is shown in Figure 2.</Paragraph> <Paragraph position="5"> ;; standard finite verb with 2 complements</Paragraph> <Paragraph position="7"> In the condition part of the verbal mapping the existence of a NOM-condition within the semantic input information is tested. It would forbid the verbal form by demanding a nominalized form. The action part describes the result of lexical selection (the lemma &quot;suit&quot;) plus generic functions for computing relevant syntactic features like tense and voice. I2 which stands for the ARG3 of WORK_ACCEPTABLE, defined by a database of linking-information as the semantic agent is characterized as neither allowing gender masc(uline) nor fem(inine) for preventing &quot;he suits&quot; in the sense of &quot;he is okay&quot;. Entries starting with KEY define identifiers used for computing the preference value of a microplanning rule with respect to the given situation.</Paragraph> <Paragraph position="8"> In an additional database, KEYs are associated with weights for predefined situation characteristics such as time pressure, or register. The microplanning content rules are not directly entered by a rule writer but are compiled off-line from several knowledge sources for lexical choice rules, rules for syntactic decisions and linking rules, thereby filtering out contradictory combinations without requiring on-line runtime.</Paragraph> <Paragraph position="9"> Regarding the sets of alternatives that result from the application of the microplanning rules, the most direct way of realizing a constraint net seems to be the definition of one variable for each condition, discourse referent, and hole, leading to a variable net as shown in Figure 3.</Paragraph> <Paragraph position="10"> For our task, it is not enough to define binary matching constraints between each pair of variables that purely test the compatibility of the described syntactic features. Some syntactic specifications may contain identifications of further entities, e.g., discourse referents and syntactic identifiers which influence the result of the compatibility test between a pair of variables referring to these identifiers. Thus, the constraint net is not easily subdivided into subnets that can be efficiently evaluated. The large number of combinations of alternative values is handled by known means for CSP such as uniting variables with 1-value domains and applying matching mechanisms to their values, computation of 2-consistency by matching value pairs and filtering out inconsistent ones, storing and reusing knowledge about binary incompatibility and performing intelligent backtracking. The result of the constraint solving process for the input shown in Fig. 1 is given in Fig. 4.</Paragraph> <Paragraph position="12"/> </Section> </Section> <Section position="4" start_page="112" end_page="114" type="metho"> <SectionTitle> 3 The Realizer </SectionTitle> <Paragraph position="0"> The syntactic realizer 2 proceeds from the microplanning result as shown in Figure 5. It produces a derived phrase structure from which the output string is read off. The realizer is based on a fully lexicalized grammar in the sense that every lexical item selects for a finite set of possible phrase structures (called elementary trees).</Paragraph> <Paragraph position="1"> In particular, we use a Feature-Based Lexicalized Tree-Adjoining Grammar (FB-LTAG, see (Vijay-Shanker and Joshi, 1988; Schabes et at., 1988)) that is derived from an HPSG grammar (see section 4 for some more details). The el} ementary trees (see Figure 9) can be seen as maximal partial projections. A derivation of an utterance is constructed by combining appropriate elementary trees with the two elementary TAG operations of adjunction and substitution.</Paragraph> <Paragraph position="2"> For each node (i.e., lexical item) in the dependency tree, the tree selection phase determines the set of relevant TAG trees. A first tree retrieval step maps every object of the dependency tree into a set of applicable elementary TAG trees. The main tree selection phase uses information from the microplanner output to further refine the set of retrieved trees. The combination phase finds a successful combination of trees to build a (derived) phrase structure tree. The final inflection phase uses the information in the feature structures of the leaves (i.e., the words) to apply appropriate morphological functions. An initial pre-processing phase is needed to accommodate the handling of auxiliaries which are not determined in microplanning. They are derived from the tense, aspect and sentence mood information as supplied by microplanning.</Paragraph> <Paragraph position="3"> The two core phases are the tree selection and the combination phase. The tree selection is driven by the HPSG instance or word class that is supplied by the microplanner. It is mapped to a lexical type by a lexicon that is automatically compiled from the HPSG grammar. The lexical types are then mapped to a tree family, i.e., a set of elementary TAG trees representing all possible minimally complete phrase structures that can be build from the instance. The additional information in the dependency tree is then used to add further feature values to the trees. This additional information acts as a filter for selecting appropriate trees in two stages: Some values are incompatible with values already present in the trees. These trees can therefore be filtered immediately from the set.</Paragraph> <Paragraph position="4"> E.g., a syntactic structure for an imperative clause is marked as such by a feature and can be discarded if a declarative sentence is to be generated. Additional features can prevent the combination with other trees during the combination phase. This is the case, e.g., with agreement features.</Paragraph> <Paragraph position="5"> The combination phase completely belongs to the core machinery. It can be exchanged with more efficient algorithms without change of the grammar or lexicon. It explores the search space of all possible combinations of trees from the candidate sets for each lexical item (instance).</Paragraph> <Paragraph position="6"> Since there is sufficient information available from the microplanner result and from the trees, a well-guided best-first search strategy can be employed in the current system.</Paragraph> <Paragraph position="7"> As part of the tree selection phase, based on the rich annotation of the input structure, the tree sets are sorted locally such that preferred trees are tested first. Then a modified back-tracking algorithm traverses the dependency tree in a bottom-up fashion a. At each node and for each subtree in the dependency tree, a candidate for the phrase structure of the subtree is constructed. Then all possible adjunction or substitution sites are computed, possibly sorted (e.g., allowing for preferences in word order) and the best candidate for a combined phrase structure is returned. Since the combination of two partial phrase structures by adjunction or substitution might fail due to incompatible feature structures, a backtracking algorithm must be 3The algorithm stores intermediate results with a memoization technique.</Paragraph> <Paragraph position="8"> used. A partial phrase structure for a subtree of the dependency is finally checked for completeness. These tests include the unifiability of all top and bottom feature structures and the satisfaction of all other constraints (e.g., obligatory adjunctions or open substitution nodes) since no further adjunctions or substitutions will occur in this subtree.</Paragraph> <Paragraph position="9"> The necessity of a spoken dialog translation system to robustly produce output calls for some relaxations in these tests. E.g., 'obligatory' arguments may be missing in the utterance. This can be caused by ellipsis in sentences such as &quot;Ok, we postpone.&quot; or by false segmentations in the analysis such as segmenting &quot;Wit soIlten (we should) das Treffen verschieben (the meeting postpone).&quot; into two segments &quot;Wit sollten&quot; and &quot;das Treffen verschieben&quot;. In order to generate &quot;postpone the meeting&quot; for the second segment, the tests in the syntactic generator must accept a phrase with a missing subject if no other complete phrase can be generated.</Paragraph> <Paragraph position="10"> Figure 6 shows a combination of the tree retrieval and the tree selection phases. In the tree retrieval phase for L5-WORK.ACCEPTABLE, first the HEAD information is used to determine the lexical types of the possible realizations SUIT_Vl and SUIT_V2, namely MV_NP_TRANS_LE and MV_EXPL_PREP_TRANSIE respectively 4. These types are then mapped to their respective sets of elementary trees, a total of 25 trees. In the tree selection phase, this number is reduced to six.</Paragraph> <Paragraph position="11"> For example, the tree MV_NP_TRANS_LE.2 in Figure 9 has a feature C\[_-MODE with the value IMPERATIVE. Now, the microplanner output for the root entity LGVI contains the information (INTENTION WH-QUESTION). The INTENTION information is unified with all appropriate CL-MODE features, which in this case fails. Therefore the tree MV_NP_TRANS_LE.2 is discarded in the tree selection phase.</Paragraph> <Paragraph position="12"> The combination phase uses the best-first bottom-up algorithm described above to determine one suitable tree for every entity and also a target node in the tree that is selected for the governing entity. For the above example, the selected trees and their combination nodes are ;; traverse for: LS-WORK_ACCEPTABLE returned MV_NP_TRANS_LE returned MV_EXPL_PREP_TRANS_LE total: 6 trees ;; traverse for: LI3-PRON returned PERS_PRO_LE total: 1 tree ;; traverse for: LIO-PRON returned PERS_PR0_LE total: I tree ; traverse for: L6-TEMP L0C returned WH_ADVERB_W0RD_LE total: 2 trees traverse for: LI5-TEMP_LOC returned NP_ADV_WORD LE total: 5 trees ; traverse for: LGVI returned WILL_AUX_P0S_LE ties of the example sentence.</Paragraph> <Paragraph position="13"> Figure 8 shows the final phrase structure for the example. The inflection function selects the base form of &quot;suit&quot; according to the BSE value of the VFORM feature and correctly uses &quot;will.&quot; Information about the sentence mode WH-QUESTION can be used to annotate the resulting string for the speech-synthesis module.</Paragraph> </Section> class="xml-element"></Paper>