File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1041_metho.xml
Size: 17,652 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1041"> <Title>Language Generation for Multimedia Healthcare Briefings</Title> <Section position="3" start_page="0" end_page="278" type="metho"> <SectionTitle> 2 System Overview </SectionTitle> <Paragraph position="0"> MAGIC's architecture is shown in Figure 1.</Paragraph> <Paragraph position="1"> MAGIC exploits the extensive online data avail- null ter (CPMC) as its source of content for its briefing. Operative events during surgery are monitored through the LifeLog database system (Modular Instruments Inc.), which polls medical devices (ventilators, pressure monitors and alike) every minute from the start of the case to the end recording information such as vital signs. In addition, physicians (anesthesiologist and anesthesia residents) enter data throughout the course of the patient's surgery, including start of cardiopulmonary bypass and end of bypass as well as subjective clinical factors such as heart sounds and breath sounds that cannot be retrieved by medical devices. In addition, CPMC main databases provide information from the online patient record (e.g., medical history).</Paragraph> <Paragraph position="2"> From this large body of information, the data filter selects information that is relevant to the bypass surgery and patient care in the ICU.</Paragraph> <Paragraph position="3"> MAGIC's content planner then uses a multimedia plan to select and partially order information for the presentation, taking into account the caregiver the briefing is intended for (nurse or physician).</Paragraph> <Paragraph position="4"> The media allocator allocates content to media, and finally, the media specific generators realize content in their own specific media (see (Zhou and Feiner, 1997) for details on the graphics generator). A media coordinator is responsible for ensuring that spoken output and animated graphics are temporally coordinated.</Paragraph> <Paragraph position="5"> Within this context, the speech generator receives as input a partially ordered conceptual representation of information to be communicated*</Paragraph> <Paragraph position="7"> The generator includes a micro-planner, which is responsible for ordering and grouping information into sentences. Our approach to micro-planning integrates a variety of different types of operators for aggregation information within a single sentence. Aggregation using semantic operators is enabled through access to the underlying domain hierarchy, while aggregation using linguistic operators (e.g., hypotactic operators, which add information using modifiers such as adjectives, and paratactic operators which create, for example, conjunctions) is enabled through lookahead to the lexicon used during realization.</Paragraph> <Paragraph position="8"> The speech generator also includes a realization component, implemented using the FUF/SURGE sentence generator (Elhadad, 1992; Robin, 1994), which produces the actual language to be spoken as well as textual descriptions that are used as labels in the visual presentation* It performs lexical choice and syntactic realization* Our version of the FUF/SURGE sentence generator produces sentences annotated with prosodic information and pause durations. This output is sent to a speech synthesizer in order to produce final speech. (Currently, we are using AT&T Bell Laboratories' Text To Speech System).</Paragraph> <Paragraph position="9"> Our use of speech as an output medium provides an eyes-free environment that allows caregivers the opportunity to turn away from the display and continue carrying out tasks involving patient care. Speech can also clarify graphical conventions without requiring the user to look away from the graphics to read an associated text. Currently, communication between OR caregivers and ICU caregivers is carried out orally in the ICU when the patient is brought in. Thus, the use of speech within MAGIC models current practice.</Paragraph> <Paragraph position="10"> Future planned evaluations will examine caregiver satisfaction with the spoken medium versus text.</Paragraph> </Section> <Section position="4" start_page="278" end_page="278" type="metho"> <SectionTitle> 3 Issues for Language Generation </SectionTitle> <Paragraph position="0"> In the early stages of system development, a primary constraint on the language generation process was identified during an informal evaluation with ICU nurses and residents (Dalai et al., 1996a). Due to time constraints in carrying out tasks, nurses, in particular, noted that speech takes time and therefore, spoken language output should be brief and to the point, while text, which is used to annotate the graphical illustration, may provide unambiguous references to the equipment and drugs being used. In the following sections, we show how we meet this constraint both in the speech content planner, which organizes the content as sentences, and in the speech sentence generator, which produces actual language.</Paragraph> <Paragraph position="1"> In all of the language generation components, the fact that spoken language is the output medium and not written language, influences how generation is carried out. We note this influence on the generation process throughout the section.</Paragraph> <Paragraph position="2"> An example showing the spoken output for a given patient and a screen shot at a single point in the briefing is shown in Figure 3.</Paragraph> <Paragraph position="3"> In actual output, sentences are coordinated with the corresponding part of the graphical illustration using highlighting and other graphical actions. In the paper, we show the kinds of modifications that it was necessary to make to the language generator in order to allow the media coordinator to synchronize speech with changing graphics.</Paragraph> <Section position="1" start_page="278" end_page="278" type="sub_section"> <SectionTitle> 3.1 Speech Micro-Planner </SectionTitle> <Paragraph position="0"> The speech micro-planner is given as input a set of information that must be conveyed. In order to ensure that speech is brief and yet still conveys the necessary information, the speech micro-planner attempts to fit more information into individual sentences, thereby using fewer words.</Paragraph> <Paragraph position="1"> Out of the set of propositions given as input, the micro-planner selects one proposition to start with. It attempts to include as many other propositions as it can as adjectives or other modifiers of information already included. To do this, from the remaining propositions, it selects a proposition which is related to one of the propositions already selected via its arguments. It then checks whether it can be lexicalized as a modifier by looking ahead Voice: Ms. Jones is an 80 year old, hypertensive, diabetic, female patient of Dr. Smith undergoing CABG. Presently, she is 30 minutes post-bypass and will arrive in the unit shortly. The existing infusion lines are two IVs, an arterial line, and a Swan-Ganz with Cordis. The patient has received massive vasotonic therapy, massive cardiotonic therapy, and massivevolume blood-replacement therapy. Drips in protocol concentrations are nitroglycerin, levophed, dobu-</Paragraph> </Section> </Section> <Section position="5" start_page="278" end_page="280" type="metho"> <SectionTitle> MAGIC </SectionTitle> <Paragraph position="0"> to the lexicon used by the lexical chooser to determine if such a choice exists. The syntactic constraint is recorded in the intermediate form, but the lexical chooser may later decide to realize the proposition by any word of the same syntactic category or transform a modifier and a noun into a semantic equivalent noun or noun phrase.</Paragraph> <Paragraph position="1"> The micro-planner uses information from the lexicon to determine how to combine the propositions together while satisfying grammatical and lexical constraints. Semantic aggregation is the first category of operators applied to the set of related propositions in order to produce concise expressions, as shown in lower portion of Fig. 1. Using ontological and lexical information, it can reduce the number of propositions by replacing them with fewer propositions with equivalent meanings.</Paragraph> <Paragraph position="2"> While carrying out hypotactic aggregation operators, a current central proposition is selected and the system searches through the un-aggregated propositions to find those that can be realized as adjectives, prepositional phrases and relative clauses, and merges them in. After hypotactic aggregation, the un-aggregated propositions are then combined using paratactic operators, such as appositions or coordinations.</Paragraph> <Paragraph position="3"> X is a patient.</Paragraph> <Paragraph position="4"> X has property last name = Jones.</Paragraph> <Paragraph position="5"> X has property age = 80 years old.</Paragraph> <Paragraph position="6"> X has property history = hypertension property.</Paragraph> <Paragraph position="7"> X has property history = diabetes property.</Paragraph> <Paragraph position="8"> X has property gender - female.</Paragraph> <Paragraph position="9"> X has property surgery = CABG.</Paragraph> <Paragraph position="10"> X has property doctor = Y.</Paragraph> <Paragraph position="11"> Y has property last name = Smith.</Paragraph> <Paragraph position="12"> In the first sentence of the example output, the micro-planner has combined the 9 input propositions shown above in Figure 3 into a single sentence: Ms Jones is an 80 year old hypertensive, diabetic female patient of Dr. Smith undergoing CABG. In this example this is possible, in part because the patient's medical history (diabetes and hypertension) can be realized as adjectives. In another example, &quot;Mr. Smith is a 60 year old male patient of Dr. Jordan undergoing CABG.</Paragraph> <Paragraph position="13"> He has a medical history of transient ischemic attacks, pulmonary hypertension, and peptic ulcers.&quot;, the medical history can only be realized as noun phrases, thus requiring a second sentence and necessarily, more words.</Paragraph> <Section position="1" start_page="279" end_page="279" type="sub_section"> <SectionTitle> 3.2 Speech Sentence Generator </SectionTitle> <Paragraph position="0"> The speech sentence generator also contributes to the goal of keeping spoken output brief, but informative. In particular, through its lexical choice component, it selects references to medical concepts that are shorter and more colloquial than the text counterpart. As long as the text label on the screen is generated using the full, unambiguous reference, speech can use an abbreviated expression. For example, when referring to the devices which have been implanted, speech can use the term &quot;pacemaker&quot; so long as the textual label specifies it as &quot;ventricular pacemaker&quot;. Similarly, MAGIC uses &quot;balloon pump&quot; in speech instead of &quot;intra-aortic balloon pump&quot;, which is already shown on the screen.</Paragraph> <Paragraph position="1"> In order to do this, lexical choice in both media must be coordinated. Lexical choice for text always selects the full reference, but lexical choice for speech must check what expression the text generator is using. Basically, the speech texical chooser must check what attributes the text generator includes in its reference and omit those.</Paragraph> <Paragraph position="2"> Finally, we suspect that the syntactic structure of sentences generated for spoken output should be simpler than that generated for written language.</Paragraph> <Paragraph position="3"> This hypothesis is in conflict with our criteria for generating as few sentences as possible, which often results in more complex sentences. This is in part acceptable due to the fact that MAGIC's output is closer to formal speech, such as one might find in a radio show, as opposed to informal conversation. It is, after all, a planned one-way presentation. In order to make the generated sentences more comprehensible, however, we have modified the lexical chooser and syntactic generator to produce pauses at complex constitutions to increase intelligibility of the output. Currently, we are using a pause prediction algorithm which utilizes the sentence's semantic structure, syntactic structure as well as the linear phrase length constraint to predict the pause position and relative strength. Our current work involves modifying the FUF/SURGE language generation package so that it can produce prosodic and pause information needed as input to a speech synthesizer, to produce a generic spoken language sentence generator. null</Paragraph> </Section> <Section position="2" start_page="279" end_page="280" type="sub_section"> <SectionTitle> 3.3 Producing Information for Media Coordination </SectionTitle> <Paragraph position="0"> Language generation in MAGIC is also affected by the fact that language is used in the context of other media as well. While there are specific modules in MAGIC whose task is concerned with utilizing multiple media, media coordination affects the language generation process also. In particular, in order to produce a coordinated presentation, MAGIC must temporally coordinate spoken language with animated graphics, both temporal media. This means that spoken references must be coordinated with graphical references to the same information. Graphical references may include highlighting of the portion of the illustration which refers to the same information as speech or appearance of new information on the screen.</Paragraph> <Paragraph position="1"> Temporal coordination involves two problems: ensuring that ordering of spoken references to information is compatible with spatial ordering of the graphical actions and synchronizing the duration of spoken and graphical references (Dalai et al., 1996b).</Paragraph> <Paragraph position="2"> In order to achieve this, language generation must provide a partial ordering of spoken references at a fairly early point in the generation process. This ordering indicates its preference for how spoken references are to be ordered in the output linear speech in accordance with both graphical and presentation constraints. For example, in the first sentence of the example shown in Figure 3, the speech components have a preference for medical history (i.e., &quot;hypertensive, diabetic&quot;) to be presented before information about the surgeon, as this allows for more concise output. It would be possible for medical history to be presented after all other information in the sentence by generating a separate sentence (e.g., &quot;She has a history of hypertension and diabetes.&quot;) but this is less preferable from the language point of view. In our work, we have modified the structure of the lexical chooser so that it can record its decisions about ordering, using partial ordering for any grammatical variation that may happen later when the final syntactic structure of the sentence is generated.</Paragraph> <Paragraph position="3"> These are then sent to the media coordinator for negotiating with graphics an ordering that is compatible to both. Details on the implementation of this negotiation are presented in (Dalal et al., 1996b) and (Pan and McKeown, 1996).</Paragraph> <Paragraph position="4"> In order to synchronize duration of the spoken and graphical references, the lexical chooser invokes the speech synthesizer to calculate the duration of each lexical phrase that it generates. By maintaining a correspondence between the referential string generated and the concepts that those referential actions refer to, negotiation with graphics has a common basis for communication. In order to provide for more flexible synchronization, the speech sentence generator includes facilities for modifying pauses if conflicts with graphics durations arise (see (Pan and McKeown, 1996) for details). null</Paragraph> </Section> </Section> <Section position="6" start_page="280" end_page="280" type="metho"> <SectionTitle> 4 Related Work </SectionTitle> <Paragraph position="0"> There is considerable interest in producing fluent and concise sentences. EPICURE (Dale, 1992), PLANDOC(Kukich et al., 1994; Shaw, 1995), and systems developed by Dalianis and Hovy (Dalianis and Hovy, 1993) all use various forms of conjunction and ellipsis to generate more concise sentences. In (Horacek, 1992) aggregation is performed at text-structure level. In addition to conjoining VP and NPs, FLowDoc(Passonneau et al., 1996) uses ontological generalization to combine descriptions of a set of objects into a more general description. Based on a corpus analysis in the basketball domain, (Robin, 1994) catalogued a set of revision operators such as adjoin and nominalization in his system STREAK. Unlike STREAK, MAGIC does not use revision to combine information in a sentence.</Paragraph> <Paragraph position="1"> Generating spoken language from meanings or concepts (Meaning to Speech, MTS) is a new topic and only a few such systems were developed in recent years. In (Prevost, 1995) and (Steedman, 1996), they explore a way to generate spoken language with accurate contrastive stress based on information structure and carefully modeled domain knowledge. In (Davis and Hirschberg, 1988), spoken directions are generated with richer intonation features. Both of these systems took advantage of the richer and more precise semantic information that is available during the process of Meaning to Speech production.</Paragraph> </Section> class="xml-element"></Paper>