File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-2011_intro.xml
Size: 15,615 bytes
Last Modified: 2025-10-06 14:03:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2011"> <Title>Towards an Optimal Lexicalization in a Natural-Sounding Portable Natural Language Generator for Dialog Systems</Title> <Section position="3" start_page="61" end_page="64" type="intro"> <SectionTitle> 2 System Architecture </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="61" end_page="61" type="sub_section"> <SectionTitle> 2.1 Three-Stage Pipeline Architecture </SectionTitle> <Paragraph position="0"> Our natural language generator architecture follows the three-stage pipeline architecture, as described in Reiter & Dale (2000). In this architecture, the generation component of a text generation system consists of the following subcomponents: * The document planner determines what the actual content of the output will be on an abstract level and decides how pieces of content should be grouped together.</Paragraph> <Paragraph position="1"> * The microplanner includes lexicalization, aggregation, and referring expression generation tasks.</Paragraph> <Paragraph position="2"> * The surface realizer takes the information constructed by the microplanner and generates a syntactically correct sentence in a natural language.</Paragraph> </Section> <Section position="2" start_page="61" end_page="61" type="sub_section"> <SectionTitle> 2.2 Lexical Resources </SectionTitle> <Paragraph position="0"> The use of FrameNet and WordNet in our system is critical to its success. The FrameNet database (Baker et al., 1998) is a machine-readable lexicographic database which can be found at http://framenet.icsi.berkeley.edu/. It is based on the principles of Frame Semantics (Fillmore, 1985).</Paragraph> <Paragraph position="1"> The following quote explains the idea behind Frame Semantics: &quot;The central idea of Frame Semantics is that word meanings must be described in relation to semantic frames - schematic representations of the conceptual structures and patterns of beliefs, practices, institutions, images, etc. that provide a foundation for meaningful interaction in a given speech community.&quot; (Fillmore et al., 2003, p. 235). In FrameNet, lexical units are grouped in frames; frame hierarchy information is provided for each frame, in combination with a list of semantically annotated corpus sentences and syntactic valence patterns.</Paragraph> <Paragraph position="2"> WordNet is a lexical database that uses conceptual-semantic and lexical relations in order to group lexical items and link them to other groups (Fellbaum, 1998).</Paragraph> </Section> <Section position="3" start_page="61" end_page="62" type="sub_section"> <SectionTitle> 2.3 System Overview </SectionTitle> <Paragraph position="0"> Our system, called LEGEND (LExicalization in natural language GENeration for Dialog systems) adapts the pipeline architecture presented in section 2.1 by replacing the document planner with the dialog manager. This makes it more suitable for use in dialog systems, since the dialog manager decides on the actual content of the output in dialog systems. Figure 1 below shows an overview As figure 1 shows, the dialog manager provides the generator with a dialog manager meaning representation (DM MR), which contains the content information for the answer.</Paragraph> <Paragraph position="1"> Our research focuses on the lexicalization sub-component of the microplanner (number 1 in figure 1). Lexicalization is further divided into two processes: lexical choice and lexical search. Based on the DM MR, the lexical choice process (number 2 in figure 1) constructs a set of all potential output candidates. Section 2.5 describes the lexical choice process in detail. Lexical search (number 3 in figure 1) consists of the decision algorithm that de- null cides which one of the set of possible candidates is most appropriate in any situation. Lexical search is also responsible for packaging up the most appropriate candidate information in an adapted Fstructure, which is subsequently processed through aggregation and referring expression generation, and finally sent to the surface realizer. Section 2.6 describes the details of the lexical search process.</Paragraph> </Section> <Section position="4" start_page="62" end_page="62" type="sub_section"> <SectionTitle> 2.4 Implementation Details </SectionTitle> <Paragraph position="0"> Given time and resource constraints, our implementation will consist of a prototype (written in Python) of the lexical choice and lexical search processes only of the microplanner. We take a DM MR as our input. Aggregation and referring expression generation requirements are hard-coded for each example; algorithm development, identification and implementation for these modules is beyond the scope of this research.</Paragraph> <Paragraph position="1"> Our system uses the LFG-based XLE system's generator component as a surface realizer. For more information, refer to Shemtov (1997) and Kaplan & Wedekind (2000).</Paragraph> </Section> <Section position="5" start_page="62" end_page="63" type="sub_section"> <SectionTitle> 2.5 Lexical Choice </SectionTitle> <Paragraph position="0"> The task of the lexical choice process is to take the meaning representation presented by the dialog manager (refer to figure 1), and to construct a set of output candidates. We will illustrate this by taking a simple example through the entire dialog system. The example question and answer are deliberately kept simple in order to focus on the workings of the system, rather than the specifics of the example.</Paragraph> <Paragraph position="1"> Assume this is a dialog system that helps the consumer in buying camping equipment. The user says to the dialog system: &quot;Where can I buy a tent?&quot; The speech recognizer recognizes the utterance, and feeds this information to the parser. The semantic parser parses the input and builds the meaning representation shown in figure 2. The main event (main verb) is identified as the lexical item buy. The parser looks up this lexical item in FrameNet, and identifies it as belonging to the commerce_buy frame. This frame is defined in FrameNet as: &quot;... describing a basic commercial transaction involving a buyer and a seller exchang- null This meaning representation is then sent to the dialog manager. The dialog manager consults the domain model for help in the query resolution, and subsequently composes a meaning representation consisting of the answer to the user's question (figure 3). For our example, the domain model presents the query resolution as &quot;Camping World&quot;, the name of a (fictitious) store selling tents. The DM MR also shows that the Agent and the Patient have been identified by their frame element names.</Paragraph> <Paragraph position="2"> This DM MR serves as the input to the microplanner, where the first task is that of lexical In order to construct the set of output candidates, the lexical choice process mines the FrameNet and WordNet databases in order to find acceptable generation possibilities. This is done in several steps: * In step 1, lexicalization variations of the main Event within the same frame are identified. null * Step 2 consists of the investigation of lexical variation in the frames that are one link away in the hierarchy, namely the frame the current frame inherits from, and the subframes, if any exist.</Paragraph> <Paragraph position="3"> * Step 3 is concerned with special relations within FrameNet, such as the 'use'-relation The lexical variation within these frames is investigated.</Paragraph> <Paragraph position="4"> We return to our example in figure 3 to clarify these 3 steps.</Paragraph> <Paragraph position="5"> In step 1, appropriate lexical variation within the same frame is identified. This is done by listing all lexical units of same syntactic category as the original word. The following verbs are lexical units in commerce_buy: buy, lease, purchase, rent.</Paragraph> <Paragraph position="6"> These verbs are not necessarily synonyms or near-synonyms of each other, but do belong to the same frame. In order to determine which of these lexical items are synonyms or near-synonyms, we turn to WordNet, and look at the entry for buy. The only lexical item that is also listed in one of the senses of buy is purchase. We thus conclude that buy and purchase are both good verb candidates.</Paragraph> <Paragraph position="7"> Step 2 investigates the lexical items in the frames that are one link away from the commerce_buy frame. Commerce_buy inherits from getting, and has no subframes. The lexical items of the getting frame are listed. The lexical items of the getting frame are: acquire, gain, get, obtain, secure. For each entry, WordNet is consulted as a first pruning mechanism. This results in the following: * Acquire: get * Gain: acquire, win * Get: acquire * Obtain: get, find, receive, incur * Secure: no items on the list How exactly lexical choice determines that get and acquire are possible candidates, while the others are not (because they aren't suitable in the context in which we use them) is as of yet an open issue. It is also an open issue whether WordNet is the most appropriate resource to use for this goal; we must consider other options, such as Thesaurus, etc...</Paragraph> <Paragraph position="8"> In step 3 we investigate the other relations that FrameNet presents. To date, we have only investigated the 'use relation'. Other relations available are the inchoative and causative relations. At this point, it is not entirely clear how those relations will prove to be of any value to our task. The commerce_buy frame uses commerce_goods_transfer, which is also used by commerce_sell. We find our frame elements goods and buyer in the commerce_sell frame as well.</Paragraph> <Paragraph position="9"> Lexical choice concludes that the use of the lexical items in this frame might be valuable and repeats step 1 on these lexical items.</Paragraph> <Paragraph position="10"> After all 3 steps are completed, we assume our set of output candidates to be complete. The set of output candidates is presented to the lexical search process, whose task it is to choose the most appropriate candidate. For the example we have been using throughout this section, the set of output candidates is as follows: As mentioned at the beginning of this section, this example is very simple. For this reason, one can definitely argue that the first 4 output possibilities could be constructed in much simpler ways than the method used here, e.g. by simply taking the question and making it an affirmative sentence through a simple rule. However, it should be pointed out that the last possibility on the list would not be covered by this simple method.</Paragraph> <Paragraph position="11"> While user studies would need to provide backup for this assumption, we feel that possibility 5 is a very good example of natural-sounding output, and thus proves our method to be valuable, even for simple examples.</Paragraph> </Section> <Section position="6" start_page="63" end_page="64" type="sub_section"> <SectionTitle> 2.6 Lexical Search </SectionTitle> <Paragraph position="0"> The set of output candidates for the example above contains 5 possibilities. The main task of the lexical search process is to choose the most optimal candidate, thus the most natural-sounding candidate (or at least one of the most natural-sounding candidates, if more than one candidate fits that criterion). There are a number of directions we can take for this implementation.</Paragraph> <Paragraph position="1"> One option is to implement a rule-based system.</Paragraph> <Paragraph position="2"> Every output candidate is matched against the rules, and the most appropriate one comes out at the top. Problems with rule-based systems are well-known: they must be handcrafted, which is very time-consuming, constructing the rule base such that the desired rules fire in the desired circumstances is somewhat of a &quot;black&quot; art, and of course a rule base is highly domain-dependent.</Paragraph> <Paragraph position="3"> Extending and maintaining it is also a laborious effort.</Paragraph> <Paragraph position="4"> Next we can look at a corpus-based technique.</Paragraph> <Paragraph position="5"> One suggestion is to construct a language model of the corpus data, and use this model to statistically determine the most suitable candidate. Langkilde (2000) uses this approach. However, the main problem here is that one needs a large corpus in the domain of the application. Rambow (2001) agrees that most often, no suitable corpora are available for dialog system development.</Paragraph> <Paragraph position="6"> Another possibility is to use machine learning to train the microplanner. Walker et al. (2002) use this approach in the SPOT sentence planner. Their ranker's main purpose is to choose between different aggregation possibilities. The authors suggest that many generation problems can successfully be treated as ranking problems. The advantage of this approach is that no domain-dependent hand-crafted rules need to be constructed, and no existence of a corpus is needed.</Paragraph> <Paragraph position="7"> Our current research idea is somewhat related to option two. A relatively small domain-independent corpus of spoken dialogue is semi-automatically labeled with frames and semantic roles. For each frame, all the occurrences in the corpus are ordered according to their frequency for each separate valence pattern. This model is then used as a comparator for all output candidates, and the most optimal one (most frequent one) will be selected.</Paragraph> <Paragraph position="8"> This approach is currently not implemented; further work needs to determine the viability of the approach.</Paragraph> <Paragraph position="9"> Independent of the method used to find the most suitable candidate, the output must be packaged up to be sent to the surface realizer. The XLE system expects a fairly detailed syntactic description of the utterance's argument structure. We construct this through the use of FrameNet and its valence pattern information. In returning to our example, let's assume the selected candidate is &quot;Camping World sells tents.&quot; Its meaning representation is as follows: null Figure 4. &quot;Camping World sells tents.&quot; FrameNet provides an overview of the frame elements a given frame requires (&quot;core elements&quot;) and those that are optional (&quot;peripheral elements&quot;). For the commerce_sell frame, the two core elements are Goods and Seller. It also provides an overview of the valence patterns that were found in the annotated sentences for this frame. FrameNet does not include frequency information for each annotation. We thus need to pick a valence pattern at random. One way of doing this is to find a pattern that includes all (both) frame elements in our utterance, and then use the (non-statistical) frequency information. Figure 5 shows that, for our example above, this results in: FE_Seller sell FE_goods With the following syntactic pattern: Thus our output to the surface realizer indicates that the seller frame element fills the subject role and consists of an NP, while the goods frame element fills the object role and consists of an NP. Given this syntactic pattern information that we gather from FrameNet, we are able to construct an F-structure that is suitable as the input to the surface realizer.</Paragraph> </Section> </Section> class="xml-element"></Paper>