File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1064_intro.xml

Size: 5,764 bytes

Last Modified: 2025-10-06 14:01:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1064">
  <Title>Text Generation from Keywords Kiyotaka Uchimoto + Satoshi Sekine ++</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Text generation is an important technique used for applications like machine translation, summarization, and human/computer dialogue. In recent years, many corpora have become available, and have been used to generate natural surface sentences. For example, corpora have been used to generate sentences for language model estimation in statistical machine translation. In such translation, given a source language text, S, the translated text, T,inthe target language that maximizes the probability P(T|S) is selected as the most appropriate translation, T best , which is represented as (Brown et al., 1990)</Paragraph>
    <Paragraph position="2"> In this equation, P(S|T) represents the model used to replace words or phrases in a source language with those in the target language. It is called a translation model. P(T)representsa language model that is used to reorder translated words or phrases into a natural order in the target language. The input of the language model is a &amp;quot;bag of words,&amp;quot; and the goal of the model is basically to reorder the words. At this point, there is an assumption that natural sentences can be generated by merely reordering the words given by a translation model. To give such a complete set of words, however, a translation model needs a large number of bilingual corpora. If we could automatically complement the words needed to generate natural sentences, we would not have to collect the large number of bilingual corpora required by a translation model. In this paper, we assume that the role of the translation model is not to give a complete set of words that can be used to generate natural sentences, but to give a set of headwords or center words that a speaker might want to express, and describe a model that can provide the complementary information needed to generate natural sentences by using a target language corpus when given a set of headwords.</Paragraph>
    <Paragraph position="3"> If we denote a set of headwords in a target language as K, we can express Eq. (1) as</Paragraph>
    <Paragraph position="5"> that gives a set of headwords in the target language when given a source-language text sentence. P(T|K) represents a model that generates text sentence T when given a set of headwords, K. We call the model represented by P(T|K)atext-generation model.Inthispaper, we describe a text-generation model and a generation system that uses the model. Given a set of headwords or keywords, our system outputs the text sentence that maximizes P(T|K)asan appropriate text sentence, T best</Paragraph>
    <Paragraph position="7"> In this equation, we call the model represented by P(K|T)akeyword-production model.This equation is equal to Eq. (1) when a source-text sentence is replaced with a set of keywords. Therefore, this model can be regarded as a model that translates keywords into text sentences. The model represented by P(T)in Eq. (3) is a language model used in statistical machine translation. The n-gram model is the most popular one used as a language model.</Paragraph>
    <Paragraph position="8"> We assume that there is one extremely probable ordered set of morphemes and dependencies between words that produce keywords, and we express P(K|T)as</Paragraph>
    <Paragraph position="10"> In this equation, M denotes an ordered set of morphemes and D denotes an ordered set of dependencies in a sentence. P(K|M,D,T)represents a keyword-production model. To estimate the models represented by P(D|M,T) and P(M|T), we use a dependency model and a morpheme model, respectively, for the dependency analysis and morphological analysis.</Paragraph>
    <Paragraph position="11"> Statistical machine translation and example-based machine translation require numerous high-quality bilingual corpora. Interlingual machine translation and transfer-based machine translation require a parser with high precision.</Paragraph>
    <Paragraph position="12"> Therefore, these approaches to translation are not practical if we do not have enough bilingual corpora or a good parser. This is especially so if the source text-sentences are incomplete or have errors like those often found in OCR and speech-recognition output. In these cases, however, if we translate headwords into words in the target language and generate sentences from the translated words by using our method, we should be able to generate natural sentences from which we can grasp the meaning of the source-text sentences. null The text-generation model represented by P(T|K) in Eq. (2) can be applied to various tasks besides machine translation.</Paragraph>
    <Paragraph position="13"> * Sentence-generation support system for people with aphasia: About 300,000 people are reported to suffer from aphasia in Japan, and 40% of them can select only a few words to describe a picture. If candidate sentences can be generated from these few words, it would help these people communicate with their families and friends.</Paragraph>
    <Paragraph position="14"> * Support system for second language writing: Beginners writing in second language usually fined it easy to produce center words or headwords, but often have difficulty generating complete sentences. If several possible sentences could be generated from those words, it would help beginners communicate with foreigners or study second-language writing.</Paragraph>
    <Paragraph position="15"> These are just two examples. We believe that there are many other possible applications.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML