File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1064_metho.xml

Size: 10,078 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1064">
  <Title>Text Generation from Keywords Kiyotaka Uchimoto + Satoshi Sekine ++</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Overview of the Text-Generation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
System
</SectionTitle>
      <Paragraph position="0"> In this section, we give an overview of our system for generating text sentences from given keywords. As shown in Fig. 1, this system consists of three parts: generation-rule acquisition,  tem.</Paragraph>
      <Paragraph position="1"> Given keywords, text sentences are generated as follows.</Paragraph>
      <Paragraph position="2">  1. During generation-rule acquisition, generation rules for each keyword are automatically acquired.</Paragraph>
      <Paragraph position="3"> 2. Candidate-text sentences are constructed during candidate-text construction by applying the rules acquired in the first step. Each candidate-text sentence is represented by a graph or dependency tree. 3. Candidate-text sentences are ranked according to their scores assigned during evaluation. The scores are calculated as a probability estimated by using a keyword-production model and a language model that are trained with a corpus.</Paragraph>
      <Paragraph position="4"> 4. The candidate-text sentence that maxi null mizes the score or the candidate-text sentences whose scores are over a threshold are selected as output. The system can also output candidate-text sentences that are ranked within the top N sentences.</Paragraph>
      <Paragraph position="5"> In this paper, we assume that the target language is Japanese. We define a keyword as the headword of a bunsetsu.Abunsetsu is a phrasal unit that usually consists of several content and function words. We define the headword of a bunsetsu as the rightmost content word in the bunsetsu, and we define a content word as a word whose part-of-speech is a verb, adjective, noun, demonstrative, adverb, conjunction, attribute, interjection, or undefined word. We define the other words as function words. We define formal nouns and auxiliary verbs &amp;quot;SURU (do)&amp;quot; and &amp;quot;NARU (become)&amp;quot; as function words, except when there are no other content words in the same bunsetsu. Part-of-speech categories follow those in the Kyoto University text corpus (Version 3.0) (Kurohashi and Nagao, 1997), a tagged corpus of the Mainichi newspaper.</Paragraph>
      <Paragraph position="6">  words.</Paragraph>
      <Paragraph position="7"> For example, given the set of keywords &amp;quot;kanojo (she),&amp;quot; &amp;quot;ie (house),&amp;quot; and &amp;quot;iku (go),&amp;quot; as shown in Fig. 2, our system retrieves sentences including each word, and extracts each bunsetsu that includes each word as a headword of the bunsetsu. If there is no tagged corpus such as the Kyoto University text corpus, each bunsetsu can be extracted by using a morphologicalanalysis system and a dependency-analysis system such as JUMAN (Kurohashi and Nagao, 1999) and KNP (Kurohashi, 1998). Our system then acquires generation rules as follows.</Paragraph>
      <Paragraph position="8">  The system next generates candidate bunsetsus for each keyword and candidate-text sentences in the form of dependency trees, such as &amp;quot;Candidate 1&amp;quot; and &amp;quot;Candidate 2&amp;quot; in Fig. 2, with the assumption that there are dependencies between keywords. Finally, the candidate-text sentences are ranked by their scores, calculated by a text-generation model, and transformed into surface sentences.</Paragraph>
      <Paragraph position="9"> In this paper, we focus on the keyword-production model represented by Eq. (4) and assume that our system outputs sentences in the form of dependency trees.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="3" type="metho">
    <SectionTitle>
3 Candidate-Text Construction
</SectionTitle>
    <Paragraph position="0"> We automatically acquire generation rules from a monolingual target corpus at the time of generating candidate-text sentences. Generation rules are restricted to those that generate bunsetsus, and the generated bunsetsus must include each input keyword as a headword in the bunsetsu. We then generate candidate-text sentences in the form of dependency trees by simply combining the bunsetsus generated by the rules.</Paragraph>
    <Paragraph position="1"> The simple combination of generated bunsetsus may produce semantically or grammatically inappropriate candidate-text sentences, but our goal in this work was to generate a variety of text sentences rather than a few fixed expressions with high precision</Paragraph>
    <Paragraph position="3"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Generation-Rule Acquisition
LetusdenoteasetofkeywordsasKS and a
</SectionTitle>
      <Paragraph position="0"> set of rules, each of which generates a bunsetsu when given keyword k([?]KS), as R</Paragraph>
      <Paragraph position="2"> represents the head morpheme whose word is equal to keyword k; m [?]  represents zero, one, or a series of morphemes that are connected to h k in the same bunsetsu. Here, we define a morpheme as consisting of a word and its morphological information or grammatical attribute, such as part-of-speech, and we define a head morpheme as consisting of a head-word and its grammatical attribute. By applying these rules, we generate bunsetsusfrominput keywords.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="3" type="sub_section">
      <SectionTitle>
3.2 Construction of Dependency Trees
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> setsus are generated by applying the generation rules described in Section 3.1. Next, by assuming dependency relationships between the bunsetsus, candidate dependency trees are constructed. Dependencies between the bunsetsus are restricted in that they must have the following characteristics of Japanese dependencies:  Note that 83.33% (3,973/4,768) of the headwords in the newspaper articles appearing on January 17, 1995 were found in those appearing from January 1st to 16th. However, only 21.82% (2,295/10,517) of the headword dependencies in the newspaper articles appearing on January 17th were found in those appearing from January 1st to 16th.</Paragraph>
      <Paragraph position="3">  (i) Dependencies are directed from left to right.</Paragraph>
      <Paragraph position="4"> (ii) Dependencies do not cross.</Paragraph>
      <Paragraph position="5"> (iii) All bunsetsus except the rightmost one de- null pend on only one other bunsetsu.</Paragraph>
      <Paragraph position="6"> For example, when three keywords are given and candidate bunsetsus including each keyword are generated as b  ) if we do not reorder keywords, but 16 trees result if we consider the order of keywords to be arbitrary.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
4 Text-Generation Model
</SectionTitle>
    <Paragraph position="0"> We next describe the model represented by Eq.</Paragraph>
    <Paragraph position="1"> (4); that is, a keyword-production model, a morpheme model that estimates how likely a string is to be a morpheme, and a dependency model. The goal of this model is to select optimal sets of morphemes and dependencies that can generate natural sentences. We implemented these models within an maximum entropy framework (Berger et al., 1996; Ristad, 1997; Ristad, 1998).</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.1 Keyword-Production Models
</SectionTitle>
      <Paragraph position="0"> This section describes five keyword-production models which are represented by P(K|M,D,T) in Eq. (4). In these models, we define the set of headwords whose frequency in the corpus is over a certain threshold as a set of keywords, KS, and we restrict the bunsetsus to those generated by the generation rules represented in form (5).</Paragraph>
      <Paragraph position="1"> We assume that all keywords are independent and that k</Paragraph>
      <Paragraph position="3"> depends only on the two anterior words w</Paragraph>
      <Paragraph position="5"> 2. posterior trigram model We assume that k</Paragraph>
      <Paragraph position="7"> depends only on the two posterior words w</Paragraph>
      <Paragraph position="9"/>
      <Paragraph position="11"/>
      <Paragraph position="13"> 5. dependency trigram model We assume that k</Paragraph>
      <Paragraph position="15"> depends only on the two rightmost words w</Paragraph>
      <Paragraph position="17"> in the right-most bunsetsu that modifies the bunsetsu, and on the two rightmost words w</Paragraph>
      <Paragraph position="19"> in the leftmost bunsetsu that modifies the bunsetsu including k</Paragraph>
      <Paragraph position="21"/>
    </Section>
    <Section position="2" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.2 Morpheme Model
</SectionTitle>
      <Paragraph position="0"> Let us assume that there are l grammatical attributes assigned to morphemes. We call a model that estimates the likelihood that a given string is a morpheme and has the grammatical attribute j(1 [?] j [?] l)amorpheme model.</Paragraph>
      <Paragraph position="1"> Let us also assume that morphemes in the ordered set of morphemes M depend on the preceding morphemes. We can then represent the probability of M,giventextT;namely,P(M|T) in Eq. (4):</Paragraph>
      <Paragraph position="3"> can be one of the grammatical attributes assigned to each morpheme.</Paragraph>
    </Section>
    <Section position="3" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.3 Dependency Model
</SectionTitle>
      <Paragraph position="0"> Let us assume that dependencies d</Paragraph>
      <Paragraph position="2"> in the ordered set of dependencies D are independent. We can then represent P(D|M,T)in Eq. (4) as</Paragraph>
      <Paragraph position="4"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML