File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-5012_metho.xml
Size: 15,748 bytes
Last Modified: 2025-10-06 14:09:42
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-5012"> <Title>a1 Information and Communication Technologies</Title> <Section position="3" start_page="88" end_page="89" type="metho"> <SectionTitle> 2 An Overview of our Approach to </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="88" end_page="88" type="sub_section"> <SectionTitle> Statistical Sentence Generation </SectionTitle> <Paragraph position="0"> One could characterise the search space as being a series of nested sets. The outer most set would contain all possible word sequences. Within this, asmaller set ofstrings exhibiting somesemblance of grammaticality might be found, though many of these might be gibberish. Further nested sets are those that are grammatical, and within those, the set of paraphrases that are entailed by the input text.</Paragraph> <Paragraph position="1"> However, given that we limit ourselves to statistical techniques and avoid symbolic logic, we cannot make any claim of strict entailment. We</Paragraph> </Section> <Section position="2" start_page="88" end_page="89" type="sub_section"> <SectionTitle> Original Text </SectionTitle> <Paragraph position="0"> A military transporter was scheduled to take off in the afternoon from Yokota air base on the outskirts of Tokyo and fly to Osaka with 37,000 blankets .</Paragraph> <Paragraph position="1"> Mondale said the United States, which has been flying in blankets and is sending a team of quake relief experts, was prepared to do more if Japan requested .</Paragraph> <Paragraph position="2"> United States forces based in Japan will take blankets to help earthquake survivors Thursday, in the U.S. military's first disaster relief operation in Japan since it set up bases here.</Paragraph> <Paragraph position="3"> Our approach with Dependencies 6: united states forces based in blankets 8: united states which has been flying in blankets 11: a military transporter was prepared to osaka with 37,000 blankets 18: mondalesaidtheafternoon from yokota airbaseon theunited stateswhich has been flying in blankets 20: mondale said the outskirts of tokyo and is sending a military transporter was prepared to osaka with 37,000 blankets 23: united states forces based in the afternoon from yokota air base on the outskirts of tokyo and fly to osaka with 37,000 blankets 27: mondale said the afternoon from yokota air base on the outskirts of tokyo and is sending a military transporter was prepared to osaka with 37,000 blankets null 29: united states which has been flying in the afternoon from yokota air base on the outskirts of tokyo and is sending a team of quake relief operation in blankets 31: united states which has been flying in the afternoon from yokota air base on the outskirts of tokyo and is sending a military transporter was prepared to osaka with 37,000 blankets 34: mondalesaidtheafternoon from yokota airbaseon theunited stateswhich has been flying in the outskirts of tokyo and is sending a military transporter was prepared to osaka with 37,000 blankets 36: united states which has been flying in japan will take off in the afternoon from yokota air base on the outskirts of tokyo and is sending a military transporter was prepared to osaka with 37,000 blankets tences are prefixed by their length.</Paragraph> <Paragraph position="4"> thus propose an intermediate set of sentences which conserve the content of the source text without necessarily being entailed. These are referred to as the set of verisimilitudes, of which properly entailed sentences are a subset. The aim of our choice of features and our algorithm extension is to reduce the search space from gibberish strings to that of verisimilitudes. While generating verisimilitudes is our end goal, in this paper, we are concerned principally with the generating of grammatical sentences.</Paragraph> <Paragraph position="5"> To do so, the extension adds an extra feature propagation mechanism to the Viterbi algorithm such that features are passed along a word sequence path in the search space whenever a new word is appended to it. Propagated features are used to influence the choice of subsequent words suitable for appending to a partially generated sentence. In our case, our feature is a dependency structure of the word sequence corresponding to the search path. Our present dependency representation is based on that of (Kittredge and Mel'cuk, 1983). However, it contains only the head and modifier of a relation, ignoring relationship labels for the present.</Paragraph> <Paragraph position="6"> Algorithmically, after appending a word to a path, a dependency structure of the partially generated string is obtained probabilistically. Along with bigram information, the long-distance context of dependency head information of the preceding word sequence will be useful in generating better sentences by filtering out all words that might, at a particular position in the string, lead to a spurious dependency relation in the final sentence. Example output is presented in Figure 2.</Paragraph> <Paragraph position="7"> Asthe dependency &quot;parsing&quot; mechanism islinear3 and is embedded within the Viterbi algorithm, the result is an O(a3 a4 ) algorithm.</Paragraph> <Paragraph position="8"> By examining surface-syntactic dependency structure at each step in the search, resulting sentences are likely to be more grammatical. This marraige of models has been tested in other fields such as speech recognition (Chelba and Jelinek, 1998) with success. Although it is an impoverished representation of semantics, considering dependency features in our application context may also serendipitously assist verisimilitude generation. null</Paragraph> </Section> </Section> <Section position="4" start_page="89" end_page="91" type="metho"> <SectionTitle> 3 The Extended Viterbi Algorithm: </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="89" end_page="89" type="sub_section"> <SectionTitle> Propagating Dependency Structure </SectionTitle> <Paragraph position="0"> In this section, we present an overview of the main features of our algorithm extension. We direct the interested reader to our technical paper (Wan et al., 2005) for full details.</Paragraph> <Paragraph position="1"> The Viterbi algorithm (for a comprehensive overview, see (Manning and Sch&quot;utze, 1999)) is used to search for the best path across a network of nodes, where each node represents a word in the vocabulary. The best sentence is a string of words, eachoneemittedbythecorresponding visited node on the path.</Paragraph> <Paragraph position="2"> Arcs between nodes are weighted using a combination of two pieces of information: a bigram probability corresponding to that pair of words; and a probability corresponding to the likelihood of a dependency relation between that pair of words. Specifically, the transition probability guaranteeing the most likely parse.</Paragraph> <Paragraph position="3"> defining theseweights istheaverage of thedependency transition probability and the bigram probability. null To simplify matters in this evaluation, we assume that the emission probability is always one. The emission probability is interpreted as being a Content Selection mechanism that chooses words that are likely to be in a summary. Thus, in this paper, each word has an equally likely chance of being selected for the sentence. The second function, a61a63a62a43a64a58a65a27a66a68a67 , is the focus of this paper and discussed in Section 3.1.</Paragraph> <Paragraph position="4"> In the remaining subsections, we present an example-based discussion of how dependency-based transitions are used, and a discussion of how the dependency structure of the unfolding path is maintained and propagated within the search process.</Paragraph> </Section> <Section position="2" start_page="89" end_page="90" type="sub_section"> <SectionTitle> 3.1 Word Selection Using Dependency Transitions </SectionTitle> <Paragraph position="0"> Given two input sentences &quot;The relief workers distributed food to the hungry.&quot; and &quot;The UN workers requested medicine and blankets.&quot;, the task is to generate a single sentence that contains material from these two sentences. As in (Barzilay et al., 1999), we assume that the sentences stem from the sameevent and thus, references can be fused together.</Paragraph> <Paragraph position="1"> Imagine alsothat bigram frequencies have been collected from a relevant UN Humanitarian corpus. Figure 3 presents bigram probabilities and two sample paths through the lattice. The path could follow one of two forks after encountering with the input text, the other is not. Assume that the probabilities are taken from a relevant corpus such that a61a77a92 ba93a42a94a78a3a96a95a70a97a56a98a30a99a48a100 da101a53a99a102a98a53a103a40a101a53a104a36a105a86a98a29a97a56a106a81a107 is not zero. the word distributed, since the corpus may have examples of the word pairs distributed food and distributed blankets. Since both food and blankets can reach the end-of-sentence state, both might conceivably be generated by considering just ngrams. However, only one is consistent with the input text.</Paragraph> <Paragraph position="2"> To encourage the generation of verisimilitudes, wecheck foradependency relation between blankets and distributed in the input sentence. As no evidence is found, we score this transition with a low weight. In contrast, there is evidence for the alternative path since the input text does contain a dependency relation between food and distributed. null In reality, multiple words might still conceivably be modified by future words, not just the immediately preceding word. In this example, distributed is the root of a dependency tree structure representing the preceding string. However, any node along the rightmost root-to-leaf branch of the dependency tree (that represents the partially generated string) could be modified. This dependency structure is determined statistically using a probabilistic model of dependency relations. To represent the rightmost branch, we use a stack data structure (referred to as the head stack) whereby older stack items correspond to nodes closer to the root of the dependency tree.</Paragraph> <Paragraph position="3"> The probability of the dependency-based transition is estimated as follows: where a61a69a92a42a129a130a97a29a61a63a131a58a132a7a133a134a92a34a135a137a136a19a138 a0a59a139a7a140 a107a29a107 is inspired by and closely resembles the probabilistic functions in (Collins, 1996).</Paragraph> <Paragraph position="4"> After selecting and appending a new word, we update this representation containing the governing words of the extended string that can yet be modified. The new path is then annotated with this updated stack.</Paragraph> </Section> <Section position="3" start_page="90" end_page="91" type="sub_section"> <SectionTitle> 3.2 Maintaining the Head Stack </SectionTitle> <Paragraph position="0"> There are three possible alternative outcomes to the head stack update mechanism. Given a head stack representing thedependency structure of the partially generated sentence and a new word to append to the search path, the first possibility is that the new word has no dependency relation to any of the existing stack items, in which case we simply push the new word onto the stack. For the second and third cases, we check each item on the stack and keep a record only of the best probable dependency between the new word and the appropriate stack item. The second outcome, then, is that the new word is the head of some item on the stack. All items up to and including that stack item are popped off and the new word is pushed on. The third outcome is that it modifies some item on the stack. All stack items up to (but not including) the stack item are popped off and the new word is pushed on.</Paragraph> <Paragraph position="1"> We now step through the generation of the sentence &quot;The UN relief workers distributed food to the hungry&quot; which is produced by the exploration of one path in the search process. Figure 4 shows how the head stack mechanism updates and propagates the stack of governing words as we append words to the path to produce this string.</Paragraph> <Paragraph position="2"> We first append the determiner the to the new string and push it onto the empty stack. As dictated by a high n-gram probability, the word UN follows. However, there is no evidence of a relation with the preceding word, so we simply push it on the stack. Similarly, relief is appended and also pushed on the stack.</Paragraph> <Paragraph position="3"> When we encounter the word workers we find evidence that it governs each of the preceding along the path.</Paragraph> <Paragraph position="4"> three words. The modifiers are popped off and workers is pushed on. Skipping ahead, the transition distribute food has a high bigram probability and evidence for a dependency relation exists. This results in a strong overall path probability as opposed to the alternative fork in Figure 3. Since distributed can still be modified in the future by words, it is not popped off. The word food is pushed onto the stack as it too can still be modified. null The sentence could end there. Since we multiply path, transition and emission probabilities together, longer sentences will have a lower probability and will be penalised. However, we can choose to continue the generation process to produce a longer sentence. The word to modifies distributed. To prevent crossing dependencies, food is popped off the stack before pushing to. Appending the rest of the words is straightforward.</Paragraph> </Section> </Section> <Section position="5" start_page="91" end_page="91" type="metho"> <SectionTitle> 4 Related Work </SectionTitle> <Paragraph position="0"> In recent years, there has been a steady stream of research in statistical text generation (see Langkilde and Knight (1998), and Bangalore and Rambow (2000)). These approaches begin with a representation of sentence semantics that has been produced by a content planning stage. Competing realisations of the semantic representation are ranked using an n-gram model. Our approach differs in that we do not start with a semantic representation. Rather, we paraphrase the original text, searching for the best word sequence and dependency tree structure concurrently.</Paragraph> <Paragraph position="1"> Summarization researchers have also studied the problem of generating non-verbatim sentences: see (Jing and McKeown, 1999), (Barzilay et al., 1999) and more recently (Daum'e III and Marcu, 2004). Jing uses a HMM for learning alignments between summary and source sentences. Daume III also provides a mechanism for sub-sentential alignment but allows for alignments between multiple sentences. Both approaches provide models for later recombining sentence fragments. Our work differs primarily in granularity. Using words as a basic unit potentially offers greater flexibility in pseudoparaphrase generation; however, like any approach that recombines text fragments, it incurs additional problems inensuring that thegenerated sentence reflects the information in the input text.</Paragraph> <Paragraph position="2"> In work describing summarisation as translation, KnightandMarcu(Knight andMarcu, 2002) also combine syntax models to help rank the space of possible candidate translations. Their work differs primarily in that they search over a space of trees representing the candidate translations and we search over a space of word sequences which are annotated by corresponding trees.</Paragraph> </Section> class="xml-element"></Paper>