File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1039_metho.xml

Size: 7,769 bytes

Last Modified: 2025-10-06 14:14:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1039">
  <Title>A Fast and Portable Realizer for Text Generation Systems</Title>
  <Section position="4" start_page="265" end_page="265" type="metho">
    <SectionTitle>
* The Surface-Syntactic Component linearizes the
</SectionTitle>
    <Paragraph position="0"> nodes of the SSyntS, which yields the deepmorphological structure, or DMorphS. It draws on the SSynt grammar, which states rules of linear precedence according to arc labels.</Paragraph>
  </Section>
  <Section position="5" start_page="265" end_page="266" type="metho">
    <SectionTitle>
* The Deep-Morphological Component inflects the
</SectionTitle>
    <Paragraph position="0"> items of the DMorphS, yielding the Surface-Morphological Structure (SMorphS). It draws on information from the lexicon, as well as on a default inflection mechanism (currently hard-coded in C++).</Paragraph>
    <Paragraph position="1">  * The Graphical Component adds abstract punctuation and formatting instructions to the SMorphS (including &amp;quot;point absorption&amp;quot;-see (White, 1995)), yielding the Deep-Graphical Structure (DGraphS).</Paragraph>
    <Paragraph position="2"> * Ad-hoc formatters transform the DGraphS into formatting instructions for the targeted output medium. Currently, REALPRo supports ASCII, HTML, and RTF output.</Paragraph>
  </Section>
  <Section position="6" start_page="266" end_page="266" type="metho">
    <SectionTitle>
4 Linguistic Knowledge Bases
</SectionTitle>
    <Paragraph position="0"> As mentioned in Section 3, REALPRO is configured by specifying several LKBs. The system comes with LKBs for English; French is currently under development. Normally, the user need not change the two grammar LKBs (the DSynt and SSynt grammars), unless the grammar of the target sublanguage is not a subset of English (or French). However, the user may want to extend the lexicon if a lexeme with irregular morphology is not in it yet. (Recall that not all words in the input representation need be in the lexicon.) For example, in order to generate saw (rather than the default seed) for the past tense of to see, the following entry would be added to the</Paragraph>
    <Paragraph position="2"> The user may also want to change the defaults.</Paragraph>
    <Paragraph position="3"> For example if in his/her application all sentences must be in past tense, the user can set the default tense to be past rather than present as follows:  DEFAULT: verb \[ tense:past mood:ind \] 5 Coverage of the English Grammar The English grammar currently covers a wide range of syntactic phenomena: * Full range of verbal forms (such as compound tenses, aspects, passive voice, and so on), including negation and questions. Also subject-verb agreement. null * Coordination of both nouns and clauses.</Paragraph>
    <Paragraph position="4"> * Relative clauses (both on subject and object). * Default word order; certain word order variations (including so-called &amp;quot;topicalization&amp;quot;, i.e. fronting of adjuncts or non-subject complements) are controled through features.</Paragraph>
    <Paragraph position="5"> * Full English morphology, including a full range of pronominal forms (personal pronouns, possessive pronouns, relative pronouns).</Paragraph>
    <Paragraph position="6"> * Full range of punctuation, such as commas  around descriptive relative clauses.</Paragraph>
    <Paragraph position="7"> Most of these points are illustrated by the input in Figure 2. Phenomena currently not handled automatically include certain types of &amp;quot;fancy syntax&amp;quot; such as clefts and it-clefts (though these can be generated by specifying the surface structure in the input), as well as long-distance dependencies such as These are books which I think you should buy (where which is an argument of buy).</Paragraph>
  </Section>
  <Section position="7" start_page="266" end_page="266" type="metho">
    <SectionTitle>
6 Interfaces
</SectionTitle>
    <Paragraph position="0"> REALPRO is currently distributed with a socket interface which allows it to be run as a standalone server. It has an application programming interface (API), available in C++ and Java, which can be used to integrate REALPRO in applications. For training, debugging, and demonstration purposes, REALPRO can also be used in interactive mode to realize sentences from ASCII files containing syntactic specifications. The following ASCII-based specification corresponds to the DSyntS of sentence (2):  In this definition, parentheses 0 are used to specify the scope of dependency while square brackets ~ are used to specify features associated with a lexeme.</Paragraph>
    <Paragraph position="1"> REALPRO can output text formatted as ASCII, HTML, or RTF. In addition, REALPRO can also output an ASCII representation of the DGraphS that a user application can format in application-specific ways.</Paragraph>
  </Section>
  <Section position="8" start_page="266" end_page="267" type="metho">
    <SectionTitle>
7 System Performance
</SectionTitle>
    <Paragraph position="0"> The following table shows the runtime for sentences of different lengths. These sentences are all of the form This small girl often claims that that boy often claims that Mary likes red wine, where the middle clause that that boy often claims is iterated for the longer sentences. The row labeled &amp;quot;Length&amp;quot; refers to the length of the output string in words. Note that the number of output words is equal to the number of nodes in the SSyntS (because it is a dependency tree), and furthermore the number of nodes in the  SSyntS is greater than or equal to the number of nodes in the DSyntS. (In our case, the number of nodes in the input DSyntS is equal to the number of words in the output string.) The row labeled &amp;quot;Sec&amp;quot; represents average execution time (over several test runs) for the sentence of the given input length, in seconds, on a PC with a 150MHz Pentium processor and 32 Megs of RAM.</Paragraph>
    <Paragraph position="1"> ILen hl 5 110115 20130140150 Sec .11 .17 .20 .28 .44 .58 .72 We also tested the system on the syntactically rather varied and complex input of Figure 2 (which is made up of 20 words). The average runtime for this input is 0.31 seconds, which is comparable to the runtime reported above for the 20 word sentence. We conclude that the uniformity of the syntactic constructions found in the sentences used in the above test sequence does not influence the results. null The complexity of the generation algorithm derives primarily from the tree traversals which must be performed twice, when passing from DSyntS to SSyntS, and from SSyntS to the DMorphS. Let n be the length of the output string (and hence an upper bound on the size of both DSyntS and SSyntS).</Paragraph>
    <Paragraph position="2"> At each node, each rule in the appropriate grammar (deep- or surface-syntactic) must be checked against the subtree rooted at that node. This tree matching is in the general case exponential in n. However, in fact it is dependent on two variables, the maximal size of grammar rules in the grammar (or n, whichever is greater), and the branching factor (maximum number of daughter nodes for a node) of the input representation. Presumably because of deeper facts about language, the grammar rules are quite small. The current grammar does not have any rules with more than three nodes. This reduces the tree matching algorithm to polynomial in n. Furthermore, while the branching factor of the input tree can in theory be n - 1, in practice it will be much smaller. For example, all the input trees used in the tests discussed above have branching factors of no more than 5. We thus obtain de-facto linear performance, which is reflected in the numbers given above.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML