File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0328_metho.xml

Size: 9,468 bytes

Last Modified: 2025-10-06 14:14:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0328">
  <Title>Toward a Multidimensional Framework to Guide the Automated Generation of Text Types</Title>
  <Section position="2" start_page="230" end_page="233" type="metho">
    <SectionTitle>
NEGOTIATION OF SPEECH ROLES, the other is concerned with the SPEECH MODALITIES. Figure 2
</SectionTitle>
    <Paragraph position="0"> contains some of these options in a systemic network.</Paragraph>
    <Paragraph position="1"> The mode of discourse. The mode of discourse has traxtitionally been seen as composed of selections from three simultaneous parameters: the LANGUAGE ROLE, the MEDIUM, and the CHANNEL OF DISCOURSE. The LANGUAGE ROLE is a continuum with the two ends of the scale being whether the language is constitutive or ancillary (the language in a face-to-face service encounter being ancillary since it accompanies an activity and is not the sole meaningful activity, and the language of a physics research paper being constitutive since the text creates the entire exchange). ThelMEDIUM OF DISCOURSE deals with the process of text creation, with the degree of sharing the process of text creation between the interlocutors. The CHANNEL OF DISCOURSE is the modality through which the language is received, including typically the options GRAPHIC and PHONIC. Early work on register (e.g., \[Gregory &amp; Carroll 78\]) often glossed medium as being congruent with the option between speaking and writing, but we can now go further</Paragraph>
    <Paragraph position="3"> C Figure 3: Mode systems: speaking and writing focus; Martin (1992) and adopt more abstract characterizations as suggested by \[Martin 92\]. This is also necessary given the range of substantial empirical work (e.g., \[Redeker 84, Biber 89\] and others) showing that the spoken/written distinction per se is not a simple parameter. The lexicogrammatical consequences of the features shown in Figure 3 are discussed in \[Martin 92\].</Paragraph>
    <Paragraph position="4"> 3 Using the multidimensional analysis of texts for generation As discussed in \[Matthiessen 94\], register can be interpreted (and therefore implemented in a sentence generator) in three ways:  * Probability variations of choices within systems: Each register imposes its idiosyncratic probability distribution upon the choice preferences within appropriate systems, so that while the grammar remains the same throughout, the generator's traversal of the grammar will vary according to registerial probabilities; * Core system with extensions for variation: Each register adds some idiosyncratic systems at appropriate points of the grammar while leaving the remainder unchanged; * Completely separated system networks: Each register has a distinct subgrammar, and no common core exists. This is the approach taken in \[Patten 88, Bateman Paris 91\]. In  this sense, register-specific language is treated like a sublanguage \[Kittredge ~ Lehrberger 81\].</Paragraph>
    <Section position="1" start_page="232" end_page="233" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> We follow the first approach. In this section, we outline a method of semi-automatically determining probability distributions for each register, taking as example the instruction stage of a recipe: Remove fruit and 2tbs of juice from the can, then discard the rest. Put all ingredients into a saucepan and slowl~t bring to the boil. When hot, pour into a food processor and process to a smooth sauce. For eztra texture reserve 1-2 pieces of fruit, mash, then add this to the finished sauce. (SHE Magazine, June 1993) What are the lexicogrammatical features that express the features of field, tenor, and mode? For fully worked out systems, tracing them through the labyrinthine networks is tedious at best. For partially worked out systems, the connections between the higher level networks such as field and the lower level networks of the grammar often do not exist, and so another method is required for determining the registerially determinating features at the lower levels.</Paragraph>
      <Paragraph position="1"> One such method, suggested in \[Bateman &amp; Paris 91\], is to perform grammatical (and presumably lexical) analyses of sample texts by hand. While (as they nicely illustrate) this is possible for small samples, the problem of ensuring coverage and consistency for larger samples can quickly become daunting. For this reason, we propose a &amp;quot;bottom-up&amp;quot; abductive method, using the generator as a tool, that is considerably easier, since it is semi-automatic. The method involves the fonowing steps:  1. For each sentence in the sample text type under consideration, create an input specification for the generator.</Paragraph>
      <Paragraph position="2"> 2. Run the generator on each input specification and check that the output sentences are correct. Collect the lexicogrammatical features for each sentence.</Paragraph>
      <Paragraph position="3"> 3. Classify the features for each sentence according to register type (field, tenor, or mode) and constituent type (clause complex, clause, noun phrase, lexical, etc.).</Paragraph>
      <Paragraph position="4"> 4. Count the number of times each feature appears in the whole test sample as a percentage  of the number of times its constituent type appeared. For example, if the NP feature DETERMINED appears 9 times for 10 noun phrases in a sample, then we say the involvement of this feature is 90%. Graph or tabulate the distribution of feature involvement as number of features vs. percentile.</Paragraph>
      <Paragraph position="5"> 5. Through inspection of the resulting table, determine the register-determinate cutoff point the point after which features appear too seldom to be indicative of the text type. This point will appear at the 'knee' at which the curve begins to rise rapidly for small increases of involvement.</Paragraph>
      <Paragraph position="6"> We use the sentence generator Penman to generate the sentences in the sample text we selected, and collected the features it needed. The total number of features (including duplication) came to 543. Of these, 48 features appeared every time they could (i.e., were present every time a syntactic constituent of the appropriate type was generated: 10 at the clause complex level, 19 at the clause level, and 19 at the NP level). That is, 48 features had an involvement of 100%. We then graphed out the distribution of feature involvements. Notwithstanding the small sample size, we found a striking regularity: the involvement distribution was bimodal, with some features appearing very often (over 80%) and almost all the remainder appearing infrequently (under 30%, for the clause and NP levels, and under 60% for the clause complex level). That is,</Paragraph>
    </Section>
    <Section position="2" start_page="233" end_page="233" type="sub_section">
      <SectionTitle>
7th International Generation Workshop - Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> the middle range between 80% and 30% involvement contained significantly fewer features than either of the extremes. This we interpret as follows: when features appear often, they appear very often, and thus specify the genre characteristics. On the other hand, if features do not appear often, they appear seldom, only as needed to produce the particular clause(s) in which they appear. The degree to which features with high involvements appear can be thought of as the degree to which they co-specify the genre, and thus the &amp;quot;strength&amp;quot; of their propensity for selection during the text and sentence planning processes.</Paragraph>
      <Paragraph position="1"> The following tables summarize (full information appears in the long version of this paper).</Paragraph>
      <Paragraph position="2"> Clause-complex level Clause level ! NP level % feature number of % of total number of % of total ! number of % of total involvement features features features features I features features 100% 10 62.5% 19 15.4% I 19 21.3% &gt; =80% 10 62.5% 34 27.6% 43 48.3% mid-range 6 37.5% 24 19.5% 12 13.5% &lt;=30% 0 0% 65 52.8% 34 38.2% A look at the genre-defining clause level features may prove instructive; as expected from looking at the text, features such as IMPERATIVE, IMPERATIVE-INTERACTANT, and NONFINITIVE-V0ICE</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="233" end_page="234" type="metho">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> The abductive method for text characterization presented here has several advantages, in our opinion. An important advantage is that it focuses human effort not on text analysis (which is difficult and prone to error and inconsistency) but rather on generator input creation (which can easily be checked). Also, the graphed distribution of feature involvements provides an immediate visual clue as to which features are indeed register-determinate and to what degree they are so. In turn, this allows the register-grammarian to express grammar decision rules (or system network options, in the case of SFL) in terms of probabilities with some empirical confidence. Another benefit is that the method assists with text type characterisation, by pointing out (through dramatically lower involvement values) when different text types or stages are mixed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML