File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-3008_metho.xml
Size: 10,458 bytes
Last Modified: 2025-10-06 14:07:10
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-3008"> <Title>Multiple Discourse Marker Occurrence: Creating Hierarchies for Natural Language Generation</Title> <Section position="3" start_page="0" end_page="41" type="metho"> <SectionTitle> 2 Defining Discourse Markers </SectionTitle> <Paragraph position="0"> Although precise definitions of discourse markers differ between studies, it is generally accepted that their role is to signal how one proposition should be interpreted given the other(s) in the discourse (Millis et al., 1995; Moore and Pollack, 1992). Most researchers in this field also agree that the relation between these propositions may exist regardless of whether a discourse marker is used (Scott and de Souza, 1990; Knott, 1995): a discourse marker is simply an explicit signal of a specific relation between two or more propositions. The non-occurrence of a marker does not mean that a discourse relation is absent: (1) no marker, 1 relation: The museum does not intend to sponsor a particular aspect of modern art; it intends to make a report to the public by offering material for study and comparison.</Paragraph> <Paragraph position="1"> By the same token, the presence of more than one discourse marker does not always signal a multitude of relations: (2) P markers, 1 relation: The museum does not intend to sponsor a particular aspect of modern art, but rather to make a report to the public by offering material for study and comparison. (BNC) 1 Previous studies have accounted for a wide range of phenomena, from choosing between similar discourse markers (Fraser, 1998; Sanders et al., 1992) to abstracting away from discourse markers and using syntax to signal underlying discourse relations (Delin et al., 1996). However, the issue of multiple markers, like those in the example above, is only now beginning to be addressed. Recent work in computational linguistics has provided possible solutions for the use of correlative markers (Webber and Joshi, 1998) and embedded clauses (Power et al., 1999). However, these solutions are incomplete and further research is needed if we are to account for all examples of multiple discourse markers.</Paragraph> </Section> <Section position="4" start_page="41" end_page="41" type="metho"> <SectionTitle> 3 Multiple Markers </SectionTitle> <Paragraph position="0"> The present project focuses on all cases of multiple discourse markers, in other words, all cases where more than one marker occurs within two spans of text which are expressed either (a) within the same text sentence (Nunberg, 1990) covering one or more discourse relations (e.g., examples 3 and 4); (3) Having said that, if you weigh only 60 kg (1321b) and yet still manage to sit your 90 kg (1981b) opponent down with a solid thump to his mid-section, then the refereeing panel may well applaud your fervour with a full point. (BNC) (4) Since the question turns on the meaning of the word &quot;appropriate&quot; in section 1(1) of the Act of 1968, the problem is therefore one of statutory interpretation. (BNC) or (b) in different text sentences but covering only one relation, the so-called correlative markers (Quirk et al., 1985) (e.g., example 5): (5) The job of being an Acorn Project leader is an unenviable one. For a start, they don't get paid, though they do receive a petrol allowance; for another thing, it's a bit like being in a group of unruly children for the week... (BNC) The work described here focuses solely on multiple discourse markers cueing a single relation, paying attention, when possible, to embedded discourse relations and their markers.</Paragraph> </Section> <Section position="5" start_page="41" end_page="41" type="metho"> <SectionTitle> 4 Single Relations Multiple </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="41" end_page="41" type="sub_section"> <SectionTitle> Markers </SectionTitle> <Paragraph position="0"> Preliminary tests using the British National Corpus (BNC) and Knott's (1995) taxonomy of discourse markers suggested that the order of multiple markers cueing a single relation is affected by their position in the taxonomy; those higher in the taxonomy always precede those lower in the taxonomy (see figure 1 and exam- null ples 6-7); (6) This blood-line was particularly helpful to the early breeders because the line was in-bred, his parents being brother and sister of excellent breeding and so consequently true to type. (BNC) (7) The difficulty is that the sites which have been extensively excavated, and so produced the largest quantities of pottery, such a Corbridge and Newstead, are multi-period, and the stratification of the excavations early in the century, consequently suspect. (BNC) However, since Knott's taxonomy only allows us to view hierarchies of markers of a single relation, improvements were necessary in order to account for multiple markers. Using the BNC, a list of at least 350 English discourse markers and Mann and Thompson's (1988) original 23 rhetorical relations, we created a database on the number and type of relations each marker can cue (see figure 2). From this a hierarchy was built, similar to Knott's (1995), but benefiting from a wider range of markers and allowing more than one relation to be expressed at a time, thus reducing the redundancy present in Knott's taxonomy. Furthermore, in contrast to Knott's study in which examples were fabricated, all examples of discourse marker usage in our database are taken from the British National Corpus (BNC). Thus, all of our examples are taken from real, natural texts and are, therefore, representative of discourse marker occurrence in natural language.</Paragraph> </Section> </Section> <Section position="6" start_page="41" end_page="41" type="metho"> <SectionTitle> 5 Constructing the Hierarchy </SectionTitle> <Paragraph position="0"> Our hierarchies are constructed on the assumption that (a) some discourse markers may be used to cue more than one relation and (b) when more than one marker is needed, the number of relations a marker can cue will affect the choice and position of that marker. In our hierarchy, those discourse markers which can cue many relations appear at the top and those marking only a single relation occur at the bottom.</Paragraph> <Paragraph position="1"> Markers may also have additional constraints on their usage depending upon the text style, other relations being marked simultaneously and the content of the related propositions.</Paragraph> </Section> <Section position="7" start_page="41" end_page="44" type="metho"> <SectionTitle> 6 Strong ~ Weak Markers </SectionTitle> <Paragraph position="0"> Figure 3 is an example of our hierarchy for the family of contrastive relations. Here we see that 'but' can mark four discourse relations While deciding to stay as independent as possible, I contacted ACET who 1 knew provided practical care at home. I had previously spent about 2 years asking local services and friends for help and not having it happen. SO my flat had become pretty run down. (BNC) While wanting to dismiss the stereotyping and silly superstition, the snag remains that within all the ballyhoo there are elements of truth. SO instead of being outraged, one is left with a resigned smirk. (BNC) Loosen the cord SO you can remove the curtains easily. (BNC) Nor is this feeling only provoked by the sight or the thought of art, he wrote. I also experienced it when I signed the marriage register aswell as when I saw the pig slaughtered...a feeling of the heart leaping and the blood pemping.....SO, wrote Harsnet, there is continuity as well as discontinuity. (BNC) If you went on strike they didn't pay you off. You got sacked and you just didn't get any money. So people had no other option but to work. (BNC) ..... that's what I guessed so I said &quot;no&quot;, I said they're fine, SO she said &quot;oh, I'm ever so sorry&quot;. I said &quot;don't be&quot;. (BNC) He'll remind her SO she'll remember. (BNC)</Paragraph> <Paragraph position="2"> without constraint. When discourse markers can be used for a large number of relations, we refer to them as 'weak' markers since there is only a weak correlation between the marker and the relation being signalled. In contrast, when a discourse marker can only cue a single relation, we refer to it as a 'strong' marker, since there is a strong correlation between the relation and the explicit lexical cue. In the hierarchy 'notwithstanding that' is a highly constrained, strong discourse marker since it can only mark one relation (concession) and occur only when the text is formal, legal or both.</Paragraph> <Paragraph position="3"> Our tests on the BNC show that the choice and placement of a marker will be affected by its strength or weakness; the weakest markers always precede the stronger ones. We find that this rule not only applies to single relations cued by multiple markers: us by themselves, but yet (b) their presence in the skin can be deduced from sweat. (BNC) but also, to a certain extent, to embedded relations cued by two markers in the same text span. In the following example, we have two relations and two markers of contrast. The superordihate relation, marked by 'however', holds between proposition (a) and propositions (b) and (c), whilst the subordinate relation, marked by 'whereas', holds between propositions (b) and (c); (9) Indeed, (a) so strong have the differential views on advantageous locations become that one recent assessment of the total stock of foreign capital in developing countries suggests that it is less today than it was in 1900. However, whereas (b) the G-5 countries now account for 75 per cent of the world's FDI flow, (c) their position as the five major exporters is a much less concentrated 45 per cent.(BNC) In both cases, the weakest marker precedes the stronger marker and neither could be reversed and remain grammatical. Thus, working through the hierarchy from the weakest to the strongest markers, a generation system can determine which discourse marker should occur in a particular position on the basis that the weakest markers always precede the stronger ones. Decisions are based on the relation(s) to be marked, any other relation(s) already present, the style of the text, the content of the text spans, and the strength or weakness of other discourse markers present.</Paragraph> </Section> class="xml-element"></Paper>