File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2027_metho.xml

Size: 5,921 bytes

Last Modified: 2025-10-06 14:10:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2027">
  <Title>Information structure and pauses in a corpus of spoken Danish</Title>
  <Section position="3" start_page="191" end_page="191" type="metho">
    <SectionTitle>
3 Pauses in earlier studies
</SectionTitle>
    <Paragraph position="0"> The material annotated so far already gives us the possibility to investigate whether there is a significant relation between pauses and information structure. Earlier studies (Jensen, 2005) (Hansen et al., 1993) investigated the effect of syntactic boundaries (clausal as well as phrasal) on the placing of pauses in spoken Danish. In the first study, it is found that more than 55% of the pauses co-occur with clause boundaries, 12% with phrase boundaries, and the remaining 33% occur within phrases or in conjunction with repairs, interjections and enumerations. It is also noted that pauses falling within a syntactic phrase tend to be placed in the final part of the sentence. The second study confirms this observation by showing that 60% of the pauses that do not co-occur with syntactic boundaries occur within the last 40% of the sentence (measured in number of syllables). The authors of both investigations make the hypothesis that information structure may have an effect on the occurrence of pauses within clauses. However, the empirical material used in those works is not annotated with respect to information structure, and therefore, no conclusive claim could be made. In addition, the data used in Hansen et al (1993) come from news reading, and are thus essentially written language although delivered orally.</Paragraph>
  </Section>
  <Section position="4" start_page="191" end_page="192" type="metho">
    <SectionTitle>
4 Pauses and focusing in Danish
</SectionTitle>
    <Paragraph position="0"> The purpose of this pilot study is, on the basis of the annotated DanPass corpus, to verify i. to what degree pauses tend to be associated with focus and topic, and ii. where in the focus domain pauses tend to occur, particularly whether pauses are used to mark the left-hand focus boundary.</Paragraph>
    <Paragraph position="1"> Since we already know from the studies cited above that there is a strong tendency for pauses to coincide with clause boundaries, we decided  structure categories (%) to exclude those from the study, and only look at pauses that occur within clauses. So far, the investigation has been carried out for the network description part of the corpus, and only for the data produced by one of the coders.</Paragraph>
    <Paragraph position="2"> The first question - whether pauses relate to words coded as either focus or topic - was investigated by counting, out of a total 3659 words, how many words tagged as either F or T, or bearing no tag, are preceded by a pause (silent or non silent). The results, shown in Tables (2), seem to disconfirm the hypothesis that there should be a correlation between pauses and information structure categories, or at least that a correlation, if it exists, can be expressed by looking at the frequency with which pauses precede focus or topic words. In fact, over 65% of the intra-clausal pauses in the material precede untagged words, and the observed frequency of a pause before a focus or a topic word is lower than the average 28.34% (baseline).</Paragraph>
    <Paragraph position="3"> Since we know that topics often occur sentence-initially, the results in the tables are misleading in that only intra-clausal pauses are taken into consideration. Therefore we also looked at what percentage of topic words are succeeded rather than preceded by a pause, and found that 33.50% are. This figure is interesting, but needs further investigations.</Paragraph>
    <Paragraph position="4"> Now we zoom in on the focus domain. First of all, we look at pause distribution across different part-of-speech categories, again by inspecting the pauses preceding words. Table (3) shows the frequency with which different part-of-speech categories occurring in the focus domain (i.e. tagged &amp;quot;F&amp;quot;) are preceded by a pause. The total no. of words considered is 1661.</Paragraph>
    <Paragraph position="5"> The interesting fact that emerges is that adjectives have a remarkably higher probability to be preceded by a pause than any of the other category, and also a clearly higher probability than the average 28.34%.</Paragraph>
    <Paragraph position="6"> We then looked at the first pause in the focus domain. The first pause falls before the first focus word in only 30% of the cases. In other words, it does not seem to mark the left-hand boundary of the focus domain. By running a decision tree generator (Witten and Eibe, 2005) on the data, we found that the strongest rule learnt by the system was one that places the first pause in the focus domain between a determiner and an adjective (2). Another rule predicts that a pause will fall between an adjective and a noun (3).</Paragraph>
    <Paragraph position="7"> (2) tilbage er der... en/F + rod/F firkant/F 'left there is... [F a PAUSE red square]' (3) til venstre... laegger du en/F rod/F + firkant/F 'to the left... you put [F a red PAUSE square]' The two rules reflect a strong characteristic of the monologues under investigation, where the speakers have to draw the listener's attention to the various geometrical figures in the network they are describing. To tell them apart from each other, they either use the colour of the figure or its shape. In other words, the pauses occurring in the focus domain tend to precede the word that expresses what Dik (1989) calls selecting focus, here an adjective that, by defining aselecting property or type, helps distinguishing the object in focus from other similar ones. From the point of view of accentuation, however, the adjective is not more prominent than the noun, and is therefore not annotated as the only word in focus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML