File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1020_metho.xml

Size: 7,683 bytes

Last Modified: 2025-10-06 14:08:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1020">
  <Title>Representing discourse coherence: A corpus-based analysis</Title>
  <Section position="3" start_page="1" end_page="2" type="metho">
    <SectionTitle>
4 Statistics
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Crossed dependencies
</SectionTitle>
      <Paragraph position="0"> An important question is how frequent the phenomena discussed in the previous sections are.</Paragraph>
      <Paragraph position="1"> The more frequent they are, the more urgent the need for a data structure that can adequately represent them.</Paragraph>
      <Paragraph position="2"> This section reports counts on crossed dependencies in the annotated database of 135 texts. In order to track the frequency of crossed dependencies for the coherence structure graph of each text, we counted the minimum number of arcs that would have to be deleted in order to make the coherence structure graph free of crossed dependencies (i.e. the minimum number of arcs that participate in crossed dependencies). The example graph in Figure 10 illustrates this process. This graph contains the following crossed dependencies: (1, 3} crosses with {0, 2} and {2, 4}. By deleting {1, 3}, both crossed dependencies can be eliminated. The crossed dependency count for the graph in Figure 5 is thus &amp;quot;one&amp;quot;.  On average for the 135 annotated texts, 12.5% of arcs in a coherence graph have to be deleted in order to make the graph free of crossed dependencies (min.: 0%; max.: 44.4%; median: 10.9%). Seven texts out of 135 had no crossed</Paragraph>
      <Paragraph position="4"> dependencies. The mean number of arcs for the coherence graphs of these texts was 36.9 (min.: 8; max.: 69; median: 35). The mean number of arcs for the other 128 coherence graphs (those with crossed dependencies) was 125.7 (min.: 20; max.: 293; median: 115.5). Thus, the graphs with no crossed dependencies have significantly fewer arcs than those graphs that have crossed dependencies (kh  =15330.35; p &lt; 10  ). Text length is hence a likely explanation for why these seven texts had no crossed dependencies.</Paragraph>
      <Paragraph position="5"> Linear regressions show that the more arcs a graph has, the higher the number of crossed  ).</Paragraph>
      <Paragraph position="6"> Another important question is whether certain types of coherence relations participate more or less frequently in crossed dependencies than other types of coherence relations. In other words, the question is whether the frequency distribution over types of coherence relations is different for arcs participating in crossed dependencies compared to the overall frequency distribution over types of coherence relations in the whole database. Results from our database indicate that the overall distribution over types of coherence relations participating in crossed dependencies is not different from the distribution over types of coherence relations overall. This is confirmed by a linear regression, which shows a significant correlation between the two distributions of percentages (R  = 0.84; p &lt; .0001). Notice that the overall distribution includes only arcs with length greater than one, since arcs of length one could not participate in crossed dependencies.</Paragraph>
      <Paragraph position="7"> However, some types of coherence relations occur considerably less frequently in crossed dependencies than overall in the database. The proportion of same relations is 15.21 times greater, and the percentage of condition relations is 5.93 times greater overall than in crossed dependencies. We do not yet understand the reason for these differences, and plan to address this question in future research.</Paragraph>
      <Paragraph position="8"> Another question is how great the distance or arc length typically is between sentences that participate in crossed dependencies. It is possible, for instance, that crossed dependencies primarily involve long-distance arcs and that more local crossed dependencies are disfavored. However, the distribution over arc lengths is practically identical for the overall database and for coherence relations participating in crossed dependencies (R</Paragraph>
      <Paragraph position="10"> ), with short-distance relations being more frequent than long-distance relations for coherence relations overall as well as for those participating in crossed dependencies. The arc lengths are normalized in order to take into account the length of a text; the absolute length of an arc is divided by the maximum length that that arc could have, given its position in a text. Furthermore, we exclude arcs of (absolute) length 1 from the overall distribution, since such arcs could not participate in crossed dependencies.</Paragraph>
      <Paragraph position="11"> Taken together, statistical results on crossed dependencies suggest that crossed dependencies are too frequent to be ignored by accounts of coherence. Furthermore, the results suggest that any type of coherence relation can participate in a crossed dependency. However, there are some cases where knowing the type of coherence relation that an arc represents can be informative as to how likely that arc is to participate in a crossed dependency. The statistical results reported here also suggest that crossed dependencies occur primarily locally, as evidenced by the distribution over lengths of arcs participating in crossed dependencies.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Nodes with multiple parents
</SectionTitle>
      <Paragraph position="0"> Above we provided examples of coherence structure graphs that contain nodes with multiple parents. Nodes with multiple parents are another reason why trees are inadequate for representing natural language coherence structures. The mean in-degree (=mean number of parents) of all nodes in the investigated database of 135 texts is 1.6 (min.: 1; max.: 12; median: 1). 41% of all nodes in the database have an in-degree greater than 1. This suggests that even if a mechanism could be derived for representing crossed dependencies in (augmented) tree graphs, nodes with multiple parents present another significant problem for trees representing coherence structures. Results from our database indicate that the overall distribution over types of coherence relations ingoing to nodes with multiple parents is significantly correlated with the distribution over types of coherence relations overall (R  As for crossed dependencies, we also compared arc lengths. Here, we compared the length of arcs that are ingoing to nodes with multiple parents to the overall distribution of arc length. Again, we compared normalized arc lengths. By contrast to the comparison for crossed dependencies, we included arcs of (absolute) length 1 because such arcs can be ingoing to nodes with either single or multiple parents. The distribution over arc lengths is practically identical for the overall database and for arcs ingoing to nodes with multiple parents (R</Paragraph>
      <Paragraph position="2"> ), suggesting a strong locality bias for coherence relations overall as well as for those participating in crossed dependencies.</Paragraph>
      <Paragraph position="3"> In sum, statistical results on nodes with multiple parents suggest that they are a frequent phenomenon, and that they are not limited to certain kinds of coherence relations. Additionally, the statistical results reported here suggest that ingoing arcs to nodes with multiple parents are primarily local.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML