File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/p95-1015_concl.xml

Size: 2,818 bytes

Last Modified: 2025-10-06 13:57:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1015">
  <Title>Combining Multiple Knowledge Sources for Discourse Segmentation</Title>
  <Section position="7" start_page="113" end_page="113" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented two methods for developing segmentation hypotheses using multiple linguistic features. The first method hand tunes features and algorithms based on analysis of training errors. The second method, machine learning, automatically induces decision trees from coded corpora. Both methods rely on an enriched set of input features compared to our previous work. With each method, we have achieved marked improvements in performance compared to our previous work and are approaching human performance. Note that quantitatively, the machine learning results are slightly better than the hand tuning results. The main difference on average performance is the higher precision of the automated algorithm. Furthermore, note that the machine learning algorithm used the changes to the coding features that resulted from the tuning methods. This suggests that hand tuning is a useful method for understanding how to best code the data, while mschine learning provides an effective (and automatic) way to produce an algorithm given a good feature representation.</Paragraph>
    <Paragraph position="1"> Our results lend further support to the hypothesis that linguistic devices correlate with discourse structure (cf. section 2.1), which itself has practical import. Understanding systems could infer segments as a step towards producing summaries, while generation systems could signal segments to increase comprehensibility/Our results also suggest that to best identify or convey segment boundaries, systems will need to exploit multiple signals simultaneously.</Paragraph>
    <Paragraph position="2"> We plan to continue our experiments by further merging the automated and analytic techniques, and evaluating new algorithms on our final test corpus.</Paragraph>
    <Paragraph position="3"> Because we have already used cross-validation, we do not anticipate significant degradation on new test narratives. An important area for future research is to develop principled methods for identifying distinct speaker strategies pertaining to how they signal segments. Performance of individual speakers varies widely as shown by the high standard deviations in our tables. The original NP, hand tuned, and machine learning algorithms all do relatively poorly on narrative 16 and relatively well on 11 (both in the test set) under all conditions. This lends support to the hypothesis that there may be consistent differences among speakers regarding strategies for signaling shifts in global discourse structure.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML