File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1007_concl.xml
Size: 2,800 bytes
Last Modified: 2025-10-06 13:53:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1007"> <Title>Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we proposed a machine learning approach for predicting inter-paragraph structure. Inferring inter-paragraph structure can be seen as a subtask of discourse parsing. While low-level discourse parsing relies to a large extent on cue phrases as predictors for rhetorical structure, these are less useful for high-level structure. As an alternative, word co-occurrence measures have been suggested.</Paragraph> <Paragraph position="1"> In this paper, we took a different approach and employed a machine learning approach to build a complex model of segment relatedness which was then combined with a clustering algorithm. The use of machine learning enabled us to combine contextual cues from several areas, such as word cooccurrence, lexical chains, changes in tense patterns, punctuation etc. Our model outperformed a word co-occurrence measure as well as left- or right-branching trees.</Paragraph> <Paragraph position="2"> In future work, we plan to extend our approach to predict rhetorical relations between paragraphs.</Paragraph> <Paragraph position="3"> While an empirical analysis revealed that one can achieve a relatively high accuracy by just predicting the most frequent relation, it is still worthwhile to investigate how much better one can do with more sophisticated methods. There is also clearly a relationship between structure and relation. For example, non-binary structures are more likely to be joined by a List relation than by an Explanation relation. Hence, inferring structure and predicting relations should be interleaved.</Paragraph> <Paragraph position="4"> It would also be interesting to investigate, whether it would be useful to relax the constraint that inter-paragraph structure is a tree with non-crossing branches. Some researchers have suggested that higher level discourse structure may be better represented if one allows crossing branches (Knott et al., 2001). In principle, the approach suggested here could be used to generate such structures if one removed the constraint that only adjacent segments can be merged.</Paragraph> <Paragraph position="5"> Finally, it remains to be seen to what extent our results carry over to other domains. So far, the RST-DT remains the only publicly available data set annotated with discourse structure but a larger corpus is currently annotated as part of the Penn Discourse Treebank project.3 It would be interesting to apply our methods to this data set as well.</Paragraph> </Section> class="xml-element"></Paper>