File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0405_concl.xml

Size: 1,992 bytes

Last Modified: 2025-10-06 13:54:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0405">
  <Title>Feature-Based Segmentation of Narrative Documents</Title>
  <Section position="7" start_page="38" end_page="38" type="concl">
    <SectionTitle>
6 Discussion and Summary
</SectionTitle>
    <Paragraph position="0"> Based on properties of narrative text, we proposed and investigated a set of features for segmenting narrative text. We posed the problem of segmentation as a feature-based classi cation problem, which presented a number of challenges: many different feature sources, generalization from outside resources for sparse data, and feature extraction from non-traditional information sources.</Paragraph>
    <Paragraph position="1"> Feature selection and analyzing feature interaction is crucial for this type of application. The paragraph feature has perfect recall in that all boundaries occur at paragraph boundaries. Surprisingly, for certain train/test splits of the data, the performance of the algorithm was actually better without the paragraph feature than with it. We hypothesize that the noisiness of the data is causing the classi er to learn incorrect correlations.</Paragraph>
    <Paragraph position="2"> In addition to feature selection issues, posing the problem as a classi cation problem loses the sequential nature of the data. This can produce very unlikely segment lengths, such as a single sentence.</Paragraph>
    <Paragraph position="3"> We alleviated this by selecting features that capture properties of the sequence. For example, the entity chains features represent some of this type of information. However, models for complex sequential data should be examined as possible better methods.</Paragraph>
    <Paragraph position="4"> We evaluated our algorithm on two books and encyclopedia articles, observing signi cantly better performance than randomly selecting the correct number of segmentation points, as well as two popular, previous approaches, PLSA and TextTiling.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML