File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/w98-1210_concl.xml

Size: 1,451 bytes

Last Modified: 2025-10-06 13:58:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1210">
  <Title>Finding Structure via Compression</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> We have shown that a statistical language model may discover high-level structure in a data sequence by thresholding its instantaneous entropy. When this structure is used to augment the model, its compression performance improves. Although the example presented in this paper used a natural language corpus, we stress that these techniques are suited to the analysis of all kinds of data.</Paragraph>
    <Paragraph position="1"> We plan to investigate how much structure can be learned by the most trivial of language models.</Paragraph>
    <Paragraph position="2"> The upwrite process provides scaffolding which allows high-level structure to be found: we believe that a low-order language model which uses the binary alphabet may be able to find characters, then words, and eventually larger scale structures in natural language corpora.</Paragraph>
    <Paragraph position="3"> Methods to select appropriate entropic thresholds need to be investigated, and the application of entropic chunking to adaptive data compression systems is being explored and looks promising.</Paragraph>
    <Paragraph position="4"> Hutchens and Alder 81 Finding Structure via Compression</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML