File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/w98-1210_concl.xml
Size: 1,451 bytes
Last Modified: 2025-10-06 13:58:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1210"> <Title>Finding Structure via Compression</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Conclusion and Future Work </SectionTitle> <Paragraph position="0"> We have shown that a statistical language model may discover high-level structure in a data sequence by thresholding its instantaneous entropy. When this structure is used to augment the model, its compression performance improves. Although the example presented in this paper used a natural language corpus, we stress that these techniques are suited to the analysis of all kinds of data.</Paragraph> <Paragraph position="1"> We plan to investigate how much structure can be learned by the most trivial of language models.</Paragraph> <Paragraph position="2"> The upwrite process provides scaffolding which allows high-level structure to be found: we believe that a low-order language model which uses the binary alphabet may be able to find characters, then words, and eventually larger scale structures in natural language corpora.</Paragraph> <Paragraph position="3"> Methods to select appropriate entropic thresholds need to be investigated, and the application of entropic chunking to adaptive data compression systems is being explored and looks promising.</Paragraph> <Paragraph position="4"> Hutchens and Alder 81 Finding Structure via Compression</Paragraph> </Section> class="xml-element"></Paper>