File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1026_intro.xml
Size: 1,696 bytes
Last Modified: 2025-10-06 14:01:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1026"> <Title>Entropy Rate Constancy in Text</Title> <Section position="2" start_page="0" end_page="2" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> It is well-known from Information Theory that the most e cient way to send information through noisy channels is at a constant rate. If humans try to communicate in the most e cient way, then they must obey this principle. The communication medium we examine in this paper is text, and we present some evidence that this principle holds here.</Paragraph> <Paragraph position="1"> Entropy is a measure of information rst proposed by Shannon (1948). Informally, entropy of a random variable is proportional to the di culty of correctly guessing the value of this variable (when the distribution is known). Entropy is the highest when all values are equally probable, and is lowest (equal to 0) when one of the choices has probability of 1, i.e. deterministically known in advance.</Paragraph> <Paragraph position="2"> In this paper we are concerned with entropy of English as exhibited through written text, though these results can easily be extended to speech as well. The random variable we deal with is therefore a unit of text (a word, for our purposes ) that a random person who has produced all the previous words in the text stream is likely to produce next. We have as many random variables as we have words in a text. The distributions of these variables are obviously different and depend on all previous words produced. We claim, however, that the entropy of these random variables is on average the same .</Paragraph> </Section> class="xml-element"></Paper>