File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/c96-2100_abstr.xml

Size: 951 bytes

Last Modified: 2025-10-06 13:48:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2100">
  <Title>Good Bigrams</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> A desired property of a measure of connective strength in bigrams is that the measure should be insensitive to corpus size. This paper investigates the stability of three different measures over text genres and expansion of the corpus. The measures are (1) the commonly used mutual information, (2) the difference in mutual information, and (3) raw occurrence. Mutual information is further compared to using knowledge about genres to remove overlap between genres. This last approach considers the difference between two products of the same process (human text-generation) constrained by different genres. The cancellation of overlap seems to provide the most specific word pairs for each genre.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML