XML Viewer - e06-2031

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2031_metho.xml
Size: 4,582 bytes
Last Modified: 2025-10-06 14:10:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2031">
  <Title>Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels</Title>
  <Section position="4" start_page="207" end_page="208" type="metho">
    <SectionTitle>
3 Detecting spikes
</SectionTitle>
    <Paragraph position="0"> Our first task is to identify spikes in moods reported in blog posts. Many of the moods reported by LiveJournal users display a cyclic behavior.</Paragraph>
    <Paragraph position="1"> There are some obvious moods with a daily cycle.</Paragraph>
    <Paragraph position="2"> For instance, people feel awake in the mornings and tired in the evening (Figure 2). Other moods show a weekly cycle. For instance, people drink more at the weekends (Figure 3).</Paragraph>
    <Paragraph position="3">  Our idea of detecting spikes tries to deal with these cyclic events and aims at finding global changes. Let POSTS(mood,date,hour) be the number of posts labelled with a given mood and created within a one-hour interval at the specified date. Similarly, ALLPOSTS(date,hour) is the number of all posts created within the interval specified by the date and hour. The ratio of posts labeled with a given mood to all posts could be expressed for all days of a week (Sunday, ..., Saturday) and for all one-hour intervals (0, ..., 23) using the formula:</Paragraph>
    <Paragraph position="5"> where day = 0,...,6 andDW(date) is a day-ofthe-week function that returns 0, ..., 6 depending on the date argument.</Paragraph>
    <Paragraph position="6"> The level of a given mood is changed within a one-hour interval of a day, if the ratio of posts labelled with that mood to all posts, created within theinterval, issignificantlydifferentfromtheratio that has been observed on the same hour of the similar day of the week. Formally:</Paragraph>
    <Paragraph position="8"> If |D |(the absolute value of D) exceeds a threshold we conclude that a spike has occurred, while  the sign of D makes it possible to distinguish between positive and negative spikes. The absolute value of D expresses the degree of the peak.</Paragraph>
    <Paragraph position="9"> This method of identifying spikes allows us to look at a period of a few hours instead of only one, which is an effective smoothing method, especially if a sufficient number of posts cannot be observed for a given mood.</Paragraph>
  </Section>
  <Section position="5" start_page="208" end_page="208" type="metho">
    <SectionTitle>
4 Explaining peaks
</SectionTitle>
    <Paragraph position="0"> Our next task is to explain the peaks identified by the methods listed previously. We proceed in two steps. First, we discover features in the peaking interval which display a significantly different language usage from that found in the general language associated with the mood. Then we form queries using these &amp;quot;overused&amp;quot; words as well as the date(s) of the peaking interval and run these as queries against a news corpus.</Paragraph>
    <Section position="1" start_page="208" end_page="208" type="sub_section">
      <SectionTitle>
4.1 Overused words To discover the reasons
</SectionTitle>
      <Paragraph position="0"> underlying mood changes we use corpus-based techniques to identify changes in language usage.</Paragraph>
      <Paragraph position="1"> We compare two corpora: (1) the full set of blog posts, referred to as the standard corpus, and (2) a corpus associated with the peaking interval, referred to as the sample corpus.</Paragraph>
      <Paragraph position="2"> To compare word frequencies across the two corpora we apply the log-likelihood statistical test (Dunning, 1993). Let Oi be the observed frequency of a term, Ni its total frequency, and</Paragraph>
      <Paragraph position="4"> in corpus i (where i takes values 1 and 2 for the standard and sample corpus, respectively). Then, the log-likelihood value is calculated according to this formula: [?]2lnl = 2summationtexti Oi ln</Paragraph>
      <Paragraph position="6"/>
    </Section>
    <Section position="2" start_page="208" end_page="208" type="sub_section">
      <SectionTitle>
4.2 Finding explanations Given the start and
</SectionTitle>
      <Paragraph position="0"> end dates of a peaking interval and a list of overused words from this period, a query is formed. This query is then submitted to (headlines of) a news corpus. A headline is retrieved if it contains at least one of the overused words and is dated within the peaking interval or the day before the beginning of the peak. The hits are ranked based on the number of overused terms contained in the headline.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML