File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0905_intro.xml

Size: 2,732 bytes

Last Modified: 2025-10-06 14:00:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0905">
  <Title>Verb Subcategorization Frequency Differences between Business- News and Balanced Corpora: The Role of Verb Sense IDouglas Roland, ~&amp;quot;Danid Jurafsky, &amp;quot;3Lise Menn,'Susanne Gahl, IElizabeth Elder and IChris</Title>
  <Section position="3" start_page="28" end_page="28" type="intro">
    <SectionTitle>
2 Verb Frequency
</SectionTitle>
    <Paragraph position="0"> Because word frequency is known to vary with corpus genre, we used the frequency differences for our target verbs as a measure of corpus difference. We would expect factors such as corpus genre (Business for WSJ vs. mixed for BNC and Brown), American vs. British English, and the era the corpus sample was taken in to influence word frequency.</Paragraph>
    <Paragraph position="1"> We calculated the frequencies tbr each verb, and used Chi Square to test whether the difference in frequency was significant for each corpus pairing. We then counted the number of verbs that showed a significant difference using p = 0.05 as a cut-off poim: This result is shown in Table 2. Although there were verbs that had a significant difference in distribution between the two mixed genre corpora (BNC, Brown), there were more differences in word frequency between the general corpora and the business corpus. The difference between the BNC/Brown comparison and the BNC and Brown vs. WSJ comparison is significant (Chi Square, p &lt; .01).</Paragraph>
    <Paragraph position="2"> BNC vs Brown BNC vs WSJ Brown vs WSJ 30/64 46/64 46/64 Table 2 - Number of verbs showing a significant difference in frequency between corpora.</Paragraph>
    <Paragraph position="3"> Table 3 shows the list of words that were significantly more frequent in both of the general corpora than they were in the business oriented corpus. Notice that most of the verbs describe leisure activities.</Paragraph>
    <Paragraph position="4"> amuse, boil, burst, dance, dL~turb, entertain, frighten, bang, harden, hurry', impress, knit, lean, paint, play, race, sail, stand, tempt, walk, wander, wash, watch Table 3 - Verbs which BNC and Brown both have more of than WSJ: Alternatively, when one looks at the words that had a significantly higher frequency in the WSJ corpus than in either of the other corpora (Table 4), one finds predominately verbs that can describe stock price changes and business transactions.</Paragraph>
    <Paragraph position="5"> adjust, advance, crumble, drop, elect, fall, grow, jump, merge, quote, rise, shrink, shut, slip,,, Table 4 - Verbs which WSJ has more of than both Brown and WSJ: We are currently examining the nature of the differences between the British and American corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML