File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/w93-0310_evalu.xml

Size: 7,167 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0310">
  <Title>Computation of Word Associations Based on the Co-Occurences of Words in Large Corpora I</Title>
  <Section position="7" start_page="86" end_page="633" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> In table 1 a few sample association lists as predicted by our system are compared to the associative responses as given by the subjects in the Russell ~ Jenkins experiment. A complete list of the predicted and observed responses is given in table 2. It shows for all 100 stimulus words used in the association experiment conducted by Russell &amp; Jenkins, a) their corpus frequency, b) the primary response, i.e. the most frequent response given by the subjects, c) the number of subjects who gave the primary response, d) the predicted response and e) the number of subjects who gave the predicted response.</Paragraph>
    <Paragraph position="1"> The valuation of the predictions has to take into account that association norms are conglomerates of the answers of different subjects which differ considerably from each other.</Paragraph>
    <Paragraph position="2"> A satisfactory prediction would be proven if the difference between the predicted and the observed responses were about equal to the difference between an average subject and the rest of the subjects. The following interpretations look for such correspondences.</Paragraph>
    <Paragraph position="3"> For 17 out of the 100 stimulus words the predicted response is equal to the observed primary response. Tiffs compares to an average of 37 primary responses given by a subject in the Russell &amp; Jenkins experiment. A slightly better result is obtained for the correspondence between the predicted and the observed associations when it is considered, how many  dicted and the ten most frequent observed responses for four stimulus words, rij was computed according to formula 6.</Paragraph>
    <Paragraph position="4"> subjects had given the predicted response: Averaged over all stimulus words and all subjects, a predicted response was given by 12.6% of the subjects. By comparison, an associative response of an arbitrary subject was given by 21.9% of the remaining subjects.</Paragraph>
    <Paragraph position="5"> When only those 27 stimulus words are considered, whose primary response was given by at least 500 subjects, an arbitrary response was given by 45.5% of the subjects on average. By comparison, the predicted response to one of these 27 stimulus words was given by 32.6% of the subjects. This means, that for stimulus words where the variation among subjects is small, the predictions improve.</Paragraph>
    <Paragraph position="6"> On the other hand, 35 of the predicted responses were given by no subject at all, whereas an average subject gives only 5.9 out of 100 responses that are given by no other subject. In about half of the cases we attribute this poor performance to the lack of representativity of the corpus. For example, the predictions combustion to the stimulus bed or brokerage to house can be explained by specific verbal usage in the DOE scientific abstracts respectively in the Wall Street Journal.</Paragraph>
    <Paragraph position="7"> In most other cases instead of paradigmatic associations (words that are used in similar contexts) syntagmatic associations (words that are often used together) are predicted. Examples are the prediction of term to the stimulus long, where most subjects answered with short, or the prediction of folk to music, where most subjects responded with song.</Paragraph>
    <Paragraph position="8">  Table 2, part 1. Observed and predicted associative responses to stimulus words 1 to 50. The abbreviations in the headline mean: stim = stimulus word; freq = corpus frequency of stimulus word; par = primary associative response; f (pax) = number of subjects who gave the primary associative response; pred -- predicted associative response; f (pred) = number of subjects who gave the predicted assocdative response.  Using the corpora listed in section 4, the same simulation as described above was conducted for German. For the computation of the associative strengths, again formula 6 was used. For optimal results, only a small adjustment had to be made to parameter alpha (from 0.66 to 0.68). However, a significant change was necessary for parameters/~ and 7, which again for ease of parameter optimization were assumed to be identical. ~ and 7 had to be reduced by a factor of approximately four from a value of 0.00002 to a value of 0.000005. Apart from these parameters, nothing was changed in the algorithm.</Paragraph>
    <Paragraph position="9"> Table 3 compares the quantitative results as given above for both languages. The figures can be interpreted as follows: With an average of 21.9% of the other subjects giving the same response as an arbitrary subject, the variation among subjects is much smaller in English than it is in German (8.7%). This is reflected in the simulation results, where both figures (12.6% and 6.9%) have a similar ratio, however at a lower level. This observation is confirmed when only stimuli with low variation of the associative responses are considered. In both languages, the decrease in variation is in about the same order of magnitude for experiment and simulation. Overall, the simulation results are somewhat better for German than they are for English. This may be surprising, since with a total of 33 million words the English corpus is larger than the German with 21 million words. However, if one has a closer look at the texts, it becomes clear, that the German corpus, by incorporating popular newspapers and spoken language, is clearly more representative to everyday language.</Paragraph>
    <Paragraph position="10"> Description percentage of subjects who give the predicted associative response percentage of other subjects who give the response of an arbitrary subject percentage of subjects who give the predicted associative response for stimuli with little response variation&amp;quot; percentage of other subjects who give the response of an arbitrary subject for stimuli with little response variation* percentage of cases where the predicted response is identical to the observed primary response percentage of cases where the response of an arbitrary subject is identical to the observed primary response percentage of cases where the predicted response is given by no subject deg* percentage of cases where the response of an arbitrary subject is given by no other subject**  Notes: &amp;quot;) little response variation is defined slightly different for English and German: in the English study, only thoee 27 stimulus words are considered, whose primary response is given by at least 500 out of 1008 subjects. In the German study, only those 25 stimulus words are taken into account, wh~e primary response is given by at least 100 out of 331 subjects. **) for comparison of English and German experimental figures, it should be kept in mind, that the American experiment was conducted with 1008, but the German experiment with only 331 subjects. 93.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML