XML Viewer - p06-1030

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1030_intro.xml
Size: 3,554 bytes
Last Modified: 2025-10-06 14:03:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1030">
  <Title>Automated Japanese Essay Scoring System based on Articles Written by Experts</Title>
  <Section position="4" start_page="234" end_page="235" type="intro">
    <SectionTitle>
2 Rhetoric
</SectionTitle>
    <Paragraph position="0"> As metrics to portray rhetoric, Jess uses (1) ease of reading, (2) diversity of vocabulary, (3) percentage of big words (long, difficult words), and (4) percentage of passive sentences, in accordance with Maekawa (1995) and Nagao (1996). These metrics are broken down further into various statistical quantities in the following sections. The distributions of these statistical quantities were obtained from the editorials and columns stored on the Mainichi Daily News CD-ROMs.</Paragraph>
    <Paragraph position="1"> Though most of these distributions are asymmetrical (skewed), they are each treated as a distribution of an ideal essay. In the event that a score (obtained statistical quantity) turns out to be an outlier value with respect to such an ideal distribution, that score is judged to be &amp;quot;inappropriate&amp;quot; for that metric. The points originally allotted to the metric are then reduced, and a comment to that effect is output. An &amp;quot;outlier&amp;quot; is an item of data more than 1.5 times the interquartile range.</Paragraph>
    <Paragraph position="2"> (In a box-and-whisker plot, whiskers are drawn up to the maximum and minimum data points within 1.5 times the interquartile range.) In scoring, the relative weights of the broken-down metrics are equivalent with the exception of &amp;quot;diversity of vocabulary,&amp;quot; which is given a weight twice that of the others because we consider it an index contributing to not only &amp;quot;rhetoric&amp;quot; but to &amp;quot;content&amp;quot; as well.</Paragraph>
    <Section position="1" start_page="234" end_page="235" type="sub_section">
      <SectionTitle>
2.1 Ease of reading
</SectionTitle>
      <Paragraph position="0"> The following items are considered indexes of &amp;quot;ease of reading.&amp;quot; 1. Median and maximum sentence length Shorter sentences are generally assumed to make for easier reading (Knuth et al., 1988). Many books on writing in the Japanese language, moreover, state that a sentence should be no longer than 40 or 50 characters. Median and maximum sentence length can therefore be treated as an index. The reason the median value is used as opposed to the average is that sentence-length distributions are skewed in most cases. The relative weight used in the evaluation of median and maximum sentence length is equivalent to that of the indexes described below. Sentence length  is also known to be quite effective for determining style.</Paragraph>
      <Paragraph position="1"> 2. Median and maximum clause length  In addition to periods (.), commas (,) can also contribute to ease of reading. Here, text between commas is called a &amp;quot;clause.&amp;quot; The number of characters in a clause is also an evaluation index.</Paragraph>
      <Paragraph position="2"> 3. Median and maximum number of phrases in clauses A human being cannot understand many things at one time. The limit of human short-term memory is said to be seven things in general, and that is thought to limit the length of clauses. Actually, on surveying the number of phrases in clauses from editorials in the Mainichi Daily News, we found it to have a median of four, which is highly compatible with the short-term memory maximum of seven things.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML