XML Viewer - p06-1030

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1030_metho.xml
Size: 19,163 bytes
Last Modified: 2025-10-06 14:10:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1030">
  <Title>Automated Japanese Essay Scoring System based on Articles Written by Experts</Title>
  <Section position="5" start_page="235" end_page="236" type="metho">
    <SectionTitle>
4. Kanji/kana ratio
</SectionTitle>
    <Paragraph position="0"> To simplify text and make it easier to read, a writer will generally reduce kanji (Chinese characters) intentionally. In fact, an appropriate range for the kanji/kana ratio in essays is thought to exist, and this range is taken to be an evaluation index. The kanji/kana ratio is also thought to be one aspect of style.</Paragraph>
    <Paragraph position="1"> 5. Number of attributive declined or conjugated words (embedded sentences) The declined or conjugated forms of attributive modifiers indicate the existence of &amp;quot;embedded sentences,&amp;quot; and their quantity is thought to affect ease of understanding.</Paragraph>
    <Paragraph position="2"> 6. Maximum number of consecutive infinitive-form or conjunctive-particle clauses Consecutive infinitive-form or conjunctive-particle clauses, if many, are also thought to affect ease of understanding. Note that not this &amp;quot;average size&amp;quot; but &amp;quot;maximum number&amp;quot; of consecutive infinitive-form or conjunctive-particle clauses holds significant meaning as an indicator of the depth of dependency affecting ease of understanding.</Paragraph>
    <Section position="1" start_page="235" end_page="235" type="sub_section">
      <SectionTitle>
2.2 Diversity of vocabulary
</SectionTitle>
      <Paragraph position="0"> Yule (1944) used a variety of statistical quantities in his analysis of writing. The most famous of these is an index of vocabulary concentration called the a0 characteristic value. The value of a0 is non-negative, increases as vocabulary becomes more concentrated, and conversely, decreases as vocabulary becomes more diversified. The median values of a0 for editorials and columns in the Mainichi Daily News were found to be 87.3 and 101.3, respectively. Incidentally, other characteristic values indicating vocabulary concentration have been proposed. See Tweedie et al. (1998), for example.</Paragraph>
    </Section>
    <Section position="2" start_page="235" end_page="235" type="sub_section">
      <SectionTitle>
2.3 Percentage of big words
</SectionTitle>
      <Paragraph position="0"> It is thought that the use of big words, to whatever extent, cannot help but impress the reader.</Paragraph>
      <Paragraph position="1"> On investigating big words in Japanese, however, care must be taken because simply measuring the length of a word may lead to erroneous conclusions. While &amp;quot;big word&amp;quot; in English is usually synonymous with &amp;quot;long word,&amp;quot; a word expressed in kanji becomes longer when expressed in kana characters. That is to say, a &amp;quot;small word&amp;quot; in Japanese may become a big word simply due to notation. The number of characters in a word must therefore be counted after converting it to kana characters (i.e., to its &amp;quot;reading&amp;quot;) to judge whether that word is big or small. In editorials from the Mainichi Daily News, the median number of characters in nouns after conversion to kana was found to be 4, with 5 being the 3rd quartile (upper 25%). We therefore assumed for the time being that nouns having readings of 6 or more characters were big words, and with this as a guideline, we again measured the percentage of nouns in a document that were big words. Since the number of characters in a reading is an integer value, this percentage would not necessarily be 25%, but a distribution that takes a value near that percentage on average can be obtained.</Paragraph>
    </Section>
    <Section position="3" start_page="235" end_page="236" type="sub_section">
      <SectionTitle>
2.4 Percentage of passive sentences
</SectionTitle>
      <Paragraph position="0"> It is generally felt that text should be written in active voice as much as possible, and that text with many passive sentences is poor writing (Knuth et al., 1988). For this reason, the percentage of passive sentences is also used as an index of rhetoric.</Paragraph>
      <Paragraph position="1"> Grammatically speaking, passive voice is distinguished from active voice in Japanese by the auxiliary verbs &amp;quot;reru&amp;quot; and &amp;quot;rareru&amp;quot;. In addition to passivity, however, these two auxiliary verbs can also indicate respect, possibility, and spontaneity. In fact, they may be used to indicate respect even in the case of active voice. This distinction, however, while necessary in analysis at the semantic level, is not used in morphological analysis and syntactic analysis. For example, in the case that the object  of respect is &amp;quot;teacher&amp;quot; (sensei) or &amp;quot;your husband&amp;quot; (goshujin), the use of &amp;quot;reru&amp;quot; and &amp;quot;rareru&amp;quot; auxiliary verbs here would certainly indicate respect. This meaning, however, belongs entirely to the world of semantics. We can assume that such an indication of respect would not be found in essays required for tests, and consequently, that the use of &amp;quot;reru&amp;quot; and &amp;quot;rareru&amp;quot; in itself would indicate the passive voice in such an essay.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="236" end_page="237" type="metho">
    <SectionTitle>
3 Organization
</SectionTitle>
    <Paragraph position="0"> Comprehending the flow of a discussion is essential to understanding the connection between various assertions. To help the reader to catch this flow, the frequent use of conjunctive expressions is useful. In Japanese writing, however, the use of conjunctive expressions tends to alienate the reader, and such expressions, if used at all, are preferably vague. At times, in fact, presenting multiple descriptions or posing several questions seeped in ambiguity can produce interesting effects and result in a beautiful passage (Noya, 1997). In essays tests, however, examinees are not asked to come up with &amp;quot;beautiful passages.&amp;quot; They are required, rather, to write logically while making a conscious effort to use conjunctive expressions. We therefore attempt to determine the logical structure of a document by detecting the occurrence of conjunctive expressions. In this effort, we use a method based on cue words as described in Quirk et al. (1985) for measuring the organization of a document. This method, which is also used in e-rater, the basis of our system, looks for phrases like &amp;quot;in summary&amp;quot; and &amp;quot;in conclusion&amp;quot; that indicate summarization, and words like &amp;quot;perhaps&amp;quot; and &amp;quot;possibly&amp;quot; that indicate conviction or thinking when examining a matter in depth, for example.</Paragraph>
    <Paragraph position="1"> Now, a conjunctive relationship can be broadly divided into &amp;quot;forward connection&amp;quot; and &amp;quot;reverse connection.&amp;quot; &amp;quot;Forward connection&amp;quot; has a rather broad meaning indicating a general conjunctive structure that leaves discussion flow unchanged. In contrast, &amp;quot;reverse connection&amp;quot; corresponds to a conjunctive relationship that changes the flow of discussion. These logical structures can be classified as follows according to Noya (1997). The &amp;quot;forward connection&amp;quot; structure comes in the following types.</Paragraph>
    <Paragraph position="2"> Addition: A conjunctive relationship that adds emphasis. A good example is &amp;quot;in addition,&amp;quot; while other examples include &amp;quot;moreover&amp;quot; and &amp;quot;rather.&amp;quot; Abbreviation of such words is not infrequent.</Paragraph>
    <Paragraph position="3"> Explanation: A conjunctive relationship typified by words and phrases such as &amp;quot;namely,&amp;quot; &amp;quot;in short,&amp;quot; &amp;quot;in other words,&amp;quot; and &amp;quot;in summary.&amp;quot; It can be broken down further into &amp;quot;summarization&amp;quot; (summarizing and clarifying what was just described), &amp;quot;elaboration&amp;quot; (in contrast to &amp;quot;summarization,&amp;quot; begins with an overview followed by a detailed description), and &amp;quot;substitution&amp;quot; (saying the same thing in another way to aid in understanding or to make a greater impression).</Paragraph>
    <Paragraph position="4"> Demonstration: A structure indicating a reason-consequence relation. Expressions indicating a reason include &amp;quot;because&amp;quot; and &amp;quot;the reason is,&amp;quot; and those indicating a consequence include &amp;quot;as a result,&amp;quot; &amp;quot;accordingly,&amp;quot; &amp;quot;therefore,&amp;quot; and &amp;quot;that is why.&amp;quot; Conjunctive particles in Japanese like &amp;quot;node&amp;quot; (since) and &amp;quot;kara&amp;quot; (because) also indicate a reason-consequence relation.</Paragraph>
    <Paragraph position="5"> Illustration: A conjunctive relationship most typified by the phrase &amp;quot;for example&amp;quot; having a structure that either explains or demonstrates by example.</Paragraph>
    <Paragraph position="6"> The &amp;quot;reverse connection&amp;quot; structure comes in the following types.</Paragraph>
    <Paragraph position="7"> Transition: A conjunctive relationship indicating a change in emphasis from A to B expressed by such structures as &amp;quot;A ..., but B...&amp;quot; and &amp;quot;A...; however, B...).</Paragraph>
    <Paragraph position="8"> Restriction: A conjunctive relationship indicating a continued emphasis on A. Also referred to as a &amp;quot;proviso&amp;quot; structure typically expressed by &amp;quot;though in fact&amp;quot; and &amp;quot;but then.&amp;quot; Concession: A type of transition that takes on a conversational structure in the case of concession or compromise. Typical expressions indicating this relationship are &amp;quot;certainly&amp;quot; and &amp;quot;of course.&amp;quot; Contrast: A conjunctive relationship typically expressed by &amp;quot;at the same time,&amp;quot; &amp;quot;on the other hand,&amp;quot; and &amp;quot;in contrast.&amp;quot; We extracted all a0a2a1 a3a5a4a7a6a9a8 phrases indicating conjunctive relationships from editorials of the Mainichi Daily News, and classified them into the above four categories for forward connection and  those for reverse connection for a total of eight exclusive categories. In Jess, the system attaches labels to conjunctive relationships and tallies them to judge the strength of the discourse in the essay being scored. As in the case of rhetoric, Jess learns what an appropriate number of conjunctive relationships should be from editorials of the Mainichi Daily News, and deducts from the initially allotted points in the event of an outlier value in the model distribution.</Paragraph>
    <Paragraph position="9"> In the scoring, we also determined whether the pattern in which these conjunctive relationships appeared in the essay was singular compared to that in the model editorials. This was accomplished by considering a trigram model (Jelinek, 1991) for the appearance patterns of forward and reverse connections. In general, an a0 -gram model can be represented by a stochastic finite automaton, and in a trigram model, each state of an automaton is labeled by a symbol sequence of length 2. The set of symbols here is a1 a1 a0a3a2a5a4 forwardconnection, a6 a4 reverse-connection a1 . Each state transition is assigned a conditional output probability as shown in Table 1. The symbol a7 here indicates no (prior) relationship. The initial state is shown as a7a8a7 . For example, the expression  In this way, the probability of occurrence of certain a0a3a2a70a4 forward-connection a1 and a0 a6 a4 reverse-connection a1 patterns can be obtained by taking the product of appropriate conditional probabilities listed in Table 1. For example, the probability of occurrence a71 of the pattern a0a3a2a73a72 a6 a72a74a2a75a72a74a2 a1 turns out to be a76a78a77a80a79a81a79a83a82a84a76a78a77 a6a7a4a47a82a8a76a78a77 a6a7a6a11a82a8a76a78a77 a4a86a85 a1a87a76a78a77a88a76a86a89a7a6 . Furthermore, given that the probability of a0a3a2 a1 appearing without prior information is 0.47 and that of a0 a6 a1 appearing without prior information is 0.53, the probability a90 that a forward connection occurs three times and a reverse connection once under the condition of no prior information would be a76a78a77a80a79a62a91a86a92a93a82a42a76a78a77 a6a81a89 a1 a76a78a77a88a76a9a6a7a6 . As shown by this example, an occurrence probability that is greater for no prior information would indicate that the forward-connection and reverse-connection appearance pattern is singular, in which case the points initially allocated to conjunctive relationships in a discussion would be reduced. The trigram model may overcome the restrictions that the essay should be written in a pyramid structure or the reversal.</Paragraph>
  </Section>
  <Section position="7" start_page="237" end_page="237" type="metho">
    <SectionTitle>
4 Content
</SectionTitle>
    <Paragraph position="0"> A technique called latent semantic indexing can be used to check whether the content of a written essay responds appropriately to the essay prompt.</Paragraph>
    <Paragraph position="1"> The usefulness of this technique has been stressed at the Text REtrieval Conference (TREC) and elsewhere. Latent semantic indexing begins after performing singular value decomposition on a94a56a82a30a95 term-document matrix a96 (a94 a4 number of words;  a4 number of documents) indicating the frequency of words appearing in a sufficiently large number of documents. Matrix a96 is generally a huge sparse matrix, and SVDPACK (Berry, 1992) is known to be an effective software package for performing singular value decomposition on a matrix of this type. This package allows the use of eight different algorithms, and Ishioka and Kameda (1999) give a detailed comparison and evaluation of these algorithms in terms of their applicability to Japanese documents. Matrix a96 must first be converted to the Harwell-Boeing sparse matrix format (Duff et al., 1989) in order to use SVDPACK. This format can store the data of a sparse matrix in an efficient manner, thereby saving disk space and significantly decreasing data read-in time.</Paragraph>
  </Section>
  <Section position="8" start_page="237" end_page="238" type="metho">
    <SectionTitle>
5 Application
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="237" end_page="238" type="sub_section">
      <SectionTitle>
5.1 An E-rater Demonstration
</SectionTitle>
      <Paragraph position="0"> An e-rater demonstration can be viewed at www.ets.org, where by clicking &amp;quot;Productsa97 e-rater Homea97 Demo.&amp;quot; In this demonstration, seven response patterns (seven essays) are evaluated.</Paragraph>
      <Paragraph position="1"> The scoring breakdown, given a perfect score of six, was one each for scores of 6, 5, 4, and 2 and three for a score of 3.</Paragraph>
      <Paragraph position="2"> We translated essays A-to-G on that Web site into Japanese and then scored them using Jess, as shown in Table 2.</Paragraph>
      <Paragraph position="3"> The second and third columns show e-rater and Jess scores, respectively, and the fourth column shows the number of characters in each essay.</Paragraph>
      <Paragraph position="4">  A perfect score in Jess is 10 with 5 points allocated to rhetoric, 2 to organization, and 3 to content as standard. For purposes of comparison, the Jess score converted to e-rater's 6-point system is shown in parentheses. As can be seen here, essays given good scores by e-rater are also given good scores by Jess, and the two sets of scores show good agreement. However, e-rater (and probably human raters) tends to give more points to longer essays despite similar writing formats. Here, a difference appears between e-rater and Jess, which uses the point-deduction system for scoring. Examining the scores for essay C, for example, we see that e-rater gave a perfect score of 6, while Jess gave only a score of 5 after converting to e-rater's 6-point system. In other words, the length of the essay could not compensate for various weak points in the essay under Jess's point-deduction system. The fifth column in Table 2 shows the processing time (CPU time) for Jess. The computer used was Plat'Home Standard System 801S using an 800-MHz Intel Pentium III running RedHat 7.2. The Jess program is written in C shell script, jgawk, jsed, and C, and comes to just under 10,000 lines. In addition to the ChaSen morphological analysis system, Jess also needs the kakasi kanji/kana converter program (http://kakasi.namagu.org/) to operate. At present, it runs only on UNIX. Jess can be executed on the Web at http://coca.rd.dnc.ac.jp/jess/.</Paragraph>
    </Section>
    <Section position="2" start_page="238" end_page="238" type="sub_section">
      <SectionTitle>
5.2 An Example of using a Web Entry Sheet
</SectionTitle>
      <Paragraph position="0"> Four hundred eighty applicants who were eager to be hired by a certain company entered their essays using a Web form without a time restriction, with the size of the text restricted implicitly by the Web screen, to about 800 characters. The theme of the essay was &amp;quot;What does working mean in your life.&amp;quot; Table 3 summarizes the correlation coefficients between the Jess score, average score of expert raters, and score of the linguistic understanding test (LUT), developed by Recruit Management Solutions Co., Ltd. The LUT is designed to measure the ability to grasp the correct meaning of words that are the elements of a sentence, and to understand the composition and the summary of a text. Five expert raters reted the essays, and three of these scored each essay independently.</Paragraph>
      <Paragraph position="1">  We found that the correlation between the Jess score and the average of the expert raters' scores is not small (0.57), and is larger than the correlation coefficient between the expert raters' scores of 0.48. That means that Jess is superior to the expert raters on average, and is substitutable for them. Note that the restriction of the text size (800 characters in this case) caused the low correlation owing to the difficulty in evaluating the organization and the development of the arguments; the essay scores even in expert rater tend to be dispersed. We also found that neither the expert raters nor Jess, had much correlation with LUT, which shows that LUT does not reflect features indicating writing ability. That is, LUT measures quite different laterals from writing ability.</Paragraph>
      <Paragraph position="2"> Another experiment using 143 universitystudents' essays collected at the National Institute for Japanese Language shows a similar result: for the essays on &amp;quot;smoking,&amp;quot; the correlation between Jess and the expert raters was 0.83, which is higher than the average correlation of expert raters (0.70); for the essays on &amp;quot;festivals in Japan,&amp;quot; the former is 0.84, the latter, 0.73. Three of four raters graded each essay independently.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML