XML Viewer - w03-0208

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0208_metho.xml
Size: 15,261 bytes
Last Modified: 2025-10-06 14:08:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0208">
  <Title>Automatic Evaluation of Students' Answers using Syntactically Enhanced LSA</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 LSA in Intelligent Tutoring Systems
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 A Brief Introduction to LSA
</SectionTitle>
      <Paragraph position="0"> LSA is a statistical-algebraic technique for extracting and inferring contextual usage of words in documents (Landauer et al., 1998). A document can be a sentence, a paragraph or even a larger unit of text. It consists of first constructing a word-document co-occurrence matrix, scaling and normalizing it with a view to discriminate the importance of words across documents and then approximating it using singular value decomposition(SVD) in R dimensions (Bellegarda, 2000). It is this dimensionality reduction step through SVD that captures mutual implications of words and documents and allows us to project any text unit whether a word, a sentence or a paragraph as a vector on the latent &amp;quot;semantic&amp;quot; space. Then any two documents can be compared by calculating the cosine measure between their projection vectors in this space.</Paragraph>
      <Paragraph position="1"> LSA has been applied to model various ITS related phenomena in cognitive science e.g. judgment of essay quality scores (Landauer et al., 1998), assessing student knowledge by evaluating their answers to questions etc (Graesser et al., 2000), deciding tutoring strategy (Lemaire, 1999). It has been also used to derive a statistical language model for large vocabulary continuous speech recognition task (Bellegarda, 2000).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 LSA based ITS's
</SectionTitle>
      <Paragraph position="0"> Researchers have long been attempting to develop a computer tutor that can interact naturally with students to help them understand a particular subject. Unfortunately, however, language and discourse have constituted a serious barrier in these efforts. But recent technological advances in the areas of latent semantic processing of natural language, world knowledge representation, multimedia interfaces etc have made it possible for various teams of researchers to develop ITS's that approach human performance. Some of these are briefly reviewed below.</Paragraph>
      <Paragraph position="1">  AutoTutor task (Graesser et al., 1999) was developed at Tutoring Research Group of University of Memphis.</Paragraph>
      <Paragraph position="2"> AutoTutor is a fully automated computer tutor that assists students in learning about hardware, operating systems and the Internet in an introductory computer literacy course. AutoTutor presents questions and problems from a curriculum script, attempts to comprehend learner contributions that are entered by keyboard, formulates dialog moves that are sensitive to the learner's contributions (such as prompts, elaborations, corrections and hints), and delivers the dialog moves with a talking head. LSA is a major component of the mechanism that evaluates the quality of student contributions in the tutorial dialog. It was found that the performance of LSA in terms of evaluating answers from college students was equivalent to an intermediate expert human evaluator.</Paragraph>
      <Paragraph position="3">  Intelligent essay assessor (Foltz et al., 1999) uses LSA for automatic scoring of short essays that would be used in any kind of content-based courses. Student essays are characterized by LSA representations of the meaning of their contained words and compared with pre-graded essays on degree of conceptual relevance and amount of relevant content by means of two kinds of scores: (1) the holistic score, the score of the closest pre-graded essay and (2) the gold standard, the LSA proximity between the student essay and a standard essay.</Paragraph>
      <Paragraph position="4">  Summary Street (Kintsch et al., 2000) is also built on top of LSA. It helps students to write good summaries.</Paragraph>
      <Paragraph position="5"> First of all, a student is provided with a general advice on how to write a summary, then the student selects a topic, reads the text and writes out a summary. LSA procedures are then applied to give a holistic grade to the summary.</Paragraph>
      <Paragraph position="6">  Apex (Dessus et al., 2000) is a web-based learning environment which manages student productions, assessments and courses. Once connected to the system, a student selects a topic or a question that he or she wishes to work on. The student then types a text about this topic into a text editor. At any time, she can get a three-part evaluation of the essay based on content, outline and coherence. At the content level, the system identifies how well the notions are covered by requesting LSA to measure a semantic similarity between the student text and each notion of the selected topic and correspondingly provides a message to the student.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Syntactically Enhanced LSA (SELSA)
</SectionTitle>
    <Paragraph position="0"> LSA is based on word-document co-occurrence, also called a 'bag-of-words' approach. It is therefore blind to word-order or syntactic information. This puts limitations on LSA's ability to capture the meaning of a sentence which depends upon both syntax and semantics.</Paragraph>
    <Paragraph position="1"> The syntactic information in a text can be characterized in various ways like a full parse tree, a shallow parse, POS tag sequence etc. In an effort to generalize the LSA, we present here a concept of word-tag-document structure, which captures the behavior of a word within each syntactic context across various semantic contexts. The idea behind this is that the syntactic-semantic sense of a word is specified by the syntactic neighborhood in which it occurs. So representation of each such variation in an LSAlike space gives us a finer resolution in a word's behavior compared to an average behavior captured by LSA. This then allows to compare two text documents based on their syntactic-semantic regularity and not based on semanticsonly. So it can be used in high quality text evaluation applications.</Paragraph>
    <Paragraph position="2"> This approach is quite similar to the tagged LSA (Wiemer-Hastings and Zipitria, 2001) which considered a word along with its POS tag to discriminate multiple syntactic senses of a word. But our approach is an extension of this work towards a more general framework where a word along with the syntactic context specified by its adjacent words is considered as a unit of knowledge representation. We define the syntactic context as the POS tag information around a focus word. In particular, we look at the POS tag of the preceding word also called prevtag for convenience. The motivation for this comes from statistical language modeling and left-to-right parsing literature where a word is predicted or tagged using its preceding words and their POS tags. Moreover, prevtag is used as an approximation to the notion of a preceding parse tree characterizing the word sequence before the focus word. But in general, we can also use the syntactic information from the words following the current word, e.g. posttag, the POS tag of the next word. However, one of the concerns while incorporating syntactic information in LSA is that of sparse data estimation problem. So it is very important to choose a robust characterization of syntactic neighbourhood as well as apply smoothing either at the matrix formation level or at the time of projecting a document in the latent space.</Paragraph>
    <Paragraph position="3"> The approach consists of first identifying a sufficiently large corpus representing the domain of tutoring. Then a POS tagger is used to convert it to a POS tagged corpus.</Paragraph>
    <Paragraph position="4"> The next step is to construct a matrix whose rows correspond to word-prevtag pairs and columns correspond to documents in the corpus. Again, a document can be a sentence, a paragraph or a larger unit of text. If the vocabulary size isI, POS tag vocabulary size isJ and number of documents in corpus is K, then the matrix will be IJ K. Let ci j;k denote the frequency of word wi with prevtag pj in the document dk. The notation i j (i underscorej) in subscript is used for convenience and indicates word wi with prevtag pj i.e., (i 1)J + jth row of the matrix. Then as in LSA (Bellegarda, 2000), we find entropy &amp;quot;i j of each word-prevtag pair and scale the corresponding row of the matrix by (1 &amp;quot;i j). The document length normalization to each column of the matrix is also applied by dividing the entries ofkth document by nk, the number of words in document dk. Let ti j be the frequency of i jth word-prevtag pair in the whole corpus i.e. ti j = PKk=1ci j;k. Then&amp;quot;i j and the matrix element</Paragraph>
    <Paragraph position="6"> (2) Once the matrix X is obtained, we perform its singular value decomposition (SVD) and approximate it by keeping the largest R singular values and setting the rest to zero. Thus,</Paragraph>
    <Paragraph position="8"> where, U(IJ R) and V(K R) are orthonormal matrices and S(R R) is a diagonal matrix. It is this dimensionality reduction step through SVD that captures major structural associations between words-prevtags and documents, removes 'noisy' observations and allows the same dimensional representation of words-prevtags and documents (albeit, in different bases). This R-dimensional space can be called either syntactically enhanced latent semantic space or latent syntactic-semantic space.</Paragraph>
    <Paragraph position="9"> After the knowledge is represented in the latent syntactic-semantic space, we can project any new document as aRdimensional vector ^dL in this space. Let d be the IJ 1 vector representing this document whose elements di j are the frequency counts i.e. number of times word wi occurs with prevtag pj, weighted by its corresponding entropy measure (1 &amp;quot;i j). It can be thought of as an additional column in the matrix X, and therefore can be thought of as having its corresponding vector v in the matrix V. Then, d = USvT and</Paragraph>
    <Paragraph position="11"> which is a R 1 dimensional vector representation of the document in the latent space.</Paragraph>
    <Paragraph position="12"> We can also define a syntactic-semantic similarity measure between any two text documents as the cosine of the angle between their projection vectors in the latent syntactic-semantic space. With this measure we can address the problems that LSA has been applied to, namely natural language understanding, cognitive modeling, statistical language modeling etc.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiment - Evaluating Students'
Answers
</SectionTitle>
    <Paragraph position="0"> We have studied the performance of SELSA and compared it with LSA in the AutoTutor task (section 2.2.1) for natural language understanding and cognitive modeling performance. The details of the experiment are presented below.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Corpus
</SectionTitle>
      <Paragraph position="0"> The tutoring research group at the University of Memphis has developed the training as well as testing corpus for the AutoTutor task. The training corpus consisted of two complete computer literacy textbooks, and ten articles on each of the tutoring topics viz. hardware, operating system and the Internet. The test corpus was formed in the following manner : eight questions from each of the three topics were asked to a number of students. Then eight answers per question, 192 in total, were selected as test database. There were also around 20 good answers per question which were used in training and testing. Using this corpus, we have implemented LSA and SELSA.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Human Evaluation of Answers
</SectionTitle>
      <Paragraph position="0"> For comparing the performance of SELSA and LSA with humans, we selected four human evaluators from computer related areas. Three of them were doctorate candidates and one had completed it, thus they were expert human evaluators. Each of them were given the 192 studentanswers and a set of good answers to each of the question. They were asked to evaluate the answers on the basis of compatibility score i.e. the fraction of the number of sentences in a student-answer that matches any of the good answers. Thus, the score for each answer ranged between 0 to 1. They were not told what constitutes a &amp;quot;match&amp;quot;, but were to decide themselves.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Syntactic Information
</SectionTitle>
      <Paragraph position="0"> We approximated the syntactic neighborhood by the POS tag of preceding word. POS tagging was performed by the LTPOS software from the Language Technology Group of University of Edinburgh1. We also mapped the 45 tags from Penn tree-bank tagset to 12 tags so as to consider major syntactic categories and also to keep the size of resulting matrix manageable.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 LSA and SELSA Training
</SectionTitle>
      <Paragraph position="0"> We considered a paragraph as a unit of document. After removing very small documents consisting less than four words, we had 5596 documents. The vocabulary size, after removing words with frequency less than two and some stopwords, was 9194. The density of LSA and SELSA matrices were 0:27% and 0:025% respectively.</Paragraph>
      <Paragraph position="1"> SVD was performed using the MATLAB sparse matrix toolbox. We performed SVD with dimensions R varying from 200 to 400 in steps of 50.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.5 Evaluation Measure
</SectionTitle>
      <Paragraph position="0"> In order to evaluate the performance of SELSA and LSA on AutoTutor task, we need to define an appropriate measure. The earlier studies on this task used a correlation coefficient measure between the LSA's rating and human rating of the 192 answers. We have also used this as one of the three measures for comparison. But for a task having small sample size, the correlation coefficient is not reliably estimated, so we defined two new performance  measures. The first one was the mean absolute difference between the human and SELSA (correspondingly LSA) evaluations. In the other measure we used the comparison of how many answers were correctly evaluated versus how many were falsely evaluated by SELSA (LSA) as compared to human evaluations. A detailed explanation of these measures is given in the following section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML