File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2023_metho.xml
Size: 8,412 bytes
Last Modified: 2025-10-06 14:07:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-2023"> <Title>A reliable approach to automatic assessment of short answer free responses</Title> <Section position="4" start_page="0" end_page="3" type="metho"> <SectionTitle> 2 Using WebLAS </SectionTitle> <Paragraph position="0"> Just as a scoring rubric for short answer scoring cannot be created in a vacuum, it would be difficult for us to discuss the scoring process without describing the task creation process.</Paragraph> <Paragraph position="1"> Task development consists of all the efforts that lead to the test administration. The task development portion of WebLAS consists of three modules- task creation, task modification, and lexicon modification. These are explained below.</Paragraph> <Section position="1" start_page="0" end_page="3" type="sub_section"> <SectionTitle> 2.1 Using WebLAS </SectionTitle> <Paragraph position="0"> WebLAS is written mostly in Perl. Its capacity for regular expressions (regex) make it well suited for natural language processing (NLP) tasks, and its scripting abilities enable dynamic and interactive content deliverable over the web.</Paragraph> <Paragraph position="1"> There is also a complete repository of open source Perl modules available, eliminating the necessity to reinvent the wheel.</Paragraph> <Paragraph position="2"> One of the tools WebLAS incorporates is Wordnet, an English lexicon under development at Princeton with foundations in cognitive psychology (Fellbaum 1998). A second tool WebLAS uses is the Link Grammar Parser, a research prototype available from Carnegie Mellon University (Grinberg et al 1995). Both Wordnet and Link Grammar are written in C/C++. To interface with the systems, we make use of 2 Perl modules developed by Dan Brian for fast access, and allows for modifications to the lexicon.</Paragraph> <Paragraph position="3"> Linguana::LinkGrammar interfaces with the Link Grammar for parts of speech (POS) tagging and syntactic parsing. For our web server we use the Apache Advanced Extranet web server. To run perl scripts via the web, we use mod_perl, which enables us to run unmodified scripts. Our database is MySQL server .</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.2 Task Development </SectionTitle> <Paragraph position="0"> WebLAS is organized into four major components relative to the test event itself. These are test development, test delivery, response scoring, and test analysis. Two of these are relevant to NLP- task development and test The task creation module is somewhat of a misnomer. At the time of using this module, the task has already been specified according to language assessment requirements. The module actually facilitates the instructor with the process of storing into the database and preprocessing the task for automatic scoring, rather than creating the task itself. This process is shown in the flowchart in Figure 1.</Paragraph> <Paragraph position="1"> a. The module requests from the instructor the task name, task type, input prompt, response prompt, and model answer for the task. This information is stored within the database for retrieval.</Paragraph> <Paragraph position="2"> b. WebLAS sends Link Grammar the model answer, which returns the answer after tagging the POS and parsing it. WebLAS then finds important elements of the model answer which are necessary to receive full credit from the parsed answer and confirms each one with the instructor. Elements are generally phrases, such as &quot;the sushi restaurant&quot; or &quot;next to the post office&quot; but could be singletons as well, such as &quot;Yamada-san&quot; as well.</Paragraph> <Paragraph position="3"> c. After each element is confirmed, WebLAS searches Wordnet for possible alternatives of the elements and their individual words. For example, it may deem &quot;near the post office&quot; as a possible alternate to &quot;next to the post office.&quot; Once found, it asks for confirmation from the instructor again. Additionally, the educator is prompted for other possibilities that were not found.</Paragraph> <Paragraph position="4"> d. The task creator dictates a ratings scale.</Paragraph> <Paragraph position="5"> Point values assigned to elements deriving directly from the model answer are assumed to be maximum values, i.e. full credit for the given element. Alternatives to the model answer elements found can be assigned scoring less than or equal to the maximum value. Thus an answer with numerous elements can be scored with a partial credit schema.</Paragraph> <Paragraph position="6"> e. WebLAS takes the input (model answer, elements, alternatives, score assignments) to create a scoring key. The scoring key employs regular expressions for pattern matching. For example, &quot;(next|near)&quot; indicate that either &quot;next&quot; or &quot;near&quot; are acceptable answers. Along with each regex is a point value, which is added to a test taker's final score if the regex is matched with the student response.</Paragraph> <Paragraph position="7"> The task modification module allows for instructors to go back and modify tasks they have created, as well as tasks others created. The database tracks information relevant to the changes, including information on the modifier, date and time of the modification, evolving changes to the tasks, and any comments on the reasons for the change. The database supports data synchronization, so that two instructors cannot and do not change tasks simultaneously. Should the model answer be changed, the scoring key creation of the task creation module is activated and the instructor is guided through the process again.</Paragraph> <Paragraph position="8"> The WebLAS lexicon is based on Wordnet.</Paragraph> <Paragraph position="9"> Wordnet is by no means complete, however, and it may be possible that instructors may find the need to add to its knowledge. The lexicon is automatically updated given the input given during scoring key creation.</Paragraph> <Paragraph position="10"> One can also manually modify the lexicon through a guided process. The system prompts for the word and its parts of speech and returns all possible word senses. The user chooses a word sense, and is then given a choice of the relation type to modify (i.e. synonyms, antonyms, hyponyms, etc.). The final step is the modification and confirmation of the change to the relation type.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.3 Test Scoring </SectionTitle> <Paragraph position="0"> Once the task creation module creates the regexes, task scoring becomes trivial. WebLAS simply needs to pattern match the regexes to score each element. Additionally, WebLAS can be quite flexible in its scoring. It is tolerant of a wide range of answers on the part of test takers, incorporating adapted soundex, edit distances, and word stemming algorithms, for phonetic, typographic, and morphological deviations from model answers.</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 3 Lexicon Modification </SectionTitle> <Paragraph position="0"> There are advantages to the WebLAS system.</Paragraph> <Paragraph position="1"> The first is a computational efficiency factor.</Paragraph> <Paragraph position="2"> The system is not a learning system (yet). The automatic scoring section, if it did not use preprocessed regexes, would perform the same search for each student response. This search becomes redundant and unnecessary. By preprocessing the search, we reduce the linear time complexity- O(n), to a constant- O(1), with respect to the number of student responses.</Paragraph> <Paragraph position="3"> Second, partial scoring eliminates arbitrariness of scoring. Rather than a simple credit/no credit schema, each element individually contributes to the final score tabulation.</Paragraph> <Paragraph position="4"> Reliability also increases. Since the scores produced are repeatable, and do not change with each scoring, WebLAS has perfect intra-rater reliability. Because the instructor confirms all scoring decisions beforehand, the scores are also explainable and justifiable, and can withstand criticism.</Paragraph> </Section> class="xml-element"></Paper>