File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0410_metho.xml
Size: 14,470 bytes
Last Modified: 2025-10-06 14:15:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0410"> <Title>A Web.Based System for Automatic Language Skill Assessment: EVALING</Title> <Section position="2" start_page="62" end_page="68" type="metho"> <SectionTitle> 1. The Developer side </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="62" end_page="68" type="sub_section"> <SectionTitle> 1.1 The Main Program and the Databases </SectionTitle> <Paragraph position="0"> From a technical point of view, EVALING uses a Multithreaded Automation Object. This technology allows multiple threads of execution (Apartment-model threading).</Paragraph> <Paragraph position="1"> Thus, the EVALING kernel is an 'Active X DLL'. In the DLL, all exercises are designed on a single module which makes it easier to develop and manage the whole system. In fact, in order to add or remove exercises, we just have to add or remove modules and to change a global variable in the main module which indicates to the system the number of exercises and their order in the session. The system is developed to run with Microsoft Information Server and works as a server application.</Paragraph> <Paragraph position="2"> Sets of exercises passed down to a client are dynamically composed by the system at the time of the user's request. All exercises are stored in tables of the Exercise database. For each questionnaire, the system extracts a set of 4 INTEX is a parsing system based on wide coverage dictionaries and local grammars (represented by Finite State Automata) which applies to lage corpora, that is 100Mo. Cf. Silberztein (1993).</Paragraph> <Paragraph position="3"> exercises from a table. Each record in the table corresponds to one sentence, or one short text. Therefore, several records are retrieved by the system to constitute the questionnaire that will be sent to the client.</Paragraph> <Paragraph position="4"> Corrections of the user's answers are provided by JavaScfipt programs that use data of hidden fields in HTML form or by more complex programs running on the server when necessary. When we make use of JavaScfipt programs to correct a form, we never display answers clearly in the JavaScript source (functions that take ASCII values or binary values evaluate the validity of answers).</Paragraph> <Paragraph position="5"> Answers are therefore not readable in HTML source code. This practice is a second level of protection, because the whole test session occurs in a Browser Window that does not contain the menu bar that allows the view source action.</Paragraph> <Paragraph position="6"> It is imperative that sets of exercises have the same level of difficulty from one client to another. To signal this stability, a field in each record indicates the level of difficulty. When the EVALING system composes a questionnaire, records are chosen in such a way that the sum of the difficulty marks is always the same (this sum is given by the administrator who can choose the global level of difficulty). Grading occurs at the development time. A script, which uses linguistic tools (INTEX programs, local grammars, dictionaries and linguistic descriptions) evaluates the sentences. For French, the difficulty of a sentence depends on: -length in words, -presence/absence of negation, conjunction, relative pronoun, -lexical difficulty: to measure this parameter, we use an electronic dictionary: the DELAF dictionary for French that contains more than 900.000 inflected forms. It is divided into three levels of difficulty: common and easy words constitute the first level, technical and unusual words the third level and in between, we have a second level consisting of words whose understanding is hazy.</Paragraph> <Paragraph position="7"> The dynamic design of a questionnaire requires large databases, with a large number of sentences for each exercise. It is costly to produce all of them manually. To avoid such a difficulty, we designed a set of search tools that apply to large corpora and retrieve sentences or short texts that match a given linguistic pattern. These tools constitute a full-fledged parser based on the use of Finite State Transducers (FST) 5. They use morpho-syntactic information provided by wide coverage electronic dictionaries 6, allowing us to retrieve accurate linguistic information 7.</Paragraph> <Paragraph position="8"> The software INTEX includes a graphic editor (FSGraph) which allows ergonomic designing of FSTs (which are graphically represented as graphs). Linguists or other non computer scientists can easily design and maintain FST libraries. Graphs can be used in three modesa: 'simple mode' for retrieval purpose, 'fusion mode' for inserting patterns in the recognized sequence and 'replace mode' for substituting a pattern to the one matched by the graph. For example, the following graph, when used in 'simple mode', locates in a text the patterns displayed in the boxes (inputs) and, when used in 'replace mode', it substitutes to the input the text displayed below the boxes (outputs).</Paragraph> </Section> <Section position="2" start_page="68" end_page="68" type="sub_section"> <SectionTitle> 1.2 Types of exercises </SectionTitle> <Paragraph position="0"> J We built two types of exercises: First, we have a set of exercises whose correction consists of comparing the user s For a simple description, see Silberztein (1998). answers to the solutions registered in the database. This elementary way of sconng is used for exercises where the number of different correct answers is limited (generally to one or two possibilities). User answers are evaluated as either right or wrong, with no intermediate scores. On the screen, the exercise can take the appearance of a multiple choice test (if we add distractors) or of a form to complete (if we remove from the original sentence patterns to be tested).</Paragraph> <Paragraph position="1"> Our methodology produces this kind of exercise easily and economically. Since we extract the items for these exercises from corpora, all items we put in the database are well-formed items 9. Thus, the validity check of the users answers can be a simple stnng comparison. The example below underline this simplicity (cf. 1.3 Sample of exercise development).</Paragraph> <Paragraph position="2"> Second, we are working on the conception of exercises where expected answers are more free, but not completely. In this case, scoring is performed by means of specialized grammars.</Paragraph> <Paragraph position="3"> Technically, these grammars are also Finite State Transducers. They describe all possible structures of a given context. Because these grammars are specialized to the descriptions of limited contexts, they are called 'local grammars'.</Paragraph> <Paragraph position="4"> Local grammars introduce a lot of flexibility for rating exercises. We are testing two ways of sconng: binary scoring (fight/wrong) and multivalue scales. In the first case, we use a grammar in 'simple mode' to verify an answer: answer matched by the grammar is valid, otherwise it is considered wrong. In the second case, the grammar is used in 'replace mode': if a path of the grammar matches the answer, the output combined to the path is produced. This output can be a score or a formal information processed later by a sconng program. It is thus possible to produce different outputs (scores), and even outputs for each path of the graph. For example, this system allows us to rate an answer according to the linguistic register used by the 9 If we assume that the corpora is error free, but it is not. A human reader has to verify the items of the database.</Paragraph> <Paragraph position="5"> testee or according to the amount of details given.</Paragraph> <Paragraph position="6"> In figure 2, words between brackets are lemmas: <Stre> refers to all the forms of the verb ~tre and <se> to the contracted form s' and se (this information is provided by dictionaries). In this way, the followings utterances are matched by the graph: s'est fair avoir, se faisait avoir, etc. The output number refers to the linguistic register: 1 indicates the best forms, 2 a spoken form and 3 slang terms.</Paragraph> </Section> <Section position="3" start_page="68" end_page="68" type="sub_section"> <SectionTitle> 1.3 Sample of exercise development </SectionTitle> <Paragraph position="0"> We present a simple case of exercise building. The purpose of this exercise is to test the ability to choose the right form of the French word tout ('all') that can be a noun, pronoun or adverb and that can be spelled tout, tous, route or toutes.</Paragraph> <Paragraph position="1"> 1.3.1 Retrieving sentences with a graph We design a graph and apply it to a corpus to retrieve sentences that match the sequences described by the graph. Then the graph is stored in the 'tool folder'. It will be reused later, when we will need to update the Exercises database.</Paragraph> <Paragraph position="2"> The graph is read from left to right (from the Initial node to the terminal one). If a sequence of words in the text is matched by one of the paths of the graph, the sequence is saved. An output is associated to this graph (represented below two boxes: &quot;\[&quot; and &quot;\]&quot;). We apply the graph in merge mode to insert this output in the text.</Paragraph> <Paragraph position="3"> Inserted signs are helpful to import sentences in the database.</Paragraph> <Paragraph position="4"> Our corpus is preprocessed by INTEX. Since preprocessing segments the text into sentences, it is possible to refer to the 'beginning of a sentence' (<^> in our graph) or to the 'end of a sentence' (<$> in our graph). The box containing 'MOT' is a link to another graph that represents a string of words (max. 10 words) and optional delimiters (apostrophe, comma, hyphen). This string is optional, that is why there is an alternative path under the box 'MOT'.</Paragraph> <Paragraph position="5"> The following sentences are retrieved from a novel of Agatha Christie: Son air tr~s anglais avait \[tout\] pour stduire quelqu'un qui, comme moi, n'avait pas revu sa patrie depuis trois ans. Je voudrais que, de retour chez vous, vous observiez le monde nouveau de rapr~s-guerre et que vous dtcidiez, en \[route\] libert6 et indtpendance, ce que vous en attendez. J'ai fait \[tout\] mon possible pour ne pas vous dire que je vous aime...</Paragraph> <Paragraph position="6"> Et ils vtcurent \[tous\] ensemble dans une petite maison biscornue.</Paragraph> <Paragraph position="7"> 1.3.2 Estimation of the sentence difficulty level The tool is still under development and a study is going on to refine the criteria we are using. At this time, we consider three levels of difficulty: easy, medium and difficult. The level assessment depends on the length in number of words, presence of a modality (interrogative or not), lexical complexity and presence or absence of a negation, conjunction, relative pronoun. For example, the graph 'value.grf' below detects negation, conjunction and relative pronouns. In this graph, GN is a link to another graph that describes summarily noun phrases. It guarantees that the pronoun which comes after GN is effectively a relative pronoun. NCR is a link to a graph that repeats the possibility to have a negation, conjunction or relative pronoun. In fact, a sentence can contain more than one pattern. <CONJC> and <CONJS> are tags that come from dictionaries and local grammars built and the equality of difliculty between questionnaires is tested by a function that refers to a key value. This key value is accessible to the Administrator who can decide to raise it or to reduce the global key value that gives the system the level of difficulty to compute. Since all marks are stored in a database, it is possible to perform some statistical analysis. Through the Administrator Interface, one can handle the usual numerical data: who got highest/lowest marks, where are the weaknesses/strengths of the group, etc. Statistical information is then displayed graphically and in tables. Example of application: In October 99, at the beginning of the academic year, EVALING will be used to help a language teacher with information about the specificity of this group of students. A statistical analysis of the marks obtained by about two hundred students is already a rich information. The administrator can detect common weaknesses and adapt his course to these difficulties.</Paragraph> <Paragraph position="8"> returns the next exercise (e.g. exercises on past participles) to the client. At the present time, the test session is not 'adaptative'ldeg: all clients answer the same questions (layouts used to compose sets of questions are the same for all users). At the end of the session, the client receives a report, in the case the Administrator has configured the system to do so. The report mentions only the marks registered by each exercise, without any other comments. Marks are also represented graphically by a Java Applet.</Paragraph> <Paragraph position="9"> We were careful to avoid interferences between language skills and computer skills.</Paragraph> <Paragraph position="10"> Computerization should not interfere with language skill assessment. Our prerequirement was to create a system that could easily be used by someone not that familiar with computer handling. In fact, recent scientific studies</Paragraph> </Section> </Section> <Section position="3" start_page="68" end_page="68" type="metho"> <SectionTitle> 10 Principles of Computer-Adaptative language Tests </SectionTitle> <Paragraph position="0"> (CAT) are described in Tung (1986).</Paragraph> <Paragraph position="1"> showed that there is little or no evidence of adverse effects on the computer-based tests due to lack of prior computer experience 11. To ensure that, we decided to rule out technical manipulations like ' drag and drop ' and to make every effort to build a simple and intuitive interface. A tutorial is available from the start page on. Before getting into the test, the user can discover the different types of questionnaires and try each element of the answering system (in fact, any item in HTML Forms: pop-up menu, text field, button, radio button, etc). During the test a help button always displayed at the same location provides contextual information by means of a simple click.</Paragraph> </Section> class="xml-element"></Paper>