File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3705_metho.xml
Size: 10,936 bytes
Last Modified: 2025-10-06 14:11:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3705"> <Title>Language Engineering and the Pathway to Healthcare: A user-oriented view</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> MT evaluation is notoriously difficult, and SLT evaluation even more so. Most researchers agree that measures of translation fidelity in comparison with a gold-standard translation, as seen in text MT evaluation, are largely irrelevant: a task-based evaluation is more appropriate. In the case of medical SLT this presumably means simulating the typical situation that the technology will be used in, which involves patients with medical problems seeking assistance.</Paragraph> <Paragraph position="1"> Since SLT is a pipeline technology, the individual components could be evaluated separately, and indeed the effects of the contributing technologies assessed (cf. Somers and Sugita 2003). Once again, literacy issues will cloud any evaluation of speech recognition accuracy that relies on its speech-to-text function, and evaluation of speech synthesis must simulate a realistic task (cf. comments on SUS, above).</Paragraph> <Paragraph position="2"> Evaluations that have been reported suggest using real medical professionals and actors playing the part of patients: this scenario is well established in the medical world, where &quot;standardized patients&quot; (SPs) - actors trained to behave like patients - have been used since the 1960s. One problem with SPs for systems handling &quot;low density&quot; languages like trained as an SP, in conflict with the need for them to not understand English in order to give the system a realistic test. Ettelaie et al. (2005) for example report that their evaluation was somewhat compromised by the fact that two of their patient roleplayers did speak some English, while a third participant did not adequately understand what they were supposed to do.</Paragraph> <Paragraph position="3"> Another problem is that there is no obvious base-line against which evaluations can be assessed. One could set up &quot;with and without&quot; trials, and measure how much and how accurately information was elicited in either mode. But this would be a waste of effort: it is widely, although anecdotally, reported that when patients with limited English arrive for a consultation where no provision for interpretation has been made, the consultations simply halt. It is also reported, as already mentioned, that human interpreters are not 100% reliable (Flores, 2005). Often, an untrained interpreter is used, whether a family member or friend that the patient has brought with them, or even another health-seeker who happens to be sitting in the waiting room. The potential for an unreliably interpreted consultation (or worse) is massive.</Paragraph> <Paragraph position="4"> Ettelaie et al. (2005) mention a number of metrics that were used in their evaluation, but unfortunately do not have space for a full discussion. The principle metric is task completion, but they also mention an evaluation of a scripted dialogue, with translations evaluated against model translations using a modified version of BLEU, and SR evaluated with word-error rate. These do not seem to me to be extremely valuable evaluation techniques.</Paragraph> <Paragraph position="5"> Starlander et al. (2005) report an evaluation in which the translations were judged for acceptability by native speakers. Given the goal-based nature of the task, rating for intelligibility rather than acceptability might have been more appropriate, though it is widely understood that the two features are closely related. On the positive side, Starlander et al. used only a three-point rating (&quot;good&quot;, &quot;ok&quot; or &quot;bad&quot;): evaluations of other target languages might be sub-ject to the problem, reported by Johnson et al. (in prep.) and by ADD REF that rating scales are highly culture-dependent, so that for example Somali participants in an evaluation of the suitability of symbols in doctor-patient communication mostly used only points 1 and 7 of a 7-point scale.</Paragraph> <Paragraph position="6"> Another evaluation method4 is to assess the number and type of translation or interpretation errors made, including whether there was any potential or actual error of clinical consequence.</Paragraph> <Paragraph position="7"> As Starlander et al. (2005) say: In the long-term, the real question we would like to answer when evaluating the prototype is whether this system is practically useful for doctors to which we can only add, reiterating our comments in Section 2, &quot;. . . and for patients&quot;.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 The Pathway to Healthcare </SectionTitle> <Paragraph position="0"> Let us move on finally to a more wide-ranging issue. &quot;Medical SLT&quot; is often assumed to focus on doctorpatient consultations or, as we have seen in 4Thanks to the anonymous reviewer for pointing this out.</Paragraph> <Paragraph position="1"> the case of systems developed under the CAST programme, interactions between medical professionals and affected persons in the field. Away from that scenario, although it is natural to think of &quot;going to the doctor&quot; as involving chiefly an interview with a doctor, and while everything in medical practice arguably derives from this consultation, the pathway to healthcare in normal circumstances involves several other processes, all of which involve language-based encounters that present a barrier to patients with limited English. None of the medical SLT systems that have been reported in the literature address this variety of scenarios, although the website for the Phraselator (which is of course not an SLT system as such) does list a number of different scenes, such as the front desk, labour ward and so on.</Paragraph> <Paragraph position="2"> In this section, we would like to survey the pathway to healthcare, and note the range of language technologies - not always speech or translation oriented - that might be appropriate at any point. The purpose of this is both to make a plea to widen our vision of what &quot;medical SLT&quot; covers, but also to note that SLT is not necessarily the most appropriate technology in every case.</Paragraph> <Paragraph position="3"> The pathway might begin with a person suspecting that there may be something wrong with them. Many people nowadays would in this situation first try to find out something about their condition on their own, typically on the Web, though of course there is still a major &quot;digital divide&quot; for racial and ethnic minorities, and the poor, partly due to the langauge barriers this research is addressing. If you need this information in your own language, and you have limited literacy skills, technologies implied are multilingual information extraction. MT perhaps coupled with text simplification, with synthesized speech output. For specific conditions which may be treated at specialist clinics (our own experience is based on Somalis with respiratory difficulties) it may be possible to identify a series of frequently asked questions and set up a pre-consultation computer-mediated help-desk and interview (cf. Osman et al. 1994). See Somers and Lovel (2003) for more details.</Paragraph> <Paragraph position="4"> Having decided that a visit to the doctor is indicated, the next step is to make an appointment. Appointment scheduling is the classical application of SLT, as seen in most of the early work in the field, and is a typical case of a task-oriented cooperative dialogue. Note that the &quot;practitioner&quot; - the receptionist in the clinic - does not necessarily have any medical expertise, nor possibly the high level of education and openness to new technology that is often assumed in the literature on medical SLT which talks of the &quot;doctor&quot; controlling the device.</Paragraph> <Paragraph position="5"> If this is the patient's first encounter with this particular healthcare institution, there may be a process of gathering details of the patient's medivcal history and other details, done separately from the main doctor-patient consultation, to save the doctor's time. This might be a suitable application for computer-based interviewing (cf. Bachman 2003).</Paragraph> <Paragraph position="6"> The next step might be the doctor-patient consultation, which has been the focus of much attention.</Paragraph> <Paragraph position="7"> For no doubt practical purposes, some medical SLT developers have assumed that the patients role in this can be reduced to simple responses involving yes/no responses, gestures and perhaps a limited vocabulary of simple answers at the limit. This view unfortunately ignores current clinical theory. Patientcentred medicine (cf. Stewart et al. 2003) is widely promoted nowadays. The session will see the doctor eliciting information in order to make a diagnosis as foreseen, but also explaining the condition and the treatment, and exploring the patients feelings about the situation. While it may be unrealistic at present to envisage fully effective support for all these aspects of the doctorpatient consultation, we feel that its purpose should be explicitly appreciated, and the limitations of current technology in this respect acknowledged. null After the initial consultation, the next step may involve a trip to the pharmacist to get some drugs or equipment. Apart from the human interaction, the drugs (or whatever) will include written instructions and information: frequency and amount of use, contraindications, warnings and so on. This is an obvious application for controlled language MT: drug dose instructions are of the same order of complexity as weather bulletins. For non-literate patients, &quot;talking pill boxes&quot; are already available:5 why can't they talk in a variety of languages? Another outcome might involve another practitioner - a nurse or a therapist - and a series of meet5Marketed by MedivoxRx. See Orlovsky (2005).</Paragraph> <Paragraph position="8"> ings where the condition may be treated or managed.</Paragraph> <Paragraph position="9"> Apart from more scheduling, this will almost certainly involve explanations and demonstrations by the practitioner, and typically also elicitation of further information from the patient. Hospital treatment would involve interaction with a wide range of staff, again not all medical experts. If a communication device is to be used, it makes more sense for it to be under the control and &quot;ownership&quot; of the person who is going to be using it regularly: the patient. null</Paragraph> </Section> class="xml-element"></Paper>