File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0104_metho.xml
Size: 17,021 bytes
Last Modified: 2025-10-06 14:07:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0104"> <Title>Evangelising Language Technology: A Practically-Focussed Undergraduate Program</Title> <Section position="4" start_page="2" end_page="4" type="metho"> <SectionTitle> 3 The Program </SectionTitle> <Paragraph position="0"> Given the above, our goal was to construct a range of courses that covered a broad range of material that students might be able to use in their subsequent careers. To emphasise the practical orientation of what we wanted to do, we deliberately pitched the program as being concerned with Language Technology, rather than as a program in either Natural Language Processing or Computational Linguistics.</Paragraph> <Paragraph position="1"> There is clearly something of an evangelical element to this: we wanted to make students aware of a broad range of techniques that we would label Language Technology, with the goal that, over time and as these students enter the work force, an awareness would start to spread that these techniques are widely usable.</Paragraph> <Paragraph position="2"> This is not a short-term strategy: it takes several years for the results of these efforts to permeate through the system to a stage where they can be evaluated, but it is essential to get started.</Paragraph> <Paragraph position="3"> In this section, we present a summary of the material we deliver in the courses that make up our program. More detail on each of these courses, and the program as a whole, can be found at http://www.clt.mq.edu.au/Teaching.</Paragraph> <Paragraph position="4"> The program consists of four courses that focus principally on Language Technology, and an additional course that looks more broadly at technologies for working with the web. Figure 1 shows the prerequisite structure that currently holds between these courses.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.1 Comp248: An Introduction to Natural Language Processing </SectionTitle> <Paragraph position="0"> Taught in the second half of second year, this is the course in our program that most closely matches the typical undergraduate NLP course.</Paragraph> <Paragraph position="1"> The design of this course was driven by a desire to show students that they could build a useful, functioning application using NLP techniques; to this end, we felt it was important not to teach only computational syntax, but also something about semantics. Our position here is that syntactic processing is only a means to an end, and we felt it important to quickly get students to the stage where they could actually see some practical import of what they were doing. To this end, in the first half of the course we take a fairly standard approach to teaching Prolog, whereby the students do some rudimentary morphological processing, build some Definite Clause Grammars, and learn about parsing techniques. In the second half of the course, we add semantics to the mix: although we teach an introduction to lambda calculus at this stage, for the practical work we focus on a much shallower approach to semantics (effectively semantic grammars), and the students build a NL database query system that allows them to ask questions of a database of flights. Along the way they learn about unification-based grammar, case frames, lexical resources, WordNet, and semantic networks. The guiding principle throughout is relevance to building a practical application.</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.2 Comp249: Web Technology </SectionTitle> <Paragraph position="0"> Although this course is part of our Language Technology program, it does not contain a significant language technology element (at least as the term is currently construed). It turns out that the background material taught here has proven to be very useful in other courses we teach, so we are considering binding this course more tightly to the others. The course covers: Perl programming, web design, client-server computing, search engines, XML and related technologies, database integration, privacy and security, VoiceXML, and content management; inevitably, with such broad coverage, most topics are treated relatively briefly.</Paragraph> <Paragraph position="1"> Our goal for this course is to target a student body who have little awareness of what NLP is and to get them to see LT in a wider perspective. The success of this course, which is by far the most popular of the units in the program, has led us to explore better ways of leveraging this interest.</Paragraph> </Section> <Section position="3" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.3 Comp348: Intelligent Text Processing </SectionTitle> <Paragraph position="0"> At the third year level, we offer two courses that take the second year material as a base. We noted earlier that we viewed the job market as consisting of two relatively distinct sectors, one concerned with voice processing and one concerned with document processing. This perception is very deliberately reflected in the individual biases of the third year offerings; Comp348 addresses the needs of document processing, whereas Comp349, discussed later, leans more towards voice processing.</Paragraph> <Paragraph position="1"> The course on intelligent text processing covers basics of text processing using Perl; tokenisation and sentence segmentation, text summarisation; information retrieval; corpus-based approaches, part of speech tagging, word sense disambiguation, information extraction; and machine translation. Again, this is a lot of material to cover, and inevitably we only skim the surface of many topics. However, in the first offering of the course, students did significant assignments in both text summarisation (using sentence extraction) and information extraction.</Paragraph> <Paragraph position="2"> The latter assignment was run roughly along the lines of the Message Understanding Conferences: using conference announcements as a data set, the students were provided with a training set on the basis of which they built an information extraction system; this was then tested against unseen data, and scores were automatically derived. Now in its second offering, our intention is to use anaphor resolution as the focus of an assignment.</Paragraph> <Paragraph position="3"> Our goal in this course is to provide students with a toolset for text processing from a language technology perspective. We focus on relatively shallow methods, since these are the methods students are most likely to find themselves using in their subsequent careers.</Paragraph> <Paragraph position="4"> Our driving aim here is for our alumni to recognize that LT provides solutions.</Paragraph> </Section> <Section position="4" start_page="2" end_page="4" type="sub_section"> <SectionTitle> 3.4 Comp349: Interactive Natural Language Systems </SectionTitle> <Paragraph position="0"> As already indicated, this course aims to provide knowledge that students need in order to be effective in the voice processing industry sector.</Paragraph> <Paragraph position="1"> The focus here is on, effectively, text- and speech-based dialog systems. In the first half of the course, we cover a significant amount of relatively theoretical material, covering question answering systems, database interfaces, and answer extraction. Students build a quite sophisticated text-based natural language query system.</Paragraph> <Paragraph position="2"> In the second half of the course, we attempt to apply the theoretical ideas in the very practical context of building spoken language dialog systems. We begin by using the CSLU Toolkit , which the students use to build a voice banking application. We then introduce VoiceXML in some detail; using a PC-based development environment, students build a simple flight reservations system.</Paragraph> <Paragraph position="3"> We place a heavy emphasis here on aspects of voice user inferface (VUI) design; in the practical half of the course, the materials we use take a similar approach to that taken in vendor courses that aim to train dialog designers and grammar writers. At the same time, we have as an important aim a clear exposition of the relationship between the ideas explored in research systems and commercially deployed systems; in practice it can be very hard to see a path from the former to the latter. We make clear to students that our goal is to teach them how to build practical dialog applications now, but to get them to think about what the next generations of such applications might be in the light of the results that come out of research laboratories.</Paragraph> </Section> <Section position="5" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 3.5 Comp448: Advanced Topics in Natural Language Processing </SectionTitle> <Paragraph position="0"> For those students who stay on for a fourth year, we run a course that is more driven by a selection of specific research topics. At the time of writing, the first offering of this course is being delivered. We are using the course to cover in more depth core topics that are only really touched upon in earlier courses, with more detailed exploration of word sense disambiguation, anaphora resolution, discourse structure and natural language generation. The course is seminar-based, with a high proportion This toolkit provides an excellent environment for teaching students to think about issues such as dialog flow, as well as introducing them to many other aspects of spoken language dialog systems. See http:// cslu.cse.ogi.edu/toolkit/.</Paragraph> <Paragraph position="1"> We have experimented with a number of different VoiceXML development environments which are freely available over the web; each has its advantages and disadvantages. Currently we've had most success with Motorola's MADK : see http://developers.motorola.com/developers/. At the time of writing, however, this does not support the new VoiceXML 2.0 standard, so we are considering other alternatives.</Paragraph> <Paragraph position="2"> of student presentations, and an assignment in anaphor resolution.</Paragraph> <Paragraph position="3"> The level of interest amongst students at this level is such that we expect to offer additional honours level courses later in the current academic year.</Paragraph> </Section> </Section> <Section position="5" start_page="4" end_page="4" type="metho"> <SectionTitle> 4 Outcomes and Issues </SectionTitle> <Paragraph position="0"> The program has been operating since the second half of 2000. Since that time, we have taught Comp248 twice and Comp349 once; Comp249 and Comp348 are currently being taught for the second time; and Comp448 is being taught for the first time.</Paragraph> <Paragraph position="1"> It is too early to establish to what extent the material we have taught is impacting on graduates' work practices: the first students to complete degrees that incorporate our courses are only now graduating. However, we have made use of a number of feedback and review mechanisms over the last 18 months, and these have already provided us with new ideas for how to improve what we are trying to do.</Paragraph> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.1 Evaluating Course Content </SectionTitle> <Paragraph position="0"> We make use of the typical infrastructure made available for evaluation purposes: student-staff liaison committees, formal questionnaires, and also a significant amount of informal feedback through discussions with students. We also have a management advisory board with representation from industry; this meets twice a year to review the development of the program and to comment on its industrial relevance.</Paragraph> <Paragraph position="1"> Generally, the courses have been very well received by the students who take them. Our advisory board is very comfortable with the material we teach, but we suffer here from the problem that the voice recognition industry is better represented here than the hard-to-define document processing industry alluded to earlier. So, we have strong evidence that students find the material interesting, challenging and informative; our industry partners think we are going in the right direction; but we have yet to demonstrate that the wider industry community will see a benefit from students who have grasped this material.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.2 Course Materials </SectionTitle> <Paragraph position="0"> We have faced a not insignificant problem in finding appropriate course materials for these courses, with the consequence that we have had to develop most things from scratch. For the first offering of Comp248, the introductory NLP course, we used Allen [1995]; in the second offering, we found Covington [1994] to be more useful. Although this is technically out of print, Prentice Hall has a technology for producing short print runs on demand.</Paragraph> <Paragraph position="1"> The materials problem was more severe in our third year courses, since there are no even vaguely adequate textbooks for the material we wanted to cover. We provide students with a comprehensive reading packet, but it is not easy to find appropriate survey or introductory readings in the various topic areas we cover. As a consequence of this we are exploring the possibility of writing a textbook that covers the material in each of these courses.</Paragraph> </Section> </Section> <Section position="6" start_page="4" end_page="4" type="metho"> <SectionTitle> 5 Lesssons Learned and Future Directions </SectionTitle> <Paragraph position="0"> Eighteen months from the start of the program, we are reasonably assured that we are going in the right direction; some things, inevitably, require fine tuning. We note here some key consequences of our experiences so far.</Paragraph> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.1 Voice Captures the Imagination </SectionTitle> <Paragraph position="0"> Perhaps not surprisingly, it is the study of voice recognition that has really captured students' imaginations. The level of enthusiasm generated in a laboratory full of students wearing headsets talking to their machines is wonderful to watch (although the working environment doesn't do a lot for speech recognizer accuracy). With this in mind, we are reworking our second year course, Comp248, so that it will contain some of the voice material currently used in third year. We are also considering an emphasis here on technology that students might meet outside of the curriculum, such as chatterbots. Our strategy here is to entice students into the area with appealing content, and draw them into the more theoretically challenging material in later courses.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.2 Document Processing as a Theme </SectionTitle> <Paragraph position="0"> It has become obvious that our Web Technology course could play a more coherent role in our program. One obvious direction we are pursuing is to cement the two strands identified earlier even further, by seeing the Web Technology course specifically as a precursor for the Intelligent Text Processing course. At the same time, we are considering broadening the third year course to cover Document Processing more generally, as a way of making its relevance more apparent; a shift of this kind might also permit the inclusion of more material on information retrieval and related technologies, which are of some significance from an industry perspective.</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.3 Linguistic Background </SectionTitle> <Paragraph position="0"> We have met the common, and not unexpected, problem that some students do not have a sufficient grasp of linguistic matters to perform satisfactorily in this area. To this end, we have initiated the introduction of a first year course that covers basic aspects of linguistics, logic and computation, taught by ourselves in conjunction with the University's Departments of Philosophy and Linguistics.</Paragraph> </Section> <Section position="4" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 5.4 Conclusions </SectionTitle> <Paragraph position="0"> So far, our program has been seen as very successful from an academic perspective, and has generated significant interest amongst students. Our next challenge is to persuade the wider industry to see students with this training as very valuable assets. We have instituted an alumni program that will attempt to track these students, with the expectation of some preliminary feedback being available by the end of the calendar year.</Paragraph> </Section> </Section> class="xml-element"></Paper>