File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/e89-1016_concl.xml

Size: 10,365 bytes

Last Modified: 2025-10-06 13:56:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="E89-1016">
  <Title>User studies and the design of Natural Language Systems</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper had two objectives: the first was to evaluate the use of the WOZ technique for assessing NL systems and the second was to investigate the effect of task on language use.</Paragraph>
    <Paragraph position="1"> One criticism we made of both test suites and tasks using pen and paper, was that they may attempt to evaluate systems against inadequate criteria. Specifically they may not evaluate the adequacy of NL systems when users are carrying out tasks with specific software systems. The unknown words analysis seems to bear this out: we found 3 classes of unknown words which occurred only because our users were doing a task. Firstly our users wanted to carry out operations - 120 involving the selection and permutation of answer sets and make explicit reference to their properties. Secondly, we found that our subjects wanted to use complex reference to refer back to previous queries in order to refine those queries, or to exclude answers to previous queries from their current query. Finally, we found that users attempted to use the structure of the information source, in this case the database, in order to access information. Together these 3 classes accounted for 45% of all unknown words. We believe that whatever the task and software, there will always be instances of operators, context use and reference to the information source. It would therefore seem that coverage of these 3 sets of phenomena is an important requirement for any NL interface to an application. The fact that other evaluation techniques may not have detected this requirement is, we believe, a vindication of our approach. An exception to this is the work of Cohen et al. \[CPA82\] who point to the need for retaining and tracking context in this type of application.</Paragraph>
    <Paragraph position="2"> Of course there are still problems with the WOZ technique. One such problem concerns the task representativeness and a difficulty in designing this study lay in the selection of a task which we felt to be typical of database access. Clearly more information from field studies would be useful in helping to identify prototypical database access tasks.</Paragraph>
    <Paragraph position="3"> A second problem lies in the interpretation of the results with respect to the classification and frequency of the unknown word errors: how frequently must an error occur if it is to warrant system modification? For example, references to the information source accounted for only 5% of the errors and yet we believe this is an interesting class of error because exploiting the structure of the database was a useful retrieval tactic for some users. The frequency problem is not specific to this study, but is an instance of a general problem in computational linguistics concerning the coverage and the range of phenomena to which we address our research. In the past, the field has focussed on the explanation of theoretically interesting phenomena without much attention to their frequency in naturally occurring speech or text. It is clear, however, that if we are to be successful in designing working systems, then we cannot afford to ignore frequently occurring but theoretically uninteresting phenomena such as punctuation or dates. This is because such phenomena will probably have to be treated in whatever application we design. Frequency data may also be of real use in determining priorities for system improvement.</Paragraph>
    <Paragraph position="4"> As a result of using our technique, we have identified a number of unknown words. How should these words be treated? Some of the unknown words are synonyms of words already in the system. Here the obvious strategy is to modify the NL system by adding these. In other cases, system modification may not be possible because linguistic theory does not have a treatment of these words.h In these circumstances, there are three possible strategies for finessing the problem. The first two involve encouraging users to avoid these words, either by generating co-operative error messages to enable the user to rephrase the query and so avoid the use of the problematic word \[Adg88, Ste88\] or by user training.</Paragraph>
    <Paragraph position="5"> The third strategy for finessing the analysis of such words is to supplement the NL interface with another medium such as graphics, and we will describe an example of this below.</Paragraph>
    <Paragraph position="6"> We believe that the use of such finessing strategies will be important if NL systems are to be usable in the short term. Our data suggests that certain words are used frequently by subjects in doing this task.</Paragraph>
    <Paragraph position="7"> It is also clear that computational linguistics has no treatment of these words. If we wish to build a system which will enable our users to carry out the task, we must be able to respond in some way to such inputs.</Paragraph>
    <Paragraph position="8"> The above techniques may provide the means to do this, although the use of such strategies is still an under-researched area.</Paragraph>
    <Paragraph position="9"> For the unknown words encountered in this study, of the operators, many can be dealt with by simple system modification because they are synonyms of list or show. Within the class of operators, however, it would seem that new semantic interpretation procedures would have to be defined for verbs like arrange or order. These would involve two operations, the first would be the generation of a set, and the second the sorting of that set in terms of some attribute such as age or date. The unknown words relating to explicit reference to set properties would not be difficult to add to the system, given that they can be paraphrased as relative clauses. For example, the sentence Find Van Gogh paintings to include four different themes can be paraphrased as Find Van Gogh paintings that have different themes.</Paragraph>
    <Paragraph position="10"> The context words present a much more serious problem. Current linguistic theory does not have treatments of words like previously or already, in terms of how these scope in dialogues. On some occasions, these are used to refer to the immediately prior query only, whereas on other occasions they - 121 might scope back to the beginning of the dialogue.</Paragraph>
    <Paragraph position="11"> In addition, words like more or another present new problems for discourse theory in that they require extensional representations of answers: Given the query Give me 10 paintingsfollowedby Now give me 5 more paintings, the system has to retain an extensional representation of the answer set generated to the first query, if it is to respond appropriately to the second one. Otherwise it will not have a record of precisely which 10 paintings were originally selected, so that these can be excluded from the second set. This extensional record would have to be incorporated into the discourse model.</Paragraph>
    <Paragraph position="12"> One solution to the dual problems presented by context words is again to either finesse the use of such words or to use a mixed media interface of NL and graphics. If users had the answers to previous queries presented on screen, then the problems of determining the reference set for phrases like the paintings al. ready mentioned could be solved by allowing the users to click on previous answer sets using a mouse, thus avoiding the need for reference resolution.</Paragraph>
    <Paragraph position="13"> For the references to the information source, it would not be difficult to modify the system so it could analyse the majority of the the specific instances recorded here, but it is not clear that all of them could have been solved in this way, especially those that require some form of inferencing based on the database structure.</Paragraph>
    <Paragraph position="14"> There are also a number of unknown words in the data that have not been discussed here, because these did not directly arise from the fact that our users were carrying out a task. Nevertheless, the set of strategies given above is also relevant to these. Just as with the task specific words, there are a number of words which can be added to the system with relatively little effort. The system can be modified to cope with the majority of the open class unknown words, e.g. common nouns, adjectives, and verbs, many of which are simple omissions from the domain-specific lexicon. Some of the closed class words such as prepositions and personal pronouns may also prove straightforward to add.</Paragraph>
    <Paragraph position="15"> There are also a number of these words which did not arise from the task, which are more difficult to add to the system. This is true for a few the open class words domain-independent words, including adjectives like same and different. The majority of the closed class words, may also be difficult to add to the system, including superlatives and various logical connectives, then, neither, some quantifiers, e.g. only, as well as words which relate to the control of dialogue such as right and o.k.. These words indicate genuine gaps in the coverage of the system. For these difficult words, it might necessary to finesse the problem of direct analysis.</Paragraph>
    <Paragraph position="16"> In conclusion, the WOZ technique proved successful for NL evaluation. We identified 3 classes of task based language use which have been neglected by other evaluation methodologies. We believe that these classes exist across applications and tasks: For any combination of application and task, specific operators will emerge, and support will have to be provided to enable reference to context and information structure. In addition, we were able to suggest a number of strategies for dealing with unknown words. For certain words, NL system modification can be easily achieved. For others, different strategies have to be employed which avoid direct analysis of these words.</Paragraph>
    <Paragraph position="17"> These finessing strategies are important if NL systems are to usable in the short term.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML