File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/e89-1039_intro.xml
Size: 5,424 bytes
Last Modified: 2025-10-06 14:04:43
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1039"> <Title>EMPIRICAL STUDIES OF DISCOURSE REPRESENTATIONS FOR NATURAL LANGUAGE INTERFACES</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Natural Language interfaces will in the foreseeable future only be able to handle a subset of natural language. The usability of this type of interfaces is therefore dependent on finding subsets of natural language that can be used without the user experiencing inexplicable &quot;holes&quot; in the system performance, i.e. finding subsets for which we can computationally handle complete linguistic and conceptual coverage. This points to the need for theories of the 'sublanguage' or 'sublanguages' used when communicating with computers (Kittredge and Lehrberger, 1982). But unfortunately: &quot;we have no well-developed linguistics of natural-language man-machine communication.&quot; (von Hahn, 1986 p. 523) One way of tackling this problem is to simulate the man-machine dialogue by letting users communicate with a background system through an interface which they have been told is a natural language interface, but which in reality is a per-son simulating such a device (sometimes called a Wizard of Oz experiment, see Guindon, Shuldberg, and Conner, 1987). While not being a new technique, early examples are Malhotra (1975, 1977), Thomas (1976), and Tennant (1979, 1981), only a limited number of studies have been conducted so far. A considerably larger number of similar studies have been conducted where the users knew that they were communicating with a person. This is unfortunate, since those researchers who have considered the issue have noted that the language used when communicating with a real or simulated natural language interface has differed from the language used in teletyped dialogues between humans, although it has been difficult to the exact nature of these differences. The language used has been described as 'formal' (Grosz, 1977), 'telegraphic' (Guindon et al, 1987), or 'computerese' (Reilly, 1987).</Paragraph> <Paragraph position="1"> Only a few Wizard of Oz studies have been run, using different background systems and diffeting in questions asked and methods of analysis used. It is therefore premature to draw any far-reaching conclusions. With some caution, bowever, perhaps the following can be accepted as a summary of the pattem of results obtained so far: The syntactic structure is not too complex (Guindon et al, 1987, Reilly, 1987), and presumably within the capacity of current parsing technology. Only a limited vocabulary is used (Richards and Underwood, 1984), and even with a generous number of synonyms in the lexicon, the size of the lexicon will not be a major stumbling block in the development of an interface (Good, Whiteside, Wixon, and Jones, 1984).</Paragraph> <Paragraph position="2"> However, it is unclear how much of this vocabulary is common across different domains and different tasks, and the possibility of porting such a module from one system to another is an open question. Spelling correction is an important feature of any natural language based system. So-called ill-formed input (fragmentary sentences, ellipsis etc) is very frequent, but the use of pronouns seems limited (Guindon, et al, 1987, J0nsson and Dahlb/~ck, 1988).</Paragraph> <Paragraph position="3"> ,However, the results concerning ill-formedness are difficult to evaluate, mainly because they are often presented without an explicit description of the linguistic representation used. An utterance can obviously only be ill-formed relative to a formal specification of well-formedness. With some hesitation the exclusion of such a specification can perhaps be accepted as far as syntax is - 291 concemed. Both linguistic theory and our linguistic intuitions are adequately developed to guarantee some consensus on what counts as ungrammatical (though the written language bias in linguistics (Linell, 1982), i.e. the tendency to regard the written language as the norm, and to view other forms as deviations from this, has in our opinion lead to an overestimation of the ill-formedness of the input to natural language interfaces also in this area). But when it comes to dialogue aspects of language use, we lack both theory and intuitions. What can be said without hesitation, however, is that the use of a connected dialogue, where the previous utterances set the context for the interpretation of the current one, is very common.</Paragraph> <Paragraph position="4"> It is therefore necessary to supplement previous and on-going linguistic and computational research on discourse representations with empirical studies of different man-computer dialogue situations where natural language seems to be a useful interaction technique. Not doing so would be as sensible as developing syntactic parsers without knowing anything about the language they should parse.</Paragraph> <Paragraph position="5"> Other researchers have proposed the use of field evaluations as they are more realistic. However, doing so requires a natural language interface advanced enough to handle the users language otherwise the evaluation will only test the NLI's already known limitations, as shown by Jarke, Turner, Stohr, Vassilou & Michielsen (1985).</Paragraph> </Section> class="xml-element"></Paper>