File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/01/w01-0811_ackno.xml

Size: 5,400 bytes

Last Modified: 2025-10-06 13:50:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0811">
  <Title>References</Title>
  <Section position="4" start_page="0" end_page="0" type="ackno">
    <SectionTitle>
2 Lexicalisation
</SectionTitle>
    <Paragraph position="0"> Lexicalisation amounts mainly to searching and choosing: one has to find lemmata, matching a given conceptual chunk, and then one has to choose among them. While much emphasis has been given to the notion of choice, far less attention has been paid to the search mechanisms (or access strategies). I will present during my talk some preliminary results concerning a system that is meant to help people to overcome the tip-of-the tongue problem, a well known stumbling block in real-time processing: we know what we want to say, we know that we do know the word, yet we cannot access it (Brown and Mc Neill, 1966).</Paragraph>
    <Paragraph position="1"> If the fundamental role of a dictionary in NLG is obvious, it is less evident as to the principles governing its compilation. A good dictionary is a place with a lot of information, structured in such a way that the relevant information is easily accessible when needed. In other words, what counts is 'what is in the dictionary' (content) and 'how the information is organized (meaning, form, sound). These two factors are not sufficient though: access depends not only on the structure of the lexicon (organisation), but also on the efficiency of 1 While in Moore &amp; Paris (1993), the messages are not given, the goal is : it cannot emerge as a side effect.</Paragraph>
    <Paragraph position="2"> 2 What shall we do if not all the data can be integrated, or if we lack data for filling all the slots of a chosen structure? Shall we keep the structure and look for more data, or use a different structure as it integrates more of the data? 3 One of the reasons for this is that we do not have a clear understanding concerning the mapping between different conceptual configurations and their corresponding rhetorical effect(s). If we did, we could use them bidirectionally (for analysis and generation).</Paragraph>
    <Paragraph position="3"> search strategies, an issue not addressed at all by the generation community. As a matter of fact, from a strict computational linguistic point of view, the whole matter may be a non-issue.</Paragraph>
    <Paragraph position="4"> However, the problem does become relevant when we look at generation as a machine-mediated process (people using a word processor for writing) or from a psycholinguistic point of view: word access in writing or spontaneous discourse.</Paragraph>
    <Paragraph position="5"> * The speaker's problem : choosing words, finding them or both ? Obviously, there is more to lexicalisation than just choosing words: one has to find them to begin with. No matter how rich a lexical database may be, it is of little use if one cannot access the relevant information in time. Access is probably THE major problem that we have to cope with when trying to produce language in real-time (in spoken or written form). As I will show during my talk, this is precisely a point where computers can be of considerable help.</Paragraph>
    <Paragraph position="6"> Work on memory has shown that access depends crucially on the way information is organized, yet the latter can vary to a great extent. From speech error literature we learn, that ease of access depends not only on meaning relations,--, i.e. the way words are organized in our mind),-- but also on linguistic form (letters, phonemes). Researchers collecting speech errors have offered countless examples of phonological errors in which segments (phonemes, syllables or words) are added, deleted, anticipated or exchanged (Fromkin, 1993). The data clearly show that knowing the meaning of words does not guarantee their access.</Paragraph>
    <Paragraph position="7"> The work on speech errors also reveals that words are stored in at least two modes, by meaning and by form (written, spoken), and it is often this latter which inhibits finding the right token: having inadvertently recombined the components of a given word (syllable scrambling), one may end up producing a word, which either does not exist or is simply different from the one in mind. This kind of recombination, resulting from bookkeeping problems (due to time pressure), parallel processing and information overload, may disturb or prevent the access of the right word. Hence the usefulness of a tool which allows the process to be reversed. In order to allow this to be done, it is necessary to represent words not only in terms of their meaning, but also in terms of their written and spoken form. The fact that words are indexed both by meaning and by sound could now be used to our advantage. The phonetic coding of words allows the recombination of their segments (syllables), hence the presentation of new candidates, among which the user should find the one s/he is looking for.4 The fact that words are coded semantically keeps the number of candidates to be presented small.</Paragraph>
    <Paragraph position="8"> Conclusion I have tried to illustrate briefly to what extent we have neglected the human factor in our work. I have also attempted to show how a simple computational method (combinatorics and filtering) can be used to bridge (one of) the gap(s) between TGBC and TGBP: text generation by people.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML