File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/87/e87-1012_intro.xml
Size: 5,430 bytes
Last Modified: 2025-10-06 14:04:32
<?xml version="1.0" standalone="yes"?> <Paper uid="E87-1012"> <Title>A TOOL FOR THE AUTOMATIC CREATION, EXTENSION AND UPDATING OF LEXICAL KNOWI.F.nGE BA.~F-g</Title> <Section position="2" start_page="0" end_page="70" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Despite efforts in the development of tools for the collection, sorting and editing of lexical information (see Kipfer, 1985 for an overview), the compilation of lexical knowledge bases (LKBs, lexical databases, machine readable dictionaries) is still an expensive and time-intensive drudgery. In the worst case, a LKB has to be built up from scratch, and even if one is available, it often does not come up to the requirements of a particular application. In this paper we propose an architecture for a tool which helps both in the construction (extension and updating) of LKBs and in creating new LKBs on the basis of existing ones. Our work falls in with recent insights about the organisation of LKBs.</Paragraph> <Paragraph position="1"> The main idea is to distinguish two representation levels: a static storage /eve/ and a dynamic knowledge level At the storage level, lexicai entries are represented simply as records (with fields for spelling, phonetic transcription, lexical representation, syntactic category, case frames, frequency counts, definitions etc.) stored in text files for easy portability. The knowledge level is an object-oriented environment, representing linguistic and lexicographic knowledge in a number of objects with attached information and procedures, organised in generalisation hierarchies. Records at the storage level are lexical objects in a 'frozen' state. When accessed from the knowledge level, these records 'come to life' as structured objects at some position in one or more generalisation hierarchies (record fields ate interpreted as slot fillers).</Paragraph> <Paragraph position="2"> This way, a number of procedures becomes accessible (through inheritance) to these lexical objects.</Paragraph> <Paragraph position="3"> For the creation and updating of dictio~es, coll~stmctors ate defined: objects at the knowledge level which compute new lexicai objects (corresponding to new records at the storage level) and new information ~n~hed to already existing lexical objects (corresponding to new fields of existing records). To achieve this, constructor objects maiC/ use of information already existing in the LKB and of the linguistic kaowledge r~re~nted at the knowledge level. Few constructors can be developed which arc complete, i.e. which can operate fully automatically without checking of the output by the user. Themfore, a central part in our system is a cooperative user interface, whose task it is to reduce initiative from the user to a minimum.</Paragraph> <Paragraph position="4"> Filters are another category of objects. They use an existing LKB to create automatically a new one. During this transformation, specified fields and entries arc k~, and others are omitted. The storage strategy used may be changed as well. E.g. an indexed-sequential file of phoneme representations could be derived from a dictionary containing this as well as oliver information, and stored in another way (e.g. as a sequential text file). The derived lexical knowledge base we call a daughter dict/onary (DD) and the source LKB moor dictionary (MD).</Paragraph> <Paragraph position="5"> Filters use the lexicographic knowledge specified at the knowledge level. In principle, one MD for each language should be sufficient. It should contain as much information as possible (see Byrd, 1983 for a similar opinion). Constmctors can be developed to assist in creating, extending and updating such an MD, thereby reducing its cost, while LKBs for specific applications or purposes could be derived from it by means of filters. The basic architecture of our system is given in Figure 1.</Paragraph> <Paragraph position="6"> Current and forthcoming storage and search technology (optical disks, dictionary chips) allow us to store enormous amounts of lexical data in external memory, and retrieve them quickly. In view of this, the traditional storage versus computation debate (should linguistic information be retrieved or computed?) becomes irrelevant in the context of language technology. Natural Language Updating LKBs.</Paragraph> <Paragraph position="7"> Processing systems should exhibit enough redundancy to have it both ways. For instance, at the level of morphology, derived and inflected forms should be stored, but at the same time enough linguistic knowledge should be available to compute them if necessary (e.g. for new entries). We think the proper place for this linguistic knowledge is the dictionary system.</Paragraph> <Paragraph position="8"> There is some evidence that this redundancy is psychologically relevant as well. The duplication of information (co-existing rules and stored forms) could be part of the explanation for the fuzzy results in most psycho-linguistic experiments aimed at resolving the concrete versus abstract controversy about the organisation of the mental lexicon (Henderson, 1985). The concrete hypothesis states that it is possible to produce and interpret word forms without resort to morphological rules while the abstract hypothesis claims that in production and comprehension rules are routinely used.</Paragraph> </Section> class="xml-element"></Paper>