File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2019_abstr.xml

Size: 7,257 bytes

Last Modified: 2025-10-06 13:46:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2019">
  <Title>LDVLIB(LEH): A SYSTEM POR INTERACTIVE LEHMATIZING AND ITS APPLICATION</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
LDVLIB(LEH): A SYSTEM POR INTERACTIVE LEHMATIZING AND ITS
APPLICATION
</SectionTitle>
    <Paragraph position="0"> R. Drewek, M. Erni Seminar of Romance Languages, University of Zurich/ Switzerland A concrete pro~.ect-like our &amp;quot;Concordanza lemmatizzata delle &amp;quot;Operette morali&amp;quot; di G. Leopardi&amp;quot; (a lemmatized concordance of an italian text of the 18th century with some archaic phenomena and of about 70 &amp;quot;000 tokens and 9 &amp;quot;500 %Tpes) is a good opportunity to introduce a new software package for lin~tistic data processing not. as mere cumulation of routines or statements but as a com$ortable tool Just in use.</Paragraph>
    <Paragraph position="1"> LDVIJ3 is no experimental, single language dedicated and fragile collection of algorithms. It tries to provide fast and reliable standard procedures for everyday Jobs in linguistic and literary research and sometimes even a bit more. The package consists of 34 programs and 41 modules, mainly written in PL/1. They have been carefully developped in the last seven years and been tested in varAous research projects since then. The programs can be grouped by purpose:  - text .preparation (editing, correcting and printing) - text corpus handling - lexical text analysis, lexicostatistics - statistical string description (length phenomena) - machine dictionary management - production of indices, frequency dictionaries and concordances - lennatization - analysis of spoken language texts - 86 - content ana2.yeis - utilities for bibliograph~ee, document preps~ation, g~aphics and ~phemes  Whereas programs can be used by the non pro~-s~ researcher commnicat~m~ with the pro~m by ke~ord oriented mad widely unfomatted co.sand language, a set of, modules is thou~t to support the pro~w~aing linguist in the fields of striug manipulation, word and word list ma~pulation, dictionary haudli~, VDU fullscreen co~munioations, print plot and other purposes.</Paragraph>
    <Paragraph position="2"> All programs which produce numerical output from statistical analysis provide a data interface to input well known statistic software like SI~S or SAS. The text coding rules are oriented on the printed original with a few restrictions which can easily be learned even by non trained personal. The character set is able to receive any roman transliteration of languages using different ~aphemes, even old Egyptian hieroglyph texts were analyzed by LDVI.~ programs.</Paragraph>
    <Paragraph position="3"> The complex task of producing a concordance claims a lot of facilities given by LDVLI3 programs. The &amp;quot;crucial point&amp;quot; of lemmatization must be discussed to define an appropriate interface in man-machine interaction to obtain reasonable philological results. Our design of an interactive lemmatizer m~7 be useful to show not onl~ mau-machine interaction but computational linguist/literary expert interaction as well. And it might reveal the lack of lingulstlcally tellable algorithms for a fully automatic approach to this problem.</Paragraph>
    <Paragraph position="4"> LDVLIB(LEM) doesn't lenm~tize automatically but it supports lemmatization as follows. It allows to work on single portions of a text and one or more users have access to the on-line machine diotions~7 at the same time. The user gets presented on the screen: - 87 - in the upper part, from the KWlC-concordance: every token to be lemmatized,with context and references (page, line) - in the lower part, from the machine dictionary: proposals of lemmatizing relative to the type shown in. the upper part.</Paragraph>
    <Paragraph position="5"> Interactive lemmatizing consists therefore in recording the (automatically generated) number of the convenient proposal in the line of the token. If there doesn't result any proposal or not a convenient one from the machine dictionary, the user will insert innnediately the convenient dictionary entry and record its proposal number in the upper part of the screen. Such a new proposal will be stored in an additional dictionary that is to be transferred periodically into the main dictionary. null The always growing ~chin e dioticnar~ bases on a national language frequence vocabulary of about 25 &amp;quot;000 types including about 5;000 lemmata. There has been put a lot of care in the design of the information codes. The machine dictionary entries consist of 4 fields: type (inflected wordform), lemma (deflected keyword), lemma information and type information. The lamina information includes the following segments:  - word class and additional informatlons - additional lemmata (enolitio article, pronouns) - disambiguation of homography - cross-reference to the standard lemma (to be generated in the printed output): - graphic variant of the lamina (archaic writing) - alteration of the le.-.- (diminutive by suffixation) - short paraphrase in case of homonymy, where dlsambig null uation is default (in case of polysemy, where disambiguation is optional) The type information includes the following segments:  - 88 - morphological information (gender, number, person, mood, tense, case, gradation) - morphological variants (archaic inflexion) - graphic variants (elision, short form) - special, i.e. idiomatical use - relation to a distinct vocabulary (e.g. frequence vocabulary)  The users of concordances (le-,,-tized or not) have different interests. In literar~ research one may study the si~Ele types or even merely the single tokens of a !e&amp;quot;&amp;quot;&amp;quot; in the order of occurrence in a work. In linguistic research one may be interested in alphabetic order of the types and in subsequent alphabetic order of the right context of the single tokens. These two examples of ordered concordances don't need the type information. But the type information as provided in our machine dictionary will allow to get s~ more sophisticated internal order of the lemmata: e.g. singular preceeds plural, positive preoeeds comparative and superlative, present preceeds past, morphological and graphic variants are distinguished or not, idiomatical uses are ordered separately or not. The access to a lemmatized concordance will be as to a data base and the lin@~tist interested in certain phenomena may select by options e.g. the substantives and adjectives only or all verbs in passive construction. LDVLIB(LEM) allows always to the user to get full print of the lemnatized concordance or a reduced print of a list of 1 to n lemmata. It will be shown that support of the philologist degs work by a large dictionary is not only useful in concordance making, but as well cumulates a lot of material for subsequent lexicographic work. Looking ahead, two questions &amp;quot;must be considered: the integration of a dictionary data base and the productive use of grammatical procedures like ATNs to shift balance between intellectual work and machine support in direction to &amp;quot;a little bit more automatic&amp;quot;.</Paragraph>
    <Paragraph position="6"> - 89 o</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML