File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2112_intro.xml
Size: 2,875 bytes
Last Modified: 2025-10-06 14:02:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2112"> <Title>R{j}ecnik.com: English--Serbo-Croatian Electronic Dictionary</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The dictionaries, monolingual, bilingual, or multilingual, are the standard way of collecting and presenting lexicographic knowledge about one or more languages. The electronic dictionaries (EDs) are not merely a straightforward extension of their printed counterparts, but they entail additional purely computational problems.</Paragraph> <Paragraph position="1"> ED as marked-up text. An ED may be seen simply as a long, marked-up text. The important computational issues arise around the problem of efficient keyword search and appropriate presentation of the dictionary data. The search is performed in the context of a markup scheme, such as SGML or XML, and the query model has to provide expressibility for search queries within this scheme; e.g, searching for a keyword within a certain text region. An example of such research is the OED project conducted from 1987 through 1994 (Tompa and Gonnet, 1999; OED, 2004). One of the achievements of the OED project was that the search software was able to retrieve all occurrences of words and phrases within the dictionary corpus of size 570 MB in less than a second (Tompa and Gonnet, 1999).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Knowledge-base Structure of an ED. </SectionTitle> <Paragraph position="0"> The second aspect of EDs is the structure of information represented in them. This structure is of interest to linguists, lexicographers, and various dictionary users, but it is of chief interest to computational linguists. A major computational challenge is how to design the dictionary structure in order to make its maintenance manageable and efficient. Various lexical resources that were developed in the last few decades have become invaluable in Natural Language Processing (NLP), most notably the WordNet. Another reason why efficiency in dictionary maintenance is important is that natural languages change dynamically and good ED should track these lexical innovations. Different domains need to be covered, and the parts of the dictionary that are becoming old and archaic need to be time-stamped and archived as such.</Paragraph> <Paragraph position="1"> In this paper, we present a bilingual bidirectional on-line Serbo-Croatian (SC)-English dictionary that has been available on the Internet since 1999. This is the first published report describing this resource. The dictionary internal structure is motivated by the WordNet structure, and it provides a way of producing mono-lingual SC and bilingual SC-English wordnet.</Paragraph> </Section> </Section> class="xml-element"></Paper>