File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0906_intro.xml

Size: 2,930 bytes

Last Modified: 2025-10-06 14:07:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0906">
  <Title>A Computational Approach to Deciphering Unknown Scripts</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> With surprising frequency, archaeologists dig up documents that no modern person can read.</Paragraph>
    <Paragraph position="1"> Sometimes the written characters are familiar (say, the Phoenician alphabet), but the language is unknown. Other times, it is the reverse: the written script is unfamiliar but the language is known. Or, both script and language may be unknown.</Paragraph>
    <Paragraph position="2"> Cryptanalysts also encounter unreadable documents, but they try to read them anyway.</Paragraph>
    <Paragraph position="3"> With patience, insight, and computer power, they often succeed. Archaeologists and linguists known as epigraphers apply analogous techniques to ancient documents. Their decipherment work can have many resources as input, not all of which will be present in a given case: (1) monolingual inscriptions, (2) accompanying pictures or diagrams, (3) bilingual inscriptions, (4) the historical record, (5) physical artifacts, (6) bilingual dictionaries, (7) informal grammars, etc.</Paragraph>
    <Paragraph position="4"> In this paper, we investigate computational approaches to deciphering unknown scripts, and report experimental results. We concentrate on the following case:  This situation has arisen in many famous cases of decipherment--for example, in the Linear B documents from Crete (which turned out to be a &amp;quot;non-Greek&amp;quot; script for writing ancient Greek) and in the Mayan documents from Mesoamerica. Both of these cases lay unsolved until the latter half of the 20th century (Chadwick, 1958; Coe, 1993).</Paragraph>
    <Paragraph position="5"> In computational linguistic terms, this decipherment task is not really translation, but rather text-to-speech conversion. The goal of the decipherment is to &amp;quot;make the text speak,&amp;quot; after which it can be interpreted, translated, etc. Of course, even after an ancient document is phonetically rendered, it will still contain many unknown words and strange constructions. Making the text speak is therefore only the beginning of the story, but it is a crucial step.</Paragraph>
    <Paragraph position="6"> Unfortunately, current text-to-speech systems cannot be applied directly, because they require up front a clearly specified sound/writing connection. For example, a system designer may create a large pronunciation dictionary (for English or Chinese) or a set of manually constructed character-based pronunciation rules (for Spanish or Italian). But in decipherment, this connection is unknown! It is exactly what we must discover through analysis. There are no rule books, and literate informants are long-since dead.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML