File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1910_intro.xml

Size: 2,671 bytes

Last Modified: 2025-10-06 14:02:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1910">
  <Title>Bootstrapping Parallel Treebanks</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work on Parallel
Treebanks
</SectionTitle>
    <Paragraph position="0"> The fleld of parallel treebanks is only now evolving into a research fleld. (Cmejrek et al., 2003) at the Charles University in Prague have built a treebank for the speciflc purpose of machine translation, the Czech-English Penn Treebank with tectogrammatical dependency trees. They have asked translators to translate part of the Penn Treebank into Czech with the clear directive to translate every English sentence with one in Czech and to stay as close as possible to the original.</Paragraph>
    <Paragraph position="1"> This directive seems strange at flrst sight but it makes sense with regard to their objective.</Paragraph>
    <Paragraph position="2"> Since they speciflcally construct the treebank for training and evaluating machine translation systems, a close human translation is a valid starting point to get good automatic translations. null At the University of M~unster (Germany) (Cyrus et al., 2003) have started working on FuSe, a syntactically analyzed parallel corpus.</Paragraph>
    <Paragraph position="3"> The goal is atreebank with English and German texts (currently with examples from the Europarl corpus). The annotation is multi-layered in that they use PoS-tags, constituent structure, functional relations, predicate-argument structure and alignment information. However their focus is on the predicate-argument structure.</Paragraph>
    <Paragraph position="4"> The Nordic Treebank Network1 has started an initiative to syntactically annotate the flrst chapter of \SophiePs World&amp;quot;2 in the nordic languages. This text was chosen since it has been translated into a vast number of languages and since it includes interesting linguistic properties such as direct speech. Currently a prototype of this parallel treebank with the flrst 50 sentences in Swedish, Norwegian, Danish, Estonian and German has been flnished. The challenge in this project is that all involved researchers annotate the Sophie sentences of their language in their format of choice (ranging from dependency structures for Danish and Swedish to constituency structures for Estonian and German).</Paragraph>
    <Paragraph position="5"> In order to make the results exchangeable and comparable all results have been converted into TIGER-XML so that TIGERSearch3 can be used to display and search the annotated sentencesmonolingually. Thealignmentacrosslanguages is still open.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML