File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2717_intro.xml
Size: 1,385 bytes
Last Modified: 2025-10-06 14:04:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2717"> <Title>XML-based Phrase Alignment in Parallel Treebanks</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The combined research on treebanks and parallel corpora has recently led to parallel treebanks. A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated (i.e. parallel) documents. In addition, the syntax trees of two corresponding sentences are aligned on a sub-sentential level. This means word level, phrase level and clause level, but we will refer to it as phrase alignment since it best represents the idea. Parallel treebanks can be used as training or evaluation corpora for word and phrase alignment, as input for example-based machine translation (EBMT), as training corpora for transfer rules, or for translation studies.</Paragraph> <Paragraph position="1"> We are developing an English-German-Swedish parallel treebank. In this paper we will focus on the representation of the treebank and the alignment. We will briefly explain the steps for building the parallel treebank and describe our new alignment tool. This paper is a follow-up and revision of (Samuelsson and Volk, 2005) based on fresh insights from this tool.</Paragraph> </Section> class="xml-element"></Paper>