File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/a97-1014_concl.xml
Size: 2,909 bytes
Last Modified: 2025-10-06 13:57:45
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1014"> <Title>An Annotation Scheme for Free Word Order Languages</Title> <Section position="8" start_page="92" end_page="92" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> As the annotation scheme described ill this paper focusses on annotating argunlent structure rather than constituent trees, it differs from existing treebanks in several aspects. These differences can be illustrated by a comparison with the Penn Treeba.nk annotation scheme. The following features of our fornlMisrn a.re then of particular importance: * simpler (i.e. 'fiat') representation structures * complete absence of ernl.)ty categories * no special nlechanisnls tbr handling discontinuous constituency The current tagset conlprises only 16 node labels and 34 function tags, yet a. finely grained cla.ssification will take place in the nea.r future.</Paragraph> <Paragraph position="1"> We have argued that the selected approach is better suited for producing higl, quality interpreted col pora m languages exhil)iting free constituent order.</Paragraph> <Paragraph position="2"> In general, the resulting interpreted data also are closer to semantic annotation and more netltra.l with respect to particular synta, ctic theories.</Paragraph> <Paragraph position="3"> As modern linguistics is a.lso becorning rnore aware of the irnportance of larger sets of m~turally occur- null ring data, interpreted corpora, are a valuable resource for theoreticzd and descriptive linguistic research. In a.ddition the a.t~proach provides empirical material lot psycholinguistic investigation, since preferences for the choice of certain syntactic constructions, linea.rizations, and atta.chments that have been observed in online experiments of language production and comprehension can now be put in relation with the frequency of these alterna,tives m la.rger amounts of texts.</Paragraph> <Paragraph position="4"> Syntactically a.nnotated corpora of German haze been missing until now. In the second phase of the project Verbnmbi\] a. treebank for 30,000 German spoken sentences a.s well a.s for the S~tllle anlounl, of English ~md .\]apanese sentences will be created. We will closely coordinate the further develolmlent of our corpus with the annotation work in Verbmobil and with other German efforts in corpus annotation.</Paragraph> <Paragraph position="5"> Since the combinatorics of syntactic constructions crea.tes a demand tbr very large corpora, efficiency of annotation is an important criterion tbr the success of the developed methodology a.nd tools. Our annotation tool supplies efficient ma.nipulation and immediate visualization of argument structures. Partial automation included it, the current version significantly reduces the manual effort. Its extension is subject to fllrther investigations.</Paragraph> </Section> class="xml-element"></Paper>