File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1014_intro.xml
Size: 4,681 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1014"> <Title>An Annotation Scheme for Free Word Order Languages</Title> <Section position="3" start_page="0" end_page="88" type="intro"> <SectionTitle> 2 Motivation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Linguistically Interpreted Corpora </SectionTitle> <Paragraph position="0"> Combining raw language data with linguistic intormation offers a promising basis for the development of new efficient and robust NLP methods. Real-world texts annotated with difihrent strata of linguistic information can be used for grarninar induetion. The data-drivenness of this approach presents a clear advantage over tile traditional, idealised notion of competence grammar.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Existing Treebank Formats </SectionTitle> <Paragraph position="0"> Corpora annotated with syntactic structures are commonly referred to as trt:tbauk.~. Existing tree-bank annotation schemes exhibit a fairly uniform architecture, as they all have to meet the same basic requirements, namely: Descriptivity: GrammaticM phenomena are to be described rather than explained.</Paragraph> <Paragraph position="1"> Theory-independence: Annotations should not be influenced by theory-specific considerations. Nevertheless, different theory-specific representations shMl be recoverable from the annotation, cf. (Marcus et al., 1994).</Paragraph> <Paragraph position="2"> Multi-stratal representation: Clear separation of different description levels is desirable.</Paragraph> <Paragraph position="3"> Data-drivenness: The scheme must provide representational means for all phenomena occurring in texts. Disambiguation is based on human processing skills (cf. (Marcus et at., 1994), (Sampson, 1995), (Black et al. , 1996)).</Paragraph> <Paragraph position="4"> The typical treebank architecture is as follows: Structures: A context-free backboI~e is augmented with trace-filler representations of non-local dependencies. The underlying argum~.nt structure is not represented directly, but can be recovered from the tree and trace-filler ammtations.</Paragraph> <Paragraph position="5"> Syntactic category is encoded in node IM:,els.</Paragraph> <Paragraph position="6"> Gralnmatical flinctioxls constitute a complex label system (cf. (Bies et al., 1995), (Sampson, 1995)).</Paragraph> <Paragraph position="7"> Part-of-Speech is annotated at word level.</Paragraph> <Paragraph position="8"> Thus the context-li'ee constituent backbone plays a pivotal role in the annotation scherne. Due to the substantial differences between existing models of constituent structure, tile question arises of how the theory indcp~ndcnc~, requirement can be satisfied. At this point the mlportance of the underlying argument struc~urC/: is emphasised (cf. (Lehmaim et al., 1996), (Marcus et al., 1994), (Sampson, 1995)).</Paragraph> </Section> <Section position="3" start_page="0" end_page="88" type="sub_section"> <SectionTitle> 2.3 Language-Specific Features </SectionTitle> <Paragraph position="0"> Treebanks of the tbrmat described ill tile M)ove section have been designed tbr English. Tllereff)re, the solutions they offer are not always optirnal for other language types. As for free word order languages, the following features may cause problems: * local a,nd ram-local dependencies tbrm a continuum rather than clear-cut classes of phenomena; null * there exists a rich inventory of discontinuous constituency types (topicalisation, scrambling, clause union, pied piping, extraposition, split NPs and PPs); * word order variation is sensitive to many factors, e.g. category, syntactic flmction, focus; * the gramrn~ticMity of different word permutations does not fit the tr~,ditional binary 'rightwrong' pattern; it, rather tbrms a gradual transition between the two poles.</Paragraph> <Paragraph position="1"> In light of these facts, serious difficulties can be expected arising from the structurM component of the existing formalisms. Due to the frequency of discontinuous constituents in non-eonfigurational langua.ges, the filler-trace mechanism would be used very often, yielding syntactic trees fairly different from the underlying predicate-argument structures.</Paragraph> <Paragraph position="2"> Consider the German sentence (1) d;tra.n wird ihn Anna. erkennen, da.t\] er weint at-it will him Anita. recognise tha.t he cries 'Anna. will recognise Iron a.t his cry' A sample constituent structure is given below:</Paragraph> </Section> </Section> class="xml-element"></Paper>