File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0304_metho.xml
Size: 4,275 bytes
Last Modified: 2025-10-06 14:13:33
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0304"> <Title>Towards a cross-linguistic tagset</Title> <Section position="2" start_page="36" end_page="36" type="metho"> <SectionTitle> 4. Adjective </SectionTitle> <Paragraph position="0"> Subclassifieation of adjectives seems to be a complicated matter since many of the subdivisions we found are determined by the research context. For example, we found attributive adjectives (main, chief), nominal adjectives and even semantically superlative adjectives.</Paragraph> <Paragraph position="1"> So at this stage we are not able to provide some consistent subclassification.</Paragraph> <Paragraph position="2"> Again, additional features such as number, case, gender, ditto (as described above) and form (for example English: -ed, -ing) can be added, for example:</Paragraph> </Section> <Section position="3" start_page="36" end_page="37" type="metho"> <SectionTitle> ADJ #gen#pos#acc#masc# </SectionTitle> <Paragraph position="0"> refers to a definite genera/ positive accusative masculine adjective, for example German ~ntelligenten Subcat different forms of transitivity such as intransitive, copular, transitive, etc.</Paragraph> <Paragraph position="1"> Again features as number, person and compounding can be added.</Paragraph> <Paragraph position="2"> For example: V:#l#3#past:~ refers to a lexical verb (third person, past tense), for example went. Features for compounding can be added. A general subdassification of adverbs is very hard to establish because of the semantle nature of such subdivisions. For example: ADV#pos# refers to a positive adverb, for example Features indicating the case a preposition requires for its complement, as well as ones for compounding can be added. For example: PRP#phra#dat# refers to a phrasal preposition (required by a prepositional verb) that combines with a complement with dative case. .Interjections are normally referred to as words that do not enter into syntactic relations and that do not have a clear morphological structure. Very often they are of a onomatopoeic nature. Examples of interjections are: aha, .hm, wow, past, oops.</Paragraph> <Paragraph position="3"> Formu/a/c Expressions are fixed expressions used as formulalc reactions in a certaln dialogue contexts. Examples are: a\]/ the best; excuse me; dank u weI; Danke, gut.</Paragraph> <Paragraph position="4"> Particles are morphologically fixed words that do not belong to any of the word classes described above and that can function in many ways in a sentence, for example as introducing element of the subject of an infinitival clause (for example: I am waiting for the meeting to begin), or they function as fixed answers to questions (for example: yes, no, ja 14 Ditto-tags can be applied to the d.ifl~erent elements of the tagged item. For example: Good FOR#1/2# Mornins FOR#2/2# 14. for more detailed information *bout particles, see Engel (1988) 11. Open Wordclass : O (open) The subclasses to be distinguished within this wordclass category may vary, depending on the specific language the tagset is used for. For English the genitive marker belongs in this category; the same goes for the German verb particle. For example: O#GM# refers to a genitive marker.</Paragraph> <Paragraph position="5"> Part V Conclusion In this paper we have sketched the way in which linguistic enrichment of corpora could be standardized. We have reported on our efforts to standardize the word class tags. In addition, we compared the tag sets of a number of prominent corpora. The differences between these sets encouraged us to proceed towards a standardized cross-linguistic tagset. This set could contribute to improved access and exchange of analyzed corpora. In addition to a standardized tagset it might be interesting to determine if and how a standard annotation of linguistic information on higher levels of description (syntax, semantics, pragmatics) can be established.</Paragraph> <Paragraph position="6"> We are working on these issues and we hope to encourage other (academic and industrial) researchers in the field of corpus linguistics to participate in the discussion about common guidelines for the linguistic annotation of corpora in the future.</Paragraph> </Section> class="xml-element"></Paper>