File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1017_intro.xml
Size: 1,679 bytes
Last Modified: 2025-10-06 14:06:17
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1017"> <Title>Probabilistic and Rule-Based Tagger of an Inflective Languagea Comparison</Title> <Section position="3" start_page="111" end_page="112" type="intro"> <SectionTitle> 2 STATISTICAL EXPERIMENTS 2.1 CZECH EXPERIMENTS 2.1.1 CZECH TAGSET </SectionTitle> <Paragraph position="0"> Czech experiment is based upon ten basic POS classes and the tags describe the possible combinations of morphological categories for each POS class. In most cases, the first letter of the tag denotes the part-of-speech; the letters and numbers which follow it describe combinations of morphological categories (for a detailed description, see Table 2.1 and Table Note especially, that Czech nouns are divided into four classes according to gender (Sgall, 1967) and into seven classes according to ease.</Paragraph> <Paragraph position="1"> verbs, infinitives VTa verbs, transgressives VWntsga verbs, common Vpnstmga pronouns, personal PPpnc pronouns, 3rd person PP3gnc pronouns, possessive PRgncpgn &quot;svfij&quot; --&quot;his&quot; referring to PSgnc subject reflexive particle &quot;se&quot; PEc pronouns, demonstrative PDgnca Not all possible combinations of morphological categories are meaningful, however. In addition to these usual tags we have used special tags for sentence boundaries, punctuation and a so called &quot;unknown tag&quot;. In the experiments, we used only those tags which occurred at least once in the training corpus. To illustrate the form of the tagged text, we present here the following examples from our training data, with comments:</Paragraph> </Section> class="xml-element"></Paper>