File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2103_metho.xml

Size: 13,759 bytes

Last Modified: 2025-10-06 14:10:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2103">
  <Title>A Quantitative Approach to Preposition-Pronoun Contraction in Polish</Title>
  <Section position="3" start_page="17" end_page="17" type="metho">
    <SectionTitle>
2 Corpus Distribution of Pronouns and
</SectionTitle>
    <Paragraph position="0"> Prepositions within PPCs For the corpus-based investigation of the distribution of pronouns and prepositions within Polish PPCs, the IPI PAN Corpus of Polish was used.4 Because of their very low frequency, the PPCs were searched for in the largest of the available IPI PAN subcorpora, i.e., the automatically annotated wstepny corpus (over 70 million segments).</Paragraph>
    <Paragraph position="1"> PPCs had to be identi ed manually, as they were not recognized in the wstepny corpus as consisting of multiple segments, instead being identi ed as unknown forms (tagged by ign).</Paragraph>
    <Paragraph position="2"> Thus, in the rst instance, a search was performed for all unknown forms ending in -(e)*n.5 Next, a total of 1193 PPCs were manually extracted from 3308 result matches. Later, an interpretation in terms of grammatical features was assigned to each contracted pronoun by identifying its antecedent. The antecedent identi cation proceeded manually as well. Finally, the set of the acquired PPCs was veri ed by querying the corpus for all potential contractions of unaccented postprepositional pronouns with each particular Polish preposition. null As a result, genitive and accusative masculine human plural, locative masculine inanimate singular, genitive and accusative masculine inanimate plural, genitive and accusative neuter singular, genitive, accusative and locative neuter plural, genitive and accusative feminine singular, and genitive, accusative and locative feminine plural pronominal forms within PPCs were recorded in addition to the masculine human, masculine animate and masculine inanimate singular pronomi- null ments), morphosyntactically annotated corpus of Polish, developed at the Institute of Computer Science at the Polish Academy of Sciences (cf. (Przepi rkowski, 2004)). The corpus web page is located athttp://korpus.pl. For quantitative information about the corpus, see Przepi rkowski (to appear).</Paragraph>
    <Paragraph position="3"> 5Note that all TPPPs contracting with prepositions are realized by the syncretic form -(e)*n.</Paragraph>
    <Paragraph position="4"> nal forms.</Paragraph>
    <Paragraph position="5"> A further observation that was made on the basis of corpus data was that the set of prepositions detected in contractions with unaccented postprepositional pronouns involves a very limited number of elements, more precisely dla 'for', do 'to', na 'on', od 'from', po 'after', przez 'by', w 'in', za 'behind', z 'with', and przed 'in front of'. No occurrences of contractions containing other prepositions were found in the corpus. While the absence of contractions involving secondary prepositions, such as ponad 'above', poprzez 'through', mi edzy 'between', etc. corresponds to dictionary data, the non-appearance of contractions containing prepositions such as bez 'without', o 'about', nad 'above', or pod 'under', provided in Polish dictionaries such as (Dubisz, 2003) or (Ba*nko, 2000), does not.6 Figure 1 on the next page presents an overview of the distribution of all unaccented postprepositional pronouns and prepositions within PPCs found in the IPI PAN Corpus. For each pronoun form, the context in which it occurs is speci ed, i.e., the contraction of that form with a particular preposition, and the total number of times this form occurred together with the percentage of the total frequency of all unaccented postprepositional forms is recorded. In addition, the total of all occurrences of each contraction found in the corpus is indicated, as well as the percentage of the total frequency of all preposition-pronoun contractions occurring in the corpus.7</Paragraph>
  </Section>
  <Section position="4" start_page="17" end_page="21" type="metho">
    <SectionTitle>
3 Quantitative Interpretation
</SectionTitle>
    <Paragraph position="0"> To determine whether the distribution of the unaccented postprepositional pronouns and prepositions within PPCs found in the IPI PAN Corpus may be considered linguistically signi cant and, in consequence, may establish the basis for a revision of the traditionally assumed in ectional paradigms, a number of quantitative procedures must be performed.</Paragraph>
    <Paragraph position="1"> First of all, it must be determined whether the frequency of each unaccented postprepositional 6Note, however, that in spite of the fact that contractions such as o*n 'for_TPPP' or we*n 'in_TPPP are included in dictionaries of contemporary Polish, these expressions are not  dla*n do*n na*n we*n ze*n ode*n przeze*n po*n za*n przede*n Total, Percentage 'for_TPPP' 'to_TPPP' 'on_TPPP' 'in_TPPP' 'with_TPPP' / 'from_TPPP' 'from_TPPP' 'by_TPPP' 'after_TPPP' 'behind_TPPP' 'in front of_TPPP nom, m1, sg 0 0.00 a1 gen, m1, sg 74 72 17 12 0 175 14.68 a1 dat, m1, sg 0 0 0.00 a1 acc, m1, sg 207 39 140 0 4 0 390 32.70 a1 instr, m1, sg 0 0 0 0 0.00 a1 loc, m1, sg 0 0 0 0 0.00 a1 nom, m1, pl 0 0.00 a1 gen, m1, pl 2 1 0 0 0 3 0.25 a1 dat, m1, pl 0 0 0.00 a1 acc, m1, pl 3 0 2 0 0 0 5 0.42 a1 instr, m1, pl 0 0 0 0 0.00 a1 loc, m1, pl 0 0 0 0 0.00 a1 nom, m2, sg 0 0.00 a1 gen, m2, sg 2 2 1 0 0 5 0.42 a1 dat, m2, sg 0 0 0.00 a1 acc, m2, sg 10 0 0 0 0 0 10 0.84 a1 instr, m2, sg 0 0 0 0 0.00 a1 loc, m2, sg 0 0 0 0 0.00 a1 nom, m2, pl 0 0.00 a1 gen, m2, pl 0 0 0 0 0 0 0.00 a1 dat, m2, pl 0 0 0.00 a1 acc, m2, pl 0 0 0 0 0 0 0 0.00 a1 instr, m2, pl 0 0 0 0 0.00 a1 loc, m2, pl 0 0 0 0 0.00 a1 nom, m3, sg 0 0.00 a1 gen, m3, sg 14 102 49 8 0 173 14.51 a1 dat, m3, sg 0 0 00.0 a1 acc, m3, sg 134 48 62 1 20 1 266 22.31 a1 instr, m3, sg 0 0 0 0 0.00 a1 loc, m3, sg 1 0 0 1 0.08 a1 nom, m3, pl 0 00.0 a1 gen, m3, pl 0 5 4 0 0 9 0.75 a1 dat, m3, pl 1 0 1 0.08 a1 acc, m3, pl 1 2 1 0 1 0 5 0.42 a1 instr, m3, pl 0 0 0 0 0.00 a1 loc, m3, pl 0 0 0 0 0.00 a1 nom, neut, sg 0 0.00 a1 gen, neut, sg 3 16 16 1 0 36 3.02 a1 dat, neut, sg 0 0 0.00 a1 acc, neut, sg 13 6 32 0 2 0 53 4.45 a1 instr, neut, sg 0 0 0 0 0.00 a1 loc, neut, sg 0 0 0 0 0.00 a1 nom, neut, pl 0 0.00 a1 gen, neut, pl 0 5 0 0 0 5 0.42 a1 dat, neut, pl 0 0 0.00 a1 acc, neut, pl 0 1 1 0 0 0 2 0.17 a1 instr, neut, pl 0 0 0 0 0.00 a1 loc, neut, pl 0 1 0 1 0.08 a1 nom, fem, sg 0 0.00 a1 gen, fem, sg 5 15 4 1 0 25 2.06 a1 dat, fem, sg 0 0 0.00 a1 acc, fem, sg 5 4 10 0 0 0 19 1.59 a1 instr, fem, sg 0 0 0 0 0.00 a1 loc, fem, sg 0 0 0 0 0.00 a1 nom, fem, pl 0 0.00 a1 gen, fem, pl 1 1 2 1 0 5 0.42 a1 dat, fem, pl 0 0 0.00 a1 acc, fem, pl 2 0 1 0 0 0 3 0.25 a1 instr, fem, pl 0 0 0 0 0.00 a1  pronoun form in the corpus is statistically significant. For this purpose, the distribution of all accented postprepositional pronouns must be compiled. On the basis of the total frequency of accented and unaccented postprepositional pronouns, the statistical signi cance can be calculated using the a2 a3 test, for instance. If one determines that the frequency of unaccented postprepositional pronouns in the corpus is statistically signi cant, ratios of the total number of particular accented postprepositional pronouns to the total number of their unaccented counterparts can be ascertained. These ratios can then be compared.8 If the ratios of accented postprepositional pronouns to their unaccented counterparts not included in the traditionally assumed in ectional paradigms correlate with the ratios of accented postprepositional pronouns to their unaccented counterparts contained in the traditionally assumed in ectional paradigms, the distribution of the unaccented postprepositional pronouns in the corpus may be considered linguistically important.</Paragraph>
    <Paragraph position="2"> In our ongoing study, the distribution of accented postprepositional pronouns combining with the prepositions dla 'for', do 'to', na 'on', w 'in', z 'with', od 'from', przez 'by', po 'after', za 'behind', and przed 'in front of' has been ascertained. These pronouns correspond to their unaccented counterparts occurring as parts of the contractions dla*n 'for_TPPP', do*n 'to_TPPP', na*n 'on_TPPP', we*n 'in_TPPP', ze*n 'with_TPPP' / 'from_TPPP', ode*n 'from_TPPP', przeze*n 'by_TPPP', po*n 'after_TPPP', za*n 'behind_TPPP', and przede*n 'in front of_TPPP respectively. Note that assigning interpretations to pronouns must proceed manually on the basis of their antecedents, as a vast number of pronouns in the IPI PAN Corpus are resolved incorrectly. Figure 2 on the next page provides the current results.9 8Alternatively, the percentage of occurrences of each unaccented postprepositional pronoun of the total number of occurrences of unaccented postprepositional pronouns and the percentage of occurrences of each accented postprepositinal pronoun of the total number of occurrences of accented postprepositional pronouns can be ascertained and the results compared.</Paragraph>
    <Paragraph position="3"> 9Note that in some cases, assigning an interpretation to a given pronoun was impossible, which is indicated in Figure 2 by the question mark (?). In some cases, identi cation of an antecedent was not possible, more than one antecedent candidate bearing different features came into question, or some features provided by an antecedent and a given pronoun were inconsistent with one another. In the majority of cases, morphosyntactic features clashed with contextual / pragmatic / natural features.</Paragraph>
    <Paragraph position="4"> Currently, only the distributional characterization of genitive and accusative feminine singular postprepositional pronouns is available for analysis. It has been ascertained that genitive unaccented postprepositional feminine pronouns are used signi cantly less frequently in the IPI PAN Corpus than are genitive accented postprepositional feminine pronouns (a2 a3 =101.76 (df=1), p&lt;0.001), and accusative unaccented postprepositional feminine pronouns are used signi cantly less frequently in the IPI PAN Corpus than are accusative accented postprepositional feminine pro-</Paragraph>
    <Paragraph position="6"> centage of genitive unaccented postprepositional feminine singular pronouns of the total of all unaccented postprepositional pronouns amounted to 2.06a4 , while the percentage of genitive accented postprepositional feminine singular pronouns amounted to 11.41a4 . The percentage of accusative unaccented postprepositional feminine singular pronouns of the total of all unaccented postprepositional pronouns was 1.59a4 , while the percentage of accusative accented postprepositional feminine singular pronouns was 5.68a4 . The ratios of the totals of genitive and accusative accented postprepositional feminine singular pronouns to the totals of their unaccented counterparts are given in Figure 3. Additionally, Figure 3 provides the ratio of the total of all accented plural pronouns occurring in the contexts indicated in Figure 2, to the total of the unaccented forms.</Paragraph>
    <Paragraph position="7"> For the nal conclusions, however, the distribution patterns of particular plural pronouns must be described. null</Paragraph>
    <Section position="1" start_page="19" end_page="21" type="sub_section">
      <SectionTitle>
Ratio
</SectionTitle>
      <Paragraph position="0"> gen, fem, sg 226.56 acc, fem, sg 148.42 pl 759.60  pronouns to their unaccented counterparts In the next step, the remaining accented postprepositional pronoun forms will be identi ed in the corpus and totaled.10 Then, the ratios of the totals of these pronouns to the totals of their unaccented forms will be calculated. Finally, all ra10Note that the total frequency of accented postprepositional forms corresponding to unaccented forms with zero frequency will, in fact, not affect the analysis.  dla TPPP do TPPP na TPPP w TPPP z TPPP od TPPP przez TPPP po TPPP za TPPP przed TPPP Total, Percentage 'for TPPP' 'to TPPP' 'on TPPP' 'in TPPP' 'with TPPP' / 'from TPPP' 'from TPPP' 'by TPPP' 'after TPPP' 'behind TPPP' 'in front of TPPP nom, m1, sg gen, m1, sg 1141 1902 dat, m1, sg acc, m1, sg 192 instr, m1, sg 699 loc, m1, sg nom, m1, pl gen, m1, pl 1207 987 dat, m1, pl acc, m1, pl 126 instr, m1, pl 310 loc, m1, pl nom, m2, sg gen, m2, sg 8 24 dat, m2, sg acc, m2, sg 1 instr, m2, sg 25 loc, m2, sg nom, m2, pl gen, m2, pl 14 12 dat, m2, pl acc, m2, pl instr, m2, pl 9 loc, m2, pl nom, m3, sg gen, m3, sg 128 1066 dat, m3, sg acc, m3, sg 99 instr, m3, sg 183 loc, m3, sg nom, m3, pl gen, m3, pl 166 808 dat, m3, pl acc, m3, pl 16 instr, m3, pl 75 loc, m3, pl  nom, neut, sg gen, neut, sg 80 336 dat, neut, sg acc, neut, sg 14 instr, neut, sg 41 loc, neut, sg nom, neut, pl gen, neut, pl 170 429 dat, neut, pl acc, neut, pl 7 instr, neut, pl 29 loc, neut, pl nom, fem, sg gen, fem, sg 872 2619 0 0 1514 659 0 0 0 0 5664 11.41 a1 dat, fem, sg acc, fem, sg 0 0 1401 264 0 0 830 74 251 0 2820 5.68 a1 instr, fem, sg 580 loc, fem, sg nom, fem, pl gen, fem, pl 319 914 dat, fem, pl acc, fem, pl 9 instr, fem, pl 123 loc, fem, pl  tios will be compared. If there are any signi cant differences between particular ratios, an attempt will be made to ascertain possible reasons for these differences (e.g., ungrammaticality, production errors, meta data, etc.) and conclusions will be made. If there are no signi cant differences between the particular ratios, it will be concluded that the distribution patterns of pronouns and prepositions within PPCs found in the corpus are also linguistically signi cant and that the traditionally assumed in ectional paradigms of TPPPs, as well as previous dictionary speci cations of PPCs, may have to be revised.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML