File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/p99-1056_abstr.xml
Size: 8,211 bytes
Last Modified: 2025-10-06 13:49:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1056"> <Title>The grapho-phonological system of written French: Statistical analysis and empirical validation</Title> <Section position="2" start_page="0" end_page="437" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The processes through which readers evoke mental representations of phonological forms from print constitute a hotly debated and controversial issue in current psycholinguistics. In this paper we present a computational analysis of the grapho-phonological system of written French, and an empirical validation of some of the obtained descriptive statistics. The results provide direct evidence demonstrating that both grapheme frequency and grapheme entropy influence performance on pseudoword naming. We discuss the implications of those findings for current models of phonological coding in visual word recognition.</Paragraph> <Paragraph position="1"> Introduction One central characteristic of alphabetic writing systems is the existence of a direct mapping between letters or letter groups and phonemes. In most languages, although to a varying extent, the mapping from print to sound can be characterized as quasi-systematic (Plaut, McClelland, Seidenberg, & Patterson, 1996; Chater & Christiansen, 1998). Thus, descriptively, in addition to a large body of regularities (e.g. the grapheme CH in French regularly maps onto/~/), one generally observes isolated deviations (e.g. CH in CHAOS maps onto /k/)as well as ambiguities. In some cases but not always, these difficulties can be alleviated by considering higher order regularities such as local orthographic environment (e.g., C maps onto /k/ or/s/ as a function of the following letter), phonotactic and phonological constraints as well as morphological properties (Cf. PH in PHASE vs. SHEPHERD).</Paragraph> <Paragraph position="2"> One additional difficulty stems from the fact that the graphemes, the orthographic counterparts of phonemes, can consist either of single letters or of letter groups, as the previous examples illustrate. Psycholinguistic theories of visual word recognition have taken the quasi-systematicity of writing into account in two opposite ways. In one framework, generally known as dual-route theories (e.g. Coltheart, 1978; Coltheart, Curtis, Atkins, &Haller, 1993), it is assumed that dominant mapping regularities are abstracted to derive a tabulation of grapheme-phoneme correspondence rules, which may then be looked up to derive a pronunciation for any letter string. Because the rule table only captures the dominant regularities, it needs to be complemented by lexical knowledge to handle deviations and ambiguities (i.e., CHAOS, SHEPHERD). The opposite view, based on the parallel distributed processing framework, assumes that the whole set of grapho-phonological regularities is captured through differentially weighted associations between letter coding and phoneme coding units of varying sizes (Seidenberg & McClelland, 1989; Plaut, Seidenberg, McClelland & Patterson, 1996).</Paragraph> <Paragraph position="3"> These opposing theories have nourished an ongoing complex empirical debate for a number of years. This controversy constitutes one instance of a more general issue in cognitive science, which bears upon the proper explanation of rule-like behavior. Is the language user's capacity to exploit print-sound regularities, for instance to generate a plausible pronunciation for a new, unfamiliar string of letters, best explained by knowledge of abstract all-or-none rules, or of the statistical structure of the language? We believe that, in the field of visual word processing, the lack of precise quantitative descriptions of the mapping system is one factor that has impeded resolution of these issues.</Paragraph> <Paragraph position="4"> In this paper, we present a descriptive analysis of the grapheme-phoneme mapping system of the French orthography, and we further explore the sensitivity of adult human readers to some characteristics of this mapping. The results indicate that human naming performance is influenced by the frequency of graphemic units in the language and by the predictability of their mapping to phonemes. We argue that these results implicate the availability of graded knowledge of grapheme-phoneme mappings and hence, that they are more consistent with a parallel distributed approach than with the abstract rules hypothesis. . Statistical analysis of grapho-phonological correspondences of</Paragraph> <Section position="1" start_page="436" end_page="437" type="sub_section"> <SectionTitle> French 1.1. Method </SectionTitle> <Paragraph position="0"> Tables of grapheme-phoneme associations (henceforth, GPA) were derived from a corpus of 18.510 French one-to-three-syllable words from the BRULEX Database (Content, Mousty, & Radeau, 1990), which contains orthographic and phonological forms as well as word frequency statistics. As noted above, given that graphemes may consist of several letters, the segmentation of letter strings into graphemic units is a non-trivial operation. A semi-automatic procedure similar to the rule-learning algorithm developed by Coltheart et al. (1993) was used to parse words into graphemes.</Paragraph> <Paragraph position="1"> First, grapheme-phoneme associations are tabulated for all trivial cases, that is, words which have exactly the same number of graphemes and phonemes (i.e. PAR,/paR/). Then a segmentation algorithm is applied to the remaining unparsed words in successive passes. The aim is to select words for which the addition of a single new GPA would resolve the parsing. After each pass, the new hypothesized associations are manually checked before inclusion in the GPA table.</Paragraph> <Paragraph position="2"> The segmentation algorithm proceeds as follows.</Paragraph> <Paragraph position="3"> Each unparsed word in the corpus is scanned from left to right, starting with larger letter groups, in order to find a parsing based on tabulated GPAs which satisfies the phonology. If this fails, a new GPA will be hypothesized if there is only one unassigned letter group and one unassigned phoneme and their positions match. For instance, the single-letter grapheme-phoneme associations tabulated at the initial stage would be used to mark the P-/p/and R-/R/correspondences in the word POUR (/puRl) and isolate OU-/u/as a new plausible association.</Paragraph> <Paragraph position="4"> When all words were parsed into graphemes, a entropy (H) values, by type and by token, for French polysyllabic words. final pass through the whole corpus computed grapheme-phoneme association frequencies, based both on a type count (the number of words containing a given GPA) and a token count (the number of words weighted by word frequency).</Paragraph> <Paragraph position="5"> Several statistics were then extracted to provide a quantitative description of the grapheme-phoneme system of French. (1) Grapheme frequency, the number of occurrences of the grapheme in the corpus, independently of its phonological value.</Paragraph> <Paragraph position="6"> (2) Number of alternative pronunciations for each grapheme. (3) Grapheme entropy as measured by H, the information statistic proposed by Shannon (1948) and previously used by Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty (1995). This measure is based on the probability distribution of the phoneme set for a given grapheme and reflects the degree of predictability of its pronunciation. H is minimal and equals 0 when a grapheme is invariably associated to one phoneme (as for J and/3/)- H is maximal and equals logs n when there is total uncertainty. In this particular case, n would correspond to the total number of phonemes in the language (thus, since there are 46 phonemes, max H = 5.52). (4) Grapheme-phoneme association probability, which is the GPA frequency divided by the total grapheme frequency. (5) Association dominance rank, which is the rank of a given grapheme-phoneme association among the phonemic alternatives for a grapheme, ordered by decreasing probability.</Paragraph> </Section> </Section> class="xml-element"></Paper>