File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/91/p91-1035_concl.xml

Size: 3,062 bytes

Last Modified: 2025-10-06 13:56:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1035">
  <Title>A STOCHASTIC PROCESS FOR WORD FREQUENCY DISTRIBUTIONS</Title>
  <Section position="6" start_page="275" end_page="277" type="concl">
    <SectionTitle>
MORPHOLOGY
</SectionTitle>
    <Paragraph position="0"> The Mandelbrot-Simon model has a single parameter ~ that allows new words to enter the distribution. Since the present theory is of a phonological rather than a morphological nature, this parameter models the (occasional) appearance of new simplex words in the language only, and cannot be used to model the influx of morphologically complex words.</Paragraph>
    <Paragraph position="1"> First, morphological word formation processes may give rise to consonant clusters that are permitted when they span morpheme boundaries, but that are inadmissible within single morphemes. This difference in phonotactic patterning within and across morphemes already rereales that morphologically complex words have a dLf\[erent source than monomorpherpJc words.</Paragraph>
    <Paragraph position="2"> Second, each word formation process, whether compounding or affixation of sufr-txes like -mess and -ity, is characterized by its own degree of productivity. Quantitatively, differences in the degree of productivity amount to differences in the birth rates at which complex words appear in the vocabulary. Typically, such birth rates, which can be expressed as E\[n~\] where n~ and Nl , A r' denote the number of types occurring once only and the number of tokens of the frequency distributions of the corresponding morphological categories (Basyen 1989), assume values that are significantly higher that the birth rate c~ of monomorphemic words. Hence it is impossible to model the complete lexical distribution without a worked-out morphological component that specifies the word formation processes of the language and their degrees of productivity.</Paragraph>
    <Paragraph position="3"> While actual modelling of the complete distribution is beyond the scope of the present paper, we may note that the addition of birth rates for word formation processes to the model, necessitated by the additional large numbers of rare  words that appear in the complete distribution, ties in nicely with the fact that the frequency distributions of productive morphological categories are prototypical LNRE distributions, for which the large values for the numbers of types occurring once or twice only are characteristic.</Paragraph>
    <Paragraph position="4"> With respect to the effect of morphological structure on the lexical similarity effects, we finally note that in the empirical data the longer word lengths show up with sharply diminished neighborhood density. However, it appears that those longer words which do have neighbors are morphologically complex. Morphological structure raises lexical density where the phonotaxis fails to do so: for long monomorphemic words the huge space of possible word types is sampled too sparcely for the lexical similarity effects to emerge.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML