File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/w94-0115_evalu.xml
Size: 3,231 bytes
Last Modified: 2025-10-06 14:00:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0115"> <Title>Learning a Radically Lexical Grammar</Title> <Section position="5" start_page="127" end_page="128" type="evalu"> <SectionTitle> 4.4. Results and Evaluation </SectionTitle> <Paragraph position="0"> L's initial corpus was books la and lb of the Ladybird Key Words Reading Scheme. They contain 351 word tokens, using a vocabulary of about 20 different words. This corpus was completely processed in 17 passes.</Paragraph> <Paragraph position="1"> Processing later books in the series has brought L's current vocabulary up to some 55 words. This is still small, of course, but the nature of the induction process means that growth should 'snowball' as each known word helps in the categorization of further new words. (And see Shieber quoted in Ritchie (1987) for a revealing discussion of the vocabulary sizes of most NLP research prototypes).</Paragraph> <Paragraph position="2"> Within this limited vocabulary, L has 'correctly' induced examples of the following categories: determiners, adjectives, prepositions, conjunctions, intransitive, transitive and ditransitive verbs, imperatives, and some auxiliaries. Furthermore, L discovers and represents ambiguity of the following types: adjective vs noun; sentence co-ordination vs noun-phrase co-ordination; prepositional form; noun-phrase vs determiner, and verbs of quotation. An example of the latter are four structural forms that L induces for says (as in 'Rhubarb rhubarb says Jane' or 'Jane says rhubarb rhubarb').</Paragraph> <Paragraph position="3"> L has also inadvertently re-invented type-raising, assigning the category (S/(S\NP))/N to a sentence-initial determiner: the system's exact method of exploiting parametric neutrality told it that this word needed a following noun to form a function into a sentence from a following 'verb phrase'. A slightly different algorithm would have given the more standard NP/N, and reduction mechanisms could easily be implemented to find simpler equivalents, where possible, of highly complex proposed categories.</Paragraph> <Paragraph position="4"> Indeed they will almost certainly be needed, as witness L's assignment of the category: S\NP/(S\NP\S)/(S\NP\NP) to you as the last uncategorized word in the sentence 'Here you are Jane says Peter'.) Evaluation of the results took two forms: I) Use of the lexicon for generation. Many different lexicons could have been produced which would account only for the training corpus. Using the lexicon for generation of new text provided evidence that it was more general. The text generated was in character with the corpus - for example, 'Peter you are in it says r. This is an important result; we have evidence that the grammar created is general, but does not over-generate.</Paragraph> <Paragraph position="5"> 2) Inspection. L produced cognitively plausible results - i.e. as well as producing categories that enable the entire corpus to reduce to a sequence of Ss, the results reflect what are traditionally (manually) assigned to each word - for a wide range of syntactic constructions, providing further evidence that the lexicon produced is not just corpusspecific. null</Paragraph> </Section> class="xml-element"></Paper>