File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/88/a88-1031_concl.xml
Size: 3,455 bytes
Last Modified: 2025-10-06 13:56:15
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1031"> <Title>MORPHOLOGICAL PROCESSING IN THE NABU SYSTEM</Title> <Section position="24" start_page="232" end_page="232" type="concl"> <SectionTitle> ANALYZERS </SectionTitle> <Paragraph position="0"> The English analyzer is complete with respect to inflection; it has been successfully tested on, among other things, the entire collection of inflectional variants presented in</Paragraph> <Section position="1" start_page="232" end_page="232" type="sub_section"> <SectionTitle> Webster's Seventh New Collegiate Dictionary </SectionTitle> <Paragraph position="0"> (ca. 42,500 nouns, 8,750 verbs, and 13,250 adjectives). It also accounts for the great bulk of English derivation, as determined by various word frequency lists, and is undergoing gradual, evolutionary extension to the missing (lowfrequency) affixes and their combinations. A first version of this grammar was delivered to MCC shareholders in mid-1985, followed by upgrades in 1986 and 1987. The current analyzer numbers 20 nodes and 60 arcs.</Paragraph> <Paragraph position="1"> As mentioned earlier, a complete Arabic morphological analyzer exists; so far as we are aware, it accounts for all morphological phenomena in the language -- no mean feat, for a language in which a single root form could in theory be realized as over 200,000 surface forms, and in which morphemes are frequently discontinuous (i.e., cannot be described by simple affixation models) \[Aristar, 1987\]. This 371-node, l133-arc analyzer was delivered to MCC shareholders in mid-1986, and may represent the first complete analyzer ever produced for Arabic.</Paragraph> <Paragraph position="2"> The French and German analyzers are complete with respect to inflection (highly irregular forms, like sein in German, naturally excepted).</Paragraph> <Paragraph position="3"> The former numbers 71 nodes and 121 arcs; the latter, 54 nodes and 79 arcs. The 19-node, 17arc Spanish analyzer is nearly complete with respect to inflection; adjectives remain a temporary exception. With respect to verbs, for example, it has been tested on an extensive list of conjugated verbs \[Noble and Lacasa, 1980\], comprising over 6,000 surface forms, and in the first such test it was 970&quot;/0 accurate.</Paragraph> </Section> </Section> <Section position="27" start_page="233" end_page="233" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> Morphological grammars in Nabu are able to account for all compositional readings of arbitrarily-complex surface-forms in a wide range of languages. Furthermot'e, the formalism and development environment are reasonably comfortable. These claims are supported by our implementation and large-scale testing of several diverse grammars.</Paragraph> <Paragraph position="1"> For philosophical reasons, we are opposed to the idea that grammars (as opposed to individual rules) must be reversible: even if it were not for the need of five-fold rather than merely dual functionality, the need for fault-tolerance in a practical system, without consequent faultexhibition, argues for separate analysis and synthesis grammars. We also point out that, in our implementations, the \[non-reversible\] control graphs tend to be much smaller in size than the hierarchies of \[reversible\] rules, hence the storage penalty for &quot;redundancy&quot; is inconsequential.</Paragraph> </Section> class="xml-element"></Paper>