File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-0902_evalu.xml

Size: 8,561 bytes

Last Modified: 2025-10-06 14:00:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0902">
  <Title>Computing Declarative Prosodic Morphology</Title>
  <Section position="6" start_page="89" end_page="123" type="evalu">
    <SectionTitle>
4 Parsing and generation
</SectionTitle>
    <Paragraph position="0"> The preceding paragraph described how to compute surface forms given roots and categories. However, this generation procedure amounts to an inefficient generate-and-minimize mechanism which must compute otherwise useless suboptimal candidates as a byproduct of optimization. More importantly, due to the nonmonotonicity of optimization it is not obvious how to invert the procedure for efficient parsing in order to derive root and category given a surface form.</Paragraph>
    <Paragraph position="1"> A first solution which comes to mind is to implement parsing as analysis-by-synthesis. A goal like ParseString&amp;verbform (Root, Category) is submitted to a first run of the MicroCUF constraint solver, resulting in instantiations for Root and Category iff a proof consistent with the grammar was found. With these instantiations, a second run of MicroCUF uses the full generate-and-minimize mechanism to compute optimal strings OptStringl ..... OptStringN. The parse is accepted iff ParseString&amp;(OptStringl; ... ;OptStringN) is consistent. Note that for this solution to be feasible it is essential that constraints are inviolable, hence their evaluation in the first run can disregard optimization. The main drawbacks of analysis-by-synthesis are that two runs are required and that the inefficiencies of generate-and-minimize are not avoided.</Paragraph>
    <Paragraph position="2"> The new solution recognizes the fact that bidirectional processing of DPM would be easy without optimization. We therefore seek to perform all optimization at compile time. The idea is this: exploiting the finiteness of natural language paradigms we compute - using generate-and-minimize - each paradigm cell of e.g. the verbal paradigm of MH for a suitable root. However, while doing so we record the proof sequence of relational clause invocations employed in the derivation of each optimal form, using the fact that each clause has a unique index in internal representation. Such proof sequences have  two noteworthy properties. By definition they first of all record just clause applications, therefore naturally abstracting over all non-relational parameter fillings of top-level goals. In particular, proving a goal like verbform ( \[g, m, r\] , bl ; b2 ) normally looses the information associated with the root and category parameters in the proof sequence representation (although these parameters could indirectly influence the proof if relationally encoded choices in the grammar were dependent on it). Secondly, we can profitably view each proof sequence as a linear finite state automaton (FSAc~u). Since a paradigm is the union of all its cells, a complete abstract paradigm can therefore be represented by a unique * minimal deterministic FSAp=r~ which is computed as the union of all FSAcett followed by determinization and minimization. At runtime we just need to run FSAp~,.~ as afinite-state oracle in parallel with the MicroCUF constraint solver. This means that each proof step that uses a clause k must be sanctioned by a corresponding k-labelled FSA transition. With this technique we are now able to efficiently restrict the search space to just the optimal proofs; the need for run-time optimization in DPM processing has been removed. However, a slight caveat is necessary: to apply the technique it must be possible to partition the data set into a finite number of equivalence classes. This condition is e.g. automatically fulfilled for all phenomena which exhibit a paradigm structure.</Paragraph>
    <Paragraph position="3"> What are the possible advantages of this hybrid FSA-guided constraint processing technique? First of all, it enables a particularly simple treatment of ttnkaowtl words for root-and-pattern morphologies, surely a necessity in the face of ever-incomplete lexicons. If the grammar is set up properly to abstract from segmental detail of the Root segments as much as possible, then these details are also absent in the proof sequences. Hence a single FSApara merging these sequences in effect represents an abstract paradigm which can be used for a large number of concrete instantiations. We thus have a principled way of parsing words that contain roots not listed in the lexicon. However, we want the system not to overgenerate, mistakenly analyzing known roots as unknown. Rather, the system should return the semantics of known roots and also respect their verbal class affiliations as well as other idiosyncratic properties. This is the purpose of the root_letter_tree clauses in (96-123).</Paragraph>
    <Paragraph position="5"> cat :sem: 'UNKNOWN' .</Paragraph>
    <Paragraph position="6"> For each level in the letter tree a new terminal branch is added that covers the complement of all attested root segments at that level (99,106,112,123). This terminal branch is assigned an 'UNKNOWN' semantics, whereas known terminal branches record a proper semantics and categorial restrictions. During off-line creation of the proof sequences we now simply let the system backtrack over all choices in the root_letter_tree by feeding it a totally underspecified Root parameter. The resulting FSApar= represents both the derivations of all known roots and of all possible unknown root types covered by the grammar. While this treatment results in a homogenous grammar integrating lexical and grammatical aspects, it considerably enlarges FSApara. It might therefore be worthwhile to separate lexical access from the grammar, running a separate proof of root_letter_tree (Root) to enforce root-specific restrictions after parsing with the abstract paradigm alone. It remains to be seen which approach is more promising w.r.t, overall space and time efficiency.</Paragraph>
    <Paragraph position="7"> A second advantage of separating FSA guidance from constraint processing, as compared to pure finite-state transducer approaches, is that we are free to build sufficient expressivity into the constraint language. For example it seems that one needs token identity, i.e. structure sharing, in phonology to cover  instances of antigemination, assimilation, dissimilation and reduplication in an insightful way. It is well-known that token identity is not finite-state representable and cumbersome to emulate in practice (cf. Antworth 1990, 157 on a FST attempt at reduplication vs the DPM treatment of infixal reduplication in Tigrinya verbs described in Walther 1997, 238247). Also, it would be fascinating to extend the constraint-based approach to phonetics. However, a pilot study reported in Walther &amp; Krrger (1994) has found it necessary to use arithmetic constraints to do so, again transcending finite-state power. Finally, to the extent that sign-based approaches to grammar like HPSG are on the right track, the smooth integration of phonology and morphology arguably is better achieved within a uniform formal basis such as MicroCUF which is expressive enough to cover the recursive aspects of syntax and semantics as well.</Paragraph>
    <Paragraph position="8"> In conclusion, some notes on the pilot implementation. The MicroCUF system was modified to produce two new incarnations of the MicroCUF interpreter, one to record proof sequences, the other to perform FSA-guided proofs. FSApara was created with the help of finite-state tools from AT&amp;T's freely availablefsm package (http: //www. research.</Paragraph>
    <Paragraph position="9"> art. com /sw /tools /fsm/).Ihavemeasured speedups of more than 102 for the generation of MH forms (&lt; l second with the new technique), although parse times in the range of 1... 4 seconds on a Pentium 200 MHz PC with 64 M_Byte indicate that the current prototype is still too slow by a factor of more than l02. However, there is ample room for future improvements. Besides drawing from the wealth of optimizations found in the logic programming literature to generally accelerate MicroCUF (e.g., term encoding of feature structures, memoization) we can also analyze the internal structure of FSAvara to gain some specific advantages. This is due to the fact that each maximal linear sub-FSA of length k &gt; i corresponds to a deterministic proof subsequence whose clauses should be partially executable at compile time, subsequently saving k - 1 proof steps at runtime. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML