File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/p00-1061_concl.xml
Size: 1,795 bytes
Last Modified: 2025-10-06 13:52:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1061"> <Title>Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training</Title> <Section position="8" start_page="68" end_page="68" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> Wehave presented a new approachtostochastic modeling of constraint-based grammars. Our experimental results show that EM training can in fact be very helpful for accurate stochastic modeling in natural language processing. We conjecture that this result is due partly to the fact that the space of parses produced by a constraint-based grammar is only mildly incomplete, i.e. the ambiguity rate can be kept relatively low. Another reason may be that EM is especially useful for log-linear models, where the search space in maximization can be kept under control. Furthermore, we have introduced a new class-based grammar lexicalization, which again uses EM training and incorporates a pre-disambiguation routine into log-linear models. An impressive gain in performance could also be demonstrated for this method. Clearly, a central task of future work is a further exploration of the relation between complete-data and incomplete-data estimation for larger, manually disambiguated treebanks. An interesting question is whether a systematic variation of training data size along the lines of the EM-experiments of Nigam et al. (2000) for text classication will show similar results, namely a systematic dependence of the relative gain due to EM training from the relative sizes of unannotated and annotated data. Furthermore, it is important to show that EM-based methods can be applied successfully also to other statistical parsing frameworks.</Paragraph> </Section> class="xml-element"></Paper>