File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1042_concl.xml

Size: 3,062 bytes

Last Modified: 2025-10-06 13:53:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1042">
  <Title>Joint and conditional estimation of tagging and parsing models</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper has investigated the difference between maximum likelihood estimation and maximum conditional likelihood estimation for three different kinds of models: PCFG parsers, HMM taggers and shift-reduce parsers. The results for the PCFG parsers suggested that conditional estimation might provide a slight performance improvement, although the results were not statistically significant since computational difficulty of conditional estimation of a PCFG made it necessary to perform the experiment on a tiny training and test corpus. In order to avoid the computational difficulty of conditional estimation, we compared closely related (but not identical) HMM tagging and shift-reduce parsing models, for some of which the maximum likelihood estimates were easy to compute and for others of which the maximum conditional likelihood estimates could be easily computed. In both cases, the joint models outperformed the conditional models by quite large amounts. This suggests that it may be worthwhile investigating methods for maximum (joint) likelihood estimation for model classes for which only maximum conditional likelihood estimators are currently used, such as Maximum Entropy models and MEMMs, since if the results of the experiments presented in this paper extend to these models, one might expect a modest performance improvement.</Paragraph>
    <Paragraph position="1"> As explained in the introduction, because maximum likelihood estimation exploits not just the conditional distribution of hidden variable (e.g., the tags or the parse) conditioned on the visible variable (the terminal string) but also the marginal distribution of the visible variable, it is reasonable to expect that it should outperform maximum conditional likelihood estimation. Yet it is counter-intuitive that joint tagging and shift-reduce parsing models, which predict the next tag or parsing move on the basis of what seems to be less information than the corresponding conditional model, should nevertheless outperform that conditional model, as the experimental results presented here show. The recent theoretical and simulation results of Lafferty et al. (2001) suggest that conditional models may suffer from label bias (the discovery of which Lafferty et. al.</Paragraph>
    <Paragraph position="2"> attribute to Bottou (1991)), which may provide an insightful explanation of these results.</Paragraph>
    <Paragraph position="3"> None of the models investigated here are stateof-the-art; the goal here is to compare two different estimation procedures, and for that reason this paper concentrated on simple, easily implemented models. However, it would also be interesting to compare the performance of joint and conditional estimators on more sophisticated models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML