File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1027_concl.xml

Size: 2,444 bytes

Last Modified: 2025-10-06 13:53:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1027">
  <Title>Supervised and unsupervised PCFG adaptation to novel domains</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> What we have demonstrated in this paper is that maximum a posteriori (MAP) estimation can make out-of-domain training data beneficial for statistical parsing. In the most likely scenario - porting a parser to a novel domain for which there is little or no annotated data - the improvements can be quite large. Like active learning, model adaptation can reduce the amount of annotation required to converge to a best level of performance. In fact, MAP coupled with active learning may reduce the required amount of annotation further.</Paragraph>
    <Paragraph position="1"> There are a couple of interesting future directions for this  adaptation. For all trials, the base training is Brown;T, the held out is Brown;H plus the parser output for WSJ;24, and the mixing parameter A is 0.20ec(A).</Paragraph>
    <Paragraph position="2"> research. First, a question that is not addressed in this paper is how to best combine both supervised and unsupervised adaptation data. Since each in-domain resource is likely to have a different optimal mixing parameter, since the supervised data is more reliable than the unsupervised data, this becomes a more difficult, multi-dimensional parameter optimization problem. Hence, we would like to investigate automatic methods for choosing mixing parameters, such as EM. Also, an interesting question has to do with choosing which treebank to use for out-of-domain data. For a new domain, is it better to choose as prior the balanced Brown corpus, or rather the more robust Wall St. Journal treebank? Perhaps one could use several out-of-domain treebanks as priors. Most generally, one can imagine using k treebanks, some in-domain, some out-of-domain, and trying to find the best mixture to suit the particular task.</Paragraph>
    <Paragraph position="3"> The conclusion in Gildea (2001), that out-of-domain tree-banks are not particularly useful in novel domains, was premature. Instead, we can conclude that, just as in other statistical estimation problems, there are generalizations to be had from these out-of-domain trees, providing more robust estimates, especially in the face of sparse training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML