File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-1615_relat.xml

Size: 4,495 bytes

Last Modified: 2025-10-06 14:15:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1615">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Domain Adaptation with Structural Correspondence Learning</Title>
  <Section position="10" start_page="125" end_page="126" type="relat">
    <SectionTitle>
8 Related Work
</SectionTitle>
    <Paragraph position="0"> Domain adaptation is an important and well-studied area in natural language processing. Here we outline a few recent advances. Roark and Bacchiani (2003) use a Dirichlet prior on the multinomial parameters of a generative parsing model to combine a large amount of training data from a source corpus (WSJ), and small amount of training data from a target corpus (Brown). Aside from Florian et al. (2004), several authors have also given techniques for adapting classification to new domains. Chelba and Acero (2004) first train a classifier on the source data. Then they use maximum a posteriori estimation of the weights of a  maximum entropy target domain classifier. The prior is Gaussian with mean equal to the weights of the source domain classifier. Daum'e III and Marcu (2006) use an empirical Bayes model to estimate a latent variable model grouping instances into domain-specific or common across both domains. They also jointly estimate the parameters of the common classification model and the domain specific classification models. Our work focuses on finding a common representation for features from different domains, not instances. We believe this is an important distinction, since the same instance can contain some features which are common across domains and some which are domain specific.</Paragraph>
    <Paragraph position="1"> The key difference between the previous four pieces of work and our own is the use of unlabeled data. We do not require labeled training data in the new domain to demonstrate an improvement over our baseline models. We believe this is essential, since many domains of application in natural language processing have no labeled training data.</Paragraph>
    <Paragraph position="2"> Lease and Charniak (2005) adapt a WSJ parser to biomedical text without any biomedical treebanked data. However, they assume other labeled resources in the target domain. In Section 7.3 we give similar parsing results, but we adapt a source domain tagger to obtain the PoS resources.</Paragraph>
    <Paragraph position="3"> To the best of our knowledge, SCL is the first method to use unlabeled data from both domains for domain adaptation. By using just the unlabeled data from the target domain, however, we can view domain adaptation as a standard semisupervised learning problem. There are many possible approaches for semisupservised learning in natural language processing, and it is beyond the scope of this paper to address them all. We chose to compare with ASO because it consistently outperforms cotraining (Blum and Mitchell, 1998) and clustering methods (Miller et al., 2004). We did run experiments with the top-k version of ASO (Ando and Zhang, 2005a), which is inspired by cotraining but consistently outperforms it. This did not outperform the supervised method for domain adaptation. We speculate that this is because biomedical and financial data are quite different.</Paragraph>
    <Paragraph position="4"> In such a situation, bootstrapping techniques are likely to introduce too much noise from the source domain to be useful.</Paragraph>
    <Paragraph position="5"> Structural correspondence learning is most similar to that of Ando (2004), who analyzed a situation with no target domain labeled data.</Paragraph>
    <Paragraph position="6"> Her model estimated co-occurrence counts from source unlabeled data and then used the SVD of this matrix to generate features for a named entity recognizer. Our ASO baseline uses unlabeled data from the target domain. Since this consistently outperforms unlabeled data from only the source domain, we report only these baseline results. To the best of our knowledge, this is the first work to use unlabeled data from both domains to find feature correspondences.</Paragraph>
    <Paragraph position="7"> One important advantage that this work shares with Ando (2004) is that an SCL model can be easily combined with all other domain adaptation techniques (Section 7.2). We are simply inducing a feature representation that generalizes well across domains. This feature representation can then be used in all the techniques described above.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML