File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2004_relat.xml

Size: 3,028 bytes

Last Modified: 2025-10-06 14:15:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2004">
  <Title>The Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation</Title>
  <Section position="8" start_page="30" end_page="30" type="relat">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Other work combining supervised and unsupervised learning for parsing includes (Charniak, 1997), (Johnson and Riezler, 2000), and (Schmid, 2002). These papers present integrated formal frameworks for incorporating information learned from unlabeled corpora, but they do not explicitly address PP and RC attachment. The same is true for uncorrected colearning in (Hwa et al., 2003).</Paragraph>
    <Paragraph position="1"> Conversely, no previous work on PP and RC attachment has integrated specialized ambiguity resolution into parsing. For example, (Toutanova et al., 2004) present one of the best results achieved so far on the WSJ PP set: 87.5%. They also integrate supervised and unsupervised learning. But to our knowledge, the relationship to parsing has not been explored before - even though application to parsing is the stated objective of most work on PP attachment.</Paragraph>
    <Paragraph position="2"> 5However, the baseline is similarly high for the PP problem if the most likely attachment is chosen per preposition: 72.2% according to (Collins and Brooks, 1995).</Paragraph>
    <Paragraph position="3"> With the exception of (Hindle and Rooth, 1993), most unsupervised work on PP attachment is based on superficial analysis of the unlabeled corpus without the use of partial parsing (Volk, 2001; Calvo et al., 2005). We believe that dependencies offer a better basis for reliable disambiguation than cooccurrence and fixed-phrase statistics. The difference to (Hindle and Rooth, 1993) was discussed above with respect to analysing the unlabeled corpus. In addition, the decision procedure presented here is different from Hindle et al.'s. LBD uses more context and can, in principle, accommodate arbitrarily large contexts. However, an evaluation comparing the performance of the two methods is necessary.</Paragraph>
    <Paragraph position="4"> The LBD model can be viewed as a back-off model that combines estimates from several &amp;quot;backoffs&amp;quot;. In a typical backoff model, there is a single more general model to back off to. (Collins and Brooks, 1995) also present a modelwith multiple backoffs. Oneof its variants computes the estimate in question as the average of three backoffs. In addition to the maximum used here, testing other combination strategies for the MI values in the lattice (e.g., average, sum, frequency-weighted sum) would be desirable. In general, MI has not been used in a backoff model before as far as we know.</Paragraph>
    <Paragraph position="5"> Previous work on relative clause attachment has been supervised (Siddharthan, 2002a; Siddharthan, 2002b; Yeh and Vilain, 1998).6 (Siddharthan, 2002b)'s accuracy for RC attachment is 76.5%.7</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML