File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1306_intro.xml

Size: 4,422 bytes

Last Modified: 2025-10-06 14:00:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1306">
  <Title>Sample Selection for Statistical Grammar Induction</Title>
  <Section position="3" start_page="45" end_page="46" type="intro">
    <SectionTitle>
2 Sample Selection
</SectionTitle>
    <Paragraph position="0"> Unlike traditional learning systems that receive training examples indiscriminately, a learning system that uses sample selection actively influences its progress by choosing new examples to incorporate into its training set. Sample selection works with two types of learning systems: a committee of learners or a single learner. The committee-based selection algorithm works with multiple learners, each maintaining a different hypothesis (perhaps pertaining to different aspects of the problem). The candidate examples that led to the most disagreements among the different learners are considered to have the highest TUV (Cohn et al., 1994; Freund et al., 1997). For computationally intensive problems such as grammar induction, maintaining multiple learners may be an impracticality. In this work, we explore sample selection with a single learner that keeps just one working hypothesis at all times.</Paragraph>
    <Paragraph position="1"> Figure 1 outlines the single-learner sample selection training loop in pseudo-code. Initially, the training set, L, consists of a small number of labeled examples, based on which the learner proposes its first hypothesis of the target concept, C. Also available to the learner is a large pool of uulabeled training candidates, U. In each training iteration, the selection algorithm, Select(n, U, C, f), ranks the candidates of U according to their expected TUVs and returns the n candidates with the highest values. The algorithm computes the expected TUV of each candidate, u E U, with an evaluation function, f(u, C).</Paragraph>
    <Paragraph position="2"> This function may possibly rely on the hypothesis concept C to estimate the utility of a candidate u. The set of the n chosen candidates are then labeled by human and added to the existing training set. Rnnning the learning algorithm~ Train(L), on the updated training set, the system proposes a new hypothesis consistent with all the examples seen thus far. The loop continues until one of three stopping conditions is met: the hypothesis is considered close enough to the target concept, all candidates are labeled, or all human resources are exhausted.</Paragraph>
    <Paragraph position="3"> Sample selection may be beneficial for many learning tasks in natural language processing. Although there exist abundant collections of raw text, the high expense of manually annotating the text sets a severe limitation for many learning algorithms in nat- null lection learning algorithm ural language processing. Sample selection presents an attractive solution to offset this labeled data sparsity problem. Thus far, it has been successfully applied to several classification applications. Some examples include text categorization (Lewis and Gale, 1994), part-of-speech tagging (Engelson and Dagan, 1996), word-sense disambiguation (Fujii et al., 1998), and prepositional-phrase attachment (Hwa, 2000).</Paragraph>
    <Paragraph position="4"> More difficult are learning problems whose objective is not classification, but generation of complex structures. One example in this direction is applying sample selection to semantic parsing (Thompson et al., 1999), in which sentences are paired with their semantic representation using a deterministic shift-reduce parser. Our work focuses on another complex natural language learning problem: inducing a stochastic context-free grammar that can generate syntactic parse trees for novel test sentences.</Paragraph>
    <Paragraph position="5"> Although abstractly, parsing with a grammar can be seen as a classification task of determining the structure of a sentence by selecting one tree out of a set of possible parse trees, there are two major distinctions that differentiate it from typical classification problems. First, a classifier usually chooses from a fixed set of categories, but in our domain, every sentence has a different set of possible parse trees. Second, for most classification problems, the the number of the possible categories is relatively small, whereas the number of potential parse trees for a sentence is exponential with respect to the sentence length.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML