XML Viewer - j02-1005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/j02-1005_metho.xml
Size: 6,018 bytes
Last Modified: 2025-10-06 14:07:53
<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-1005">
  <Title>Squibs and Discussions The DOP Estimation Method Is Biased and Inconsistent</Title>
  <Section position="3" start_page="0" end_page="72" type="metho">
    <SectionTitle>
2. DOP1 Models
</SectionTitle>
    <Paragraph position="0"> For simplicity, this note focuses on DOP1 or Tree-DOP models, in which linguistic representations are phrase structure trees, but the results carry over to more complex models that use attribute-value feature structure representations such as LFG-DOP.</Paragraph>
    <Paragraph position="1"> The fragments used in DOP1 are multinode trees whose leaves may be labeled with nonterminals as well as terminals. A derivation starts with a fragment whose root is labeled with the start symbol, and it proceeds by substituting a fragment for the leftmost nonterminal leaf under the constraint that the fragment's root node and the leaf node have the same label. The derivation terminates when there are no nonterminal leaves. Figure 1 depicts three different derivations that yield the same tree. The fragments used in these derivations could have been obtained from a training corpus of trees that contains trees for examples such as Sasha likes motorcycles, Alex eats pizza, and so on.</Paragraph>
    <Paragraph position="2"> In a DOP model, each fragment is associated with a real-valued weight, and the weight of a derivation is the product of the weights of the tree fragments involved.</Paragraph>
    <Paragraph position="3"> The weight of a representation is the sum of the weights of its derivations, and a probability distribution over linguistic representations is obtained by normalizing the representations' weights.</Paragraph>
    <Paragraph position="4">  Given a combinatory operation and a fixed set of fragments, a DOP model is a parametric model where the fragment weights are the parameters. In DOP1 and DOP models based on it, the weight associated with a fragment is estimated as follows (Bod 1998). For each tree fragment f , let n(f ) be the number of times it appears in the training corpus, and let F be the set of all tree fragments with the same root as f . Then the weight w(f ) associated with f is</Paragraph>
    <Paragraph position="6"> This relative-frequency estimation method has the advantage of simplicity, but as shown in the following sections, it is biased and inconsistent.</Paragraph>
    <Paragraph position="7"> 1 In DOP1 and similar models, it is not necessary to normalize the representations' weights if the fragments' weights are themselves appropriately normalized.</Paragraph>
    <Paragraph position="8">  Johnson DOP Is Biased and Inconsistent 3. Bias and Inconsistency  Bias and inconsistency are usually defined for parametric estimation procedures in terms that are not quite appropriate for evaluating the DOP estimation procedure, but their standard definitions (see Shao [1999] for a textbook exposition) will serve as the basis for the definitions adopted below. Let be a vector space of real-valued parameters, so that P , 2 is a probability distribution. In the DOP1 case, would be the space of all possible weight assignments to fragments. An estimator is a function from a vector x of n samples to a parameter value (x)2 , and an estimation procedure specifies an estimator n for each sample size n.</Paragraph>
    <Paragraph position="9"> Let X be a vector of n independent random variables distributed according to  and only if the limit of the risk of</Paragraph>
    <Paragraph position="11"> . (There are various different notions of consistency depending on how convergence is defined; however, the DOP1 estimator is not consistent with respect to any of the standard definitions of consistency.) Strictly speaking, the standard definitions of bias and loss function are not applicable to DOP estimation because there can be two distinct parameter vectors  (such a case is presented in the next section).</Paragraph>
    <Paragraph position="12"> Thus it is more natural to define bias and loss in terms of the probability distributions that the parameters specify, rather than in terms of the parameters themselves. In this paper, an estimator is unbiased iff P</Paragraph>
    <Paragraph position="14"> ; that is, its expected parameter estimate specifies the same distribution as the true parameters. Similarly, the loss function is the mean squared difference between the &amp;quot;true&amp;quot; and estimated distributions; that is, if Ohm is the event space (in DOP1, the space of all phrase structure trees), then</Paragraph>
    <Paragraph position="16"> As before, the risk of an estimator is its expected loss, and an estimation procedure is consistent iff the limit of the expected loss is 0 as n!1.</Paragraph>
  </Section>
  <Section position="4" start_page="72" end_page="73" type="metho">
    <SectionTitle>
4. A DOP1 Example
</SectionTitle>
    <Paragraph position="0"> This section presents a simple DOP1 model that only generates two trees with probability p and 1[?]p, respectively. The DOP relative frequency estimator is applied to a random sample of size n drawn from this population to estimate the tree weight parameters for the model. The bias and inconsistency of the estimator follow from the fact that these estimated parameters generate the trees with probabilities different from p and 1[?]p. The trees used and their DOP1 fragments are shown in Figure 2.</Paragraph>
    <Paragraph position="1">  in the DOP1 model.</Paragraph>
    <Paragraph position="2"> Suppose the &amp;quot;true&amp;quot; weights for the fragments f  is n(1[?]p). Thus the expected number of occurrences of the fragments in a sample of size n is</Paragraph>
    <Paragraph position="4"> ?; that is, the DOP1 estimator is biased.</Paragraph>
    <Paragraph position="5"> Further, note that the estimated distribution P E(^w) does not approach P w ? as the sample size increases, so the expected loss does not converge to 0 as the sample size n increases. Thus the DOP1 estimator is also inconsistent.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML