File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1045_intro.xml

Size: 3,221 bytes

Last Modified: 2025-10-06 14:03:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1045">
  <Title>Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling</Title>
  <Section position="3" start_page="364" end_page="365" type="intro">
    <SectionTitle>
3 A Conditional Random Field Model
</SectionTitle>
    <Paragraph position="0"> Our basic CRF model follows that of Lafferty et al.</Paragraph>
    <Paragraph position="1"> (2001). We choose a CRF because it represents the state of the art in sequence modeling, allowing both discriminative training and the bi-directional flow of probabilistic information across the sequence. A CRF is a conditional sequence model which represents the probability of a hidden state sequence given some observations. In order to facilitate obtaining the conditional probabilities we need for Gibbs sampling, we generalize the CRF model in a  entity recognition (NER) and template filling (TF).</Paragraph>
    <Paragraph position="2"> way that is consistent with the Markov Network literature (see Cowell et al. (1999)): we create a linear chain of cliques, where each clique, c, represents the probabilistic relationship between an adjacent pair of states2 using a clique potential phc, which is just a table containing a value for each possible state assignment. The table is not a true probability distribution, as it only accounts for local interactions within the clique. The clique potentials themselves are defined in terms of exponential models conditioned on features of the observation sequence, and must be instantiated for each new observation sequence. The sequence of potentials in the clique chain then defines the probability of a state sequence (given the observation sequence) as</Paragraph>
    <Paragraph position="4"> where phi(si[?]1,si) is the element of the clique potential at position i corresponding to states si[?]1 and si.3 Although a full treatment of CRF training is beyond the scope of this paper (our technique assumes the model is already trained), we list the features used by our CRF for the two tasks we address in  nential models with a quadratic prior and used the quasi-Newton method for parameter optimization. As is customary, we used the Viterbi algorithm to infer the most likely state sequence in a CRF.</Paragraph>
    <Paragraph position="5">  The clique potentials of the CRF, instantiated for some observation sequence, can be used to easily compute the conditional distribution over states at a position given in Equation 1. Recall that at position i we want to condition on the states in the rest of the sequence. The state at this position can be influenced by any other state that it shares a clique with; in particular, when the clique size is 2, there are 2 such cliques. In this case the Markov blanket of the state (the minimal set of states that renders a state conditionally independent of all other states) consists of the two neighboring states and the observation sequence, all of which are observed. The conditional distribution at position i can then be computed simply as</Paragraph>
    <Paragraph position="7"> where the factor tables F in the clique chain are already conditioned on the observation sequence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML