XML Viewer - n06-2047

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2047_metho.xml
Size: 5,134 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2047">
  <Title>Engineering Management The Chinese University of Hong Kong</Title>
  <Section position="4" start_page="185" end_page="185" type="metho">
    <SectionTitle>
3 The Maximum Entropy Framework
</SectionTitle>
    <Paragraph position="0"> Suppose a story S contains n sentences, C0,... ,Cn, the objective of an RC system can be described as: A = arg maxCi[?]S P(Ci answers Q|Q). (1) Let x be the question (Q) and y be the answer sentence Ci that answers x . Equation 1 can be computed by the ME method (Zhou et al., 2003):</Paragraph>
    <Paragraph position="2"> factor, fj(x,y) is the indicator function for feature fj; fj occurs in the context x, lj is the weight of fj. For a given question Q, the Ci with the highest probability is selected. If multiple sentences have the maximum probability, the one that occurs the earliest in the passage is returned. We used the selective gain computation (SGC) algorithm (Zhou et al., 2003) to select features and estimate parameters for its fast performance.</Paragraph>
    <Paragraph position="3"> Question: Who wrote the &amp;quot;Pledge of Allegiance&amp;quot; Answer sentence: The pledge was written by Frances Bellamy.</Paragraph>
    <Paragraph position="5"> question and a candidate answer sentence.</Paragraph>
  </Section>
  <Section position="5" start_page="185" end_page="186" type="metho">
    <SectionTitle>
4 Features Used in the Deep Linguistic
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="185" end_page="185" type="sub_section">
      <SectionTitle>
Analysis
</SectionTitle>
      <Paragraph position="0"> A feature in the ME approach typically has binary values: fj(x,y) = 1 if the feature j occurs; otherwise fj(x,y) = 0. This section describes two types of deep linguistic features to be integrated in the ME framework in two subsections.</Paragraph>
    </Section>
    <Section position="2" start_page="185" end_page="186" type="sub_section">
      <SectionTitle>
4.1 POS Tags of Matching Words and
Dependencies
</SectionTitle>
      <Paragraph position="0"> Consider the following question Q and sentence C, Q: Who wrote the Pledge of Allegiance C: The pledge was written by Frances Bellamy.</Paragraph>
      <Paragraph position="1"> The set of words and POS tags2 are: Q: {write/VB, pledge/NN, allegiance/NNP} C: {write/VB, pledge/NN, by/IN, Frances/NNP, Bellamy/NNP}.</Paragraph>
      <Paragraph position="2"> Two matching words between Q and C (i.e. write and pledge ) activate two POS tag features: fV B(x,y)=1 and fNN(x,y)=1.</Paragraph>
      <Paragraph position="3"> We extracted dependencies from lexicalized syntactic parse trees, which can be obtained according to the head-rules in (Collins, 1999) (e.g. see Figure 1). In a lexicalized syntactic parse tree, a dependency can be de ned as: &lt; hc - hp &gt; or &lt; hr - TOP &gt;, where hc is the headword of the child node, hp is the headword of the parent node (hc negationslash= hp), hr is the headword of the root node. Sample 2We used the MXPOST toolkit downloaded from ftp://ftp.cis.upenn.edu/pub/adwait/jmx/ to generate POS tags. Stop words including who, what, when, where, why, be, the, a, an, and of are removed in all questions and story sentences. All plural noun POS tags are replaced by their single forms (e.g. NNS-NN); all verb POS tags are replaced by their base forms (e.g. VBN-VB) due to stemming.  PAR for a question and a candidate answer sentence. dependencies in C (see Figure 1) are: &lt;write-TOP&gt; and &lt;pledge-write&gt;.</Paragraph>
      <Paragraph position="4"> The dependency features are represented by the combined POS tags of the modi ers and headwords of (identical) matching dependencies3. A matching dependency between Q and C, &lt;pledge-write&gt; activates a dependency feature: fNN[?]V B(x,y)=1. In total, we obtained 169 and 180 word dependency features from the Remedia and ChungHwa training sets respectively.</Paragraph>
    </Section>
    <Section position="3" start_page="186" end_page="186" type="sub_section">
      <SectionTitle>
4.2 Matching Grammatical Relationships (GR)
</SectionTitle>
      <Paragraph position="0"> We extracted grammatical relationships from the dependency trees produced by MINIPAR (Lin, 1998), which covers 79% of the dependency relationships in the SUSANNE corpus with 89% precision4. IN a MINIPAR dependency relationship: (word1 CATE1:RELATION:CATE2 word2), CATE1 and CATE2 represent such grammatical categories as nouns, verbs, adjectives, etc.; RELATION represents the grammatical relationships such as subject, objects, modi ers, etc.5 Figure 2 shows dependency trees of Q and C produced by MINI-PAR. Sample grammatical relationships in C are pledge N:det:Det the, and write V:by-subj:Prep by.</Paragraph>
      <Paragraph position="1"> GR features are extracted from identical matching relationships between questions and candidate sentences. The only identical matching relationship between Q and C, write V:obj:N pledge activates a grammatical relationship feature: fobj(x,y)=1. In total, we extracted 44 and 45 GR features from the Remedia and ChungHwa training sets respectively.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML