File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1635_evalu.xml

Size: 4,442 bytes

Last Modified: 2025-10-06 13:59:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1635">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Protein folding and chart parsing</Title>
  <Section position="6" start_page="297" end_page="298" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="297" end_page="297" type="sub_section">
      <SectionTitle>
5.1 Folding accuracy
</SectionTitle>
      <Paragraph position="0"> With our current pruning strategy, CKY finds the native state of 96.7% of all 24,900 unique-folding 20mers, confirming our hypothesis that the hierarchical greedy search that is implemented in CKY is a viable strategy. With exhaustive search, the &amp;quot;conformational search number&amp;quot; (CSN), ie. total number of conformations searched per sequence (summed over all cells), corresponds on average to 2.5% of all possible conformations for a sequence of length 20. We have also explored restrictions where an initial contact is only allowed between H monomers whose distance along the backbone is smaller than or equal to a given threshold [?]. For [?] a0 7, accuracy drops slightly to 95.2%, but the number of searched conformations corresponds to only 1% of the search space.</Paragraph>
    </Section>
    <Section position="2" start_page="297" end_page="298" type="sub_section">
      <SectionTitle>
5.2 The chart landscape
</SectionTitle>
      <Paragraph position="0"> Since we employ a beam search strategy, all conformations that remain in a cell after pruning have the same energy level. Therefore, CKY identifies the substring or chart energy landscape of each sequence, a function f a1 i a0 j a2 which maps sub-strings a1 i a0 j a2 to their lowest accessible energy level. Since the energy of a conformation in the HP model is determined by the number of HH contacts, f a1 i a0 j a2 a7 f a1 ia1 a0 ja1 a2 for all ia1 a7 i a0 j a7 ja1 . That is, unlike standard energy functions, f has no local minima. As shown in figure 3 (where the size of the cells is adjusted to reflect the length of the corresponding substrings), the &amp;quot;slope&amp;quot; of f determines the amount of search required to fold a sequence. Sequence that require little search have a steep funnel, whereas sequence that require a lot of search have a flat, golf-course like landscape.</Paragraph>
      <Paragraph position="1"> HH contacts impoose constraints on the number of conformations, therefore a cell with lower energy will also have fewer entries than a cell with higher energy that spans a string of the same length. This is analogous to standard energy landscapes (Dill and Chan, 1997), where a plateau corrresponds to an entropic barrier, which requires a lot of search.</Paragraph>
      <Paragraph position="2"> 5.3 The &amp;quot;constituent structure&amp;quot; of proteins We can extract the set of all folding routes (all trees which lead to the native state) from the chart, visualize the ensemble-averaged &amp;quot;constituent structure&amp;quot; of a chain by coloring each cell in the (adjusted) chart by the posterior probability that native routes go through it (here black:p=1 and white:p=0). A probability of one corresponds to a structure that has to be formed by all routes, whereas a probability of zero represents a set of misfolded structures. Misfolding arises if the lowest energy structures contain non-native (incor null rect) contacts. Since these contacts have to be broken before the native state can be reached, requiring an uphill step in energy, they correspond to energetic barriers.</Paragraph>
      <Paragraph position="3"> Figure 4 shows the &amp;quot;constituent structure&amp;quot; of the conformation shown in Figure 1, and one of its corresponding folding routes. Many sequences show very specific patterns of folding routes, as in the example given here, where the b-strands 7-10 and 11-16 and the a-helix from 17-24 &amp;quot;grow&amp;quot; onto the hairpin from 1-5.</Paragraph>
      <Paragraph position="4"> A number of proteins are known to form so-called &amp;quot;foldons&amp;quot; (Maity et al., 2005). These are substrings of the chain which can be found in their near-native conformation before the entire chain is completely folded. In our parsing perspective on protein folding, these foldons correspond to nodes that are shared by sufficiently many native routes that they can be detected experimentally.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML