File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1635_concl.xml
Size: 4,261 bytes
Last Modified: 2025-10-06 13:55:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1635"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Protein folding and chart parsing</Title> <Section position="7" start_page="298" end_page="299" type="concl"> <SectionTitle> 6 Conclusions and future work </SectionTitle> <Paragraph position="0"> This paper has demonstrated that an adaptation of the CKY chart parsing algorithm can be succcessfully applied to protein folding in the 2D HP model, a commonly used simplified lattice model which captures essential physical and computational properties of the real folding process. Both syntactic parsing and protein folding algorithms search for the globally optimal structure for a given input string. And any given sentence has a large number of possible interpretations, just as any amino acid sequence has an astronomical number of possible spatial conformations. Therefore it is not surprising if similar techniques can be applied to both tasks. In both cases, it seems to be possible to exploit locally available information with a greedy, hierarchical search strategy, which starts with local, independent searches for small substrings (to first determine which small phrases might make sense, or to find partially stable peptide structures) and then either: (a) 'grows' one substring into a larger substring, or (b) 'assembles' two substrings together into a larger substring. More interestingly, in the protein folding case, such recursive hierarchical search strategies, which imply tree-shaped folding routes, have been postulated independently for biological and biophysical reasons. This may indicate a deeper, natural connection between these two processes.</Paragraph> <Paragraph position="1"> Given that hierarchical search strategies for protein folding have been proposed in the biological literature, our primary interest here has been the question of whether a greedy, hierarchical search as implemented in CKY is able to identify the native state of proteins in the HP model.</Paragraph> <Paragraph position="2"> The research presented here aims to verify these predictions with an explicit computational model.</Paragraph> <Paragraph position="3"> Therefore, we were less concerned with improving efficiency, and more with the properties of this algorithm, which we consider a baseline method upon which more sophisticated techniques such as best-first parsing (Caraballo and Charniak, 1998) or Aa0 search (Klein and Manning, 2003) may well be able to improve.</Paragraph> <Paragraph position="4"> We also plan to adapt this technique to other, more realistic, representations of proteins, and to longer sequences. For longer sequences, we will take advantage of the fact that CKY is easily parallelizable, since any operation which combines the entries of two cells chart a4 ia5a6a4 ka5 and chart a4 k a2 1a5a6a4 ja5 is completely independent of other parts of the chart.</Paragraph> <Paragraph position="5"> If the routes by which proteins fold really are trees, a dynamic programming technique such as CKY is inherently suited to model this process, since it is the most efficient way to search all possible trees. This distinguishes it from more established techniques such as Monte Carlo, which can only follow one trajectory at a time, and require multiple runs to sample the underlying landscape to a sufficient degree. What CKY by itself does not give us is an accurate prediction of the rates that govern the folding process, including misfolding and unfolding events. However, we believe that it is possible to obtain this information from the chart by extracting all tree cuts (which corresond to the states of the chain at different stages during the folding process) and calculating folding rates between them.</Paragraph> <Paragraph position="6"> Our work is only the beginning of a larger research program: eventually we would like to be able to model the folding process of real proteins. One aim of this paper was therefore to point out the fundamental similarities between statistical parsing and protein folding. We believe that this is a fertile area for future work where other natural language processing techniques may also prove to be useful.</Paragraph> </Section> class="xml-element"></Paper>