Recently, we showed that a variational auto encoder with convolutional filters on contact maps derived from protein folding trajectories can be used to cluster conformations based on their degree of folding. This method, what we referred to as CVAE, enables us to compactly place simulations in a low dimensional manifold in an unsupervised manner. We also demonstrated the CVAE can cluster equilibrium simulations to identify a small number of biophysically relevant potential reaction coordinates/ collective variables. This led us to naturally question: if the CVAE determined reaction coordinates are meaningful, can it also lead us to fold proteins faster?
We answer this question, in part, through a workflow that we set up using the ensemble toolkit (ENTK) — part of the Radical Cybertools platform from Shantenu Jha’s group at Rutgers University. The reason for relying on a workflow is mainly motivated by the fact that such ensemble simulations require specialized support such that we can orchestrate O(100s)-O(1000s) of jobs on a supercomputer. Furthermore, since AI/ML approaches are being used to control the next set of simulations, the timing and synchronization of how the next set of simulations are selected and run require precise scheduling, necessitating the use of effective middleware such as ENTK/Radical-Cybertools.
A take home message from our evolving story is that deep learning approaches such as the CVAE can indeed be used to steer MD simulations towards sampling folded states of a protein. Here is a movie that shows the states that we sample from our trajectories guided by the CVAE.
Take a look at the papers: (1) Deep Generative Model Driven Protein Folding Simulation — published as part of Parco’19 (Prague), (2) DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding — will be published in TCHPC as part of the Third Workshop for Deep Learning on Supercomputers (SC’19).
Additionally, we also developed a visual analytics system to interact with simulation data based on our deep learning techniques. The paper is accepted as part of the IEEE Big Data 2019 (Government and Industry track).