End-to-end biological programming — updates

We have new projects looking at the development of machine learning and artificial intelligence (ML/AI) tools for end-to-end biological programming. The project is being supported under the auspices of the US Department of Energy (DOE) Biological and Environmental Research (BER) program and a lab-wide initiative in infusing AI for effective design of biological experiments.

Current lab automation techniques are geared towards optimizing single processes repeatedly and are not well suited for implementing AI-driven experiments. To fully utilize their capabilities in the context of novel, high-throughput biological applications, we need (1) robust programming constructs that can reliably and repeatedly execute experimental protocols, and (2) scalable ML/AI techniques that can automatically capture data, process/analyze, and guide experimental design based on partially available (or prior) information. We posit that designing a ‘digital twin’ for executable synthetic biology will accelerate and facilitate the abstraction, exchange, and reuse of experimental protocols; enable integration of process descriptions and their executable forms (i.e., experiments); simplify representations of protocols; and offer practical ways to test, verify and validate such data generation processes.

For synthetic biology applications, we are interested in mining gene parts (proteins, circuits, etc) and how we can automate the construction of such parts aided by AI. This project is a continuation of the efforts led by Dr. Carla Mann in looking at ways to build CRISPR-Cas9 probes targeting specific E. coli functions.

For protein design applications, we are interested in fast methods to build materials out of disordered peptides and proteins. We now have collaborations with both industry and academic partners where we are looking to build these applications.

If these ideas sound interesting, please do get in touch with me/ through the official Argonne website.

408954: Postdoctoral Appointee – Data Science and Learning for End-to-End Biological Programming: https://bit.ly/3jfVKJw
408953: Postdoctoral Appointee – Biologist for Rapid Design and Engineering of Microbiological Systems: https://bit.ly/35hX8WT

Lab progress report (Dec 2020)

What a year it has been! From starting the year with the novel coronavirus disease (SARS-CoV-2) research to ending up with the IEEE Gordon Bell Award for use of HPC resources towards COVID-19 research, this has been an exhilarating ride!

Our team, in collaboration with Drs. Rommie Amaro (UCSD), Lillian Chong (University of Pittsburgh), and John Stone (UIUC) used AI-driven simulations to probe how the SARS-CoV-2 Spike protein interacts with the human ACE-2 receptor. Our video presentation is embedded below and you can also watch it on YouTube. This was a massive effort by Anda Trifan, Alex Brace, Austin Clyde, Heng Ma in the group working with 25 other researchers from various institutions. We are also grateful for our collaboration with NVIDIA (Thorsten Kurth, Tom Gibbs, Abe Stern) for enabling the development of the adversarial autoencoder models for characterizing protein conformational changes.

Our review on the use of AI-methods for studying disordered proteins was also accepted to Current Opinion in Structural Biology, where we outline how AI methods are applied to study IDP structure-function relationships. Kudos to Heng Ma, Akash Parvatikar and Chakra Chennubhotla for making it happen!

COVID-19 work at the lab

If you were wondering what happened to us in the mean time, don’t despair… the lab has been working on the ongoing COVID-19 work and has been really pushing hard in terms of finding novel molecules that can help address this pandemic.

We are now collaborating with a number of groups including Shantenu Jha (Rutgers/ Brookhaven), Carlos Simmerling (Stony Brook University), Rommie Amaro (UCSD), Peter Coveney (University College London), Shozeb Haider (University College London), Jeremy Smith (University of Tennessee/ Oak Ridge National Laboratory) on the biophysics side. See the story published here: Intel/ Jan Rowell and at Texas advanced computing center (TACC).

In addition, we have also started to collaborate with chemistry groups at the University of Chicago as well as the University of Michigan looking at ways to leverage some of our AI capabilities in designing small molecules that can target various viral proteins.

A special shout out to Drs. Heng Ma and Carla Mann who have really worked hard at this problem — including sleepless nights in setting up simulations on the supercomputers at the lab, as well as across the entire supercomputing ecosystem.

I also have to thank one of my graduate interns, Anda Trifan (student of Emad Tajkhorshid, UIUC) for taking the time — in spite of a baby and all — for plowing through the hardship of scaling our deep learning code on emerging supercomputers.

Austin Clyde, who is Rick Stevens’ PhD student at the University of Chicago has been instrumental developing novel machine learning tools to study how small molecule interactions may alter viral proteins’ behaviors.

Also, special thanks are due to a number of team members (Matteo Turilli, Hyungro Lee, Li Tan, Andre Merzky, M.A. Titov) for their help in standing up and running systems on all of the supercomputing resources.

Two new articles showing how deep learning driven adaptive molecular simulations can fold small proteins

Recently, we showed that a variational auto encoder with convolutional filters on contact maps derived from protein folding trajectories can be used to cluster conformations based on their degree of folding. This method, what we referred to as CVAE, enables us to compactly place simulations in a low dimensional manifold in an unsupervised manner. We also demonstrated the CVAE can cluster equilibrium simulations to identify a small number of biophysically relevant potential reaction coordinates/ collective variables. This led us to naturally question: if the CVAE determined reaction coordinates are meaningful, can it also lead us to fold proteins faster?

We answer this question, in part, through a workflow that we set up using the ensemble toolkit (ENTK) — part of the Radical Cybertools platform from Shantenu Jha’s group at Rutgers University. The reason for relying on a workflow is mainly motivated by the fact that such ensemble simulations require specialized support such that we can orchestrate O(100s)-O(1000s) of jobs on a supercomputer. Furthermore, since AI/ML approaches are being used to control the next set of simulations, the timing and synchronization of how the next set of simulations are selected and run require precise scheduling, necessitating the use of effective middleware such as ENTK/Radical-Cybertools.

A take home message from our evolving story is that deep learning approaches such as the CVAE can indeed be used to steer MD simulations towards sampling folded states of a protein. Here is a movie that shows the states that we sample from our trajectories guided by the CVAE.

Take a look at the papers: (1) Deep Generative Model Driven Protein Folding Simulation — published as part of Parco’19 (Prague), (2) DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding — will be published in TCHPC as part of the Third Workshop for Deep Learning on Supercomputers (SC’19).

Additionally, we also developed a visual analytics system to interact with simulation data based on our deep learning techniques. The paper is accepted as part of the IEEE Big Data 2019 (Government and Industry track).

New article in Frontiers Journal on using machine learning to tune force field parameters for disordered protein systems

Congrats to Dr. Omar Demerdash on his first author paper to be published in the journal, Frontiers in Molecular Biosciences — Bimolecular modeling and simulations.

Our paper applies the ForceBalance approach (source code link: https://github.com/leeping/forcebalance), developed by Lee Ping Wang and colleagues, to refine force field parameters for intrinsically disordered protein (IDPs) ensembles.

The paper is a collaboration with the Center for Molecular Biophysics at Oak Ridge National Laboratory (Drs. Julie Mitchell, Jeremy Smith and Loukas Petridis).

This research was inspired by the fact that traditional experimental techniques such as NMR or X-ray crystallography find it challenging to resolve all of the structural diversity of IDPs.

Biophysical Journal cover!

Congrats to Dr. Blake A. Wilson @ Vanderbilt for making the cover of the Biophysical Journal! Working with Dr. Carlos Lopez, we utilized microsecond timescale simulations of cardiolipin membranes (at various concentrations) to understand the mechanisms of how it affects the membrane structure and dynamics. You can check out the cool image here: https://www.cell.com/biophysj/issue?pii=S0006-3495(18)X0017-4

Our article was also highlighted on the Biophysical Society Blog. Here is a link to the cover and its description on BPS Blog.

Code from the paper as part of the PyBILT toolkit is here: http://pybilt.readthedocs.io/en/latest.

Paper in Journal of Parallel and Distributed Computing (JPDC) is accepted!

Congrats to Michael T. Young (Todd) for getting his first author paper accepted to the Journal of Parallel & Distributed Computing (JPDC). The paper focuses on the use of HyperSpace, our distributed Bayesian hyper parameter optimization (HPO) approach for optimizing reinforcement learning algorithms. Joint work with M. T. Young, R. Kannan and J. D. Hinkle (all at Oak Ridge National Lab). The final copy of the paper will be available shortly!

Check out the code at: http://www.github.com/yngtodd/hyperspace for HyperSpace.

You can check out the repo: http://www.github.com/yngtodd/hyperpoints for the experiments run in our paper.

Hello from Argonne National Laboratory!

After nearly 8 years at Oak Ridge National Laboratory, TN, the Arvind Ramanathan Lab has moved to Argonne National Laboratory, IL. We are looking forward to expanding our research in the areas of computational biology through close knit interactions with both biologists and computer scientists. Using AI, we plan to advance our knowledge of how IDPs function.

We have various openings in the lab. If interested please leave me a message in the blog through contact page.

Thank you.