NORA’s largest event to date – about a 50-year-old grand challenge in biology
If you haven’t heard it before, make sure to memorize the term Protein Folding! It is one of the great mysteries of life, and Artificial Intelligence just solved the problem – in what has been described as Artificial Intelligence’s greatest achievements to date. The tools used to solve this mystery are accessible to researchers, which will contribute to further new and important research and discoveries in this field. And NORA was first to hold a large-scale open event about the discovery with over 600 participants from around the world!
Why are Proteins important?
A protein’s shape is closely linked with its function, and the ability to predict this structure unlocks a greater understanding of what it does and how it works. Many of the world’s greatest challenges, like developing treatments for diseases or finding enzymes that break down industrial waste, are fundamentally tied to proteins and the role they play.
This has been a focus of intensive scientific research for many years, using a variety of experimental techniques to examine and determine protein structures, such as nuclear magnetic resonance and X-ray crystallography. These techniques, as well as newer methods like cryo-electron microscopy, depend on extensive trial and error, which can take years of painstaking and laborious work per structure, and require the use of multi-million dollar specialised equipment.
In his acceptance speech for the 1972 Nobel Prize in Chemistry, Christian Anfinsen famously postulated that, in theory, a protein’s amino acid sequence should fully determine its structure. This hypothesis sparked a five decade quest to be able to computationally predict a protein’s 3D structure based solely on its 1D amino acid sequence as a complementary alternative to these expensive and time consuming experimental methods. A major challenge, however, is that the number of ways a protein could theoretically fold before settling into its final 3D structure is astronomical.
In 1969 Cyrus Levinthal noted that it would take longer than the age of the known universe to enumerate all possible configurations of a typical protein by brute force calculation – Levinthal estimated 10^300 possible conformations for a typical protein. Yet in nature, proteins fold spontaneously, some within milliseconds – a dichotomy sometimes referred to as Levinthal’s paradox.
Results from the CASP14 assessment
In 1994, Professor John Moult and Professor Krzysztof Fidelis founded CASP as a biennial blind assessment to catalyse research, monitor progress, and establish the state of the art in protein structure prediction. It is both the gold standard for assessing predictive techniques and a unique global community built on shared endeavour. Crucially, CASP chooses protein structures that have only very recently been experimentally determined (some were still awaiting determination at the time of the assessment) to be targets for teams to test their structure prediction methods against; they are not published in advance. Participants must blindly predict the structure of the proteins, and these predictions are subsequently compared to the ground truth experimental data when they become available. We’re indebted to CASP’s organisers and the whole community, not least the experimentalists whose structures enable this kind of rigorous assessment.
The main metric used by CASP to measure the accuracy of predictions is the Global Distance Test (GDT) which ranges from 0-100. In simple terms, GDT can be approximately thought of as the percentage of amino acid residues (beads in the protein chain) within a threshold distance from the correct position. According to Professor Moult, a score of around 90 GDT is informally considered to be competitive with results obtained from experimental methods.
In the results from the 14th CASP assessment, released December 2020, Deepmind’s latest AlphaFold system achieves a median score of 92.4 GDT overall across all targets. This means that Alphafold’spredictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer). Even for the very hardest protein targets, those in the most challenging free-modelling category, AlphaFold achieves a median score of 87.0 GDT (data available here).
These exciting results open up the potential for biologists to use computational structure prediction as a core tool in scientific research. Deepmind’s methods may prove especially helpful for important classes of proteins, such as membrane proteins, that are very difficult to crystallise and therefore challenging to experimentally determine.
The outstanding performance of DeepMind’s AlphaFold2 attracted great attention and the scientific community raised the question whether the code would be publicly available and whether such a result could be achieved in academia. After all, DeepMind’s resources are far beyond the reach of any single academic institution.
These questions were answered July 15, 2021. On this date, DeepMind published a paper about AlphaFold v2.0 in Nature with accompanying open code on GitHub. Notably, RoseTTAFold, another deep learning-based algorithm for protein folding was published the same day in Science. The latter is developed collaboratively between several universities with lead researchers from the University of Washington. RoseTTAFold performs nearly as well as AlphaFold v2.0 and is much more efficient in terms of computing power needed.
NORA Alphafold and RoseTTAFold workshop
As Norway’s leading and largest AI-network, NORA took upon itself to organize a two-day workshop about these recent significant scientific discoveries on August 30 and September 1, 2021. The workshop attracted world leading experts in the field of Artificial Intelligence, Protein Folding and related topics, who amongst other presented their tools and methods. Among the keynotes were:
- Minkyung Baek from University of Washington who was first author of the RoseTTAfold-article.
- Sameer Velanker, Team Leader for Protein Data Bank in Europe.
- Randy John Read, Cambridge University
- Jim Brase, Co-lead of The ATOM consortium and Deputy Associate Director for Computing, Lawrence Livermore National Laboratory.
The workshop was hosted for participants who wanted to get hands on experience with the tools and methods. Both algorithms were pre-installed on Norwegian supercomputers, and workshop participants were tutored on how to run AlphaFold v2.0 and RoseTTAFold.
The goal of the workshop was not only to boost Norwegian and international research within protein folding and function by advanced AI methods, but also to inspire development of AI-powered biotech in Norway. Such great was the interest, that this became NORA’s largest event to date with over 600 participants from across the world.
NORA wants to show our gratitude to DeepMind and the Rosettafold team for opening and publishing their tools. These examples of open science will surely boost research and development in the field, ultimately to the benefit for patients and humanity, says Klas H. Pettersen, CEO of NORA.
Please follow this link to watch recording of the workshop.
Unlocking new possibilities
AlphaFold and RoseTTAfold are one of most significant advances in Artificial Intelligence to date but, as with all scientific research, there are still many questions to answer. Not every structure will be predicted perfect. There’s still much to learn, including how multiple proteins form complexes, how they interact with DNA, RNA, or small molecules, and how one can determine the precise location of all amino acid side chains. There’s also much to learn about how best to use these scientific discoveries in the development of new medicines, ways to manage the environment, and more.
For everybody working on computational and machine learning methods in science, systems like AlphaFold and RoseTTAfold demonstrate the stunning potential for AI as a tool to aid fundamental discovery. Just as 50 years ago Anfinsen laid out a challenge far beyond science’s reach at the time, there are many aspects of our universe that remain unknown. The progress announced today gives everybody further confidence that AI will become one of humanity’s most useful tools in expanding the frontiers of scientific knowledge, and we’re looking forward to the many years of hard work and discovery ahead!
This article is based on a blog by the Aplhafold Team. Please read the entire blog at Deepmind.com