A deep learning system can predict the structure of a protein using its genetic sequence more accurately than any previous modelling system, according to a study by researchers at DeepMind and UCL.
Nearly every function our body performs relies on proteins. Predicting the intricate 3D structure of a protein is important because its structure largely determines its function and, once the structure is known, scientists can develop drugs that target this unique shape.
Protein structure can be determined experimentally, using techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, but these are extremely costly and time consuming.
The new study found that a deep learning system called AlphaFold could model protein structure from scratch - i.e., based only on genetic sequence - better than any previous modelling system and with a similar accuracy to systems drawing on templates of previously solved proteins.
Professor David Jones (UCL Computer Science), Head of the UCL Bioinformatics Group and study co-author, said: "The 3D structure of a protein is probably the single most useful piece of information scientists can obtain to help understand what the protein does and how it works in cells.
"Experimental techniques to determine protein structures are time consuming and expensive, so there’s a huge demand for better computer algorithms to calculate the structures of proteins directly from the gene sequences which encode them, and DeepMind’s work on applying AI to this long-standing problem in molecular biology is a definite advance.
"One eventual goal will be to determine accurate structures for every human protein, which could ultimately lead to new discoveries in molecular medicine."
Dr Andrew Senior, research scientist at DeepMind and lead author of the study, said: "Protein structure prediction is an extremely hard problem and our work builds on decades of progress by experts in the field. Although there is more work we need to do before we achieve consistently accurate protein predictions, we are excited about this step forward."
The research team trained a deep neural network - sets of algorithms modelled loosely on the human brain that are designed to recognise patterns in unstructured data - to predict two specific properties of protein structure: the distances between pairs of amino acids and the angles between chemical bonds that connect those amino acids. Another neural network predicted the distribution of distances between amino acid residues in a protein.
A third set of algorithms was then trained to estimate how close the proposed protein structure was to the correct answer. Researchers identified proteins that matched their predictions and then used algorithms to invent new fragments of protein within a protein structure to get closer to what the structure may be.
Finally, a mathematical technique called gradient descent was used to improve the accuracy of the predicted structures.
To date, only about half of the proteins in the human body have been mapped. It is hoped that being able to map the structure of a protein rapidly and economically may help contribute to the understanding of disease and to the discovery of new treatments.