Understanding life from the inside out

Every living cell has thousands of different proteins inside that keep it alive and well.

Proteins carry out the labor within our cells, so it’s important to understand what they do and how they do that work. The shape is very important, as it determines their function. These shapes are called folds.

There an immense number of possible combinations and permutations of amino acid sequences, and each of these results in different folds in 3D space. Most of these don’t result in anything functional, and in fact malformed proteins (prions), can be very harmful, even spreading to other tissue, and potentially even other organisms that consume them .

Proteins have a long sequence of amino acids which fold into these structures. All of the information required for a fold is purely encoded by the amino acids and their sequence, which is turn encoded by DNA. Therefore, if one knows only a DNA sequence, it should in theory be possible work out not only the resulting protein but also its structure.

The painstakingly difficult and laborious process of resolving a protein structure is usually a laborious blend of human and machine labor, trying to infer the properties of a protein from a snapshot of a moment in time, captured in a crystalized from.

Scientists have been researching the processes of protein folding for decades, working to map the three-dimensional shapes of the proteins that are responsible for a vast number biological processes. Only a tiny portion of the known proteins have ever been accurately modelled.

Google’s Deepmind claims to have created a machine learning system AlphaFold 2.0 that is able to resolve those problems in a matter of days, given only the primary structure (the sequence of amino acids in the polypeptide chain). This apparent approximate solution has arrived far sooner than anticipated by many, and is something of a Sputnik Moment in structural biology.

If one can predict the structure of a protein given a certain sequence, biology becomes an open book instead of a confetti of letters.

I can imagine this leading to better working drugs/medicine that work more efficiently and with less side effects. We may also see cures for diseases, including cancer, most nutritious plants, and plastic-deconstructing enzymes, maybe we can even extent lifespan also.

Advances in cultured protein such as meat and bioprinting of organs may develop more quickly. There’s even potential within evolutionary research. We can rewinding the tape of evolution by experimenting with amino acids one by one.

There are several limitations: The reported precision isn’t perfect, but it’s generally a decent approximation of reality. Results still appear strongly based on existing input data and known references. For example, the systems appears to predict that certain exotic proteins fold like common ones, and reportedly only around 2/3 of Deepmind’s protein predictions matched with empirical truth.

Knowing the structure of a protein alone doesn't tell you what ligands will bind it. (Drugs are ligands) That's likely to be a significant next challenge, as we already structures available for most proteins of particular interest.

The reported ‘solving’ of protein folding here is absolutely not the case. Regardless, this is one of the hardest and most important problems in computer science, and appears to be a very significant advance. Right now we have access to a tiny percentage of all known protein structures. Soon, we may have an educated guess about all of them.

The impact of derivatives of this research is likely to be profound across a wide number of sectors, far beyond mere drug discovery.