I can remember the day in middle school I was taught that amino acids were "the building blocks of life." I was fascinated by the idea that our complex form, and the form of other living organisms, was like a tiny Lego set, constructed to make us who we were. Even then, in the early 1980s, researchers had already been trying for almost a decade to figure out how those amino acids told proteins what shape to take. Ever since, with evermore powerful computers and complex algorithms, researchers have applied machine learning techniques to answer the same biological question.
Google's (GOOG 1.46%) (GOOGL 1.50%) DeepMind has just provided an answer, and it is blowing the minds of researchers. For nearly 50 years, scientists have questioned how proteins know what shape to fold themselves into, and do it repeatedly time after time. In a modeling competition, DeepMind researchers just broke the code, creating a model that translates the amino acid chains into three-dimensional protein structures. To figure out how this could impact medicine (and investing) it's important to understand what this new knowledge will allow scientists to do. Further, what are the downstream effects? Which areas of biological research may be most affected? And which companies stand to gain -- or lose -- the most?
The competition
Tens of thousands of proteins exist in humans, and there are billions in other species, viruses, and bacteria. The way these proteins fold directly determines what they do. In fact, in molecular biology there is a saying that "structure is function." The folded shape is the key to the role that proteins perform -- like antibodies that fight infections or insulin to regulate blood sugar. That is why since 1994, the Critical Assessment of Protein Structure Prediction (CASP) has been held. It is an event challenging teams to advance the accuracy of predictions in the field of protein structure.
AlphaFold, DeepMind's winning model, was trained on public data of 170,000 protein structures. The program required 128 high-end cloud computing cores running for several weeks to create the algorithm. In the end, two-thirds of the model's accuracy scores represent errors in the design of less than a single atom's width. DeepMind was head and shoulders above other participants in the event, which consisted mostly of academic teams, but included entries from Microsoft (MSFT 1.78%) and Chinese internet giant Tencent (TCEHY -0.10%).
Why it's important
Most drugs being prescribed today were either discovered by chance or time-intensive trial-and-error experiments. Understanding how amino acids direct proteins to twist and fold, taking their three-dimensional shape, will create a better understanding of why each protein becomes what it does and how those signals are transmitted across cell membranes. This could allow scientists to better design drugs that will be utilized by cells in a desired way, understand disease causing mis-folds, and allow drug makers to identify the cause of genetic variations that lead to disease.
In one example during the event, the AlphaFold model provided the structure of a bacterial protein in only 30 minutes. The Max Planck Institute in Germany had been working on that very problem for more than a decade. Next, the team could begin tackling the thousands of unsolved proteins in the human genome and the hundreds of millions of proteins in nature that have not been modeled. This begs the question of when we can all get drugs designed for our own specific biology.
What to look out for
For now, applications to drug discovery will have to wait. It's unclear when or how DeepMind will share its model, and while impressive, it had limitations. For instance, the model struggled to predict protein complexes, or groups, where interactions between proteins can distort shapes. As more proteins are involved, the potential possibility of interactions to model becomes nearly impossible. This mathematical constraint -- known as combinatorial explosion -- is common in advanced modeling but could eventually be overcome with more computing power. It will be important to address this, as protein-protein interactions are one of the key mechanisms being targeted to discover new drugs.
Despite the caveats, the discovery promises to add fuel to the fire of scientific research into how the human body operates. Better understanding the translation of amino acids to proteins validates the potential impact of gene editing and companies like CRISPR Therapeutics (CRSP 7.78%), Intellia Therapeutics (NTLA 0.70%), and Editas Medicine (EDIT -2.23%). Further, solving this problem should ultimately lead to less trial-and-error in the lab and make genome sequencing even more important, benefiting Illumina (ILMN 0.69%), Thermo Fisher Scientific (TMO 0.06%), and Agilent (A 1.17%). After all, DNA carries information for making proteins.
The benefits of DeepMind's discovery will stay largely behind the research curtain, appearing to most of us just as other medical breakthroughs have -- in the form of new or better medicines to treat diseases. But make no mistake about the importance. One CASP judge, a computational biologist at Columbia University, called it one of the most significant breakthroughs of his lifetime. Even the co-founder of CASP added, "I never thought I'd see this in my lifetime." I expect this is the first salvo in a new battle against human disease. Armed with a better understanding of the building blocks of life and once unthinkable computing power, we may soon look back on our current drug discovery process as we now look at treating infections before penicillin was available, or monitoring pregnancy before ultrasounds -- both advancements made in the 1950s. Seventy years from now, people may marvel at the effort drug discovery took and wonder how we were ever able to develop drugs with such a haphazard process.