The question has been lingering for years in medical science circles. Since 2020, when the artificial intelligence (AI) model AlphaFold made it possible to predict protein structures, would the technology open the drug discovery floodgates?
The right drug (a protein designed to bind to the mutation, stopping its activity) is the key.
But proteins are fidgety and flexible.
"They're basically molecular springs," said Gabriel Monteiro da Silva, PhD, a computational chemistry research scientist at Genesis Therapeutics. "Your key can bend and alter the shape of the lock, and if you don't account for that, your key might fail."
This is the protein problem in drug development. Another issue making this challenge so vexing is that proteins don't act in isolation. Their interactions with other proteins, ribonucleic acid, and DNA can affect how they bind to molecules and the shapes they adopt.
Newer versions of AlphaFold, such as AlphaFold Multimer and AlphaFold 3 (the code for which was recently revealed for academic use), can predict many interactions among proteins and between proteins and other molecules. But these tools still have weak points scientists are trying to overcome or work around.
"Those kinds of dynamics and multiple conformations are still quite challenging for the AI models to predict," said James Zou, PhD, associate professor of biomedical data science at Stanford University, Stanford, California.
"We're finding more and more that the only way we can make these structures useful for drug discovery is if we incorporate dynamics, if we incorporate more physics into the model," said Monteiro da Silva.
Monteiro da Silva spent 3 years during his PhD at Brown University, Providence, Rhode Island, running physics-based simulations in the lab, trying to understand why proteins carrying certain mutations are drug resistant. His results showed how "the changing landscape of shapes that a protein can take" prevented the drug from binding.
It took him 3 years to model just four mutations.
AI can do better -- and the struggle is fascinating. By developing models that build on the predictive power of AlphaFold, scientists are uncovering new details about protein activity -- insights that can lead to new therapeutics and reveal why existing ones stop working -- much faster than they could with traditional methods or AlphaFold alone.
By predicting protein structural details, AlphaFold models also made it possible to predict pockets where drugs could bind.
A notable step, "but that's just the starting point," said Pedro Beltrao, PhD, an associate professor at Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. "It's still very difficult, given a pocket, to actually design the drug or figure out what the pocket binds."
Going back to the lock-and-key analogy: While he was at Brown, with a team of researchers in the Rubenstein Group, Monteiro da Silva helped create a model to better understand how mutations affect "the shape and dynamics of the lock." They manipulated the amino acid sequences of proteins, guiding their evolution. This enabled them to use AlphaFold to predict "protein ensembles" and how frequently those ensembles appear. Each ensemble represents the many different shapes a protein can take under given conditions.
"Essentially, it tries to find the most common shapes that a protein will take over an arbitrary amount of time," Monteiro da Silva said. "If we can predict these ensembles at scale and fast, then we can screen many mutations that cause resistance and develop drugs that will not be affected by that resistance."
To evaluate their method, the researchers focused on ABL1, a well-studied kinase that causes leukemia. ABL1 can be drugged - unless it carries or develops a mutation that causes drug resistance. Currently there are no drugs that work against proteins carrying those mutations, according to Monteiro da Silva. The researchers used their hybrid AI-meets-physics method to investigate how drugs bind to different ABL1 mutations, screening 100 mutations in just 1 month.
"It's not going to be perfect for every one of them. But if we have 100 and we get 20 with good accuracy, that's better than doing four over 3 years," Monteiro da Silva said.
A forthcoming paper will make their model publicly available in "an easy-to-use graphical interface" that they hope clinicians and medicinal chemists will try out. It can also complement other AI-based tools that dig into protein dynamics, according to Monteiro da Silva.
Another aspect of the protein problem is scale. One protein can interact with hundreds of other proteins, which in turn may interact with hundreds more, all of which comprise the human interactome.
Feixiong Cheng, PhD, helped build PIONEER, a deep learning model that predicts the three-dimensional (3D) structure of interactions between proteins across the interactome.
Most disease mutations disrupt specific interactions between proteins, making their affinity stronger or weaker, explained Cheng. To treat a disease without causing major side effects, scientists need a precise understanding of those interactions.
"From the drug discovery perspective, we cannot just focus on single proteins. We have to understand the protein environment, in particular how the protein interacts with other proteins," said Cheng, director of Cleveland Clinic Genome Center, Cleveland.
PIONEER helps by blending AlphaFold's protein structure predictions with next-generation sequencing, a type of genomic research that identifies mutations in the human genome. The model predicts the 3D structure of the places where proteins interact -- the binding sites, or interfaces -- across the interactome.
"We tell you not only that a binds b, but where on a and where on b the two proteins interact," said Haiyuan Yu, PhD, director of the Center for Innovative Proteomics, Cornell University, and co-creator of PIONEER.
This can help scientists understand "why a mutation, protein, or even network is a good target for therapeutic discovery," Cheng said.
The researchers validated PIONEER's predictions in the lab, testing the impacts of roughly 3000 mutations on 7000 pairs of interacting proteins. Based on their findings, they plan to develop and test treatments for lung and endometrial cancer.
PIONEER can also help scientists home in on how a mutation causes a disease, such as by showing recurrent mutations.
"If you find cancer mutations hitting an interface again and again and again, it means that this is likely to be driving cancer progression," said Beltrao.
Beltrao's lab and others have looked for recurrent mutations by using AlphaFold Multimer and AlphaFold 3 to directly model protein interactions. It's a much slower approach (Pioneer is more than 5000 faster than AlphaFold Multimer, according to Cheng). But it could allow scientists to model interfaces that are not shown by PIONEER.
"You will need many different things to try to come up with a structural modeling of the interactome, and all these will have limitations," said Beltrao. "Their method is a very good step forward, and there'll be other approaches that are complementary, to continue to add details."
Large language models, such as ChatGPT, are another way that scientists are adding details to protein structure predictions. Zou used GPT-4 to "fine tune" a protein language model, called evolutionary scale modeling (ESM-2), which predicts protein structures directly from a protein sequence.
First, they trained ChatGPT on thousands of papers and studies containing information about the functions, biophysical properties, and disease relevance of different mutations. Next, they used the trained model to "teach" ESM-2, boosting its ability "to predict which mutations are likely to have larger effects or smaller effects," Zou said. The same could be done for a model like AlphaFold, according to Zou.
"They are quite complementary in that the large language model contains a lot more information about the functions and the biophysics of different mutations and proteins as captured in text," he said, whereas "you can't give AlphaFold a piece of paper."
Exactly how AlphaFold makes its predictions is another mystery. "It will somehow learn protein dynamics phenomenologically," said Monteiro da Silva. He and others are trying to understand how that happens, in hopes of creating even more accurate predictive models. But for the time being, AI-based methods still need assistance from physics.
"The dream is that we achieve a state where we rely on just the fast methods, and they're accurate enough," he said. "But we're so far from that."