As we approach the end of 2024, First Opinion is publishing a series of essays on the state of AI in medicine and biopharma.
There's more to life than protein folding.
You'd be forgiven for believing otherwise based on the news lately. AlphaFold just won a Nobel Prize, and almost daily another foundation model debuts to much fanfare and venture funding.
I get the hype. Predicting protein structure from sequence paves the way for everything from enzyme engineering to rational drug design. These goals were pipe dreams back when I would donate spare cycles on my Compaq PC to the Folding@home project. Thanks to AI, they're now far more attainable.
But by no means has biology been solved. AlphaFold can't answer every question. For example: Have you picked a safe and effective drug target? Well, where in a cell and where in the body does your protein of interest sit? What role does it play in signaling pathways? How does it drive tissue (dys)function, from fluid flow to fibrosis? Good luck getting a computer to tell you any of that.
For now, no foundation model can predict what a cell, a tissue, or a whole organism will do. What we call AI in biology today is mostly about chemistry -- how molecules bend into shape and bind to one another.
If you want to know which molecules matter in the first place, you'll need to answer that question for yourself. The data you need doesn't exist yet. Expect to pay for experiments -- maybe even pick up a pipette. I've seen enough people (including me) belatedly realize the limits of AI in biology that I thought I'd summarize the journey and save everyone some time. Let's call it the five stages of grief (techbio's version).
Allow me to set the scene. Our tragic hero is a numbers person -- a physicist? a programmer? -- full of hope and hubris about what computers can do for them.
Artificial intelligence requires real data. The unsung hero underlying AlphaFold is the Protein Data Bank, or PDB. Since 1971, postdocs the world over have painstakingly crystallized and cataloged the structures of nearly 250,000 proteins, in the process assembling the ideal training corpus for today's neural networks. Unfortunately the PDB is very much the exception to the rule. And the further you go toward whole organs and organisms, the less likely there's a public database to piggyback on.
So, several startups have amassed whole databases themselves. Fauna Bio believes we have a lot to learn about obesity from animals that hibernate as they swing from feast to famine. Fauna has made multi-omic measurements across hundreds of species of mammals to uncover the molecular underpinnings of their remarkable resilience. By feeding this data into graph neural networks, Fauna predicts and pursues novel connections between diseases and drug targets. None of this AI would be possible if Fauna hadn't carefully characterized the metabolism of the 13-lined ground squirrel. I'm sure many people wrote that off as an overly academic money pit. Five hundred million dollars in biobucks from Eli Lilly beg to differ.
Indeed, learning from nature appears to be a winning strategy. Enveda pairs artificial intelligence with folk wisdom to decipher the chemical contents of medicinal plants. Enveda's foundation model for chemistry, PRISM, builds on the foundational language model BERT, with peaks in mass spectra taking the place of words in sentences. Enveda has never been under the illusion that it could train PRISM purely on public data. The company collected 1.2 billion mass spectra to feed its GPUs, generating 600 million of those training examples itself. That kind of data doesn't come cheap, but the investment appears to have paid off. Enveda has one drug in the clinic and nine development candidates on their way there -- remarkable productivity for a company that started from scratch five years ago.
(Side note: Botany is full of billion-dollar ideas. There would be no aspirin without the willow tree and no Alnylam without the purple petunia.)
At this point, you might feel like all hope is lost if you don't have hundreds of millions of dollars or data points. That's certainly the consensus among those in the know: Woefully constrained by data, AI in biology is destined to be more evolutionary than revolutionary.
Luckily, you've still got your brain. You don't need an oracular, spectacular foundation model if something simpler can help you run the right experiments to find your answer. Call it "augmented intelligence" -- a computer as your copilot.
That's what we mean when we say AI at my company, Tessel Bio. Our goal at Tessel is to reverse tissue remodeling and inflammatory memory in chronic disease. We prioritize predictive validity: We measure tissue function in patient-derived, "organotypic" cultures to model what's broken in the original organ -- and I mean bona fide biophysical phenotypes like tissue stiffness in the Crohn's intestine and mucus transport in the COPD lung. These sorts of assays aren't super high throughput. No existing foundation model can field our questions. But we can use our "active learning" platform, Tesselogic, to prioritize perturbations and save precious time, money, and material. (By one benchmark, beating a brute-force screen with as little as 3% of the effort.) Simply put, Tesselogic learns from what we've already done to suggest what to test next.
I'm bullish on human-AI hybrids to collect the right data at the right scale. Such approaches have emerged everywhere experiments are costly, from target discovery to small molecule design.
You don't always need to boil the ocean to distill the meaning of life.