From skeptic to builder: why I changed my mind about AI for biology

October 20, 2025

First published in The Pharma Letter on October 20, 2025

From skeptic to builder: why I changed my mind about AI for biology

By: Albi Celaj, Head of ML Research, Deep Genomics

In 2014, I sat in a room at a “Genetic Networks” conference listening to Brendan Frey present his talk titled Decoding the Genetic Determinants of Human Disease. Brendan presented data from his 2015 paper that AI had “decoded biology”, solved part of the splicing problem and could help to replace many experiments - we just needed to go out and apply it. I remember the optimism, but thought that there’s no way that these efforts had involved the realities of wet-lab work, let alone the more difficult constraints of drug discovery. Part of me wanted to dismiss the talk, but I couldn’t shake the feeling that there was something compelling underneath. More than a decade later, I can definitely say that this moment was part of a broader movement in AI - what sounded like provocation in 2014 was just an early signal.

Before Deep Genomics: a practical skeptic

Before I joined Deep Genomics, I saw machine learning as a rigid and brittle tool that should be used only if you can’t figure out the problem in any other way. I had seen too many times that someone would publish a system that would fall apart, even when used as intended, because of modest changes, like being tested with data from a different lab than the authors’. At the same time, I had plenty of technical problems I couldn’t solve, and generic machine learning systems weren’t a fit.

My perspective first shifted with the release of Keras. It lets someone like myself, uninterested in writing low-level GPU code, to express a neural network at an abstract level, and with best practices, the software would take care of solving it. Its strength was flexibility: I provided it with an abstract system of equations I had wrestled with for three years in graduate school, which I hadn’t been able to optimize to fit my experimental data. The hand-made rules I had made up during many sleepless nights just weren’t cutting it.

Surprisingly, when I gave the problem to Keras - it just worked, within a few minutes. This was the first of several such moments and it convinced me I had to join this field.

Early Deep Genomics (2020): the splicing story and its limits

When I first arrived at DG, I worked with a long successor to Brendan’s 2015 splicing model - DeepScanNext. I hadn’t built it, a small team of Brendan’s former students had. My first job was to ask it to predict the results from a splicing assay in a far away lab, even though the model had only ever seen the human genome and knew nothing about their experiment.

Everything in my prior ML life told me that this prediction task should be impossible and I wasn’t expecting much. Instead, the model’s predictions showed a strong correlation to what the lab measured. I was happily surprised, but long‑time DG folks were unfazed. To them, this was the expected behavior of a model that had actually learned RNA splicing logic.

That period also produced a story that fit the “narrow AI” narrative perfectly: AI found a variant, predicted the splicing mechanism, and helped design the oligo - clean causality, clear mechanism. It’s the first time I really understood why Brendan sounded so confident at that 2014 meeting.

But even a strong hammer makes you go hunting for nails. Splicing effects are relatively rare, and a good narrative isn’t the same as meeting unmet need. I kept learning a frustrating lesson: model performance wasn’t the sticking point. We had world‑class pioneers and models that beat the benchmarks, but the question was how to turn that into drug development programs.

From a great tool to too many tools

To better serve unmet needs, the next phase was moving “beyond splicing” into expression‑increasing oligos. That turned out not to be a simple generalization. Splicing is clean - you know the mechanism, you run the assay, and you roughly know if your drug behaves as you predict. Expression, on the other hand, is complicated and controlled by many mechanisms.

The job of myself and the ML team at the time was to use the tried-and-true AI formula and groundwork to build. I fine‑tuned DeepScanNext to catch retained introns, trained models of RNA‑binding proteins, others built microRNA models, and so on.

Using this approach, we saw the literature benchmarks fall one after another.

It was exciting, but it didn’t scale or give us any drugs. With a small team, we couldn’t maintain all of these models, and decide, program by program, which one to use first. Which of these forty mechanisms actually mattered for this gene? When you can “do everything”, it quickly degenerates into a scramble that is a lot like doing nothing. We had lost the clean orchestration we enjoyed with splicing.

BigRNA: one weekend, a shoestring TPU, and a broader idea

That orchestration pain is what pushed me toward the foundation model we now call BigRNA. Previously, groups at Calico and elsewhere had blended the kinds of convolutional models Brendan’s lab had pioneered in 2014 with transformer architectures to model gene‑scale regulation. But I wanted to take these ideas in a new direction and model RNA sequencing data.

The overall idea was for the model to learn the rules governing gene expression directly, while at the same time providing it with all of the relevant mechanisms. The model would do the hard work of telling you what mechanisms actually matter for controlling expression, and tell you exactly what your drug or mutation is doing at a molecular level.

In late 2021, I wrote a short proposal to prototype the idea. It seemed complicated, but many of the pieces were all laid out at Deep Genomics and elsewhere already, and in my mind, it was straightforward. I spent the weekend reading and hacking together a minimal version using an older TPU (Google’s specialised AI chip) via Google Colab. Google Colab is only meant for ideating rather than any real model training, so I had the delightful constraint of needing to restart every 24 hours. It wasn’t well documented, so I stitched code from several repos, figured out how to restart the run on my phone, and would habitually scroll to see if the loss curve had dropped. After I returned to work on Monday, it was clear that the dropping loss curves meant something - it was already learning to predict RNA expression patterns in a way no other model had done. This was the proof-of-concept that we could collapse many of the narrow tools into a single foundation model.

As BigRNA matured, it did more than help design expression‑increasing oligos. It subsumed a surprising amount of our earlier work, mine included, by directly modeling the RNA regulatory landscape across genes and conditions. In our own platform overview, we describe BigRNA as trained on over a trillion genomic signals and able to predict complex regulatory mechanisms, including the retained introns and RNA binding proteins I had worked on in the past. This made it a natural core for an integrated platform rather than yet another point tool.

What changed in the field (and what hasn’t)

From the outside, it can look as though the RNA foundation model field suddenly “went mainstream”: DeepMind, Calico, and others published related models; talk of “foundation models for RNA biology” proliferated. From the inside, not much has changed about the fundamentals, but now our key focus is scale - bigger and more relevant datasets and bigger models. We’re still training broad models on the right kinds of biological signals, then fine-tuning to the contexts that really matter for drug discovery, like a disease state or specific tissue.

Despite a lot of the work in the field by ourselves and others, a lot remains unsolved. For example, inter‑individual differences remain hard to predict. Ironically, predicting benign variation across healthy people can be much tougher than predicting large disease mutations, because they are the result of many subtle effects rather than a single smoking gun. Also, most of the community’s models still learn from “healthy” data. Pathological states change the rules in ways we’re only starting to capture, in a way that is important for drug discovery.

Beyond one model: platforms, culture, and pace

BigRNA taught us that solving orchestration once, inside the model, is better than doing it many times by hand. But a platform is more than a single foundation model. In our platform, BigRNA is still complemented by specialist models like REPRESS (for miRNA-mRNA and other post-transcriptional regulation) and DeepADAR (for guide‑mediated RNA editing), and we emphasize that performance continues to improve with new data and larger backbones. We continue to update these models and compound our platform, both with our own lab data, and as we discover datasets developed by the community.

Culture matters just as much as code. Keeping biologists and ML researchers in the same conversation makes sure that the models and experiments evolve together. Good AI teams move quickly and can be flexible and chaotic, but good biology teams move at a pace that requires long-term planning and strict deadlines. If you don’t engage both sides, the models don’t get meaningfully used for drug development, and the data doesn’t flow back to the model.

Partnerships matter, too. The best outcomes won’t come from a clever solution that gives you a decreasing loss function on a narrow task, and SaaS models will not work in this field. Model platforms need to be embedded into complex scientific workflows and decision making pipelines - target identification, mechanism selection, molecule design, secondary effects, etc.

While ultimate drug development performance is a lagging indicator of underlying model performance and capability, the leading indicators of model improvement, cross‑task generalization, and closed‑loop cycle time between model and lab, are signals that the field is accelerating.

What I’d tell my 2014 self

Back in that conference room, I wasn’t sure what to make of the AI optimism. Today, I’d tell my younger self three things:

Be picky about problems. Splicing was the right early bet because the data were right and the drug development story was clean. As new data sets mature, new domains become tractable. Keep moving.
Build platforms, not piles. Many great models without orchestration are still a bottleneck. Consolidate capability into foundation models when possible, and learn how to meaningfully use them for the question at hand.
Measure pace, not headlines. Ask how fast the model’s getting better and how quickly a lab result can update a decision. If those numbers look good, the clinic count will take care of itself.

We’re hiring and partnering with those who want to solve real problems, not chase trends. The field has long moved on from its niche Toronto circle into an accelerating and mainstream global effort. We haven’t solved everything yet, but we’re quickly improving - this is the most encouraging signal. If we keep building AI the right way - platforms embedded in real programs, multilingual teams, and partnerships that integrate AI into decisions - I like our chances.

‹ Back to Listing