The BioFM Platform

Biological Foundation Models (BioFMs) require coherent, robust platforms so their power can be leveraged in key drug discovery processes. Deep Genomics’ BioFM Platform has the ability to generate differentiated predictions pertinent to molecular design and target biology discovery - reliably, reproducibly and at scale. As a result, scientists can rely on our platform to get work done fast and in a way that’s reproducible and correct.

Our platform hinges on lab-in-the-loop workflows - as we design and generate datasets that are fit to the purpose of training models from scratch or fine-tuning models for specific applications, such as in the context of disease progression or off-target liabilities. Our lab-in-the-loop workflows include validating predictive capabilities using orthogonal assays, including functional assays, and also feeding results back in, to train models. This cycle allows Deep Genomics to rapidly advance on a range of drug discovery problems, both internally and in support of major pharma partners.

Engineering

Underpinning everything in our Platform is a commitment to robust information and software systems - they enable scalable workflow-integrated AI, and reliability, reproducibility and correctness, ensuring we advance molecules with the best chances of helping patients.


Engineering and Architecture visualization 3


Our Biological Foundation Models

BigRNA header

BigRNA is a proprietary, powerful BioFM that was trained on over a trillion signals derived from high-throughput sequencing datasets and has distilled the core principles of RNA biology. This allows it to serve as a versatile platform for numerous downstream applications, such as predicting the effects of genetic variants, fine-tuning for predicting on- and off-target effects of genetic medicines, and generating rich sequence embeddings for other models used in target biology discovery and molecular design. Our BigRNA workflows include Weights and Biases integration, sophisticated model evaluation frameworks and provide meaningful scalar outputs while enabling fine tuning on new datasets.  

BigRNA Figure 1a

Since its initial development in 2021, BigRNA has evolved significantly. Key enhancements include a shift to higher prediction resolution (from 128-bp to 1-bp), the integration of new training datasets (e.g. relevant to disease contexts), and architectural improvements enabling higher predictive power and efficiency.


REPRESS Header

REPRESS, is a deep learning foundation model that predicts cell-type-specific microRNA binding and mRNA degradation directly from RNA sequence. It has been shown to reveal biology missed by other state-of-the-art methods, such as identifying repressive non-canonical miRNA target sites and decoding the regulatory effects of sequence context and miRNA binding site multiplicity


REPRESS, outperforms other advanced methods and neural architectures on a comprehensive suite of orthogonal tasks, including identifying genetic variants that affect microRNA binding, predicting out-of-distribution data from massively parallel reporter assays, and predicting canonical and non-canonical miRNA mediated repression.

REPRESS figure


DeepADAR Header

DeepADAR can design guide RNAs (gRNAs) to induce ADAR-mediated editing across various trinucleotide contexts. The base DeepADAR model is trained to predict endogenous ADAR editing based on local sequence and structure around candidate editing sites. By observing endogenous editing at 16 million target sites, as well as a multitude of other sites where editing does not occur, the base DeepADAR learned the subtle sequence and structural features that direct ADAR to edit specific sites. Using screening data from a custom  dataset of synthetic gRNAs, we fine-tuned this base model to predict gRNA-driven editing, based on sequence and structural features created by gRNAs.

DeepADAR figure

Our Experimental Capabilities

Deep Genomics views data generation and experimental validation as crucial components of our BioFM Platform. To develop fit-for-purpose training and validation datasets plus datasets for therapeutic programs, we operate two experimental facilities, with over 10,000 square feet of lab space between Toronto, Ontario and Cambridge, Massachusetts. In both locations, our experimental scientists work hand in hand with their machine learning and computational counterparts to design experiments and workflows that accelerate our mission.

lab peek


Publications

Our scientists regularly release their work to benefit the field. Here is a selection of our contributions.

FlashRNA: An Efficient Model for Regulatory Genomics

Jung et al., BioRxiv, October 2025.

Transcriptome-wide off-target effects of steric-blocking oligonucleotides

Holgersen et al., Nucleic Acid Ther., December 2021.

ATP7B Met645Arg causes Wilson disease by promoting exon 6 skipping

Merico et al., npj Genomic Medicine, April 2020.

Deep learning in biomedicine

Wainberg et al. Nature Biotechnology, September 2018.

Inference of the human polyadenylation code

Leung et al. RECOMB, April 2017.

Genome-wide characteristics of de novo mutations in autism

Yuen et al. NPJ Genome Medicine, August 2016.

Automated analysis of high‐content microscopy data with deep learning

Kraus et al. Molecular Systems Biology, April 2017.