Biological Foundation Model Platform

The BioFM Platform

Biological Foundation Models (BioFMs) require coherent, robust platforms so their power can be leveraged in key drug discovery processes. Deep Genomics’ BioFM Platform has the ability to generate differentiated predictions pertinent to molecular design and target biology discovery - reliably, reproducibly and at scale. As a result, scientists can rely on our Platform to get work done fast and in a way that’s reproducible and correct.

Our Platform hinges on lab-in-the-loop workflows - as we design and generate datasets that are fit to the purpose of training models from scratch or fine-tuning models for specific applications, such as in the context of disease progression or off-target liabilities. Our lab-in-the-loop workflows incorporate the validation of predictive capabilities using orthogonal assays, including functional assays, and feed results back in for model training. This cycle allows Deep Genomics to rapidly advance on a range of drug discovery problems, both internally and in support of major pharma partners.

Engineering

Underpinning everything in our Platform is a commitment to robust information and software systems - they enable scalable workflow-integrated AI, and reliability, reproducibility and correctness, ensuring we advance molecules with the best chances of helping patients.

Engineering and Architecture visualization 3

Our Biological Foundation Models

BigRNA header

BigRNA is a proprietary, powerful BioFM that was trained on over a trillion signals derived from high-throughput sequencing datasets and has distilled the core principles of RNA biology. This allows it to serve as a versatile model for numerous downstream applications, such as predicting the effects of genetic variants, fine-tuning for predicting on- and off-target effects of genetic medicines, and generating rich sequence embeddings for other models used in target biology discovery and molecular design. Our BigRNA workflows include Weights and Biases integration, sophisticated model evaluation frameworks and provide meaningful scalar outputs while enabling fine tuning on new datasets.

BigRNA Figure 1a

Since its initial development in 2021, BigRNA has evolved significantly. Key enhancements include a shift to higher prediction resolution (from 128-bp to 1-bp), the integration of new training datasets (e.g. relevant to disease contexts), and architectural improvements enabling higher predictive power and efficiency.

REPRESS is a deep learning foundation model that predicts cell-type-specific microRNA (miRNA) binding and mRNA degradation directly from RNA sequence. It has been shown to reveal biology missed by other state-of-the-art methods, such as identifying repressive non-canonical miRNA target sites and decoding the regulatory effects of sequence context and miRNA binding site multiplicity.

REPRESS outperforms other advanced methods and neural architectures on a comprehensive suite of orthogonal tasks, including identifying genetic variants that affect miRNA binding, predicting out-of-distribution data from massively parallel reporter assays, and predicting canonical and non-canonical miRNA mediated repression.

REPRESS figure

DeepADAR can design guide RNAs (gRNAs) to induce ADAR-mediated editing across various trinucleotide contexts. The base DeepADAR model is trained to predict endogenous ADAR editing based on local sequence and structure around candidate editing sites. By observing endogenous editing at 16 million target sites, as well as a multitude of other sites where editing does not occur, the base DeepADAR learned the subtle sequence and structural features that direct ADAR to edit specific sites. Using screening data from a custom dataset of synthetic gRNAs, we fine-tuned this base model to predict gRNA-driven editing, based on sequence and structural features created by gRNAs.

DeepADAR figure

Our Experimental Capabilities

Deep Genomics views data generation and experimental validation as crucial components of our BioFM Platform. To develop fit-for-purpose training and validation datasets plus datasets for therapeutic programs, we operate two experimental facilities, with over 10,000 square feet of lab space between Toronto, Ontario and Cambridge, Massachusetts. In both locations, our experimental scientists work hand in hand with their machine learning and computational counterparts to design experiments and workflows that accelerate our mission.

lab peek

Publications

Our scientists regularly release their work to benefit the field. Here is a selection of our contributions.

A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction

Bohn et al., Front. Mol. Biosci., September 2023.

The BioFM Platform

Engineering

Our Biological Foundation Models

Our Experimental Capabilities

Publications

Enigma: An Efficient Model for Deciphering Regulatory Genomics

FlashRNA: An Efficient Model for Regulatory Genomics

Sequence based prediction of cell type specific microRNA binding and mRNA degradation for therapeutic discovery

Assessing Hybridization-Dependent Off-Target Risk for Therapeutic Oligonucleotides: Updated Industry Recommendations

An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics

A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction

Transcriptome-wide off-target effects of steric-blocking oligonucleotides

ATP7B Met645Arg causes Wilson disease by promoting exon 6 skipping

Deep learning in biomedicine

COSSMO: Predicting competitive alternative splice site selection using deep learning

Inference of the human polyadenylation code

Efficient in vivo correction of a splicing defect using an HDR-independent mechanism

Genome-wide characteristics of de novo mutations in autism

Automated analysis of high‐content microscopy data with deep learning

Machine learning in genomic medicine: A review of computational problems and data sets

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

The human splicing code reveals new insights into the genetic determinants of disease

Find Us

Follow Us

Contact Us