Our scientific and technological roadmap is to build an integrated computational system that can learn, predict and interpret how genetic variation, whether natural or therapeutic, alters crucial cellular processes. These processes include transcription, splicing, polyadenylation and translation, and their alteration can lead to disease or effective therapies.

The technology developed at Deep Genomics is based on machine learning, a powerful and practical form of artificial intelligence. We develop new machine learning methods that can find patterns in massive datasets and infer computer models of how cells read the genome and generate biomolecules. In this way, our unique technology provides a causal interpretation for genetic variation, not just the correlative information given by industry standard techniques. We can even generate networks of known and unknown variants based on how they affect the same cellular processes, something that was not previously possible. Any variant. Any disease.

Our approach opens the door to a wide range of new techniques for classifying, prioritizing, interpreting and linking genetic variants, whether natural or therapeutic.




  • Efficient in vivo correction of a splicing defect using an HDR-independent mechanism
    Kemaladewi D., Maino E., Hyatt E., Hou H., Ding M., Place K., Zhu X., Bassi P., Baghestani Z., Deshwar A., Merico D., Xiong H., Frey B., Wilson M., Ivakine E., Cohn R.
    Under review (Apr 2017)
  • Inference of the Human Polyadenylation Code (Journal
    Michael KK Leung, Andrew Delong, Brendan J Frey
    International Conference on Research in Computational Molecular Biology (RECOMB) 2017.

  • The human splicing code reveals new insights into the genetic determinants of disease (Journal) (PubMed)
    Hui Y Xiong, Babak Alipanahi, Leo J Lee, Hannes Bretschneider, Daniele Merico, Ryan KC Yuen, Yimin Hua, Serge Gueroussov, Hamed S. Najafabadi, Timothy R Hughes, Quaid Morris, Yoseph Barash, Adrian R Krainer, Nebojsa Jojic, Stephen W Scherer, Benjamin J Blencowe, Brendan J Frey
    Science Express, doi:10.1126/science.1254806, December 2014.
    Science, Vol. 347, No. 6218, January 2015.
    The original genome-wide human splicing index, SPIDEX, is available from Annovar.

  • Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (Journal)
    Babak Alipanahi, Andrew Delong, Matthew T Weirauch, Brendan J Frey
    Nature Biotechnology, doi:10.1038/nbt.3300, August 2015.
  • Deep learning of the tissue-regulated splicing code (PDF) (Journal)
    Michael KK Leung, Hui Y Xiong, Leo J Lee, Brendan J Frey
    Proceedings of the 22nd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), June 2014.
    Bioinformatics, Vol 30, No. 12, i121-i129.

  • Machine learning in genomic medicine: a review of computational problems and data sets (Journal)
    Michael KK Leung, Andrew Delong, Babak Alipanahi, Brendan J Frey
    Proceedings of the IEEE, Vol 104, No. 1, January 2016.

  • Genome-wide characteristics of de novo mutations in autism (Journal)
    Ryan KC Yuen, Daniele Merico, Hongzhi Cao, Giovanna Pellecchia, Babak Alipanahi, Bhooma Thiruvahindrapuram, Xin Tong, Yuhui Sun, Dandan Cao, Tao Zhang, Xueli Wu, Xin Jin, Ze Zhou, Xiaomin Liu, Thomas Nalpathamkalam, Susan Walker, Jennifer L Howe, Zhuozhi Wang, Jeffrey R MacDonald, Ada JS Chan, Lia D’Abate, Eric Deneault, Michelle T Siu, Kristiina Tammimies, Mohammed Uddin, Mehdi Zarrei, Mingbang Wang, Yingrui Li, Jun Wang, Jian Wang, Huanming Yang, Matt Bookman, Jonathan Bingham, Samuel S Gross, Dion Loy, Mathew Pletcher, Christian R Marshall, Evdokia Anagnostou, Lonnie Zwaigenbaum, Rosanna Weksberg, Bridget A Fernandez, Wendy Roberts, Peter Szatmari, David Glazer, Brendan J Frey, Robert H Ring, Xun Xu & Stephen W Scherer
    NPJ Genome Medicine, doi:10.1038/npjgenmed.2016.27, August 2016.