Deep learning meets genome biology

Featured in O'Reilly, April 27, 2016

Genome biology, as a field, is generating torrents of data. You will soon be able to sequence your genome using a cell-phone size device for less than a trip to the corner store. And yet, the genome is only part of the story: there exists huge amounts of data that describe cells and tissues. We, as humans, can’t quite grasp all this data: we don’t yet know enough biology. Machine learning can help solve the problem.

At the same time, others in the machine learning community recognize this need. At last year’s premier conference on machine learning, four panelists—Yann LeCun, director of AI at Facebook; Demis Hassabis, co-founder of DeepMind; Neil Lawrence, professor at the University of Sheffield; and Kevin Murphy from Google—identified medicine as the next frontier for deep learning.

To succeed, we need to bridge the “genotype-phenotype divide.” Genomic and phenotype data abound. Unfortunately, the state-of-the-art in meaningfully connecting these data results in a slow, expensive, and inaccurate process of literature searches and detailed wetlab experiments. To close the loop, we need systems that can determine intermediate phenotypes called “molecular phenotypes,” which function as stepping stones from genotype to disease phenotype. For this, machine learning is indispensable.