Building A World Class Genetics Center Based On Data Scalability
- by 7wData
The ability to accelerate drug discovery is necessary. I recently spoke with Jeffrey Reid, Head of genomics and Data Engineering for Regeneron. Reid works in the Regeneron genetics Center (RGC), a research initiative that seeks to improve patient care by using genomic approaches to speed drug discovery and development. The genetics center is a unit of Regeneron (NASDAQ: REGN), a leading biotechnology company that has been at the forefront of drug discovery for 3 decades. The firm’s focus on translating science into medicine has led to seven FDA-approved treatments. The Regeneron Genetics Center is engaged in one of the largest genetics sequencing efforts in the world.
Reid describes his role as existing at the intersection of science and data, noting that he is responsible for “taking raw data and turning it into usable facts about genomes”. His role in data engineering entails the deployment of algorithms that enable drug development. As part of building a large genetic sequencing center, Reid works with more than 80 industry and academic research partners to combine genetics data with electronic health record (EHR) data to understand how genetics impact health.
To enable these drug discovery efforts, Reid and his team have deployed the Databricks technology platform to help mine genomic data at scale. Reid remarks, “We bring to bear a lot of robotics in the lab and analysis automation”. He emphasizes the urgency of operating at scale, given the billions of combinations of genotypes and phenotypes that can be mined for drug development insights. “We need to identify every possible association between each genotype and phenotype. This requires us to analyze billions of cells of information”, says Reid.
Databricks provides Regeneron with a scalable solution for mining these vast amounts of data. Reid notes that in the past, there was no scalable approach to managing volumes of data this large, and research companies were dependent upon home-built solutions based on antiquated approaches and technologies. According to Reid, Databricks delivers an enterprise platform that operates on the FAIR data principles of making data “findable, accessible, interoperable, and reusable” and helps drive scientific insights. Reid describes a technology environment at Regeneron characterized by what he describes as “tune up, deploy, tear down clusters” that support collaborative research initiatives such as Project Glow, an open-source toolkit for large-scale genomic analysis that was jointly created by the Regeneron Genetics Center and Databricks.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More