Big Data Tamps Down HIV Outbreaks

2 min read

One of the best ways to prevent the spread of HIV is to treat those at high risk with a daily prophylactic pill. Unfortunately, this week Stanford University health researchersconcluded that it’s simply too expensive to pre-treat even a fraction of people at increased risk for HIV.

But what if healthcare providers could track a brewing outbreak in real-time, and quickly help those at highest risk of infection? Thanks to big data and crackerjack new software, Canada’s westernmost province is doing just that.

In June 2014, a monitoring system operated by the British Columbia Centre for Excellence in HIV/AIDS (BC-CfE) detected a cluster of 11 new HIV cases in a town just outside Vancouver. The system, designed by bioinformatician Art Poon, analyzes massive amounts of HIV genetic data to detect outbreaks.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

Such data is surprisingly easy to come by. In many developed countries, it is now routine for a doctor to sequence viral DNA from the blood of a HIV-positive patient. By doing so, the physician can identify which drugs, if any, the virus is resistant to and prescribe an optimal treatment.

In Canada, that DNA sequence data is regularly uploaded to BC-CfE’s secure Oracle database, home to 30,000+ anonymized HIV genotypes. Every time new sequences are added—which happens almost every day—it triggers the entire database to be downloaded to a secure workstation, where Poon’s software works its magic. During the download, all patient information is de-identified. “The system is designed to maintain patient privacy,” says Poon.

Once the download is complete, the software analyzes the de-identified DNA and demographic information to determine where new infections have popped up, if they carry drug-resistance mutations, and how they are related. HIV evolves very quickly, so if sequences from different infections are genetically similar, those infections are almost surely related by one or more recent transmissions.

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.