By accessing Google Cloud Platform through Google Genomics, researchers at the National Institute on Aging can more securely store, process, explore, and share large biological datasets.
The National Institute on Aging works with the International Parkinson’s Disease Genomics Consortium, a broad collaboration of scientists striving to characterize molecular changes associated with the debilitating disease. A recent study involved compiling information from thousands of exomes—or the DNA sequence of all transcribed regions in an individual’s genome—from data generated at various research institutes on different sequencing platforms over a period of several years.
To make real scientific discoveries possible from so many sources of data, the data had to be reanalyzed for consistency. To reduce the possibility of technical artifacts, scientists had to perform realignment, recalibration, and re-genotyping of the exomes. But there was a problem: none of the consortium members had enough local computational resources to process all 6,500 exomes.
The team decided to use Google Genomics, a fully managed service on Google Cloud Platform. Scientist Mike Nalls ran Broad Institute’s GATK Best Practices pipeline using Google Genomics, processing the full 200TB set of 6,500 exomes—starting with raw, unaligned sequence data and leading to a set of variant calls—in just three and a half weeks. The dataset was subsequently used to identify six new risk loci for Parkinson’s disease, helping scientists better understand genetic risks for the disease.
“Cloud computing allowed us to speed up discovery,” says Mike Nalls, PhD, Scientist at National Institute on Aging. “We collaborated with Google Genomics to test varying implementations of the standard processing pipeline for exome sequence data on the cohort and population scale.”
Mike could have run the analysis even faster, but opted to limit the number of virtual machines and disks to take advantage of sustained use discounts and reduce costs. Even if hardware could have been procured, the effort would have taken months of compute time using local infrastructure. With Google Genomics on Google Cloud Platform, the National Institute on Aging can now analyze massive datasets, giving scientists access to virtually unlimited compute resources for large-scale projects.
To learn more about how cloud computing allows new discoveries in weeks versus months, download the paper "National Institute on Aging: Accelerating the fight against Parkinson’s Disease."