Whole genome sequencing holds great potential for enriching diagnoses and understanding hereditary risk factors for specific diseases. However, the sheer volume of data involved poses major technical challenges, which limits the utility of this approach. For this reason many clinical geneticists have turned to exome sequencing which looks at a small portion of the genome that codes for proteins.
A team from the University of Chicago have managed to turn the spotlight back on whole genome sequencing by analyzing 240 full genomes in two days by recruiting the computational muscle of Beagle, one of the world’s fastest supercomputers. Beagle is a Cray XE6 supercomputer at the Argonne National Laboratory outside Chicago, and is used for computation, simulation, and data analysis for the biomedical research community.
The architecture of the Beagle is such that it allows highly efficient and rapid processing of parallel data streams. To give you some idea of just how powerful the Beagle is, the researchers estimate that the equivalent task carried out by a single 2.1 GHz CPU would take approximately 47.2 years to complete.
According to one of the lead investigators, Professor Elizabeth McNally:
Improving analysis through both speed and accuracy reduces the price per genome, with this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is get the cost of analysis down into that range.
The team have published their results, in great technical depth, in the journal Bioinformatics and while we won’t see this kind of technology in clinics anytime soon, it should certainly enhance the pace and clinical utility of whole genome sequencing.
Journal of Bioinformatics: Supercomputing for the parallelization of whole genome analysis
Press release: Whole Genome Analysis, STAT