Researchers have developed an ultra-fast software that slashes the time it takes to search a person’s genome for disease-causing variations from weeks to hours.
"Now, even the smallest research groups can complete genomic sequencing in a matter of days. However, once you’ve generated all that data, that’s the point where many groups hit a wall.
"After a genome is sequenced, scientists are left with billions of data points to analyse before any truly useful information can be gleaned for use in research and clinical settings," White said.
To overcome the challenges of analysing that large amount of data, White and his team developed a computational pipeline called "Churchill."
By using novel computational techniques, Churchill allows efficient analysis of a whole genome sample in as little as 90 minutes.
The output of Churchill was validated using National Institute of Standards and Technology (NIST) benchmarks.
In comparison with other computational pipelines, Churchill was shown to have the highest sensitivity at 99.7 per cent; highest accuracy at 99.99 per cent and the highest overall diagnostic effectiveness at 99.66 per cent.
To demonstrate Churchill’s capability to perform population scale analysis, White and his team received an award from Amazon Web Services (AWS) in Education Research Grants programme that enabled them to successfully analyse phase 1 of the raw data generated by the 1000 Genomes Project.
The project is an international collaboration to produce an extensive public catalogue of human genetic variation, representing multiple populations from around the globe.
Using cloud-computing resources from AWS, Churchill was able to complete analysis of 1,088 whole genome samples in seven days and identified millions of new genetics variants.
The Churchill algorithm was licensed to Columbus-based GenomeNext LLC, which has built upon the Churchill technology to develop a secure and automated software-as-a-service platform.
It enables users to simply upload raw whole-genome, exome or targeted panel sequence data to the GenomeNext system and run an analysis that not only identifies genetic variants but also generates fully annotated datasets enabling filtering and identification of pathogenic variants.