UK scientists designed world’s most sophisticated COVID-19 sequencing system - here’s how they did it

New bioinformatics software and cloud computing approaches developed at the University of Birmingham, have enabled the UK’s COVID-19 genome sequencing effort to be the most sophisticated in the world.

The system, called CLIMB-COVID was designed for the COVID-19 Genomics UK (COG-UK) consortium, set up in March 2020 to tackle the huge challenge of rapidly sequencing SARS-CoV-2 genomes. In a new paper, published this month in Genome Biology , the research team discussed the approach they took.

The first version of CLIMB-COVID was designed and built by researchers at the University of Birmingham and Cardiff University in under a month and has been instrumental in processing the sequencing data of more than 675,000 coronavirus genomes, including identifying and tracking the Alpha and Delta variants that became dominant in the UK. CLIMB-COVID also integrates new software from collaborators at the University of Edinburgh and the Centre for Genomic Pathogen Surveillance.

CLIMB-COVID enables a distributed sequencing system, harnessing sequencing capability from universities, academic institutes and the UK’s four public health agencies. The CLIMB-COVID software and database infrastructure was able to receive all this data, process it and help analyse it into interpretable outputs for public health analysts.

All the data from the project has been integrated and shared in real-time. Not only has this enabled the UK’s public health agencies to work together more easily, but enabling seamless access and collaboration with academics, also enabled the early detection and evaluation of new variants of the virus.

Understanding viral evolution is important for understanding how the virus is spreading in local, national and international settings. It provides valuable epidemiological information revealing the chains of transmission that must be stopped in order to stop outbreaks.

Dr Samuel Nicholls, lead author on the paper, says: "Building this kind of decentralised sequencing system has not been possible before now, because the software infrastructure has not been available. By designing that system, we have shown how genetic sequencing can be used as a vital tool in any public health response."

The COG-UK consortium benefited particularly from cloud computing resources established as part of the CLIMB-BIG-DATA project in which the University of Birmingham and Cardiff University has played a pivotal role, as well as the University of Birmingham’s BEAR high performance computing infrastructure which provided additional capacity. This cloud infrastructure provides the computing and storage capacity required to analyse the large genome datasets produced by the consortium, as well as facilitating national and international research capabilities.

Dr Nicholls adds: "The CLIMB-COVID system is open source. That means anyone in the world can access our computer code and all genomic data, and can see how we work. We have never seen such a co-ordinated, sustained effort to generate real-time genomic surveillance data at this scale and pace and this is why the UK is world-leading in the genomic sequencing of SARS-CoV-2."