Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase
10 Pages, 1 Figure, 3 Tables. Supplementary information: http://dx.doi.org/10.1038/s41598-018-33731-1 ; Drug-resistant tuberculosis poses a persistent public health threat. The ReSeqTB platform is a collaborative, curated knowledgebase, designed to standardize and aggregate global Mycobacterium tuberculosis complex (MTBC) variant data from whole genome sequencing (WGS) with phenotypic drug susceptibility testing (DST) and clinical data. We developed a unified analysis variant pipeline (UVP) ( https://github.com/CPTR-ReSeqTB/UVP ) to identify variants and assign lineage from MTBC sequence data. Stringent thresholds and quality control measures were incorporated in this open source tool. The pipeline was validated using a well-characterized dataset of 90 diverse MTBC isolates with conventional DST and DNA Sanger sequencing data. The UVP exhibited 98.9% agreement with the variants identified using Sanger sequencing and was 100% concordant with conventional methods of assigning lineage. We analyzed 4636 publicly available MTBC isolates in the ReSeqTB platform representing all seven major MTBC lineages. The variants detected have an above 94% accuracy of predicting drug based on the accompanying DST results in the platform. The aggregation of variants over time in the platform will establish confidence-graded mutations statistically associated with phenotypic drug resistance. These tools serve as critical reference standards for future molecular diagnostic assay developers, researchers, public health agencies and clinicians working towards the control of drug-resistant tuberculosis. ; This study was supported by the Bill & Melinda Gates Foundation under grant agreement OPP1115887 to C-Path for developing the ReSeqTB drug resistance data sharing platform and under grant agreement FIND OPP1115209 to address how to score mutations in the ReSeqTB data sharing platform initiative. The South African MRC and the EDCTP support K. Dheda. I. Comas is supported by the Ministerio de Economía y Competitividad (Spanish Government) research grant SAF2016-77346-R and the European Research Council (ERC) (638553-TB-ACCELERATE). L. Chindelevitch acknowledges support by NSERC, Genome Canada, and the Sloan Foundation. Use of trade names is for identification only and does not constitute endorsement by the US Department of Health and Human Services, the US Public Health Service, or the Centers for Disease Control and Prevention. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency. ; Peer reviewed