Real-time search of all bacterial and viral genomic data - Archive ouverte HAL Access content directly
Journal Articles Nature Biotechnology Year : 2019

Real-time search of all bacterial and viral genomic data

Abstract

Genome sequencing of pathogens is becoming ubiquitous in microbiology, and automated diagnostics will soon appear. The genome sequence archives, already growing exponentially, are currently not searchable for arbitrary sequence. Such an ability would unlock this resource for science and could underpin global real-time genomic epidemiology and surveillance. We combine knowledge about bacterial genetic variation with ideas used in web-search, to build a DNA search engine for microbial data that can grow incrementally. We index the complete corpus of bacterial and viral whole genome sequence data as of December 2016 (447,833 genomes, 176 Terabytes), using four orders of magnitude less storage than previous methods, making the global archive for the first time accessible to search, and scaling to millions of genomes. We demonstrate its usefulness with three applications: ultra-fast search for resistance genes MCR1-3, host-range determination for 2827 plasmids, and quantification of the rise of antibiotic resistance prevalence in the archives.
Fichier principal
Vignette du fichier
NatureBiotech_bigsi_resubmission_ed.pdf (1.48 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02329753 , version 1 (23-10-2019)

Identifiers

Cite

Phelim Bradley, Henk C den Bakker, Eduardo P C Rocha, Gil Mcvean, Zamin Iqbal. Real-time search of all bacterial and viral genomic data. Nature Biotechnology, 2019, 37 (2), pp.152-159. ⟨10.1038/s41587-018-0010-1⟩. ⟨hal-02329753⟩
101 View
233 Download

Altmetric

Share

Gmail Facebook X LinkedIn More