Skip to Main content Skip to Navigation
Journal articles

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Abstract : de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.
Complete list of metadata
Contributor : Gregory Kucherov Connect in order to contact the contributor
Submitted on : Wednesday, December 9, 2020 - 7:41:53 PM
Last modification on : Friday, April 1, 2022 - 3:54:12 AM
Long-term archiving on: : Wednesday, March 10, 2021 - 8:06:54 PM


Files produced by the author(s)




Karel Brinda, Michael Baym, Gregory Kucherov. Simplitigs as an efficient and scalable representation of de Bruijn graphs. Genome Biology, BioMed Central, 2021, 22 (96), ⟨10.1101/2020.01.12.903443⟩. ⟨hal-03049400⟩



Record views


Files downloads