Skip to Main content Skip to Navigation
Journal articles

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Abstract : de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.
Complete list of metadata

https://hal-upec-upem.archives-ouvertes.fr/hal-03049400
Contributor : Gregory Kucherov Connect in order to contact the contributor
Submitted on : Wednesday, December 9, 2020 - 7:41:53 PM
Last modification on : Friday, April 1, 2022 - 3:54:12 AM
Long-term archiving on: : Wednesday, March 10, 2021 - 8:06:54 PM

File

2020.01.12.903443v3.full.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Karel Brinda, Michael Baym, Gregory Kucherov. Simplitigs as an efficient and scalable representation of de Bruijn graphs. Genome Biology, BioMed Central, 2021, 22 (96), ⟨10.1101/2020.01.12.903443⟩. ⟨hal-03049400⟩

Share

Metrics

Record views

28

Files downloads

29