Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Abstract : De Bruijn graphs play an essential role in computational biology. However, despite their widespread use, they lack a universal scalable representation suitable for different types of genomic data sets. Here, we introduce simplitigs as a compact, efficient and scalable representation and present a fast algorithm for their computation. On examples of several model organisms and two bacterial pan-genomes, we show that, compared to the best existing representation, simplitigs provide a substantial improvement in the cumulative sequence length and their number, especially for graphs with many branching nodes. We demonstrate that this improvement is amplified with more data available. Combined with the commonly used Burrows-Wheeler Transform index of genomic sequences, simplitigs substantially reduce both memory and index loading and query times, as illustrated with large-scale examples of GenBank bacterial pan-genomes.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal-upec-upem.archives-ouvertes.fr/hal-03049400
Contributor : Gregory Kucherov Connect in order to contact the contributor
Submitted on : Wednesday, December 9, 2020 - 7:41:53 PM
Last modification on : Monday, December 14, 2020 - 8:39:32 AM
Long-term archiving on: : Wednesday, March 10, 2021 - 8:06:54 PM

File

2020.01.12.903443v3.full.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Karel Brinda, Michael Baym, Gregory Kucherov. Simplitigs as an efficient and scalable representation of de Bruijn graphs. 2020. ⟨hal-03049400⟩

Share

Metrics

Record views

33

Files downloads

29