Skip to Main content Skip to Navigation

Lossless Filter for Long Multiple Repetitions with Edit Distance

Abstract : Identifying local similarity between two or more sequences, or identifying repe- titions occurring at least twice in a sequence, is an essential part in the analysis of biologi- cal sequences and of their phylogenetic relationship. Finding fragments that are conserved among several given sequences, or inside a unique sequence, while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computation- ally expensive task, and consequently exact methods can usually not be applied in practice. The filter we introduce in this paper, called Ed'Nimbus, provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment method, eliminating an important fraction of the input that is guaranteed not to contain any ap- proximate repetition. It consists in the verification of a strong necessary condition. This condition concerns the number and order of exactly repeated words shared by the approx- imate repetitions. The efficiency of the filter is due to this condition, that we show how to check in a fast way. The speed of the method is achieved thanks also to the use of a simple and efficient data structure, that we describe in this paper, as well as its linear time and space construction. Our results show that using Ed'Nimbus allows us to sensibly reduce the time spent in a local multiple alignment. The Ed'Nimbus software is freely available on the web:∼peterlon/officiel/ednimbus/
Complete list of metadata

Cited literature [32 references]  Display  Hide  Download
Contributor : Pierre Peterlongo Connect in order to contact the contributor
Submitted on : Thursday, September 29, 2011 - 4:28:21 PM
Last modification on : Saturday, January 15, 2022 - 3:56:20 AM
Long-term archiving on: : Friday, December 30, 2011 - 2:30:43 AM


Files produced by the author(s)


  • HAL Id : hal-00627831, version 1



Pierre Peterlongo, Nadia Pisanti, Alair Peirera Do Lago, Marie-France Sagot. Lossless Filter for Long Multiple Repetitions with Edit Distance. 2006. ⟨hal-00627831⟩



Record views


Files downloads