Lossless Filter for Long Multiple Repetitions with Edit Distance - Archive ouverte HAL Accéder directement au contenu
Rapport Année : 2006

Lossless Filter for Long Multiple Repetitions with Edit Distance

Nadia Pisanti
  • Fonction : Auteur
Alair Peirera Do Lago
  • Fonction : Auteur
Marie-France Sagot

Résumé

Identifying local similarity between two or more sequences, or identifying repe- titions occurring at least twice in a sequence, is an essential part in the analysis of biologi- cal sequences and of their phylogenetic relationship. Finding fragments that are conserved among several given sequences, or inside a unique sequence, while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computation- ally expensive task, and consequently exact methods can usually not be applied in practice. The filter we introduce in this paper, called Ed'Nimbus, provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment method, eliminating an important fraction of the input that is guaranteed not to contain any ap- proximate repetition. It consists in the verification of a strong necessary condition. This condition concerns the number and order of exactly repeated words shared by the approx- imate repetitions. The efficiency of the filter is due to this condition, that we show how to check in a fast way. The speed of the method is achieved thanks also to the use of a simple and efficient data structure, that we describe in this paper, as well as its linear time and space construction. Our results show that using Ed'Nimbus allows us to sensibly reduce the time spent in a local multiple alignment. The Ed'Nimbus software is freely available on the web: http://igm.univ-mlv.fr/∼peterlon/officiel/ednimbus/
Fichier principal
Vignette du fichier
Hal.pdf (284.17 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00627831 , version 1 (29-09-2011)

Identifiants

  • HAL Id : hal-00627831 , version 1

Citer

Pierre Peterlongo, Nadia Pisanti, Alair Peirera Do Lago, Marie-France Sagot. Lossless Filter for Long Multiple Repetitions with Edit Distance. 2006. ⟨hal-00627831⟩
115 Consultations
95 Téléchargements

Partager

Gmail Facebook X LinkedIn More