Regular expression constrained sequence alignment revisited

Gregory Kucherov; Tamar Pinhas; Michal Ziv-Ukelson

doi:10.1089/cmb.2010.0291

Article Dans Une Revue Journal of Computational Biology Année : 2011

Regular expression constrained sequence alignment revisited

(1) , (2) , (2)

1
2

Gregory Kucherov

Fonction : Auteur
PersonId : 14903
IdHAL : gregory-kucherov
ORCID : 0000-0001-5899-5424
IdRef : 093602189

Laboratoire d'Informatique Fondamentale de Lille

Tamar Pinhas

Fonction : Auteur

Department of Computer Science [Beer-Sheva]

Michal Ziv-Ukelson

Fonction : Auteur

Department of Computer Science [Beer-Sheva]

Résumé

Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n^2t^4) time and O(n^2t^2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n^2t^3) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n^2t^3/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Algorithme et structure de données [cs.DS] Théorie et langage formel [cs.FL] Traitement du texte et du document

Fichier principal

CMB-2010-0291-Kucherov_1P.pdf (381.06 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Gregory Kucherov : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00790008

Soumis le : mardi 19 février 2013-11:23:31

Dernière modification le : dimanche 14 janvier 2024-11:56:04

Archivage à long terme le : lundi 20 mai 2013-04:00:47

Dates et versions

hal-00790008 , version 1 (19-02-2013)

Identifiants

HAL Id : hal-00790008 , version 1
DOI : 10.1089/cmb.2010.0291

Citer

Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson. Regular expression constrained sequence alignment revisited. Journal of Computational Biology, 2011, 18 (5), pp.771-781. ⟨10.1089/cmb.2010.0291⟩. ⟨hal-00790008⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UNIV-LILLE3 CNRS INRIA UNIV-MLV LIGM_ALGO LIFL PARISTECH LIGM

309 Consultations

639 Téléchargements

Regular expression constrained sequence alignment revisited

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager