A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices

Abstract : The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n²) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global alignment computations.The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n²/log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn²/log n) where h ≤ 1 is the entropy of the text.
Document type :
Conference papers
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-00619996
Contributor : Maxime Crochemore <>
Submitted on : Wednesday, March 20, 2013 - 6:43:25 PM
Last modification on : Wednesday, April 11, 2018 - 12:12:02 PM
Long-term archiving on: Friday, June 21, 2013 - 3:05:09 AM

File

RECDOC02-SODA-TR.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00619996, version 1

Collections

Citation

Maxime Crochemore, Gad M. Landau, Michal Ziv-Ukelson. A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices. Proceedings of the Thirteen Annual ACM-SIAM Symposium on Discrete Algorithms, 2002, United States. pp.679-688. ⟨hal-00619996⟩

Share

Metrics

Record views

268

Files downloads

42