Boyer-Moore strategy to efficient approximate string matching

Abstract : We propose a simple but e cient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size w, that is, m(⌈log2(k+1)⌉+1 )≤w. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2+(k+4)/(m-k)).
Document type :
Conference papers
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-00620020
Contributor : Maxime Crochemore <>
Submitted on : Tuesday, March 26, 2013 - 11:24:55 PM
Last modification on : Wednesday, April 11, 2018 - 12:12:02 PM
Long-term archiving on : Thursday, June 27, 2013 - 2:45:13 AM

File

cpm96.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00620020, version 1

Collections

Citation

Nadia El Mabrouk, Maxime Crochemore. Boyer-Moore strategy to efficient approximate string matching. Combinatorial Pattern Matching (Labuna Beach, California, 1996), 1996, France. pp.24-38. ⟨hal-00620020⟩

Share

Metrics

Record views

211

Files downloads

498