Fast practical multi-pattern matching

Abstract : The multi-pattern matching problem consists in finding all occurrences of the patterns from a finite set X in a given text T of length n. We present a new and simple algorithm combining the ideas of the Aho-Corasick algorithm and the directed acyclic word graphs. The algorithm has time complexity which is linear in the worst case (it makes at most 2n symbol comparisons) and has good average-case time complexity assuming the shortest pattern is sufficiently long. Denote the length of the shortest pattern by m, and the total length of all patterns by M. Assume that M is polynomial with respect to m, the alphabet contains at least 2 symbols and the text (in which the pattern is to be found) is random, for each position each letter occurs independently with the same probability. Then the average number of comparisons is O((n/m)log m), which matches the lower bound of the problem. For sufficiently large values of m the algorithm has a good behavior in practice.
Document type :
Journal articles
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-00619554
Contributor : Maxime Crochemore <>
Submitted on : Tuesday, March 19, 2013 - 5:14:07 PM
Last modification on : Thursday, May 9, 2019 - 8:02:09 PM
Long-term archiving on : Thursday, June 20, 2013 - 4:20:44 PM

File

ipl3.pdf
Files produced by the author(s)

Identifiers

Citation

Maxime Crochemore, Artur Czumaj, Leszek Gąsieniec, Thierry Lecroq, Wojciech Plandowski, et al.. Fast practical multi-pattern matching. Information Processing Letters, Elsevier, 1999, 71 (3-4), pp.107-113. ⟨10.1016/S0020-0190(99)00092-7⟩. ⟨hal-00619554⟩

Share

Metrics

Record views

368

Files downloads

262