A Basis for Repeated Motifs in Pattern Discovery and Text Mining

Abstract : We present a new notion of basis that is able to generate the repeated motifs (possibly exponential in number) that appear at least twice with don't care symbols in a string of length n over an alphabet . Our basis has some interesting features such as being (a) a subset of the previously de ned bases, (b) truly linear as its motifs are less than n in number and appear in the string for a total of 2n times at most; (c) symmetric as the basis of the reversed string is the reverse of the basis; (d) computable in polynomial time, namely, in O(n log n log jj) time.
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-00627828
Contributor : Maxime Crochemore <>
Submitted on : Thursday, September 29, 2011 - 4:25:02 PM
Last modification on : Thursday, November 21, 2019 - 1:14:17 PM
Long-term archiving on: Friday, December 30, 2011 - 2:30:31 AM

File

Hal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00627828, version 1

Collections

Citation

Nadia Pisanti, Maxime Crochemore, Roberto Grossi, Marie-France Sagot. A Basis for Repeated Motifs in Pattern Discovery and Text Mining. 2002. ⟨hal-00627828⟩

Share

Metrics

Record views

293

Files downloads

107