Skip to Main content Skip to Navigation
Conference papers

Computing discriminating and generic words

Abstract : We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less than d documents, and (iii) all shortest extensions of P which occur only in d selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-00789986
Contributor : Gregory Kucherov <>
Submitted on : Tuesday, February 19, 2013 - 10:59:32 AM
Last modification on : Wednesday, May 6, 2020 - 8:18:02 PM
Long-term archiving on: : Monday, May 20, 2013 - 4:00:34 AM

File

main.pdf
Files produced by the author(s)

Identifiers

Citation

Gregory Kucherov, Yakov Nekrich, Tatiana Starikovskaya. Computing discriminating and generic words. String Processing and Information Retrieval, Oct 2012, Cartagena de Indias, Colombia. pp.307-317, ⟨10.1007/978-3-642-34109-0_32⟩. ⟨hal-00789986⟩

Share

Metrics

Record views

328

Files downloads

356