Skip to Main content Skip to Navigation
Conference papers

Efficient Tree-Structured Categorical Retrieval

Abstract : We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log σ(1 + o(1)) + log D + O(h)) + O(∆) bits of space and O(|p| + t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and ∆ is the total number of nodes in the category tree. Another solution uses n(log σ(1 + o(1)) + O(log D)) + O(∆) + O(D log n) bits of space and O(|p| + t log D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.
Document type :
Conference papers
Complete list of metadata

https://hal-upec-upem.archives-ouvertes.fr/hal-03049420
Contributor : Gregory Kucherov Connect in order to contact the contributor
Submitted on : Wednesday, December 9, 2020 - 7:23:14 PM
Last modification on : Friday, June 18, 2021 - 4:02:02 PM
Long-term archiving on: : Wednesday, March 10, 2021 - 8:04:01 PM

File

LIPIcs-CPM-2020-4.pdf
Publisher files allowed on an open archive

Identifiers

Collections

`

Citation

Djamal Belazzougui, Gregory Kucherov. Efficient Tree-Structured Categorical Retrieval. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), Jun 2020, Copenhagen, Denmark. ⟨10.4230/LIPIcs.CPM.2020.4⟩. ⟨hal-03049420⟩

Share

Metrics

Record views

54

Files downloads

22