Efficient Tree-Structured Categorical Retrieval

Djamal Belazzougui; Gregory Kucherov

doi:10.4230/LIPIcs.CPM.2020.4

Communication Dans Un Congrès Année : 2020

Efficient Tree-Structured Categorical Retrieval

(1) , (2)

1
2

Djamal Belazzougui

Fonction : Auteur
PersonId : 1085162

DTISI

Gregory Kucherov

Fonction : Auteur
PersonId : 14903
IdHAL : gregory-kucherov
ORCID : 0000-0001-5899-5424
IdRef : 093602189

Laboratoire d'Informatique Gaspard-Monge

Résumé

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log σ(1 + o(1)) + log D + O(h)) + O(∆) bits of space and O(|p| + t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and ∆ is the total number of nodes in the category tree. Another solution uses n(log σ(1 + o(1)) + O(log D)) + O(∆) + O(D log n) bits of space and O(|p| + t log D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.

Mots clés

document retrieval category tree space-efficient data structures

Domaines

Algorithme et structure de données [cs.DS]

Fichier principal

LIPIcs-CPM-2020-4.pdf (444.31 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Gregory Kucherov : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03049420

Soumis le : mercredi 9 décembre 2020-19:23:14

Dernière modification le : vendredi 24 mars 2023-14:53:19

Archivage à long terme le : mercredi 10 mars 2021-20:04:01

Dates et versions

hal-03049420 , version 1 (09-12-2020)

Identifiants

HAL Id : hal-03049420 , version 1
DOI : 10.4230/LIPIcs.CPM.2020.4

Citer

Djamal Belazzougui, Gregory Kucherov. Efficient Tree-Structured Categorical Retrieval. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), Jun 2020, Copenhagen, Denmark. ⟨10.4230/LIPIcs.CPM.2020.4⟩. ⟨hal-03049420⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS PARISTECH LIGM LIGM_MOA UNIV-EIFFEL LIGM_ADA

47 Consultations

24 Téléchargements

Efficient Tree-Structured Categorical Retrieval

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager