文章基本信息

标题：Efficient Tree-Structured Categorical Retrieval
本地全文：下载
作者：Djamal Belazzougui ; Gregory Kucherov
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2020
卷号：161
页码：4:1-4:11
DOI：10.4230/LIPIcs.CPM.2020.4
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(logÏf(1+o(1))+log D+O(h)) + O(Î") bits of space and O( p +t) query time, where n is the total length of the documents, Ïf the size of the alphabet used in the documents and Î" is the total number of nodes in the category tree. Another solution uses n(logÏf(1+o(1))+O(log D))+O(Î")+O(Dlog n) bits of space and O( p +tlog D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.
关键词：pattern matching; document retrieval; category tree; space-efficient data structures