首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Efficient Tree-Structured Categorical Retrieval
  • 本地全文:下载
  • 作者:Djamal Belazzougui ; Gregory Kucherov
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:161
  • 页码:4:1-4:11
  • DOI:10.4230/LIPIcs.CPM.2020.4
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(logÏf(1+o(1))+log D+O(h)) + O(Î") bits of space and O( p +t) query time, where n is the total length of the documents, Ïf the size of the alphabet used in the documents and Î" is the total number of nodes in the category tree. Another solution uses n(logÏf(1+o(1))+O(log D))+O(Î")+O(Dlog n) bits of space and O( p +tlog D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.
  • 关键词:pattern matching; document retrieval; category tree; space-efficient data structures
国家哲学社会科学文献中心版权所有