首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Hidden Words Statistics for Large Patterns
  • 本地全文:下载
  • 作者:Svante Janson ; Wojciech Szpankowski
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:159
  • 页码:17:1-17:15
  • DOI:10.4230/LIPIcs.AofA.2020.17
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern w of length m as a subsequence in a random text of length n. The quantity of interest is the number of occurrences of w as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern w is of variable length. To the best of our knowledge this problem was only tackled for a fixed length m=O(1) [P. Flajolet et al., 2006]. In our main result Theorem 5 we prove that for m=o(n^{1/3}) the number of subsequence occurrences is normally distributed. In addition, in Theorem 6 we show that under some constraints on the structure of w the asymptotic normality can be extended to m=o(â^Sn). For a special pattern w consisting of the same symbol, we indicate that for m=o(n) the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. We conjecture that this dichotomy is true for all patterns. We use Hoeffding’s projection method for U-statistics to prove our findings.
  • 关键词:Hidden pattern matching; subsequences; probability; U-statistics; projection method
国家哲学社会科学文献中心版权所有