首页    期刊浏览 2024年10月04日 星期五
登录注册

文章基本信息

  • 标题:Aggregated estimating equation estimation
  • 本地全文:下载
  • 作者:Nan Lin ; Ruibin Xi
  • 期刊名称:Statistics and Its Interface
  • 印刷版ISSN:1938-7989
  • 电子版ISSN:1938-7997
  • 出版年度:2011
  • 卷号:4
  • 期号:1
  • 页码:73-83
  • DOI:10.4310/SII.2011.v4.n1.a8
  • 出版社:International Press
  • 摘要:Motivated by the recent active research on online analytical processing (OLAP), we develop a computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy. In each partition of the data set, we compress the raw data into some low dimensional statistics and then discard the raw data. Then, we obtain an approximation to the EE estimator, the aggregated EE (AEE) estimator, by solving an equation aggregated from the saved low dimensional statistics in all partitions. Such low dimensional statistics are taken as the EE estimates and first-order derivatives of the estimating equations in each partition. We show that, under proper partitioning and some regularity conditions, the AEE estimator is strongly consistent and asymptotically equivalent to the EE estimator. A major application of the AEE technique is to support fast OLAP of EE estimations for data warehousing technologies such as data cubes and data streams. It can also be used to reduce the computation time and conquer the memory constraint problem posed by massive data sets. Simulation studies show that the AEE estimator provides efficient storage and remarkable deduction in computational time, especially in its applications to data cubes and data streams.
  • 关键词:massive data sets; estimating equation; data compression; aggregation; consistency; asymptotic normality; data cube
国家哲学社会科学文献中心版权所有