文章基本信息

标题：Aggregated estimating equation estimation
本地全文：下载
作者：Nan Lin ; Ruibin Xi
期刊名称：Statistics and Its Interface
印刷版ISSN：1938-7989
电子版ISSN：1938-7997
出版年度：2011
卷号：4
期号：1
页码：73-83
DOI：10.4310/SII.2011.v4.n1.a8
出版社：International Press
摘要：Motivated by the recent active research on online analytical processing (OLAP), we develop a computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy. In each partition of the data set, we compress the raw data into some low dimensional statistics and then discard the raw data. Then, we obtain an approximation to the EE estimator, the aggregated EE (AEE) estimator, by solving an equation aggregated from the saved low dimensional statistics in all partitions. Such low dimensional statistics are taken as the EE estimates and first-order derivatives of the estimating equations in each partition. We show that, under proper partitioning and some regularity conditions, the AEE estimator is strongly consistent and asymptotically equivalent to the EE estimator. A major application of the AEE technique is to support fast OLAP of EE estimations for data warehousing technologies such as data cubes and data streams. It can also be used to reduce the computation time and conquer the memory constraint problem posed by massive data sets. Simulation studies show that the AEE estimator provides efficient storage and remarkable deduction in computational time, especially in its applications to data cubes and data streams.
关键词：massive data sets; estimating equation; data compression; aggregation; consistency; asymptotic normality; data cube