摘要:在智能电网通信管理系统(TMS)中产生的大量数据信息有待分析总结,这些数据信息存在账务和实物不一致、数据录入错误以及缺失数据等问题。本文基于Hadoop分布式集群基础框架和Spark通用并行计算平台的分布式支持向量机训练算法,提出一种针对TMS系统数据站点检修次数中的异常数据纠察分析的解决方法。该方法以站点类型为代表的一系列数据为特征属性,使用支持向量机算法建立的模型,对各个站点进行预测和评级,纠察出异常站点,以供相关人员进行排查。最后该方法通过实验进行了验证。 Massive date generated from TMS needs to be analyzed, so as to address the in-consistency between the financial data and real data, wrong data input, and data missing. This paper proposes a method to identify and correct abnormal data in data site maintenance times of TMS, which is based on support vector machine training algorithm running on the Hadoop distributed cluster-based framework and Spark distributed parallel computing platform. To this end, the writer takes a series of data represented by site type as the feature attribute and uses models which support vector machine algorithm to predicate and evaluate each site, thus identifying the abnormal sites needed to be further checked by relevant personnel. This method has been finally verified by experiment.