出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Backup software information is a potential source for data mining: not only the unstructuredstored data from all other backed-up servers, but also backup jobs metadata, which is stored ina formerly known catalog database. Data mining this database, in special, could be used inorder to improve backup quality, automation, reliability, predict bottlenecks, identify risks,failure trends, and provide specific needed report information that could not be fetched fromclosed format property stock property backup software database. Ignoring this data miningproject might be costly, with lots of unnecessary human intervention, uncoordinated work andpitfalls, such as having backup service disruption, because of insufficient planning. The specificgoal of this practical paper is using Knowledge Discovery in Database Time Series, StochasticModels and R scripts in order to predict backup storage data growth. This project could not bedone with traditional closed format proprietary solutions, since it is generally impossible toread their database data from third party software because of vendor lock-in deliberateovershadow. Nevertheless, it is very feasible with Bacula: the current third most popular backupsoftware worldwide, and open source. This paper is focused on the backup storage demandprediction problem, using the most popular prediction algorithms. Among them, Holt-WintersModel had the highest success rate for the tested data sets.
关键词:Backup; Catalog; Data Mining; Forecast; R; Storage; Prediction; ARIMA; Holt-Winters