Optimization approach of large downloadable data files from SCADA systems.
Boca, Loredana ; Croitoru, Bogdan ; Ileana, Ioan 等
1. INTRODUCTION
Nowadays, process control performance monitoring software has
become an important tool in the control engineer's toolbox. Still,
the number of performance tests and statistics that can be calculated
for any given control loop can be overwhelming. One concerning problem
is the permanently increasing number of data. These data become even
more complex to be gathered and the management is also very difficult.
Even more difficult is to make results readable and meaningful. For
example a SCADA system produce more than 900 million data gathered from
less than 100 sensors in a few years (Widom, 2008).
The present paper describes the management of SCADA downloadable
data and how can be possible to optimize the process of recording data
from sensors. We propose an optimization method for data management that
gives us the possibility to monitor the evolution of a sensor for a long
period of time. Every sensor from the SCADA system is measuring
different environmental parameters and the recorded values are stored in
CSV or Excel files. These kinds of files are difficult to manage because
each file contains millions of data (usually around 1.5 -3 millions of
values). Also, each file contains data from many sensors (typically up
to 40 sensors). Sampling rate is the average value of 1000 readings/ min
(0.06 Hz), but we have some situations where 10 or 100 Hz sampling rates
are requested. According to this situation, in a month a sensor has
stored (produced) ~44000 records. The average number of values in every
file is ~1.5 million, resulting in a very large CSV file (~7 MB)
(McGowan, 2008).
To control the evolution process of one sensor on a period of time
is very difficult, because we need a graphic tool for this. Each file
contains sensor data for a month, but a sensor can be found in multiple
Excel files, so, it's even more difficult to monitor the evolution
of that sensor during months of working. In our case, each Excel file
consists in: a header containing sensor name, engineering units,
description, status, data type, size (bytes/value), number of values;
and the recorded sensors values, date, time and status for each value
(Fig. 1).
We have found a solution for data evolution management and for
monitoring sensors behavior. We chose to work with Access Databases,
because we considered that querying parameters are powerful tools for
extracting relevant data of any sensor and data monitoring.
[FIGURE 1 OMITTED]
The Excel files were converted in Access Databases following some
criteria. Because we are dealing with two types of Excel files, we are
forced to use different criteria for each type. MDB files offer the
possibility to use queries for extracting certain details that we need
(Ye & Heidemann, 2006).
2. RELATED WORK
Three study cases (Fernandez, 1994; Heidemann & Ye, 2006; Ye
& Heidemann, 2006), tried to solve, using different approaches, some
S.C.A.D.A problems regarding the management of large data files, data
which had been gathered through telemetry for system evaluation and
maintenance management. To handle the large volume of data collection,
evaluation, and report, they have expanded its existing software as part
of an S.C.A.D.A Wide Area Network that is operating on several mini
computers Module. This provides capability of quickly access in semi
real-time data mode.
Another related research describes the openness and support for
disclosure of future S.C.A.D.A data used to improve safety, reliability,
and security for networked embedded control of physical systems. Open
protocols, with new approaches to access control and in-network
processing are needed to make greater openness and viable data sharing (Widom, 2008; Youqiang, 2009).
3. IMPLEMENTATION
We have designed a test panel using Excel files and Access Database
files for measuring computing resources, because we want to understand
the need of building a high level application, which solves the problem
of computing resource and large data management (Tab. 1.).
We have implemented two VB.NET applications that convert Excel
files into MDB files for solving large data files management.
The first application (Tag Converter--Fig. 2) is creating, as a
first step, four working directories. The internal activity of the
application's algorithms is based on full paths of the files
involved in the conversion process (Fig. 2).
Non_Processed_Excel_Files directory will contain the non-processed
large data Excel files ready to be converted.
Processed_Excel_Files directory will contain Excel files that have
been processed. Every processed Excel file is moved and renamed from
Non_Processed_Excel_Files directory into Processed_Excel_Files
directory. The renaming files process is made by extracting the first
data (year and month) of the first record from every Excel file.
Archive_MDB directory contains a list of full paths of Excel files
that will be converted into MDB files.
Database_MDB directory contains the final MDB files as a result of
Excel files conversion process. The conversion process ratio is 1:1,
which means one MDB file is corresponding to one Excel file. Each MDB
file contains five fields of data corresponding to: tag name (or sensor
name), description of the sensor, data, value and status.
The conversion process takes at least one hour and a half to be
performed, but the advantage is the possibility to apply querying
parameters to MDB files to analyze the sensor evolution. A 70 MB Excel
file containing ~4 million of records was converted in ~2 hours.
The second VB.NET application is Tag_Graph (Fig. 3). This
application is loading an MDB database, previously created by
Tag_Converter (Fig. 2) application. By applying a complex querying
algorithm to database, the application displays a graph of sensor values
on a specified period of time (Fig. 4) for observing the sensor
evolution. The complex query consists in: starting date, ending date,
starting time, ending time, and tag name by description.
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
4. CONCLUSIONS
One concern of SCADA systems is the management of the millions of
gathered data. We fulfilled the main goal of the paper, which is an
approach in optimization and management of downloaded SCADA sensor data,
by implementing, in a reliable mode, two software applications that
helps to optimize the work with large data files. Also, with the
designed management system of data we are able to easily monitor the
evolution of a specified sensor on a period of time using a complex
graphic.
There are also disadvantages: converting data from Excel files to
MDB files is taking lot of time and powerful hardware resources are
needed; the conversion and querying operations are a hardware resources
consuming. We propose to be used parallel or multi-server computation to
reduce the processing time and speed up files conversion and querying
processes.
As a future work we propose two goals: one is to optimize the
number, efficiency and relevance of SCADA gathered data by using new
sensors sampling rates based on Nyquist principles, and the second goal
is the implementation of a software application for parallel computation
(multi server) to reduce data processing time.
5. REFERENCES
Fernandez, R. B. (1994). SCADA data integration with facilities
management in Miami-Dade, Available from:
http://libraries.maine.edu/Spatial/gisweb/spatdb/amfm/am9 4015.html
Accessed: 2009-04-25
Heidemann, J. & Ye, W. (2006). Towards Full-disclosure:
Broadening Access to SCADA Data to Improve Safety, Reliability, and
Security, Available from:
http://www.isi.edu/~johnh/PAPERS/Heidemann06e.pdf Accessed: 2009-03-14
McGowan R. (2008). AMPLA--Turning SCADA Data into Information,
Goulburn Valley Water, Available from:
http://www.wioa.org.au/conference_papers/08_vic/
documents/RyanMcGowan.pdf Accessed: 2009-05-20
Widom, J. (2008). TRIO--A system for data, uncertainty, and
lineage, Available from:
http://ilpubs.stanford.edu:8090/843/1/2008-27.pdf Accessed: 2009-05-10
Ye, W. & Heidemann, J. (2006). Enabling Interoperability and
Extensibility of Future SCADA Systems, Available from:
http://www.isi.edu/~johnh/PAPERS/Ye06c.pdf Accessed: 2009-04-28
Youqiang Guo, Zijun Zhang, Xuezhu Pei, A Research on Instability of
Small Flow in SCADA and an Optimizing Design for Control, In:
Computational Science and Its Applications--ICCSA 2009, Gervasi, O.,
653-663, Springer Berlin / Heidelberg, ISBN 978-3-642-02456-6, Berlin
Tab. 1. Allocated Memory & Required CPU usage regarding
different size of Excel and MDB files
Sensor DATA Allocated CPU Usage
Capacity [kBytes] Memory (KB)
200kB Excel 28.460 40%
Application
2 MB Excel 33.072 43%
Application
200MB Excel 375.845 99%
Application
200kB MDB File 10.520 13%
2 MB MDB File 11.080 14%
200 MB MDB File 11.112 19%