Fault detection of non-linear processes using Kernel independent component analysis.
Lee, Jong-Min ; Joe Qin, S. ; Lee, In-Beum 等
INTRODUCTION
Multivariate statistical process monitoring approaches based on
principal component analysis (PCA) have been developed to extract useful
information from a large amount of chemical process data and to detect
and identify various faults in an abnormal operating situation (Kresta
et al., 1991; Nomikos and MacGregor, 1995; Ku et al., 1995; Wise and
Gallagher, 1996; Dong and McAvoy, 1996; Dunia et al., 1996; Bakshi,
1998; Li et al., 2000; Chiang et al., 2001; Qin, 2003). PCA is a
dimensionality reduction technique that handles high dimensional, noisy
and highly correlated data. It divides data information into a
systematic part that contains the most variation in the data and a noisy
part having least variance. Two statistics, represented by Mahalanobis
and Euclidean distances, are used for process monitoring to detect
pattern changes of the process variation in the model and residual
subspaces, respectively. However, PCA performs poorly due to its use of
second order statistics and linearity assumption when it is applied to
industrial chemical process data having non-Gaussian and nonlinear characteristics.
To solve the problem posed by non-linear data, some nonlinear PCA
approaches have been developed. Kramer (1991) developed a non-linear PCA
based on auto-associative neural networks having five layers (input,
mapping, bottleneck, demapping and output layers). Dong and McAvoy
(1996) proposed a non-linear PCA based on principal curves and neural
networks and applied it to non-linear process monitoring. After
obtaining the associated scores and the corrected data using the
principal curve method, they used a neural network model to map the
original data into the corresponding scores and to map these scores into
the original variables. Alternative non-linear PCA methods based on
genetic programming (Hiden et al., 1999) and input-training neural
networks (Tan and Mavrovouniotis, 1995) have been also developed.
However, most existing non-linear PCA approaches are based on neural
networks, thus, a non-linear optimization problem has to be solved to
compute the principal components and the number of principal components
must be specified in advance before training the neural networks.
Recently, Lee et al. (2004a) proposed a new non-linear process
monitoring technique using kernel PCA (KPCA) to monitor continuous
process and demonstrated its superiority to the PCA monitoring method.
KPCA is to first map the input space into a feature space via a
non-linear map, which makes data structure more linear, and then to
extract the principal components in that feature space. One can avoid
the need for both performing the non-linear mappings and computing inner
products in the feature space by introducing a kernel function. Compared
to other non-linear PCA methods, KPCA has the main advantages that no
non-linear optimization is involved, it essentially requires only linear
algebra, and the number of principal components need not be specified
prior to modelling.
Besides the linear assumption, conventional PCA has a limitation to
extract useful information from observed data because it is a
second-order method, considering only mean and variance of the data. It
extracts only uncorrelated components, not independent components,
therefore, giving limited representation for non-Gaussian data, which is
typical in industrial data. Hence, a method is needed that can take into
account all higher-order statistics of observed data and make the latent
components independent. Recently, several monitoring methods based on
independent component analysis (ICA) have been proposed in order to
improve monitoring performance (Kano et al., 2003, 2004; Lee et al.,
2003, 2004b, 2006; Albazzaz and Wang, 2004). ICA decomposes observed
data into linear combinations of statistically independent components.
In comparison to PCA, ICA involves higher-order statistics, i.e., not
only decorrelates the data but also reduces higher order statistical
dependencies, making the distribution of the projected data as
independent as possible and thus giving more useful information from
observed data. However, ICA-based linear projection is also inadequate
to represent the data with a non-linear structure (Yang et al., 2005).
In this paper, a new non-linear process monitoring technique based on
kernel independent component analysis (KICA) is proposed. KICA is an
emerging non-linear feature extraction technique that formulates
independent component analysis (ICA) in the kernel-induced feature space
(Kocsor and Csirik, 2001; Bach and Jordan, 2002; Kocsor and Toth, 2004;
Yang et al., 2005). The basic idea of KICA is to non-linearly map the
data into a feature space using KPCA and to extract useful information
to further perform ICA in the KPCA feature space. In this paper, the
KICA algorithm is based on the formalism presented in Yang et al.
(2005). However, the algorithm is modified to make it suitable for the
purpose of process monitoring. It is composed of two steps: KPCA (kernel
centring, kernel whitening) and iterative procedure of the modified ICA
suggested by Lee et al. (2006). The paper is organized as follows. The
KICA algorithm is explained in the next section, followed by its
application to non-linear process monitoring. Then, the superiority of
KICA and its application to process monitoring is illustrated through a
simple multivariate process and the Tennessee Eastman Process. Finally,
conclusions are given.
KICA
In this section, KICA is derived to extract statistically
independent components that also capture the non-linearity of the
observed data. The idea is to non-linearly map the data into a feature
space where the data has a more linear structure. Then we perform
modified ICA in feature space and make the extracted components of data
as independent as possible. As in the central algorithm of Yang et al.
(2005), we will use "kernel tricks" to extract whitened
principal components in high dimensional feature space and ultimately
convert the problem of performing modified ICA in feature space into a
problem of implementing modified ICA in the KPCA transformed space. In
this paper, detailed algorithms about KPCA and ICA are not considered.
The authors are referred to some papers about KPCA and ICA (Scholkopf et
al., 1998, 1999; Hyvarinen and Oja, 2000; Hyvarinen et al., 2001).
ICA Model in Feature Space
The non-linear data structure in the original space is more likely
linear via high-dimensional non-linear mapping (Haykin, 1999). This
higher-dimensional linear space is referred to as the feature space (F).
Consider a non-linear mapping [PHI] : [R.sub.m] [right arrow] F (feature
space) (1)
We first map the input space into a feature space via non-linear
mapping and make the covariance structure of mapped data be an identity
matrix to make the problem of ICA estimation simpler and better
conditioned (the detail algorithm will be explained later). Then, our
objective is to find a linear operator [W.sup.F] in feature space F to
recover the independent components from [PHI](x) by the following linear
transformation:
y = [W.sup.F] x [PHI](x) (2)
where E{[PHI](x) [PHI][(x).sup.T]} = I.
Whitening of Data in Feature Space Using KPCA
The goal of this step is to map the input space into a feature
space via non-linear mapping and then to extract whitened principal
components in that feature space such that their covariance structure is
an identity matrix. Consider the observed data [x.sub.k] [member of]
[R.sup.m], k=1,...,N where N is the number of observations. Using the
non-linear mapping [PHI]:[R.sup.m] [right arrow] F, the observed data in
the original space are extended into the high dimensional feature space,
[PHI]([x.sub.k]) [member] F. Then, the covariance matrix in feature
space will be
[S.sup.F] = 1/N [N summation over (k=1)] [PHI]([x.sub.k])
[PHI][([x.sub.k]).sup.T] (3)
where [PHI](xk) for k=1,...,N is assumed to be zero-mean and unit
variance, which will be explained later. Let [[THETA] = [[PHI](x1),...,
[PHI]([x.sub.N])], then [S.sup.F] can be expressed by [S.sup.F] = 1/N
[[THETA][THETA].sup.T]. We can obtain principal components in feature
space by finding eigen-vectors of [S.sup.F], which is straightforward.
Instead of eigen-decomposing SF directly, we can alternatively find
principal components using the "kernel tricks." Defining an N
x N gram kernel matrix K by
[[K].sub.ij] = [K.sub.ij] =
<[PHI]([x.sub.i]),[PHI]([x.sub.j])> = k ([x.sub.i],[x.sub.j])
we have K = [THETA]T[THETA]. The use of a kernel function
k([x.sub.i], [x.sub.j]) allows us to compute inner products in F without
performing nonlinear mappings. That is, we can avoid performing the
nonlinear mappings and computing inner products in the feature space by
introducing a kernel function of form k(x, y)= <[PHI](x),
[PHI](y)> (Scholkopf et al., 1998; Romdhani et al., 1999). Some of
the most widely used kernel functions are radial basis
kernel: k(x,y) = exp ([parallel]x-y[[parallel]2/c), polynomial kernel: k(x, y) = <x, y>r,
sigmoid kernel: k(x, y) = tanh([[beta].sub.o] <x, y> +
[[beta].sub.1]) where c, r, [[beta].sub.0], and [[beta].sub.1] have to
be specified. A specific choice of kernel function implicitly determines
the mapping [PHI] and the feature space F. From the kernel matrix K, the
mean centring and variance scaling of [PHI]([x.sub.k]) in
high-dimensional space can be performed as follows. The mean centred
kernel matrix K can be easily obtained from
K = K - [1.sub.N] K - [K1.sub.N] + [1.sub.N] [K1.sub.N] (5) where
[1.sub.N] = 1/N [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]
[member of] [R.sup.NxN] (Scholkopf et al., 1998). Also, the variance
scaling of kernel matrix can be done by the following equation (Lee et
al., 2004c)
[K.sub.scl] = K/trace(K)/N (6)
If we apply eigenvalue decomposition to [K.sub.scl]
[lambda] a = [K.sub.scl]a (7)
we can obtain the orthonormal eigenvectors [a.sub.1],
[a.sub.2],...,[a.sub.d] of [K.sub.scl] corresponding to d largest
positive eigenvalues [[lambda].sub.1] [greater than or equal to]
[[lambda].sub.2] [greater than or equal to] ... [greater than or equal
to] [[lambda].sub.d] Theoretically, the number of nonzero eigenvalues is
equal to the hyper-dimension. In this paper, we empirically determine
the hyper-dimension as the number of eigenvalues satisfying
[[lambda].sub.i]/sum([lambda].sub.i] > 0.0001. Then, the d
largest positive eigenvalues of SF
are [[lambda].sub.1]/N , [[lambda].sub.2]/N, ..., [lambda].sub.d]/N
and the associated orthonormal eigenvectors
[v.sub.1],[v.sub.2],...,[v.sub.d] can be expressed as
[v.sub.j] = 1/[square root of ([[lambda].sub.j] [THETA]aj j =
1,...,d (8)
The eigenvector matrix V = [[v.sub.1,] [v.sub.2],...,?[v.sub.d]]
can be briefly expressed by the following matrix
V = [THETA]H[[LAMBDA].sup.-1]/2 (9)
where H = [[a.sub.1], [a.sub.2],...,[a.sub.d]] and [LAMBDA] =
diag([[lambda].sub.1] [greater than or equal to] [[lambda].sub.2]
[greater than or equal to] ... [greater than or equal
to][[lambda].sub.d]). Then, V makes the covariance matrix [S.sub.F] be a
diagonal matrix
[V.sup.T] [S.sup.F] V = diag ([[lambda].sub.1]/N ,
[[lambda].sub.2]/N, ..., [lambda].sub.d]/N) = 1/N [LAMBDA] (10)
Let
P = V [(1/N [LAMBDA]).sup.-1/2 = [square root of
(N[THETA]H[[LAMBDA].sup.-1] (11)
then
[P.sup.T] [S.sup.F] P = I (12)
Thus, we obtain the whitening matrix P and the mapped data in the
feature space can be whitened by the following transformation
z = [P.sup.T] [PHI] (x) (13)
In detail,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]
In fact, z is the same as the whitened KPCA score vector satisfying
E{z[z.sup.T]} = I
Further Processing Using the Modified ICA
The goal of this step is to extract independent components from the
KPCA-transformed space. In the central part of this step, we applied the
modified ICA of Lee et al. (2006), instead of the original FastICA
algorithm used in Yang et al. (2005). To be suitable for process
monitoring, Lee et al. (2006) proposed the modified ICA to extract some
dominant independent components from observed data. Compared to
conventional ICA (FastICA), the modified ICA algorithm can extract a few
dominant factors needed for process monitoring, attenuate high
computational load, consider the ordering of independent components, and
give a consistent solution. From z [member of] [R.sup.d], the modified
ICA can find p (p [less than or equal to] d) dominant independent
components, y, satisfying E{[yy.sup.T]} = D
=diag{[lambda].sub.1],...[lambda].sub.p]} by maximizing the
non-Gaussianity of the elements of y, using
y [C.sup.T.sub.z] (15)
where C [member of] [R.sup.dxp] and [C.sup.T] C = D. The
requirement E{[yy.sup.T]} = D reflects that the variance of each element
of y is the same as that of scores in KPCA. Like PCA, the modified ICA
can rank independent components according to their variances. If we
define the normalized independent components, [y.sub.n], as:
[y.sub.n] = [D.sup.-1] y = [D.sup.-1/2] [C.sup.T] z = [C.sub.n.]
[T.sub.z ] (16)
it is clear that [D.sup.-1/2] [C.sup.T] = [C.sup.T.sub.n],
[C.sup.T.sub.n], [C.sub.n] = I, and E{[y.sub.n][y.sup.T.sub.n]} = I
Although z is not independent, it can be a good initial value of yn
since it has removed statistical dependencies of data up to the
second-order (mean and variance). Therefore, we can set the initial
matrix of [C.sub.n.sup.T] to be
[C.sub.n.sup.T] = [I.sub.p] : 0] (17)
where [I.sub.p] is the p-dimensional identity matrix and 0 is p x
(d - p) zero matrix.
To calculate [C.sub.n], each column vector [c.sub.n,i] is
initialized and then updated so that i-th independent component
[y.sub.n,i] = ([c.sub.n,i])T z can have the most non-Gaussianity. The
objective function that yn,i for i = 1 ,..., p are statistically
independent is equivalent to maximizing the non-Gaussianity (Hyvarinen
and Oja, 2000). Hyvarinen and Oja (2000) introduced a flexible and
reliable approximation of the negentropy as a measure of
non-Gaussianity:
J(y) [approximately equal to] [[E{G(y)} - E{G(v)}].sup.2] (18)
where y is assumed to be of zero mean and unit variance, v is a
Gaussian variable of zero mean and unit variance, and G is any
non-quadratic function. The non-quadratic function G is described in
detail in the paper of Hyvarinen (1999). The detailed algorithms are
given below:
1. Choose p, the number of independent components to estimate. Set
counter i [left arrow] 1.
2. Take the initial vector [c.sub.n,i] to be the i-th row of the
matrix in Equation (17).
3. Maximize the approximated negentropy: Let [c.sub.n,i] [left
arrow] E{zg([c.sub.n,i] [T.sub.z]} [c.sub.n,i], where g is the first
derivative and g' is the second derivative of G
4. Do the orthogonalization in order to exclude the information
contained in the solutions already found:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]
5. Normalize [c.sub.n,i] [left arrrow] [c.sub.n,i]/
[parallel][c.sub.n,i][parallel]
6. If [c.sub.n,i] has not converged, go back to Step 3. 7. If
[c.sub.n,i] has converged, output the vector [c.sub.n,i]. Then, if i
[less than or equal to] p set i [left arrow] i + 1 and go back to Step
2.
Once [C.sub.n] is found, the kernel independent components are
obtained from
y = [D.sup.1/2] [C.sub.n.sup.T] z = [D.sup.1/2] [C.sub.n.sup.T]
[P.sup.T] [PHI] (x) (19)
This equation is the final realization of Equation (2).
ON-LINE MONITORING STRATEGY OF KICA
The monitoring strategy based on KICA is the extension of the
modified ICA based monitoring in the feature space. To detect the
systematic part change within the KICA model, the Hotelling's
[T.sup.2] statistic, the sum of the normalized squared scores, is
defined as follows:
[T.sup.2] = [y.sup.T] [D.sup.-1] y (20)
The upper control limit for [T.sup.2] cannot be determined using
the F-distribution because y does not follow Gaussian distribution. In
this paper, kernel density estimation is used to define the control
limit (Martin and Morris, 1996; Lee et al., 2004a).
To detect the non-systematic part change in the residual of the
KICA model, the SPE-statistic is defined as follows:
SPE - [e.sup.T] e = [(z-z).sup.T][(z-z) = [z.sup.T] (I -
[C.sub.n][C.sub.n.sup.T]z (21)
where e = z - z and z can be found from:
z - [C.sub.n][D.sup.-1/2] y = [C.sub.n][C.sub.n] T.sup.z] (22)
If the majority of non-Gaussianity is included in the extracted
independent components, the residual subspace will contain mostly
Gaussian noise that can be treated as normal distribution. Assuming that
the prediction errors are normally distributed, the control limits for
the SPE are calculated from the following weighted [X.sup.2]
distribution
SPE ~ [micro][X.sup.2]n [micro] = b/2a, h = 2[a.sup.2]/b (23)
where a and b are the estimated mean and variance of the SPE from
the normal operating data, respectively (Nomikos and MacGregor, 1995;
Tates et al., 1999).
Based on the [T.sup.2] statistic and the SPE statistic, the
proposed KICA monitoring method can be summarized as follows. The KICA
model is first made using historical data collected during normal
operation. Future process behaviour is then compared against this
"normal" or "in control" representation.
Developing the normal operating condition (NOC) KICA model
a) Acquire normal operating data and normalize the data using the
mean and standard deviation of each variable.
b) From the scaled normal operating data [x.sub.k] [member of ]
[R.sup.m], k = 1,...,N, compute the kernel matrix K [member of ]
[R.sup.NxN] using Equation (4).
c) Carry out mean centring and variance scaling in the feature
space, obtaining [K.sub.scl] using Equation (5) and Equation (6).
d) Solve the eigenvalue problem [lambda]a = [K.sub.scl]a. Here, we
extracted d eigenvectors and the corresponding eigenvalues ?i satisfying
[[lambda].sub.i]/sum([[lambda].sub.i]) > 0 0001.
e) For normal operating data x, extract a non-linear component via
Equation (14).
f) Extract independent components using modified ICA from Equation
(19).
g) Calculate [T.sup.2] and SPE using Equation (20) and Equation
(21), respectively.
h) Determine the control limits of the [T.sup.2] and SPE charts.
On-line monitoring
a) Obtain new data for each sample and scale it with the mean and
variance obtained at step 1 of the modelling procedure.
b) From the m-dimensional scaled test data [x.sub.i] [member of]
[R.sup.m], compute the kernel vector [k.sub.t] [member of] [R.sup.1xN]
by [[k.sub.t]j] = [[k.sub.t]([x.sub.t], [x.sub.j])] where [x.sub.j] is
the normal operating data [x.sub.j] [member of] [R.sup.m], j = 1,..., N.
c) Mean centre the test kernel vector kt as follows:
[k.sub.t] = [k.sub.t] - [1.sub.t] K - [k.sub.t] [1.sub.N] +
[1.sub.t][[K1.sub.N] (24)
where K and [1.sub.N] are obtained from step 3 of the modelling
procedure and [1.sub.t] = 1/N [1,...,1] [member of] [R.sup.1xN].
d) Variance scaling:
[K.sup.tscl] = [k.sub.t]/ trace(K)/N (25)
e) For the test data xt, extract a non-linear component via
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (26)
f) Further processing using modified ICA
y = [D.sup.1/2] [C.sub.n] [T.sub.zt] (27)
g) Calculate the monitoring statistics ([T.sup.2] and SPE) of the
test data.
h) Monitor whether [T.sup.2] or SPE exceeds its control limit
calculated in the modelling procedure.
CASE STUDIES
In this section, the performance of KICA is illustrated through a
simple example and Tennessee Eastman process case study. When the
proposed KICA is applied to the observed data, the selection of the
kernel function is important since the degree to which the non-linear
characteristic of a system is captured depends on this function;
however, the general question of how to select the ideal kernel for a
given monitoring process still remains an open problem (Mika et al.,
1999; Lee et al., 2004a). By testing the monitoring performance of a
range of kernel functions in various systems, we found that the radial
basis kernel is the most appropriate for our non-linear process
monitoring examples. The radial basis kernel function is given
as k(x,y) exp ([parallel]x-y[parallel].sup.2]/c) with c = rm
[[sigma].sup.2], where r is a constant to be selected, m is the
dimension of the input space, and [[sigma.sup.2] is the variance of the
data (Mika et al., 1999). If c increases, kernel value k approaches to
1, but there is no problem in eigendecomposition of kernel matrix
because the variance scaling in Equation (6) adjusts the k values. After
testing the monitoring performance for various values of c, we recommend
setting c so that the approximated hyper-dimension is similar to the
dimension of input space. Finding the optimal value of c should be
studied in future work.
When designing the PCA, modified ICA, KPCA and KICA models, we must
determine the number of components. For linear PCA, a cross-validation
method is used (Wold, 1978). For KPCA, we employed the cut-off method
using the average eigenvalue to determine the number of principal
components due to its simplicity and robustness. The number of
independent components of the modified ICA and KICA are set to be the
same as the number of principal components of PCA and KPCA,
respectively.
Simple Example
Let us consider two source variables plotted in Figure 1a
[s.sub.1](k) = 2cos(0.1k) (28)
[s.sub.2](k) = 3sign(sin(0.03k) + 9 cos (0.01k)) (29)
where k = 0,1,...,499. These sources s = [[s.sub.1]
[s.sub.2]].sup.T] are nonlinearly mixed as follows:
[x.sub.i] = 5exp[2(0.7[s.sub.1] + 0.3[s.sub.2])] + [e.sub.i] (30)
[x.sub.2] = [(-0.5[s.sub.1] + 0.8[s.sub.2]).sup.3] + [e.sub.2] (31)
where [e.sub.1] and [e.sub.2] are independent random noise
following N(0,1). The non-linearly mixed signals, x = [[[x.sub.1]
[x.sub.2]].sup.T] = f(S), are shown in Figure 1b. Before applying PCA,
modified ICA, KPCA, and KICA, x is mean centred and variance scaled. In
addition, c = 10 m is used for KPCA and KICA.
[FIGURE 1 OMITTED]
As shown in Figure 1c, PCA cannot recover original signals since it
is a linear method providing just a decorrelated representation of the
data. The recovered signals form KPCA, shown in Figure 1d has a similar
pattern as the PCA solutions. Therefore, KPCA still has a limited
ability to extract underlying hidden source signals even though it can
capture non-linearity theoretically. As illustrated in Figure 1e, the
modified ICA has limited success to recover original signals because of
its linear assumption even though the solutions are improved compared to
PCA and KPCA solutions. In comparison to PCA, KPCA, and modified ICA,
the KICA is able to recover the original source signal patterns with
changing the sign only from the mixed signals (Figure 1(f)). The simple
example clearly demonstrates that the proposed method is very effective
to extract a few dominant essential factors to a much greater extent
than PCA, KPCA, and modified ICA solutions. In the process monitoring,
it is of importance to determine what kind of multivariate data analysis
is used to build the normal operating model from historical data. If we
can extract the underlying hidden factors correctly, they can not only
fit the normal operating model well, but also directly reflect the
process changes caused by process fault. Consequently, KICA is expected
to give a better monitoring performance if it is applied to non-linear
process monitoring.
Tennessee Eastman Process
The chemical process plant has typically non-linear
characteristics, causing the measured data or underlying hidden factor
to be away from Gaussian distribution. The KICA monitoring approach
proposed here was tested for its ability to detect various faults in
simulated data obtained from a well-known benchmark process, the
Tennessee Eastman industrial process. This process consists of five
major units: a reactor, a condenser, a recycle compressor, a separator,
and a stripper. The gaseous four reactants A, C, D and E and the inert B
are injected to the reactor where the liquid products G and H are formed
with a by-product F. All the reactions are irreversible and exothermic.
The details on the process description including control structure are
well explained in Chiang et al. (2001). The process has 41 measured
variables and 12 manipulated variables. Among them, 33 variables listed
in Table 1 are selected for monitoring in this study. The sampling time
was 3 min. Normal operating data including 960 samples were used for
modelling PCA, KPCA, modified ICA, and KICA. To test and compare the
performance of each monitoring method, 20 faults are considered. Table 2
describes the characteristics of all faults. Each fault data has 960
samples and the fault starts from sample 160. The data can be downloaded
from http://brahms. scs.uiuc.edu (Chiang et al., 2001).
All the data were auto-scaled prior to the application of PCA, ICA,
KPCA, and KICA. In this example, c = 500 m is used for KPCA and KICA.
Nine principal components are selected for the PCA by cross-validation
and the same number of independent components is selected. For KPCA and
KICA, 30 whitened vectors are selected where the corresponding relative
eigenvalues [[lambda].sub.i]/sum([[lambda].sub.i]) in feature space are
larger than 0.0001. Among 30 whitened vectors, 11 KPCA and KICA
components are selected from average eigenvalue criterion, respectively.
False alarm rates are calculated from the other normal operation data
and tabulated in Table 3. For 99% control limit, the false alarm rates
of each method are acceptable with the KPCA having slightly greater
values. For the data obtained after the fault occurrence, the percentage
of the samples outside the 99% control limits was calculated in each
simulation and termed as the detection rate.
The fault detection rates of the four multivariate methods, PCA,
ICA, KPCA, and KICA for all 21 faults were computed and summarized in
Table 4. As shown in Table 4, the detection rates of each monitoring
method are almost the same in case of Fault 1, 2, 4, 6, 7, 8, 12, 13,
and 14 since the fault magnitude is so large that they can be detected
up to nearly 100% by even PCA. On the other hand, the detection rates
for Fault 3, 9, and 15 are not considerably higher than 1% for all
methods due to too small fault magnitudes. In Table 4, the maximum
detection rates achieved for Faults 5, 10, 11, 16, 17, 18, 19, and 20
are marked with a bold number. For most faults, the detection rate of
KPCA is higher than that of PCA since the former considers the
nonlinearity of data. If we compare the detection rates between ICA and
PCA, ICA also can detect small events that are difficult for PCA to
detect. Specifically, for Faults 10 and 16, the detection rate of ICA is
more than twice as high as that of PCA. Comparing ICA with KPCA, one
cannot say which is better because their relative performance is
different for each fault case. KPCA outperform ICA in the cases of
Faults 11 and 19 whereas ICA gives higher detection rates for Faults 10,
16, and 20. All things to be considered, the best monitoring performance
is found in the case of KICA. In most cases except Faults 11 and 20, the
detection rate of KICA is considerably higher than that of any other
methods considered in this paper. In particular, the detection rate is
definitely enhanced by KICA in the case of Fault 19. Although KPCA and
ICA show highest detection rate for Fault 11 and 20, respectively, the
difference from the detection rate of KICA is small within just 5%. To
clearly illustrate the superiority of KICA in process monitoring, two
monitoring results in the case of Faults 10 and 19 are shown in Figures
2 and 3, respectively. The 99% control limits are expressed as a dotted
line in those figures. In Figure 2, the SPE chart of PCA and KPCA shows
some false alarms around sample 130. Although PCA and KPCA detect the
Fault 10 from about sample 200, there are lots of samples below the
control limit despite the presence of the fault. In the case of Fault
10, ICA is able to detect the fault more efficiently than PCA and KPCA
without false alarms. However, it is evident that KICA gives the best
monitoring performance to detect the fault, giving no false alarms and a
higher detection rate. Figure 3 also demonstrates that KICA monitoring
charts are able to detect the Fault 19 more efficiently and consistently
than any other methods. The [T.sup.2] and SPE statistics of KICA
monitoring clearly exceed the 99% control limit after sample 160. In
Table 4, one thing to be noted is the [T.sup.2] ability of the proposed
method for detecting faults. For most cases, the detection rate of
[T.sup.2] is considerably improved by the proposed KICA method. It means
the proposed method can extract essential features of a process much
more efficiently than other methods since it can effectively capture
non-linear relationships in process variables while extracting
independent components from multivariate. This result demonstrates that
the proposed method is expected to outperform other methods to diagnose
fault patterns in the feature space.
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
CONCLUSIONS
This paper proposes a new approach to non-linear process monitoring
based on KICA. KICA can efficiently compute independent components in a
high-dimensional feature space by means of non-linear kernel functions.
KICA consists of two steps: KPCA (kernel centring, kernel whitening) and
an iterative procedure of the modified ICA. Therefore, KICA can be
thought as performing ICA in the whitened kernel space generated from
KPCA. The main non-linearity of process data is captured by KPCA and the
additional ICA procedure gives statistically independent components from
the observed data. The proposed method was applied to fault detection in
Tennessee Eastman process and was compared with PCA, modified ICA, and
KPCA. This example demonstrated that the proposed approach can
effectively capture non-linear relationships in process variables and
showed the best performance when used for process monitoring.
The present work shows the superiority of the KICA based process
monitoring; however, KICA has some problems that should be noted. How to
select the appropriate kernel for a given data set. The selection of the
kernel function is important to the proposed method since the degree to
which the non-linear characteristic of a system is captured depends on
this function. Furthermore, determining the optimal number of
independent components in the kernel space and identifying which
variable causes the process fault should be considered in future work.
ACKNOWLEDGEMENTS
This work was supported by the Korea Research Foundation Grant
funded by Korea Government (MOEHRD, Basic Research Promotion Fund)
(KRF-2005-214-D00027, the Texas-Wisconsin Modeling and Control
Consortium, and the Chang Jiang Scholars Program of the Ministry of
Education of China and the Li Ka Shing Foundation.
NOMENCLATURE
a estimated mean of the SPE
b estimated variance of the SPE
C demixing matrix in the whitened feature space
[C.sub.n] normalized demixing matrix in the whitened
feature space
[c.sub.n,i] i-th column vector of [C.sub.n]
[parallel] [c.sub.n,i]
[parallel [L.sub.2] norm of [c.sub.n,i]
c specified constant value
D diag{[[lambda].sub.1],...,[[lambda].sub.p]}
d the number of selected nonzero eigenvalues in
feature space
diag diagonal matrix
E{*} expected value of *
e residual value, z - z
[e.sub.1], [e.sub.2], random noise following N(0,1)
F feature space
f(s) non-linear function
G non-quadratic function
g the first derivative of G
g' the second derivative of G
H [[a.sub.1],[a.sub.2],...,[a.sub.d]]
h [2.sup.a2]/b
I identity matrix
[I.sub.p] p-dimensional identity matrix
f (y) negentropy of y
K gram kernel matrix
[[K].sub.ij] i-th row and j-th column element of
K, [K.sub.ij] = k([x.sub.i], [x.sup.j])
K mean centred kernel matrix
[K.sub.scl] mean centred and variance scaled kernel matrix
[k.sub.t] kernel vector of test data
[k.sub.t] mean centred kernel vector of test data
[[k.sub.i]j] j-th element of kt
[k.sub.scl] mean centred and variance scaled vector of
test data
m dimension of observed data
N the number of observations
N(0,1) Gaussian distribution with zero mean and unit
variance
P whitening matrix in feature space
p the number of extracted independent components
[R.sup.m] m-dimensional input space
r specified constant value
[S.sup.F] covariance matrix in feature space
s source vector
[s.sub.j] j-th element of s sum summation
SPE SPE statistic (squared prediction error)
[T.sup.2] Hotelling's [T.sup.2] statistic (Mahalonobis
distance)
trace(*) summed value of diagonal element of matrix *
V [[v.sub.1],[v.sub.2],...,[v.sub.d]]
[v.sub.j] j-th eigenvector of [C.sup.F]
v Gaussian variable of zero mean and unit
variance
[W.sup.F] demixing matrix in feature space
x observed data
[x.sub.k] observed data at sample k
[x.sub.t] test data vector
[x.sup.j] j-th element of x
y independent components vector in feature space
[y.sub.n] normalized independent components vector in
feature space
[y.sub.n,i] i-th element of [y.sub.n]
z whitened KPCA score vector
z estimated value of Z
[z.sub.t] whitened KPCA score vector of test data
0 zero matrix
<x.sub.i], [x.sup.j] inner product of xi and [x.sup.j]
Greek Symbols
[alpha] eigenvector of K
[[alpha].sub.j] j-th eigenvector of [K.sub.scl]
[[beta].sub.0] specified constant value
[[beta].sub.1] specified constant value
[X.sup.2] chi-square distribution
[PHI] high dimensional non-linear mapping function
[PHI](x) non-linearly mapped data
[THETA] non-linearly mapped data matrix
[[THETA]([x.sub.1]),...,[PHI](x.sub.N)] N
[[LAMBDA].sub.j] diag([[lambda].sub.1]
[[lambda].sub.2],...[[lambda].sub.d])
[[lambda].sub.j] the j-th eigenvalue of [K.sub.scl]
[micro] b/2a
[[sigma].sup.2] the variance of the data
Superscripts
T transpose
-1 inverse matrix
Manuscript received February 22, 2006; revised manuscript received
July 14, 2006; accepted for publication April 20, 2007.
REFERENCES
Albazzaz, H. and X. Z. Wang, "Statistical Process Control
Charts for Batch Operations Based on Independent Component
Analysis," Ind. Eng. Chem. Res. 43(21), 6731-6741 (2004).
Bakshi, B. R., "Multiscale PCA with Application to
Multivariate Statistical Process Monitoring," AIChE J. 44(7),
1596-1610 (1998).
Bach, F. R. and M. I. Jordan, "Kernel Independent Component
Analysis," J. Machine Learning Res. 3, 1-48 (2002).
Chiang, L. H., E. L. Russell and R. D. Braatz, "Fault
Detection and Diagnosis in Industrial Systems," Springer, London
(2001).
Dong, D. and T. J. McAvoy, "Nonlinear Principal Component
Analysis Based on Principal Curves and Neural Networks," Comput.
Chem. Eng. 20(1), 65-78 (1996).
Dunia, R., S. J. Qin, T. F. Edgar and T. J. McAvoy,
"Identification of Faulty Sensors Using Principal Component
Analysis," AIChE J. 42(10), 277-2812 (1996).
Haykin, S., "Neural Networks," Prentice Hall International, Inc., New Jersey (1999).
Hiden, H. G., M. J. Willis, M. T. Tham and G. A. Montague,
"Nonlinear Principal Components Analysis Using Genetic
Programming," Comput. Chem. Eng. 23, 413-425 (1999).
Hyvarinen, A., "Fast and Robust Fixed-Point Algorithms for
Independent Component Analysis," IEEE Trans. Neural Networks 10,
626 -634 (1999).
Hyvarinen, A. and E. Oja, "Independent Component Analysis:
Algorithms and Applications," Neural Networks 13(4-5), 411-430
(2000).
Hyvarinen, A., J. Karhunen and E. Oja, "Independent Component
Analysis," John Wiley & Sons, Inc., New York (2001).
Kano, M., S. Tanaka, S. Hasebe, I. Hashimoto and H. Ohno,
"Monitoring Independent Components for Fault Detection," AIChE
J. 49(4), 969-976 (2003).
Kano, M., S. Hasebe, I. Hashimoto and H. Ohno, "Evolution of
Multivariate Statistical Process Control: Independent Component Analysis
and External Analysis," Comput. Chem. Eng. 28(6-7), 1157-1166
(2004).
Kocsor, A. and J. Csirik, "Fast Independent Component Analysis
in Kernel Feature Spaces," Proc. SOFSEM 2001, L. Pacholski and P.
Ruzicka, Eds., Nov. 24-Dec. 1, Springer Verlag (2001), pp. 271-281.
Kocsor, A. and L. Toth, "Kernel-Based Feature Extraction with
a Speech Technology Application," IEEE Trans. Signal Process.
52(8), 2250-2263 (2004).
Kramer, M. A., "Nonlinear Principal Component Analysis Using
Autoassociative Neural Networks," AIChE J. 37(2), 233-243 (1991).
Kresta, J., J. F. MacGregor and T. E. Marlin, "Multivariate
Statistical Monitoring of Process Operating Performance," Can. J.
Chem. Eng. 69, 35-47 (1991).
Ku, W., R. H. Storer and C. Georgakis, "Disturbance Detection
and Isolation by Dynamic Principal Component Analysis," Chemometr.
Intell. Lab. 30, 179-196 (1995).
Lee, J.-M., C. K. Yoo and I.-B. Lee, "New Monitoring Technique
with ICA Algorithm in Wastewater Treatment Process," Water Sci.
Technol. 47(12), 49-56 (2003).
Lee, J.-M., C. K. Yoo, S. W. Choi, P. A. Vanrolleghem and I.-B.
Lee, "Nonlinear Process Monitoring Using Kernel Principal Component
Analysis," Chem. Eng. Sci. 59, 223-234 (2004a).
Lee, J.-M., C. K. Yoo and I.-B. Lee, "Statistical Process
Monitoring with Independent Component Analysis," J. Process Contr.
14, 467-485 (2004b).
Lee, J.-M., C. K. Yoo and I.-B. Lee, "Fault Detection of Batch
Processes Using Multiway Kernel Principal Component Analysis,"
Comput. Chem. Eng. 28, 1837-1847 (2004c).
Lee, J.-M., S. J. Qin and I.-B. Lee, "Fault Detection and
Diagnosis of Multivariate Process Based on Modified Independent
Component Analysis," AIChE J. 52(10), 3501-3514 (2006).
Li, W., H. H. Yue, S. V. Cervantes and S. J. Qin, "Recursive PCA for Adaptive Process Monitoring," J. Process Contr. 10, 471-486
(2000).
Martin, E. B. and A. J. Morris, "Non-Parametric Confidence
Bounds for Process Performance Monitoring Charts," J. Process
Contr. 6(6), 349-358 (1996).
Mika, S., B. Scholkopf, A. J. Smola, K.-R. Muller, M. Scholz and G.
Ratsch, "Kernel PCA and De-Noising in Feature Spaces," Proc.
Advances Neural Inform. Processing Syst. II., 536-542 (1999).
Nomikos, P. and J. F. MacGregor, "Multivariate SPC Charts for
Monitoring Batch Processes," Technometrics 37, 41-59 (1995).
Qin, S. J., "Statistical Process Monitoring: Basics and
Beyond," J. Chemomtr. 17, 480-502 (2003).
Romdhani, S., S. Gong and A. Psarrou, "A Multi-View Nonlinear
Active Shape Model Using Kernel PCA," Proc. British Machine Vision
Conf., Nottingham, UK, 483-492 (1999).
Scholkopf, B., A. J. Smola and K. Muller, "Nonlinear Component
Analysis as a Kernel eigenvalue Problem," Neural Computation,
10(5), 1299-1399 (1998).
Scholkopf, B., S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Muller,
G. Ratsch and A. J. Smola, "Input Space Versus Feature Space in
Kernel-Based Methods," IEEE Trans. Neural Networks 10(5), 1000-1016
(1999).
Tan, S. and M. L. Mavrovouniotis, "Reducing Data
Dimensionality through Optimizing Neural-Network Inputs," AIChE J.
41(6), 1471-1480 (1995).
Tates, A. A., D. J. Louwerse, A. K. Smilde, G. L. M. Koot and H.
Berndt, "Monitoring a PVC Batch Process with Multivariate
Statistical Process Control Charts," Ind. Eng. Chem. Res. 38,
4769-4776 (1999).
Wise, B. M. and N. B. Gallagher, "The Process Chemometrics
Approach to Process Monitoring and Fault Detection," J. Process
Contr. 6(6), 329-348 (1996).
Wold, S., "Cross-Validatory Estimation of Components in Factor
and Principal Components Models," Technometrics 20, 397-405 (1978)
Yang, J., X. Gao, D. Zhang and J. Yang, "Kerenel ICA: An
Alternative Formulation and Its Application to Face Recognition,"
Pattern Recognition 38(10), 1784-1787 (2005).
Jong-Min Lee (1), S. Joe Qin (1)* and In-Beum Lee (2)
(1.) Department of Chemical Engineering, The University of Texas at
Austin, Austin, TX, U.S.A. 78712
(2.) Department of Chemical Engineering, Pohang University of
Science and Technology, San 31 Hyoja-Dong, Pohang, 790-784, Korea
* Author to whom correspondence may be addressed. E-mail address:
qin@che.utexas.edu
Table 1. Monitored variables in the Tennessee Eastman process
No. Variables
1 A feed (stream 1)
2 D feed (stream 2)
3 E feed (stream 3)
4 Total feed (stream 4)
5 Recycle flow (stream 8)
6 Reactor feed rate (stream 6)
7 Reactor pressure
8 Reactor level
9 Reactor temperature
10 Purge rate (stream 9)
11 Product separator temperature
12 Product separator level
13 Product separator pressure
14 Product separator underflow (stream 10)
15 Stripper level
16 Stripper pressure
17 Stripper underflow (stream 11)
18 Stripper temperature
19 Stripper steam flow
20 Compressor work
21 Reactor cooling water outlet temperature
22 Separator cooling water outlet temperature
23 D feed flow valve (stream 2)
24 E feed flow valve (stream 3)
25 A feed flow valve (stream 1)
26 Total feed flow valve (stream 4)
27 Compressor recycle valve
28 Purge valve (stream 9)
29 Separator pot liquid flow valve (stream 10)
30 Stripper liquid product flow valve (stream 11)
31 Stripper steam valve
32 Reactor cooling water flow
33 Condenser cooling water flow
Table 2. Process fault descriptions for the Tennessee Eastman process
No. Description Type
1 A/C feed ratio, B composition constant (stream 4) Step
2 B composition, A/C ratio constant (stream 4) Step
3 D feed temperature (stream 2) Step
4 Reactor cooling water inlet temperature Step
5 Condenser cooling water inlet temperature Step
6 A feed loss (stream 1) Step
7 C header pressure loss--reduced availability Step
(stream 4)
8 A, B, C feed composition (stream 4) Random
variation
9 D feed temperature (stream 2) Random
variation
10 C feed temperature (stream 4) Random
variation
11 Reactor cooling water inlet temperature Random
variation
12 Condenser cooling water inlet temperature Random
variation
13 Reaction kinetics Slow drift
14 Reactor cooling water valve Sticking
15 Condenser cooling water valve Sticking
16~20 Unknown
Table 3. False alarm rates (%) of each method in the Tennessee
Eastman process against 99% control limit
Modified
PCA ICA KPCA KICA
[T.sup.2] SPE [T.sup.2] SPE [T.sup.2] SPE [T.sup.2] SPE
0.5 0.8 0.2 0.8 1.78 3.1 0.33 1.37
Table 4. Fault detection rates of each method in the Tennessee
Eastman process
Faults PCA Modified ICA
[T.sup.2] SPE [T.sup.2] SPE
1 99 100 100 100
2 98 96 98 98
3 2 1 1 1
4 6 100 65 96
5 24 18 24 24
6 99 100 100 100
7 42 100 100 100
8 97 89 97 98
9 1 1 1 2
10 31 17 70 64
11 21 72 43 66
12 97 90 98 97
13 93 95 95 94
14 81 100 100 100
15 1 2 1 2
16 14 16 76 73
17 74 93 87 94
18 89 90 90 90
19 0 29 25 29
20 32 45 70 66
Faults KPCA KICA
[T.sup.2] SPE [T.sup.2] SPE
1 100 100 100 100
2 98 98 98 98
3 4 5 1 3
4 9 100 81 100
5 27 25 25 28
6 99 100 100 100
7 100 100 100 100
8 97 96 97 98
9 4 4 1 3
10 43 51 81 78
11 24 81 58 77
12 98 97 99 99
13 94 95 95 95
14 79 100 100 100
15 8 6 3 5
16 30 52 77 87
17 74 95 91 97
18 90 90 89 91
19 3 49 70 85
20 41 52 50 65