文章基本信息

标题：Fault detection of non-linear processes using Kernel independent component analysis.
作者：Lee, Jong-Min ; Joe Qin, S. ; Lee, In-Beum 等
期刊名称：Canadian Journal of Chemical Engineering
印刷版ISSN：0008-4034
出版年度：2007
期号：August
语种：English
出版社：Chemical Institute of Canada
摘要：Multivariate statistical process monitoring approaches based on principal component analysis (PCA) have been developed to extract useful information from a large amount of chemical process data and to detect and identify various faults in an abnormal operating situation (Kresta et al., 1991; Nomikos and MacGregor, 1995; Ku et al., 1995; Wise and Gallagher, 1996; Dong and McAvoy, 1996; Dunia et al., 1996; Bakshi, 1998; Li et al., 2000; Chiang et al., 2001; Qin, 2003). PCA is a dimensionality reduction technique that handles high dimensional, noisy and highly correlated data. It divides data information into a systematic part that contains the most variation in the data and a noisy part having least variance. Two statistics, represented by Mahalanobis and Euclidean distances, are used for process monitoring to detect pattern changes of the process variation in the model and residual subspaces, respectively. However, PCA performs poorly due to its use of second order statistics and linearity assumption when it is applied to industrial chemical process data having non-Gaussian and nonlinear characteristics.

Fault detection of non-linear processes using Kernel independent component analysis.

Lee, Jong-Min ; Joe Qin, S. ; Lee, In-Beum 等

INTRODUCTION

Multivariate statistical process monitoring approaches based on principal component analysis (PCA) have been developed to extract useful information from a large amount of chemical process data and to detect and identify various faults in an abnormal operating situation (Kresta et al., 1991; Nomikos and MacGregor, 1995; Ku et al., 1995; Wise and Gallagher, 1996; Dong and McAvoy, 1996; Dunia et al., 1996; Bakshi, 1998; Li et al., 2000; Chiang et al., 2001; Qin, 2003). PCA is a dimensionality reduction technique that handles high dimensional, noisy and highly correlated data. It divides data information into a systematic part that contains the most variation in the data and a noisy part having least variance. Two statistics, represented by Mahalanobis and Euclidean distances, are used for process monitoring to detect pattern changes of the process variation in the model and residual subspaces, respectively. However, PCA performs poorly due to its use of second order statistics and linearity assumption when it is applied to industrial chemical process data having non-Gaussian and nonlinear characteristics.

To solve the problem posed by non-linear data, some nonlinear PCA approaches have been developed. Kramer (1991) developed a non-linear PCA based on auto-associative neural networks having five layers (input, mapping, bottleneck, demapping and output layers). Dong and McAvoy (1996) proposed a non-linear PCA based on principal curves and neural networks and applied it to non-linear process monitoring. After obtaining the associated scores and the corrected data using the principal curve method, they used a neural network model to map the original data into the corresponding scores and to map these scores into the original variables. Alternative non-linear PCA methods based on genetic programming (Hiden et al., 1999) and input-training neural networks (Tan and Mavrovouniotis, 1995) have been also developed. However, most existing non-linear PCA approaches are based on neural networks, thus, a non-linear optimization problem has to be solved to compute the principal components and the number of principal components must be specified in advance before training the neural networks. Recently, Lee et al. (2004a) proposed a new non-linear process monitoring technique using kernel PCA (KPCA) to monitor continuous process and demonstrated its superiority to the PCA monitoring method. KPCA is to first map the input space into a feature space via a non-linear map, which makes data structure more linear, and then to extract the principal components in that feature space. One can avoid the need for both performing the non-linear mappings and computing inner products in the feature space by introducing a kernel function. Compared to other non-linear PCA methods, KPCA has the main advantages that no non-linear optimization is involved, it essentially requires only linear algebra, and the number of principal components need not be specified prior to modelling.

Besides the linear assumption, conventional PCA has a limitation to extract useful information from observed data because it is a second-order method, considering only mean and variance of the data. It extracts only uncorrelated components, not independent components, therefore, giving limited representation for non-Gaussian data, which is typical in industrial data. Hence, a method is needed that can take into account all higher-order statistics of observed data and make the latent components independent. Recently, several monitoring methods based on independent component analysis (ICA) have been proposed in order to improve monitoring performance (Kano et al., 2003, 2004; Lee et al., 2003, 2004b, 2006; Albazzaz and Wang, 2004). ICA decomposes observed data into linear combinations of statistically independent components. In comparison to PCA, ICA involves higher-order statistics, i.e., not only decorrelates the data but also reduces higher order statistical dependencies, making the distribution of the projected data as independent as possible and thus giving more useful information from observed data. However, ICA-based linear projection is also inadequate to represent the data with a non-linear structure (Yang et al., 2005). In this paper, a new non-linear process monitoring technique based on kernel independent component analysis (KICA) is proposed. KICA is an emerging non-linear feature extraction technique that formulates independent component analysis (ICA) in the kernel-induced feature space (Kocsor and Csirik, 2001; Bach and Jordan, 2002; Kocsor and Toth, 2004; Yang et al., 2005). The basic idea of KICA is to non-linearly map the data into a feature space using KPCA and to extract useful information to further perform ICA in the KPCA feature space. In this paper, the KICA algorithm is based on the formalism presented in Yang et al. (2005). However, the algorithm is modified to make it suitable for the purpose of process monitoring. It is composed of two steps: KPCA (kernel centring, kernel whitening) and iterative procedure of the modified ICA suggested by Lee et al. (2006). The paper is organized as follows. The KICA algorithm is explained in the next section, followed by its application to non-linear process monitoring. Then, the superiority of KICA and its application to process monitoring is illustrated through a simple multivariate process and the Tennessee Eastman Process. Finally, conclusions are given.

KICA

In this section, KICA is derived to extract statistically independent components that also capture the non-linearity of the observed data. The idea is to non-linearly map the data into a feature space where the data has a more linear structure. Then we perform modified ICA in feature space and make the extracted components of data as independent as possible. As in the central algorithm of Yang et al. (2005), we will use "kernel tricks" to extract whitened principal components in high dimensional feature space and ultimately convert the problem of performing modified ICA in feature space into a problem of implementing modified ICA in the KPCA transformed space. In this paper, detailed algorithms about KPCA and ICA are not considered. The authors are referred to some papers about KPCA and ICA (Scholkopf et al., 1998, 1999; Hyvarinen and Oja, 2000; Hyvarinen et al., 2001).

ICA Model in Feature Space

The non-linear data structure in the original space is more likely linear via high-dimensional non-linear mapping (Haykin, 1999). This higher-dimensional linear space is referred to as the feature space (F). Consider a non-linear mapping [PHI] : [R.sub.m] [right arrow] F (feature space) (1)

We first map the input space into a feature space via non-linear mapping and make the covariance structure of mapped data be an identity matrix to make the problem of ICA estimation simpler and better conditioned (the detail algorithm will be explained later). Then, our objective is to find a linear operator [W.sup.F] in feature space F to recover the independent components from [PHI](x) by the following linear transformation:

y = [W.sup.F] x [PHI](x) (2)

where E{[PHI](x) [PHI][(x).sup.T]} = I.

Whitening of Data in Feature Space Using KPCA

The goal of this step is to map the input space into a feature space via non-linear mapping and then to extract whitened principal components in that feature space such that their covariance structure is an identity matrix. Consider the observed data [x.sub.k] [member of] [R.sup.m], k=1,...,N where N is the number of observations. Using the non-linear mapping [PHI]:[R.sup.m] [right arrow] F, the observed data in the original space are extended into the high dimensional feature space, [PHI]([x.sub.k]) [member] F. Then, the covariance matrix in feature space will be

[S.sup.F] = 1/N [N summation over (k=1)] [PHI]([x.sub.k]) [PHI][([x.sub.k]).sup.T] (3)

where [PHI](xk) for k=1,...,N is assumed to be zero-mean and unit variance, which will be explained later. Let [[THETA] = [[PHI](x1),..., [PHI]([x.sub.N])], then [S.sup.F] can be expressed by [S.sup.F] = 1/N [[THETA][THETA].sup.T]. We can obtain principal components in feature space by finding eigen-vectors of [S.sup.F], which is straightforward. Instead of eigen-decomposing SF directly, we can alternatively find principal components using the "kernel tricks." Defining an N x N gram kernel matrix K by

[[K].sub.ij] = [K.sub.ij] = <[PHI]([x.sub.i]),[PHI]([x.sub.j])> = k ([x.sub.i],[x.sub.j])

we have K = [THETA]T[THETA]. The use of a kernel function k([x.sub.i], [x.sub.j]) allows us to compute inner products in F without performing nonlinear mappings. That is, we can avoid performing the nonlinear mappings and computing inner products in the feature space by introducing a kernel function of form k(x, y)= <[PHI](x), [PHI](y)> (Scholkopf et al., 1998; Romdhani et al., 1999). Some of the most widely used kernel functions are radial basis

kernel: k(x,y) = exp ([parallel]x-y[[parallel]2/c), polynomial kernel: k(x, y) = <x, y>r,

sigmoid kernel: k(x, y) = tanh([[beta].sub.o] <x, y> + [[beta].sub.1]) where c, r, [[beta].sub.0], and [[beta].sub.1] have to be specified. A specific choice of kernel function implicitly determines the mapping [PHI] and the feature space F. From the kernel matrix K, the mean centring and variance scaling of [PHI]([x.sub.k]) in high-dimensional space can be performed as follows. The mean centred kernel matrix K can be easily obtained from

K = K - [1.sub.N] K - [K1.sub.N] + [1.sub.N] [K1.sub.N] (5) where [1.sub.N] = 1/N [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] [member of] [R.sup.NxN] (Scholkopf et al., 1998). Also, the variance scaling of kernel matrix can be done by the following equation (Lee et al., 2004c)

[K.sub.scl] = K/trace(K)/N (6)

If we apply eigenvalue decomposition to [K.sub.scl]

[lambda] a = [K.sub.scl]a (7)

we can obtain the orthonormal eigenvectors [a.sub.1], [a.sub.2],...,[a.sub.d] of [K.sub.scl] corresponding to d largest positive eigenvalues [[lambda].sub.1] [greater than or equal to] [[lambda].sub.2] [greater than or equal to] ... [greater than or equal to] [[lambda].sub.d] Theoretically, the number of nonzero eigenvalues is equal to the hyper-dimension. In this paper, we empirically determine the hyper-dimension as the number of eigenvalues satisfying

[[lambda].sub.i]/sum([lambda].sub.i] > 0.0001. Then, the d largest positive eigenvalues of SF

are [[lambda].sub.1]/N , [[lambda].sub.2]/N, ..., [lambda].sub.d]/N and the associated orthonormal eigenvectors [v.sub.1],[v.sub.2],...,[v.sub.d] can be expressed as

[v.sub.j] = 1/[square root of ([[lambda].sub.j] [THETA]aj j = 1,...,d (8)

The eigenvector matrix V = [[v.sub.1,] [v.sub.2],...,?[v.sub.d]] can be briefly expressed by the following matrix

V = [THETA]H[[LAMBDA].sup.-1]/2 (9)

where H = [[a.sub.1], [a.sub.2],...,[a.sub.d]] and [LAMBDA] = diag([[lambda].sub.1] [greater than or equal to] [[lambda].sub.2] [greater than or equal to] ... [greater than or equal to][[lambda].sub.d]). Then, V makes the covariance matrix [S.sub.F] be a diagonal matrix

[V.sup.T] [S.sup.F] V = diag ([[lambda].sub.1]/N , [[lambda].sub.2]/N, ..., [lambda].sub.d]/N) = 1/N [LAMBDA] (10)

Let

P = V [(1/N [LAMBDA]).sup.-1/2 = [square root of (N[THETA]H[[LAMBDA].sup.-1] (11)

then

[P.sup.T] [S.sup.F] P = I (12)

Thus, we obtain the whitening matrix P and the mapped data in the feature space can be whitened by the following transformation

z = [P.sup.T] [PHI] (x) (13)

In detail,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

In fact, z is the same as the whitened KPCA score vector satisfying E{z[z.sup.T]} = I

Further Processing Using the Modified ICA

The goal of this step is to extract independent components from the KPCA-transformed space. In the central part of this step, we applied the modified ICA of Lee et al. (2006), instead of the original FastICA algorithm used in Yang et al. (2005). To be suitable for process monitoring, Lee et al. (2006) proposed the modified ICA to extract some dominant independent components from observed data. Compared to conventional ICA (FastICA), the modified ICA algorithm can extract a few dominant factors needed for process monitoring, attenuate high computational load, consider the ordering of independent components, and give a consistent solution. From z [member of] [R.sup.d], the modified ICA can find p (p [less than or equal to] d) dominant independent components, y, satisfying E{[yy.sup.T]} = D =diag{[lambda].sub.1],...[lambda].sub.p]} by maximizing the non-Gaussianity of the elements of y, using

y [C.sup.T.sub.z] (15)

where C [member of] [R.sup.dxp] and [C.sup.T] C = D. The requirement E{[yy.sup.T]} = D reflects that the variance of each element of y is the same as that of scores in KPCA. Like PCA, the modified ICA can rank independent components according to their variances. If we define the normalized independent components, [y.sub.n], as:

[y.sub.n] = [D.sup.-1] y = [D.sup.-1/2] [C.sup.T] z = [C.sub.n.] [T.sub.z ] (16)

it is clear that [D.sup.-1/2] [C.sup.T] = [C.sup.T.sub.n], [C.sup.T.sub.n], [C.sub.n] = I, and E{[y.sub.n][y.sup.T.sub.n]} = I Although z is not independent, it can be a good initial value of yn since it has removed statistical dependencies of data up to the second-order (mean and variance). Therefore, we can set the initial matrix of [C.sub.n.sup.T] to be

[C.sub.n.sup.T] = [I.sub.p] : 0] (17)

where [I.sub.p] is the p-dimensional identity matrix and 0 is p x (d - p) zero matrix.

To calculate [C.sub.n], each column vector [c.sub.n,i] is initialized and then updated so that i-th independent component [y.sub.n,i] = ([c.sub.n,i])T z can have the most non-Gaussianity. The objective function that yn,i for i = 1 ,..., p are statistically independent is equivalent to maximizing the non-Gaussianity (Hyvarinen and Oja, 2000). Hyvarinen and Oja (2000) introduced a flexible and reliable approximation of the negentropy as a measure of non-Gaussianity:

J(y) [approximately equal to] [[E{G(y)} - E{G(v)}].sup.2] (18)

where y is assumed to be of zero mean and unit variance, v is a Gaussian variable of zero mean and unit variance, and G is any non-quadratic function. The non-quadratic function G is described in detail in the paper of Hyvarinen (1999). The detailed algorithms are given below:

1. Choose p, the number of independent components to estimate. Set counter i [left arrow] 1.

2. Take the initial vector [c.sub.n,i] to be the i-th row of the matrix in Equation (17).

3. Maximize the approximated negentropy: Let [c.sub.n,i] [left arrow] E{zg([c.sub.n,i] [T.sub.z]} [c.sub.n,i], where g is the first derivative and g' is the second derivative of G

4. Do the orthogonalization in order to exclude the information contained in the solutions already found:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

5. Normalize [c.sub.n,i] [left arrrow] [c.sub.n,i]/ [parallel][c.sub.n,i][parallel]

6. If [c.sub.n,i] has not converged, go back to Step 3. 7. If [c.sub.n,i] has converged, output the vector [c.sub.n,i]. Then, if i [less than or equal to] p set i [left arrow] i + 1 and go back to Step 2.

Once [C.sub.n] is found, the kernel independent components are obtained from

y = [D.sup.1/2] [C.sub.n.sup.T] z = [D.sup.1/2] [C.sub.n.sup.T] [P.sup.T] [PHI] (x) (19)

This equation is the final realization of Equation (2).

ON-LINE MONITORING STRATEGY OF KICA

The monitoring strategy based on KICA is the extension of the modified ICA based monitoring in the feature space. To detect the systematic part change within the KICA model, the Hotelling's [T.sup.2] statistic, the sum of the normalized squared scores, is defined as follows:

[T.sup.2] = [y.sup.T] [D.sup.-1] y (20)

The upper control limit for [T.sup.2] cannot be determined using the F-distribution because y does not follow Gaussian distribution. In this paper, kernel density estimation is used to define the control limit (Martin and Morris, 1996; Lee et al., 2004a).

To detect the non-systematic part change in the residual of the KICA model, the SPE-statistic is defined as follows:

SPE - [e.sup.T] e = [(z-z).sup.T][(z-z) = [z.sup.T] (I - [C.sub.n][C.sub.n.sup.T]z (21)

where e = z - z and z can be found from:

z - [C.sub.n][D.sup.-1/2] y = [C.sub.n][C.sub.n] T.sup.z] (22)

If the majority of non-Gaussianity is included in the extracted independent components, the residual subspace will contain mostly Gaussian noise that can be treated as normal distribution. Assuming that the prediction errors are normally distributed, the control limits for the SPE are calculated from the following weighted [X.sup.2] distribution

SPE ~ [micro][X.sup.2]n [micro] = b/2a, h = 2[a.sup.2]/b (23)

where a and b are the estimated mean and variance of the SPE from the normal operating data, respectively (Nomikos and MacGregor, 1995; Tates et al., 1999).

Based on the [T.sup.2] statistic and the SPE statistic, the proposed KICA monitoring method can be summarized as follows. The KICA model is first made using historical data collected during normal operation. Future process behaviour is then compared against this "normal" or "in control" representation.

Developing the normal operating condition (NOC) KICA model

a) Acquire normal operating data and normalize the data using the mean and standard deviation of each variable.

b) From the scaled normal operating data [x.sub.k] [member of ] [R.sup.m], k = 1,...,N, compute the kernel matrix K [member of ] [R.sup.NxN] using Equation (4).

c) Carry out mean centring and variance scaling in the feature space, obtaining [K.sub.scl] using Equation (5) and Equation (6).

d) Solve the eigenvalue problem [lambda]a = [K.sub.scl]a. Here, we extracted d eigenvectors and the corresponding eigenvalues ?i satisfying

[[lambda].sub.i]/sum([[lambda].sub.i]) > 0 0001.

e) For normal operating data x, extract a non-linear component via Equation (14).

f) Extract independent components using modified ICA from Equation (19).

g) Calculate [T.sup.2] and SPE using Equation (20) and Equation (21), respectively.

h) Determine the control limits of the [T.sup.2] and SPE charts.

On-line monitoring

a) Obtain new data for each sample and scale it with the mean and variance obtained at step 1 of the modelling procedure.

b) From the m-dimensional scaled test data [x.sub.i] [member of] [R.sup.m], compute the kernel vector [k.sub.t] [member of] [R.sup.1xN] by [[k.sub.t]j] = [[k.sub.t]([x.sub.t], [x.sub.j])] where [x.sub.j] is the normal operating data [x.sub.j] [member of] [R.sup.m], j = 1,..., N.

c) Mean centre the test kernel vector kt as follows:

[k.sub.t] = [k.sub.t] - [1.sub.t] K - [k.sub.t] [1.sub.N] + [1.sub.t][[K1.sub.N] (24)

where K and [1.sub.N] are obtained from step 3 of the modelling procedure and [1.sub.t] = 1/N [1,...,1] [member of] [R.sup.1xN].

d) Variance scaling:

[K.sup.tscl] = [k.sub.t]/ trace(K)/N (25)

e) For the test data xt, extract a non-linear component via

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (26)

f) Further processing using modified ICA

y = [D.sup.1/2] [C.sub.n] [T.sub.zt] (27)

g) Calculate the monitoring statistics ([T.sup.2] and SPE) of the test data.

h) Monitor whether [T.sup.2] or SPE exceeds its control limit calculated in the modelling procedure.

CASE STUDIES

In this section, the performance of KICA is illustrated through a simple example and Tennessee Eastman process case study. When the proposed KICA is applied to the observed data, the selection of the kernel function is important since the degree to which the non-linear characteristic of a system is captured depends on this function; however, the general question of how to select the ideal kernel for a given monitoring process still remains an open problem (Mika et al., 1999; Lee et al., 2004a). By testing the monitoring performance of a range of kernel functions in various systems, we found that the radial basis kernel is the most appropriate for our non-linear process monitoring examples. The radial basis kernel function is given

as k(x,y) exp ([parallel]x-y[parallel].sup.2]/c) with c = rm [[sigma].sup.2], where r is a constant to be selected, m is the dimension of the input space, and [[sigma.sup.2] is the variance of the data (Mika et al., 1999). If c increases, kernel value k approaches to 1, but there is no problem in eigendecomposition of kernel matrix because the variance scaling in Equation (6) adjusts the k values. After testing the monitoring performance for various values of c, we recommend setting c so that the approximated hyper-dimension is similar to the dimension of input space. Finding the optimal value of c should be studied in future work.

When designing the PCA, modified ICA, KPCA and KICA models, we must determine the number of components. For linear PCA, a cross-validation method is used (Wold, 1978). For KPCA, we employed the cut-off method using the average eigenvalue to determine the number of principal components due to its simplicity and robustness. The number of independent components of the modified ICA and KICA are set to be the same as the number of principal components of PCA and KPCA, respectively.

Simple Example

Let us consider two source variables plotted in Figure 1a

[s.sub.1](k) = 2cos(0.1k) (28)

[s.sub.2](k) = 3sign(sin(0.03k) + 9 cos (0.01k)) (29)

where k = 0,1,...,499. These sources s = [[s.sub.1] [s.sub.2]].sup.T] are nonlinearly mixed as follows:

[x.sub.i] = 5exp[2(0.7[s.sub.1] + 0.3[s.sub.2])] + [e.sub.i] (30)

[x.sub.2] = [(-0.5[s.sub.1] + 0.8[s.sub.2]).sup.3] + [e.sub.2] (31)

where [e.sub.1] and [e.sub.2] are independent random noise following N(0,1). The non-linearly mixed signals, x = [[[x.sub.1] [x.sub.2]].sup.T] = f(S), are shown in Figure 1b. Before applying PCA, modified ICA, KPCA, and KICA, x is mean centred and variance scaled. In addition, c = 10 m is used for KPCA and KICA.

[FIGURE 1 OMITTED]

As shown in Figure 1c, PCA cannot recover original signals since it is a linear method providing just a decorrelated representation of the data. The recovered signals form KPCA, shown in Figure 1d has a similar pattern as the PCA solutions. Therefore, KPCA still has a limited ability to extract underlying hidden source signals even though it can capture non-linearity theoretically. As illustrated in Figure 1e, the modified ICA has limited success to recover original signals because of its linear assumption even though the solutions are improved compared to PCA and KPCA solutions. In comparison to PCA, KPCA, and modified ICA, the KICA is able to recover the original source signal patterns with changing the sign only from the mixed signals (Figure 1(f)). The simple example clearly demonstrates that the proposed method is very effective to extract a few dominant essential factors to a much greater extent than PCA, KPCA, and modified ICA solutions. In the process monitoring, it is of importance to determine what kind of multivariate data analysis is used to build the normal operating model from historical data. If we can extract the underlying hidden factors correctly, they can not only fit the normal operating model well, but also directly reflect the process changes caused by process fault. Consequently, KICA is expected to give a better monitoring performance if it is applied to non-linear process monitoring.

Tennessee Eastman Process

The chemical process plant has typically non-linear characteristics, causing the measured data or underlying hidden factor to be away from Gaussian distribution. The KICA monitoring approach proposed here was tested for its ability to detect various faults in simulated data obtained from a well-known benchmark process, the Tennessee Eastman industrial process. This process consists of five major units: a reactor, a condenser, a recycle compressor, a separator, and a stripper. The gaseous four reactants A, C, D and E and the inert B are injected to the reactor where the liquid products G and H are formed with a by-product F. All the reactions are irreversible and exothermic. The details on the process description including control structure are well explained in Chiang et al. (2001). The process has 41 measured variables and 12 manipulated variables. Among them, 33 variables listed in Table 1 are selected for monitoring in this study. The sampling time was 3 min. Normal operating data including 960 samples were used for modelling PCA, KPCA, modified ICA, and KICA. To test and compare the performance of each monitoring method, 20 faults are considered. Table 2 describes the characteristics of all faults. Each fault data has 960 samples and the fault starts from sample 160. The data can be downloaded from http://brahms. scs.uiuc.edu (Chiang et al., 2001).

All the data were auto-scaled prior to the application of PCA, ICA, KPCA, and KICA. In this example, c = 500 m is used for KPCA and KICA. Nine principal components are selected for the PCA by cross-validation and the same number of independent components is selected. For KPCA and KICA, 30 whitened vectors are selected where the corresponding relative eigenvalues [[lambda].sub.i]/sum([[lambda].sub.i]) in feature space are larger than 0.0001. Among 30 whitened vectors, 11 KPCA and KICA components are selected from average eigenvalue criterion, respectively. False alarm rates are calculated from the other normal operation data and tabulated in Table 3. For 99% control limit, the false alarm rates of each method are acceptable with the KPCA having slightly greater values. For the data obtained after the fault occurrence, the percentage of the samples outside the 99% control limits was calculated in each simulation and termed as the detection rate.

The fault detection rates of the four multivariate methods, PCA, ICA, KPCA, and KICA for all 21 faults were computed and summarized in Table 4. As shown in Table 4, the detection rates of each monitoring method are almost the same in case of Fault 1, 2, 4, 6, 7, 8, 12, 13, and 14 since the fault magnitude is so large that they can be detected up to nearly 100% by even PCA. On the other hand, the detection rates for Fault 3, 9, and 15 are not considerably higher than 1% for all methods due to too small fault magnitudes. In Table 4, the maximum detection rates achieved for Faults 5, 10, 11, 16, 17, 18, 19, and 20 are marked with a bold number. For most faults, the detection rate of KPCA is higher than that of PCA since the former considers the nonlinearity of data. If we compare the detection rates between ICA and PCA, ICA also can detect small events that are difficult for PCA to detect. Specifically, for Faults 10 and 16, the detection rate of ICA is more than twice as high as that of PCA. Comparing ICA with KPCA, one cannot say which is better because their relative performance is different for each fault case. KPCA outperform ICA in the cases of Faults 11 and 19 whereas ICA gives higher detection rates for Faults 10, 16, and 20. All things to be considered, the best monitoring performance is found in the case of KICA. In most cases except Faults 11 and 20, the detection rate of KICA is considerably higher than that of any other methods considered in this paper. In particular, the detection rate is definitely enhanced by KICA in the case of Fault 19. Although KPCA and ICA show highest detection rate for Fault 11 and 20, respectively, the difference from the detection rate of KICA is small within just 5%. To clearly illustrate the superiority of KICA in process monitoring, two monitoring results in the case of Faults 10 and 19 are shown in Figures 2 and 3, respectively. The 99% control limits are expressed as a dotted line in those figures. In Figure 2, the SPE chart of PCA and KPCA shows some false alarms around sample 130. Although PCA and KPCA detect the Fault 10 from about sample 200, there are lots of samples below the control limit despite the presence of the fault. In the case of Fault 10, ICA is able to detect the fault more efficiently than PCA and KPCA without false alarms. However, it is evident that KICA gives the best monitoring performance to detect the fault, giving no false alarms and a higher detection rate. Figure 3 also demonstrates that KICA monitoring charts are able to detect the Fault 19 more efficiently and consistently than any other methods. The [T.sup.2] and SPE statistics of KICA monitoring clearly exceed the 99% control limit after sample 160. In Table 4, one thing to be noted is the [T.sup.2] ability of the proposed method for detecting faults. For most cases, the detection rate of [T.sup.2] is considerably improved by the proposed KICA method. It means the proposed method can extract essential features of a process much more efficiently than other methods since it can effectively capture non-linear relationships in process variables while extracting independent components from multivariate. This result demonstrates that the proposed method is expected to outperform other methods to diagnose fault patterns in the feature space.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

CONCLUSIONS

This paper proposes a new approach to non-linear process monitoring based on KICA. KICA can efficiently compute independent components in a high-dimensional feature space by means of non-linear kernel functions. KICA consists of two steps: KPCA (kernel centring, kernel whitening) and an iterative procedure of the modified ICA. Therefore, KICA can be thought as performing ICA in the whitened kernel space generated from KPCA. The main non-linearity of process data is captured by KPCA and the additional ICA procedure gives statistically independent components from the observed data. The proposed method was applied to fault detection in Tennessee Eastman process and was compared with PCA, modified ICA, and KPCA. This example demonstrated that the proposed approach can effectively capture non-linear relationships in process variables and showed the best performance when used for process monitoring.

The present work shows the superiority of the KICA based process monitoring; however, KICA has some problems that should be noted. How to select the appropriate kernel for a given data set. The selection of the kernel function is important to the proposed method since the degree to which the non-linear characteristic of a system is captured depends on this function. Furthermore, determining the optimal number of independent components in the kernel space and identifying which variable causes the process fault should be considered in future work.

ACKNOWLEDGEMENTS

This work was supported by the Korea Research Foundation Grant funded by Korea Government (MOEHRD, Basic Research Promotion Fund) (KRF-2005-214-D00027, the Texas-Wisconsin Modeling and Control Consortium, and the Chang Jiang Scholars Program of the Ministry of Education of China and the Li Ka Shing Foundation.

NOMENCLATURE

a estimated mean of the SPE
b estimated variance of the SPE
C demixing matrix in the whitened feature space
[C.sub.n] normalized demixing matrix in the whitened
 feature space
[c.sub.n,i] i-th column vector of [C.sub.n]
[parallel] [c.sub.n,i]
 [parallel [L.sub.2] norm of [c.sub.n,i]
c specified constant value
D diag{[[lambda].sub.1],...,[[lambda].sub.p]}
d the number of selected nonzero eigenvalues in
 feature space
diag diagonal matrix
E{*} expected value of *
e residual value, z - z
[e.sub.1], [e.sub.2], random noise following N(0,1)
F feature space
f(s) non-linear function
G non-quadratic function
g the first derivative of G
g' the second derivative of G
H [[a.sub.1],[a.sub.2],...,[a.sub.d]]
h [2.sup.a2]/b
I identity matrix
[I.sub.p] p-dimensional identity matrix
f (y) negentropy of y
K gram kernel matrix
[[K].sub.ij] i-th row and j-th column element of
 K, [K.sub.ij] = k([x.sub.i], [x.sup.j])
K mean centred kernel matrix
[K.sub.scl] mean centred and variance scaled kernel matrix
[k.sub.t] kernel vector of test data
[k.sub.t] mean centred kernel vector of test data
[[k.sub.i]j] j-th element of kt
[k.sub.scl] mean centred and variance scaled vector of
 test data
m dimension of observed data
N the number of observations
N(0,1) Gaussian distribution with zero mean and unit
 variance
P whitening matrix in feature space
p the number of extracted independent components
[R.sup.m] m-dimensional input space
r specified constant value
[S.sup.F] covariance matrix in feature space
s source vector
[s.sub.j] j-th element of s sum summation
SPE SPE statistic (squared prediction error)
[T.sup.2] Hotelling's [T.sup.2] statistic (Mahalonobis
 distance)
trace(*) summed value of diagonal element of matrix *
V [[v.sub.1],[v.sub.2],...,[v.sub.d]]
[v.sub.j] j-th eigenvector of [C.sup.F]
v Gaussian variable of zero mean and unit
 variance
[W.sup.F] demixing matrix in feature space
x observed data
[x.sub.k] observed data at sample k
[x.sub.t] test data vector
[x.sup.j] j-th element of x
y independent components vector in feature space
[y.sub.n] normalized independent components vector in
 feature space
[y.sub.n,i] i-th element of [y.sub.n]
z whitened KPCA score vector
z estimated value of Z
[z.sub.t] whitened KPCA score vector of test data
0 zero matrix
<x.sub.i], [x.sup.j] inner product of xi and [x.sup.j]

Greek Symbols

[alpha] eigenvector of K
[[alpha].sub.j] j-th eigenvector of [K.sub.scl]
[[beta].sub.0] specified constant value
[[beta].sub.1] specified constant value
[X.sup.2] chi-square distribution
[PHI] high dimensional non-linear mapping function
[PHI](x) non-linearly mapped data
[THETA] non-linearly mapped data matrix
 [[THETA]([x.sub.1]),...,[PHI](x.sub.N)] N
[[LAMBDA].sub.j] diag([[lambda].sub.1]
 [[lambda].sub.2],...[[lambda].sub.d])
[[lambda].sub.j] the j-th eigenvalue of [K.sub.scl]
[micro] b/2a
[[sigma].sup.2] the variance of the data

Superscripts

T transpose
-1 inverse matrix

Manuscript received February 22, 2006; revised manuscript received July 14, 2006; accepted for publication April 20, 2007.

REFERENCES

Albazzaz, H. and X. Z. Wang, "Statistical Process Control Charts for Batch Operations Based on Independent Component Analysis," Ind. Eng. Chem. Res. 43(21), 6731-6741 (2004).

Bakshi, B. R., "Multiscale PCA with Application to Multivariate Statistical Process Monitoring," AIChE J. 44(7), 1596-1610 (1998).

Bach, F. R. and M. I. Jordan, "Kernel Independent Component Analysis," J. Machine Learning Res. 3, 1-48 (2002).

Chiang, L. H., E. L. Russell and R. D. Braatz, "Fault Detection and Diagnosis in Industrial Systems," Springer, London (2001).

Dong, D. and T. J. McAvoy, "Nonlinear Principal Component Analysis Based on Principal Curves and Neural Networks," Comput. Chem. Eng. 20(1), 65-78 (1996).

Dunia, R., S. J. Qin, T. F. Edgar and T. J. McAvoy, "Identification of Faulty Sensors Using Principal Component Analysis," AIChE J. 42(10), 277-2812 (1996).

Haykin, S., "Neural Networks," Prentice Hall International, Inc., New Jersey (1999).

Hiden, H. G., M. J. Willis, M. T. Tham and G. A. Montague, "Nonlinear Principal Components Analysis Using Genetic Programming," Comput. Chem. Eng. 23, 413-425 (1999).

Hyvarinen, A., "Fast and Robust Fixed-Point Algorithms for Independent Component Analysis," IEEE Trans. Neural Networks 10, 626 -634 (1999).

Hyvarinen, A. and E. Oja, "Independent Component Analysis: Algorithms and Applications," Neural Networks 13(4-5), 411-430 (2000).

Hyvarinen, A., J. Karhunen and E. Oja, "Independent Component Analysis," John Wiley & Sons, Inc., New York (2001).

Kano, M., S. Tanaka, S. Hasebe, I. Hashimoto and H. Ohno, "Monitoring Independent Components for Fault Detection," AIChE J. 49(4), 969-976 (2003).

Kano, M., S. Hasebe, I. Hashimoto and H. Ohno, "Evolution of Multivariate Statistical Process Control: Independent Component Analysis and External Analysis," Comput. Chem. Eng. 28(6-7), 1157-1166 (2004).

Kocsor, A. and J. Csirik, "Fast Independent Component Analysis in Kernel Feature Spaces," Proc. SOFSEM 2001, L. Pacholski and P. Ruzicka, Eds., Nov. 24-Dec. 1, Springer Verlag (2001), pp. 271-281.

Kocsor, A. and L. Toth, "Kernel-Based Feature Extraction with a Speech Technology Application," IEEE Trans. Signal Process. 52(8), 2250-2263 (2004).

Kramer, M. A., "Nonlinear Principal Component Analysis Using Autoassociative Neural Networks," AIChE J. 37(2), 233-243 (1991).

Kresta, J., J. F. MacGregor and T. E. Marlin, "Multivariate Statistical Monitoring of Process Operating Performance," Can. J. Chem. Eng. 69, 35-47 (1991).

Ku, W., R. H. Storer and C. Georgakis, "Disturbance Detection and Isolation by Dynamic Principal Component Analysis," Chemometr. Intell. Lab. 30, 179-196 (1995).

Lee, J.-M., C. K. Yoo and I.-B. Lee, "New Monitoring Technique with ICA Algorithm in Wastewater Treatment Process," Water Sci. Technol. 47(12), 49-56 (2003).

Lee, J.-M., C. K. Yoo, S. W. Choi, P. A. Vanrolleghem and I.-B. Lee, "Nonlinear Process Monitoring Using Kernel Principal Component Analysis," Chem. Eng. Sci. 59, 223-234 (2004a).

Lee, J.-M., C. K. Yoo and I.-B. Lee, "Statistical Process Monitoring with Independent Component Analysis," J. Process Contr. 14, 467-485 (2004b).

Lee, J.-M., C. K. Yoo and I.-B. Lee, "Fault Detection of Batch Processes Using Multiway Kernel Principal Component Analysis," Comput. Chem. Eng. 28, 1837-1847 (2004c).

Lee, J.-M., S. J. Qin and I.-B. Lee, "Fault Detection and Diagnosis of Multivariate Process Based on Modified Independent Component Analysis," AIChE J. 52(10), 3501-3514 (2006).

Li, W., H. H. Yue, S. V. Cervantes and S. J. Qin, "Recursive PCA for Adaptive Process Monitoring," J. Process Contr. 10, 471-486 (2000).

Martin, E. B. and A. J. Morris, "Non-Parametric Confidence Bounds for Process Performance Monitoring Charts," J. Process Contr. 6(6), 349-358 (1996).

Mika, S., B. Scholkopf, A. J. Smola, K.-R. Muller, M. Scholz and G. Ratsch, "Kernel PCA and De-Noising in Feature Spaces," Proc. Advances Neural Inform. Processing Syst. II., 536-542 (1999).

Nomikos, P. and J. F. MacGregor, "Multivariate SPC Charts for Monitoring Batch Processes," Technometrics 37, 41-59 (1995).

Qin, S. J., "Statistical Process Monitoring: Basics and Beyond," J. Chemomtr. 17, 480-502 (2003).

Romdhani, S., S. Gong and A. Psarrou, "A Multi-View Nonlinear Active Shape Model Using Kernel PCA," Proc. British Machine Vision Conf., Nottingham, UK, 483-492 (1999).

Scholkopf, B., A. J. Smola and K. Muller, "Nonlinear Component Analysis as a Kernel eigenvalue Problem," Neural Computation, 10(5), 1299-1399 (1998).

Scholkopf, B., S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Muller, G. Ratsch and A. J. Smola, "Input Space Versus Feature Space in Kernel-Based Methods," IEEE Trans. Neural Networks 10(5), 1000-1016 (1999).

Tan, S. and M. L. Mavrovouniotis, "Reducing Data Dimensionality through Optimizing Neural-Network Inputs," AIChE J. 41(6), 1471-1480 (1995).

Tates, A. A., D. J. Louwerse, A. K. Smilde, G. L. M. Koot and H. Berndt, "Monitoring a PVC Batch Process with Multivariate Statistical Process Control Charts," Ind. Eng. Chem. Res. 38, 4769-4776 (1999).

Wise, B. M. and N. B. Gallagher, "The Process Chemometrics Approach to Process Monitoring and Fault Detection," J. Process Contr. 6(6), 329-348 (1996).

Wold, S., "Cross-Validatory Estimation of Components in Factor and Principal Components Models," Technometrics 20, 397-405 (1978)

Yang, J., X. Gao, D. Zhang and J. Yang, "Kerenel ICA: An Alternative Formulation and Its Application to Face Recognition," Pattern Recognition 38(10), 1784-1787 (2005).

Jong-Min Lee (1), S. Joe Qin (1)* and In-Beum Lee (2)

(1.) Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, U.S.A. 78712

(2.) Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja-Dong, Pohang, 790-784, Korea

* Author to whom correspondence may be addressed. E-mail address: qin@che.utexas.edu

Table 1. Monitored variables in the Tennessee Eastman process

No. Variables

 1 A feed (stream 1)
 2 D feed (stream 2)
 3 E feed (stream 3)
 4 Total feed (stream 4)
 5 Recycle flow (stream 8)
 6 Reactor feed rate (stream 6)
 7 Reactor pressure
 8 Reactor level
 9 Reactor temperature
10 Purge rate (stream 9)
11 Product separator temperature
12 Product separator level
13 Product separator pressure
14 Product separator underflow (stream 10)
15 Stripper level
16 Stripper pressure
17 Stripper underflow (stream 11)
18 Stripper temperature
19 Stripper steam flow
20 Compressor work
21 Reactor cooling water outlet temperature
22 Separator cooling water outlet temperature
23 D feed flow valve (stream 2)
24 E feed flow valve (stream 3)
25 A feed flow valve (stream 1)
26 Total feed flow valve (stream 4)
27 Compressor recycle valve
28 Purge valve (stream 9)
29 Separator pot liquid flow valve (stream 10)
30 Stripper liquid product flow valve (stream 11)
31 Stripper steam valve
32 Reactor cooling water flow
33 Condenser cooling water flow

Table 2. Process fault descriptions for the Tennessee Eastman process

No. Description Type

 1 A/C feed ratio, B composition constant (stream 4) Step
 2 B composition, A/C ratio constant (stream 4) Step
 3 D feed temperature (stream 2) Step
 4 Reactor cooling water inlet temperature Step
 5 Condenser cooling water inlet temperature Step
 6 A feed loss (stream 1) Step
 7 C header pressure loss--reduced availability Step
 (stream 4)
 8 A, B, C feed composition (stream 4) Random
 variation
 9 D feed temperature (stream 2) Random
 variation
10 C feed temperature (stream 4) Random
 variation
11 Reactor cooling water inlet temperature Random
 variation
12 Condenser cooling water inlet temperature Random
 variation
13 Reaction kinetics Slow drift
14 Reactor cooling water valve Sticking
15 Condenser cooling water valve Sticking
16~20 Unknown

Table 3. False alarm rates (%) of each method in the Tennessee
Eastman process against 99% control limit

 Modified
PCA ICA KPCA KICA

[T.sup.2] SPE [T.sup.2] SPE [T.sup.2] SPE [T.sup.2] SPE
0.5 0.8 0.2 0.8 1.78 3.1 0.33 1.37

Table 4. Fault detection rates of each method in the Tennessee
Eastman process

Faults PCA Modified ICA

 [T.sup.2] SPE [T.sup.2] SPE

 1 99 100 100 100
 2 98 96 98 98
 3 2 1 1 1
 4 6 100 65 96
 5 24 18 24 24
 6 99 100 100 100
 7 42 100 100 100
 8 97 89 97 98
 9 1 1 1 2
10 31 17 70 64
11 21 72 43 66
12 97 90 98 97
13 93 95 95 94
14 81 100 100 100
15 1 2 1 2
16 14 16 76 73
17 74 93 87 94
18 89 90 90 90
19 0 29 25 29
20 32 45 70 66

Faults KPCA KICA

 [T.sup.2] SPE [T.sup.2] SPE

 1 100 100 100 100
 2 98 98 98 98
 3 4 5 1 3
 4 9 100 81 100
 5 27 25 25 28
 6 99 100 100 100
 7 100 100 100 100
 8 97 96 97 98
 9 4 4 1 3
10 43 51 81 78
11 24 81 58 77
12 98 97 99 99
13 94 95 95 95
14 79 100 100 100
15 8 6 3 5
16 30 52 77 87
17 74 95 91 97
18 90 90 89 91
19 3 49 70 85
20 41 52 50 65