Image compression and feature extraction using Kohonen's self-organizing map neural network.
Sharma, Dinesh K. ; Gaur, Loveleen ; Okunbor, Daniel 等
ABSTRACT
Data transmission over the Internet is prevalent and the development of efficient algorithms for compressing such data in order to achieve reduced bandwidth has been an active research. With increased demand for exchanges of audio, video and other image data over the Internet, research for data compression is more intense than ever before. One of the most popular algorithms for image compression and feature extraction is the use of Kohonen's Self- Organizing Map (SOM), a neural network scheme, designed originally for pattern recognition through association rules. In the past decade and half, the SOM has revolutionized the field of data compression and feature extraction because of its ability to reduce higher dimensional data into lower dimensional data. In this paper, a global processing technique is proposed for training the Kohonen's network and this new technique was tested using JPEG images and the compressed images were considerably reduced in size.
Keywords: Kohonen's Self-Organizing Map, Feature extraction, Image compression, Global processing, Neural Network.
INTRODUCTION
The rapid development of information and communication technologies is enabling large amount of information to be processed, stored, and transmitted over high speed networks. The need for data compression and transmission is increasingly becoming a significant topic in all areas of computing and communications. Computing techniques that would considerably reduce the image size that occupies less space and bandwidth for transmission over networks form an active research. Image compression deals with reducing the amount of data required to represent a digital image (Haykin, 2003).
Digital image processing: (1) improves the information content of the picture for better understanding of what the picture is made up of so that an image appears to be better as its contrast is increased; (2) provides data compression for efficient storage and transmission. Although, storage technology has improved significantly over the past decade, the same cannot be said for transmission capacity. Recently, the need for the transmission of the visual data has increased, particularly over the Internet. Image compression is a commonly used practice in Joint Photographic Experts Group (JPEG) image compression standard. The JPEG image files generally have the jpeg in their name extensions.
Several compression techniques have been developed, such as Differential Pulse Code Modulation, Discrete Cosine Transform (Gersho and Gray, 1992), Discrete Fourier Transform, and numerous Vector Quantization (VQ) methods (Amerijckx et. al., 1998). VQ technique has been used widely (Gray, 1984; Gersho and Gray, 1992). VQ has fairly good performance in both compression ratio and extracted image quality. The principle of the VQ technique is simple. At first, the image is split into square blocks of n'n pixels, for example 4'4, 6'6 or 8'8; each block is a vector. After dividing the original image into blocks, the VQ encoder is used to search each block throughout the codebook for the codeword that is similar to the image block. The index values of the code words closest to the blocks are recoded as the compressed image. When decompressing the image, the VQ decoder uses these index values to recover the corresponding blocks to reconstruct the image. Although, VQ is a powerful technique, it however suffers from high computational complexity.
SOM is another image compression technique that achieves the same efficiency as that of the VQ scheme (Amerijckx et. al., 1998). SOM has many applications, including but not limited to, classification and exploration of collections of text documents, image compression and pattern recognition (Chen et al., 1994; Pei and Lo, 1988). Kurnaz et al. (2001) presented an incremental SOM for the segmentation of ultrasound images. Elements of the feature vectors were formed by the Fast Fourier Transform of image intensities in square blocks. Zheng (1994) described the concept of groupings based on certain rules such as proximity and similarity of image segmentation based on SOM. Katoh et al. (1998) used SOM for recognizing human emotions through facial expressions. By inputting various images of facial expressions into SOM and changing the interconnection weights, they were able to classify image features.
In this paper, we present an image compression technique using SOM employing global processing. Global processing technique processes every pixel of an image without utilizing blocks as it is generally implemented in the conventional SOM. The proposed technique places emphasis on the fourth property of the feature map, e.g., feature extraction. In this respect, the data from an input space obtained from nonlinear distribution, is processed using the SOM to select the set of best approximating features.
The remainder of the paper is organized as follows. In Section 2, a brief review of relevant literature is presented. In Section 3, the methodology used in the paper is described. In Section 4, an application of the methodology is offered. In Section 5, the result of the methodology is presented. In Section 6, conclusions are given.
BACKGROUND
Neural Networks
Neural networks have changed the way we solve "real-world" problems in science, engineering and economics (Mennon, 1996). Neural networks are an effective tool for pattern classification and clustering (Haykin, 2003). There are broadly two paradigms of neural learning algorithms namely, supervised and unsupervised. In the supervised learning paradigm, the networks are generally universal approximations of continuous/discontinuous functions, with prior knowledge of the information about the input-output map. Such known information is used to train the network, combined with backward/feed forward propagation to obtain the optimal interconnection weight matrix. The trained network is utilized for subsequent simulations. The quality of output from the supervised network is determined by its closeness to the desired actual output. The unsupervised neural network is the opposite of supervised network, in this case, desired actual output is unknown. Unsupervised neural algorithms are best suited for clustering patterns on the basis of their inherent characteristics (Kohonen, 2001; Haykin, 2003). The major approaches for unsupervised learning: a) Competitive Learning, b) Self Organizing Feature Maps.
Feature Extraction
Feature extraction is the process of mapping the original features (measurements) into fewer features which include the main information of the structure of the data. Feature extraction methods can be grouped into four categories based on a priori knowledge: supervised versus unsupervised; and by the functional form: linear versus nonlinear. In cases where the target class of the patterns is unknown, unsupervised methods are the only way to perform feature extraction. In other cases, supervised paradigms are preferable. Linear methods are simpler and are often based on an analytical solution but they are inferior to nonlinear methods when the classification task requires complex hyper surfaces. Widespread unsupervised methods for feature extraction are Principal Component Analysis (PCA), a linear mapping; and Sammon's nonlinear mapping. The PCA attempts to preserve the variance of the projected data, whereas Sammon's mapping tries to preserve the interpattern distances.
Feature extraction can be seen as a special kind of data reduction in which the goal is to find a subset of informative variables based on image data. Since image data are by nature high dimensional, feature extraction is often a necessary step for segmentation or object recognition to be successful. Feature Extraction is the first and most important step which is ordinarily performed in an unsupervised manner for pattern classification. The objective of this step is to select small sets of features in which the essential information content of the input data is concentrated. The property "feature selection" of SOM is well suited for the task of feature extraction particularly if input data is generated by a non-linear process (Haykin, 2003).
Kohonen's Self-Organizing Maps
Kohonen invented the Self-Organizing Map (SOM) in the early 1980s. Kohonen's SOM is a widely-used artificial neural network (ANN) model based on the idea of self-organized or unsupervised learning (Kohonen, 2001). The SOM network is a data visualization technique, which reduces the dimensions of data through a variation of neural computing networks. It is a nonparametric approach that makes no assumptions about the underlying population distribution and is independent of prior information (Kohonen, 2001). The problem that data visualization attempts to solve is that humans simply cannot visualize high dimensional data so techniques must be created to help us understand high dimensional data.
In comparison, the back propagation algorithm requires the examples to consist of input-output pairs. The network architecture of the SOM consists of a set of laterally interacting adaptive processing elements, nodes, usually arranged in a two-dimensional grid called a map. All the map nodes are connected to a common set of inputs. Any activity pattern on the input gives rise to excitation of some local group of map nodes. After learning, the spatial positions of the excited groups specify a mapping of the input onto the map. The learning process is based on similarity comparisons in a continuous space. The result is a system that maps similar inputs close to each other in the resulting map. The input may be highly complex multidimensional data like in real-life speech recognition, image analysis, and process monitoring.
SOM goes about reducing dimensions by producing maps of usually one or two dimensions which plot the similarities of the data by grouping similar data items together. So SOM's accomplish two things, they reduce dimensions, and display similarities. SOM is a type of unsupervised learning where the goal is to discover some underlying structure of the data. SOM is called a topology preserving map because there is topological structure imposed on the nodes in the network. A topological map is simply a mapping that preserves neighborhood relations. The neighborhood can be rectangular or hexagonal (Figure 1).
[FIGURE 1 OMITTED]
In SOM, the neurons are placed at the nodes of a lattice, e.g., usually one or two dimensional. The neurons become selectively tuned to various input patterns (Stimuli) or classes of update patterns in the course of a competitive learning process. The location of the neurons so tuned (i.e. winning neurons) become ordered with respect to each other in such a way that a meaningful coordinate system for different input features is created over the lattice. A SOM is therefore characterized by formation of a topographic map of the input pattern in which the spatial location of the neurons in the lattice is indicative of intrinsic statistical features contained in the input pattern (Haykin, 2003; Zheng, 1994).
SOMs are based on competitive learning in which the output neurons of the network compete among themselves to be activated or fired, with the result that only one output neuron, or one neuron per group, is on at any one time. An output neuron that wins the competition is called a winning neuron (Haykin, 2003). One way of inducing a winning neuron is to use lateral inhibitory connections (i.e., negative feedback paths) between them. SOM provides a way of representing multidimensional data in much lower dimensional spaces--usually one or two dimensions. The brain is organized in such a way that topologically ordered computational maps (defined by an array of neurons representing slightly differently tuned processors or filters) represent different sensory inputs. Consequently, the neurons transform input signals into a place-coded probability distribution that represents the computed values of parameters by sites of maximum relative activity within the map (Knudsen, 1987).
[FIGURE 2 OMITTED]
From the above Figure 2, it can be seen that neurons can be placed in different topologies. Besides the topologies shown above there is a random topology, in which the neurons are placed randomly.
[FIGURE 3 OMITTED]
Kohonen's SOM Algorithm
A SOM does not need a target output to be specified unlike many other types of networks. Further, when the node weights match the input vector, that area of the lattice is selectively optimized to more closely resemble the data for the classification of the input vector. Kohonen's network follows the "Winner takes all" policy. The network cluster unit, whose weight vector matches more closely with the input pattern, is considered the winning neuron. The weights of the winning unit and its neighboring units are updated (Figure 3). The winning unit is decided based on the Euclidean distance (D) is calculated as:
D = [square root][SIGMA] [([x.sub.i] - [w.sub.ij]).sup.2]
During the training process, input data is fed to the network through the processing nodes in the input layer. Training occurs in several steps and over several iterations. The following training procedure summarizes the Kohonen learning algorithm ((Haykin, 2003; Kohonen, 2001).
Step 1. Initialization : Assign randomly small values to the initial weight vectors [W.sub.j](0) of output neuron j, Where j = 1, 2 . . . N.
Step 2. Sampling: Draw an input vector X randomly from the input data, and feed it into the Network.
Step 3. Similarity Matching: Find distance between input vector X and each output neuron's weight [W.sub.j] (n) at time n.
Dj = ||X - Wj ||; j = 1, 2 ... N, where ||.|| is Euclidean distance.
Step 4. Selecting: Select the winning neuron C which has minimum of Dj.
C = argj min (Dj); j = 1, 2, ..., N.
Step 5. Updating: Adjust the weight vectors of all neurons through
wj (n + 1) = wj (n) + (n)[X(n) -Wj (n)] if [member of] Ac
wj (n + 1) = wn(n) otherwise
C = argj min (Dj); j = 1, 2 ... N,
Where, (n) is the learning rate parameter, and AC is the neighborhood function centered on the winning neuron C.
Step 6. Continuation: Repeat steps (2)-(5) until no noticeable changes in the feature map are observed or when specified number of epochs is achieved (Haykin, 2003).
METHODOLOGY
As mentioned earlier, the Kohonen's SOM is based on an unsupervised learning that only requires the input data with the main objective of reducing high dimensional input space to lower dimensional output. In addition to these important characteristics, the algorithm is simple and easy to understand. In this paper, we use SOM for segmenting images. Image processing using segmentation is treated as a classification problem, in which segmentation is achieved by pixel classification using SOM (Kong and Guan, 1994). Kong et al. (2002) used SOM for performing image segmentation in two steps, coarse segmentation to obtain the global clustering information of the image followed by pixel based classification scheme that utilizes the local features to refine segmentation. Iivarinen et al. (1996) used SOM to estimate the distribution of features extracted from faulty-free samples. Visa (1992) implemented image segmentation based on SOM and texture measures. The proposed SOM algorithm for image compression using SOM employing global processing is divided into six steps as depicted in the Figure 4.
[FIGURE 4 OMITTED]
Networks were created to compress the image while retaining its key features. Experiments were done with the help of Network/Data manager tool of the MATLAB software. Every network created was uniquely identified by its name. The networks created were Self-Organizing Maps having dimensions as 15x15, 20x20, 25x25, 35x35 and finally 45x45 dimensional map varied from 225 neurons to 2025 neurons. The topology adopted was a grid topology. The distance function selected was 'LINKDIST'. The ordering phase learning rate was fixed at 0.9 and the steps for this phase were kept at 2000. Similarly tuning phase learning rate was kept at 0.02. The neighborhood distance was fixed to unity.
After the creation of the network, a three-dimensional matrix (as described above) was given input into the network. Where the node weights match the input vector, that area of the lattice was selectively optimized to more closely resemble the data for the classification of the input vector. From an initial distribution of random weights, and over much iteration, the SOM eventually settle into a map of stable zones, where each zone was effectively a feature classifier, so you can think of the graphical output as a type of feature map of the input space. Self-Organizing Feature Maps (SOFM) learn to classify input vectors according to how they are grouped in the input space. They differ from competitive layers in that neighboring neurons in the self-organizing map learn to recognize neighboring sections of the input space. As a result, self-organizing maps learn both the distribution (as do competitive layers) and topology of the input vectors they are trained. After the completion of training, weights were extracted from the network object, obtained as a result of training. The input/output algorithms using MATLAB coding are given in Figures 5 & 6.
Figure 5: INPUT Algorithm using MATLAB coding Step 1: Input jpeg image a1= imread ('image',' jpeg'); a1=double (a1); Step 2: Reduce the jpeg image into 100X100 pixels a1=a1 (1:100, 1:100); Step 3: Initialize variables 1=100, m=100,a=[0 0 0], c=[0 0 0] c=[c', a'] For i =1: l For j=1: m a= [i, j, a1 (i,j)]; a=[c a']; c=a; end end c=c(:,3:10002): Step 4: c is the Output which is a matrix form and work as input for networks
Figure 6: OUTPUT Algorithm Using MATLAB Coding Step 1: load filename; (Input is Network of various dimensions like 15X15, 20X20,.., and, so on.) Step 2: w=network9.iw; a=0; ww = cell2mat(w); a=uint8(a); ww=uint8(ww); For n=1:1225 i=ww(n,1); j=ww(n,2); k=ww(n,3); a(i,j)=k; end (Here the node weights match the input vector, that area of the lattice is selectively optimized to more closely resemble the data for the class the input vector is a member of. From an initial distribution of random weights, and over much iteration, the SOM eventually settles into a map of stable zones. Each zone is effectively a feature classifier, so you can think of the graphical output as a type of feature map of the input space) Step 3: imwrite (a, 'test11', 'jpeg'); (Converting these output files into jpeg)
RESULTS
Based on node weights taken from the input algorithm, reconstruction of the image was performed using an output algorithm and the results were found to be satisfactory. We noticed that as we increased the dimensions of the SOM, the images began to resemble more closely the original image. Also, more feature extraction took place as we increased the dimensions of the map. The more closely resembled reconstructed image i.e. 45X45 is reduced to 5409 bytes. Since SOM is computationally intensive, a substantial amount of time is expended in the mere training of any SOM network.
CONCLUSION
In this study, we used SOM for segmenting images. The unsupervised learning paradigm implemented using SOM has been one of the major methods for image segmentation research. Image compression addresses the problem of reducing the amount of data required to represent a digital image. Digital Image Processing encompasses processes whose inputs and outputs are images and encompasses processes that extract attributes from images, up to, and including the recognition of individual objects. A global processing technique was used for the image compression and the image taken was in jpeg format. As we increased the dimensions, the picture was reduced by the number of bytes and started to closely resemble the actual picture through the feature extraction property of SOM thereby making the images very convenient for storage and transmission.
[FIGURE 7 OMITTED]
REFERENCES
Ahmed, N., T. Nataranjan & K.R. Rao (1974). Discrete Cosine Transform. IEEE Trans. Com., C-23, 90-93.
Amerijckx, C., M. Velerysen, P. Thissen & J.D. Legat (1998). Image compression by selforganized Kohonen map. IEEE Transactions on Neural Networks, 9 (3), 503-507.
Chen, O.T.-C., B.J. Sheu & W.-C. Fang (1994). Image compression using self organization networks. IEEE Trans. Circuits Syst. Video Tech., 4 (5), 480-489.
Delogne, P. & B. Macq (1991). Universal Variable Length coder for an integrated approach to image coding Ann Telecomm., 46(7-8), 452-459.
Gersho, A. & R.M. Gray (1992). Vector Quantization and Signal Compression. London: Kluwer Academic Publishers.
Gray, R.M. (1984). Vector qunatization. IEEE ASSP Mag., 9-31.
Haykin, S. (2003). Neural Networks: A Comprehensive Foundation. New Delhi: Prentice Hall India.
Iivarinen, J., J. Rauhamaa & A. Visa (1996). Unsupervised segmentation of surface defects. Proceedings of ICPR, 356-360.
Katoh, A. & Y. Fukui (1998). Classification of facial expressions using self organizing maps. Proceedings of the 20th Annual International Conference of the IEEE Engineering in medicine and Biology Society, 20, 986- 989.
Kohonen, T. (2001). Self Organizing Maps. Berlin: Springer-Verlag.
Kong, H. & L. Guan (1994). A self-organizing neural network for image segmentation. Proceedings of the 1994 Second Australian and New Zealand Conference, 27-31, 29 Nov.-2 Dec., 1994.
Kong, H. S., L. Guan & S. Y. Kung (2002). A self organizing tree map approach for image segmentation. Proceedings of ICSP, 588-591, 2002.
Knudsen, E. I., S. D. Lac & S. D. Esterly (1987). Computational Maps in the Brain. Annual Review of Neuroscience, 10, 41-65.
Kurnaz, M. N., Z. Dokur & T. Olmez (2001). Segmentation of ultrasound images by using an incremental self-organized map. Proceedings of the 23rd annual EMBS International Conference, Istanbul, Turkey, 2638-2640, October, 25-28.
Macq, B. (1990). A Universal entropy coder for transform on hybrid coding. Picture Coding Symposium, 12.1.1-12.12.2, Boston.
Mennon, A., Mehrotra, C. Mohan & S. Ranka (1996). Characterization of a Class of Sigmoid Functions with Applications to Neural Networks. Neural Networks, 9, 819-835.
Pei, S.-C. & Y.-S. Lo (1998). Color image compression and limited display using self-organization Kohonen map. IEEE Trans. Circuits Syst. 8 (2), 191-205.
Stallman, Richard (2001). The GNU Project (www.stallman.org/rms-bw.jpeg).
Visa, A. (1992). Unsupervised image segmentation based on a self-organizing feature map and a texture measure. Proceedings 11th IAPR International Conference on Image, Speech and Signal Analysis, Vol. III , 101-104, 30 Aug-3 Sept, 1992.
Zheng, Y. J. (1994). Self organizing grouping for feature extraction and image segmentation. International symposium on speech. Image Processing and Neural Networks, 13-16, April, 1994.
Zirilli, J. (1997). Financial Prediction using Neural Networks. Boston: International Thompson Computer Press.
Dinesh K. Sharma, University of Maryland Eastern Shore
Loveleen Gaur, BLS Institute of Management
Daniel Okunbor, Fayetteville State University