摘要:AbstractExtraction of useful and discriminative information from fluorescence microscopy protein images is a challenging task in the field of machine learning and pattern recognition.Gray Level Co-occurrence Matrix (GLCM) was among the first methods developed for textural analysis, which holds information of intensity distribution as well as the respective distance of intensity levels in the original image. In this paper, several GLCMs are constructed with different quantization levels for different values of offsetd. Haralick descriptors are extracted from each GLCM, which are then utilized to train support vector machines. The final output is obtained through the majority voting scheme. Hybrid models from different individual feature spaces have also been constructed. Additionally, Correlation-based Feature Selection (CFS) is performed to extract the most useful features from the hybrid models.The empirical analysis reveals that varying the value of parameterdcauses the GLCM to extract different information from a particular fluorescence microscopy image. Hence, producing diversified co-occurrence matrices for same images. Similarly, using more quantization levels for constructing a GLCM generates informative and discriminative features for the classification phase. Furthermore, CFS has significantly reduced the feature space dimensionality achieving almost the same accuracy as full feature space.The performance of the proposed system is validated using three benchmark datasets including HeLa (99.6%), CHO (100%), and LOCATE Endogenous (100%) datasets. It is anticipated that GLCM is still an efficient technique for pattern analysis in the field of bioinformatics and computational biology as well as might be helpful in drug discovery related applications.
关键词:Texture analysis;Protein subcellular localization;GLCM;Haralick textures;Support vector machine;Correlation based feature selection