摘要:It has been proven that the large-scale image dataset is strictly complex in content-based image retrieval (CBIR) as the present strategies in CBIR might have difficulties in processing it. Other than this,near-duplicate images would possibly consume space,in which as an alternative can be used for storing other or unique images. In order to solve these problems,MapReduce has been used for speedup filtering near-duplicate images. However,there is still a lack of accuracy in detecting near-duplicate images. Hence,this study has discovered that image features extraction by means of Principal Component Analysis (PCA) technique,which is primarily based on the matrix of image representation that will expand the similarity of detection. There is a need whereby PCA approach requires to be enhanced resulting from the lack of the extraction of features in Songket motives images. Therefore,this study proposes a new hybrid model that will integrate PCA with MapReduce for image feature extraction and clustering the large-scale image dataset in the cloud environment. In view of this, the present study employs the use of a qualitative experimental design model and goes through three main phases iteration:firstly,is the analysis and design phase,secondly is a development phase and lastly is testing and evaluation phase. However,this study focuses only on the analysis and design phase. The outcomes process of the empirical phase is followed by designing the algorithm and model according to the result of literature reviews. The expected results of study is a proposed model and extract principal component elements of the large-scale image dataset using PCA,as well as boosting up time in filtering the images through MapReduce environment.