期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2017
卷号:7
期号:4
页码:33
DOI:10.5121/ijdkp.2017.7403
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The geometry of data, also known as probability distribution, is an important consideration for accurate computation of data mining tasks, such as pre-processing, classification and interpretation. The data geometry influences outcome and accuracy of the statistical analysis to a large extent. The current paper focuses on, understanding the influence of data geometry in the feature subset selection process using random forest algorithm. In practice, it is assumed that the data follows normal distribution and most of the time, it may not be true. The dimensionality reduction varies, due to change in the distribution of the data. A comparison is made using three standard distributions such as Triangular, Uniform and Normal Distribution. The results are discussed in this paper.
关键词:Data Geometry; Gaussian Distribution; Uniform Distribution; Triangular Distribution; Dimensionality ;Reduction; Random Forest; Random Subset Feature Selection.