首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Research on social data by means of cluster analysis
  • 本地全文:下载
  • 作者:Camila Maione ; Donald R. Nelson ; Rommel Melgaço Barbosa
  • 期刊名称:Applied Computing and Informatics
  • 印刷版ISSN:2210-8327
  • 电子版ISSN:2210-8327
  • 出版年度:2019
  • 卷号:15
  • 期号:2
  • 页码:153-162
  • DOI:10.1016/j.aci.2018.02.003
  • 出版社:Elsevier
  • 摘要:This paper presents a data mining study and cluster analysis of social data obtained on small producers and family farmers from six country cities in Ceará state, northeast Brazil. The analyzed data involve demographic, economic, agriculture and food insecurity information. The goal of the study is to establish profiles for the small producer families that reside in the region and to identify relevant features which differentiate these profiles. Moreover, we provide an efficient data mining methodology for analysis of social data sets which is capable of handling its natural challenges, such as mixed variables and abundance of null values. We use the Silhouette method for the estimation of the best number of natural groups within the data, along with the Partitioning Around Medoids clustering algorithm in order to compute the profiles. The Correlation-Based Feature Selection method is used to identify which social criteria are the most important to differentiate the families from each profile. Classification models based on support vector machines, multilayer perceptron and decision trees were developed aiming to predict in which of the identified clusters an arbitrary family would be best fit. We obtained a good separation of the families into two clusters, and a multilayer perceptron model with approximately 93.5% prediction accuracy.
  • 关键词:Clustering ; Social data ; Classification ; Pam ; Data mining
国家哲学社会科学文献中心版权所有