首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:A data-driven framework for archiving and exploring social media data
  • 本地全文:下载
  • 作者:Qunying Huang ; Chen Xu
  • 期刊名称:Annals of GIS
  • 印刷版ISSN:1947-5683
  • 出版年度:2014
  • 卷号:20
  • 期号:4
  • 页码:265-277
  • DOI:10.1080/19475683.2014.942697
  • 语种:English
  • 出版社:Taylor & Francis Ltd.
  • 摘要:Social media data are available and accumulated at the extra-byte level every day. As social media applications are widely deployed in various platforms from personal computers to mobile devices, they are becoming a natural extension to human sensory system. The synthesis of social media with human intelligence has the potential to be the intelligent sensor network of unprecedented scale and capacity. However, it also poses several grand challenges to archive and retrieve information from massive social media data. One of these challenges is how to archive, retrieve and mine such massive unstructured data set efficiently to support real-time emergency response. To explore potential solutions, this paper utilizes parallel computing methods to harvest social media data sets, using Twitter as an example, and to store, index, query and analyse them. Within this framework, a Not Only SQL database (DB), MongoDB, is used to store data as document entries rather than relational tables. To retrieve information from the massive data sets efficiently, several strategies are used: (1) data are archived in the MongoDB across multiple collections with each collection containing a subset of the accumulated data, (2) parallel computing is applied to query and process data from each collection and (3) data are duplicated across multiple servers to support massive concurrent access of the data sets. This study has also tested the performance of spatiotemporal query, concurrent user requests and sentiment analysis over multiple DB servers, and performance benchmark results showed that the proposed approach could provide a solution for processing massive social media data with more than 40% performance improvement. A proof-of-concept prototype implements the design to harvest, process and analyse tweets.
  • 关键词:cloud computing;big data;NoSQL;parallel computing
国家哲学社会科学文献中心版权所有