摘要:Social media data are available and accumulated at the extra-byte level every day. As social media applications are widely deployed in various platforms from personal computers to mobile devices, they are becoming a natural extension to human sensory system. The synthesis of social media with human intelligence has the potential to be the intelligent sensor network of unprecedented scale and capacity. However, it also poses several grand challenges to archive and retrieve information from massive social media data. One of these challenges is how to archive, retrieve and mine such massive unstructured data set efficiently to support real-time emergency response. To explore potential solutions, this paper utilizes parallel computing methods to harvest social media data sets, using Twitter as an example, and to store, index, query and analyse them. Within this framework, a Not Only SQL database (DB), MongoDB, is used to store data as document entries rather than relational tables. To retrieve information from the massive data sets efficiently, several strategies are used: (1) data are archived in the MongoDB across multiple collections with each collection containing a subset of the accumulated data, (2) parallel computing is applied to query and process data from each collection and (3) data are duplicated across multiple servers to support massive concurrent access of the data sets. This study has also tested the performance of spatiotemporal query, concurrent user requests and sentiment analysis over multiple DB servers, and performance benchmark results showed that the proposed approach could provide a solution for processing massive social media data with more than 40% performance improvement. A proof-of-concept prototype implements the design to harvest, process and analyse tweets.