标题:Understanding the spatio-temporal characteristics of Twitter data with geotagged and non-geotagged content: two case studies with the topic of flu and Ted (movie)
摘要:The dynamic characteristics of geotagged Twitter messages provide researchers with vast potential for analysing the spatial diffusion of events such as disease outbreaks, environmental changes and social movements. The percentage of geotagged data, however, is extremely small compared to non-geotagged data, whereas non-geotagged tweets often contain noises generated by automated robots, location spoofing and human-made mistakes. Given these challenges, this study aims to understand the difference in Twitter diffusion characteristics between geotagged and non-geotagged. Tweets were collected using two keywords ‘flu’ and movie Ted from four targeted cities (San Diego, Los Angeles, Denver, New York) to represent different topics and geographical areas in the United States. This study presents methodological and analytical frameworks to filter out noises, analyse the internal structure of the diffusion process, and investigate the spatial distribution of geotagged tweets and their associations with land-use types. Results indicate that geotagged tweets demonstrated less noise and stronger correlations to events, filtered non-geotagged tweets were effective in the trend and content analyses, and the topic choice in Twitter was associated with the correlation between geotagged and non-geotagged tweets. Further, geotagged tweets of multiple topics showed significant spatial variation related to the land-use distribution and the city structure.
关键词:Social media;GIS;topic selection;time series analysis;land-use