期刊名称:ISPRS International Journal of Geo-Information
电子版ISSN:2220-9964
出版年度:2021
卷号:10
期号:6
页码:373
DOI:10.3390/ijgi10060373
语种:English
出版社:MDPI AG
摘要:Twitter’s APIs are now the main data source for social media researchers. A large number of studies have utilized Twitter data for diverse research interests. Twitter users can share their precise real-time location, and Twitter APIs can provide this information as longitude and latitude. These geotagged Twitter data can help to study human activities and movements for different applications. Compared to the mostly small-scale data samples in different domains, such as social science, collecting geotagged data offers large samples. There is a fundamental question whether geotagged users can represent non-geotagged users. While some studies have investigated the question from different perspectives, they did not investigate profile information and the contents of tweets of geotagged and non-geotagged users. This empirical study addresses this limitation by applying text mining, statistical analysis, and machine learning techniques on Twitter data comprising more than 88,000 users and over 170 million tweets. Our findings show that there is a significant difference (p-value < 0.001) between geotagged and non-geotagged users based on 73% of the features obtained from the users’ profiles and tweets. The features can also help to distinguish between geotagged and non-geotagged users with around 80% accuracy. This research illustrates that geotagged users do not represent the Twitter population.