摘要:Text often includes references to places by name; in prior work, more than 20% of a sample of event-related tweets were found to include place names. Research has addressed the challenge of leveraging the geographic data reflected in text statements, with well-developed methods to recognize location mentions in text and related work on automated toponym resolution (deciding which place in the world is meant by a place name). A core issue that remains is to distinguish between text that mentions a place or places and text that is about a place or places. This paper presents the first step in research to address this challenge. The research reported here sets the conceptual and practical groundwork for subsequent supervised machine learning research; that research will leverage human-produced training data, for which a judgment is made about whether a statement is or is not about a place (or places), to train computational methods to do this classification for large volumes of text. The research step presented here focuses on three questions: (1) what kinds of entities are typically conceptualized as places, (2) what features of a statement prompt the reader to judge a statement to be about a place (or not about a place) and (3) how do judgments of whether or not a statement is about a place compare between a group of experts who have studied the concept of "place" from a geographic perspective and a cross-section of individuals recruited through a crowdsourcing platform to make these judgments.
关键词:geographic information retrieval; spatial language; crowdsourcing