其他摘要:Social media is a rich source of information and opinion, with exponential data growth rate. However social media posts are difficult to analyze since they are brief, unstructured and noisy. Interestingly, many social media posts are about an entity or entities. Understanding which entity is central (Salient Entity) to a post, helps better analyze the post. In this paper we propose a model that aids in such analysis by identifying the Salient Entity in a social media post, tweets in particular. We present a supervised machine-learning model, to identify Salient Entity in a tweet and propose that the tweet is most likely about that particular entity. We have used the premise that, when an image accompanies a text, the text most likely is about the entity in that image, to build a dataset of tweets and salient entities. We trained our model using this dataset. Note that this does not restrict the applicability of our model in any way. We use tweets with images only to obtain objective ground truth data, while features for the model are derived from tweet text. Our experiments show that the model identifies Salient Named Entity with an F-measure of 0.63. We show the effectiveness of the proposed model for tweet-filtering and salience identification tasks. We have made the human annotated dataset and the source code of this model publicly available.
其他关键词:Entity salience; named entity recognition; semantic search; named entity extraction.