首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Approaches for the Clustering of Geographic Metadata and the Automatic Detection of Quasi-Spatial Dataset Series
  • 本地全文:下载
  • 作者:Javier Lacasta ; Francisco Javier Lopez-Pellicer ; Javier Zarazaga-Soria
  • 期刊名称:ISPRS International Journal of Geo-Information
  • 电子版ISSN:2220-9964
  • 出版年度:2022
  • 卷号:11
  • 期号:2
  • 页码:87
  • DOI:10.3390/ijgi11020087
  • 语种:English
  • 出版社:MDPI AG
  • 摘要:The discrete representation of resources in geospatial catalogues affects their information retrieval performance. The performance could be improved by using automatically generated clusters of related resources, which we name quasi-spatial dataset series. This work evaluates whether a clustering process can create quasi-spatial dataset series using only textual information from metadata elements. We assess the combination of different kinds of text cleaning approaches, word and sentence-embeddings representations (Word2Vec, GloVe, FastText, ELMo, Sentence BERT, and Universal Sentence Encoder), and clustering techniques (K-Means, DBSCAN, OPTICS, and agglomerative clustering) for the task. The results demonstrate that combining word-embeddings representations with an agglomerative-based clustering creates better quasi-spatial dataset series than the other approaches. In addition, we have found that the ELMo representation with agglomerative clustering produces good results without any preprocessing step for text cleaning.
国家哲学社会科学文献中心版权所有