文章基本信息

标题：An Agglomerative-adapted Partition Approach for Large-scale Graphs
本地全文：下载
作者：Chen Tao ; Rongrong Shan ; Hui Li 等
期刊名称：International Journal of Librarianship (IJoL)
电子版ISSN：2474-3542
出版年度：2019
卷号：4
期号：1
页码：3-18
DOI：10.23974/ijol.2019.vol4.1.106
出版社：Chinese American Librarians Association
摘要：In recent years, an increasing number of knowledge bases have been built using linked data, thus datasets have grown substantially. It is neither reasonable to store a large amount of triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs, because many joins can cause performance problems between graphs. This paper presents an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up merging process. The proposed algorithm can partition triples data in three levels: blank nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes involved in reasoning rules, it is better to store with an optimal neighbor node in the same partition instead of splitting into separate partitions. The process of merging associated nodes, needs to start with the node in the smallest cost and then repeat it until the final number of partitions is met. Finally, the feasibility and rationality of the merging algorithm are analyzed in detail through bibliographic cases. In summary, the partitioning methods proposed in this paper can be applied in distributed storage, data retrieval, data export, and semantic reasoning of large-scale triples graphs. In the future, we will research the automation setting of the number of partitions with machine learning algorithms.
其他摘要：In recent years, an increasing number of knowledge bases have been built using linked data, thus datasets have grown substantially. It is neither reasonable to store a large amount of triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs, because many joins can cause performance problems between graphs. This paper presents an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up merging process. The proposed algorithm can partition triples data in three levels: blank nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes involved in reasoning rules, it is better to store with an optimal neighbor node in the same partition instead of splitting into separate partitions. The process of merging associated nodes needs to start with the node in the smallest cost and then repeat it until the final number of partitions is met. Finally, the feasibility and rationality of the merging algorithm are analyzed in detail through bibliographic cases. In summary, the partitioning methods proposed in this paper can be applied in distributed storage, data retrieval, data export, and semantic reasoning of large-scale triples graphs. In the future, we will research the automation setting of the number of partitions with machine learning algorithms.
关键词：Linked Data; Agglomerative-Adapted Partition; Merging Algorithm; Large-Scale Graph; k-Graph