期刊名称:International Journal of Librarianship (IJoL)
电子版ISSN:2474-3542
出版年度:2019
卷号:4
期号:1
页码:3-18
DOI:10.23974/ijol.2019.vol4.1.106
出版社:Chinese American Librarians Association
摘要:In recent years, an increasing number of knowledge bases have been built using linked data,
thus datasets have grown substantially. It is neither reasonable to store a large amount of
triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs,
because many joins can cause performance problems between graphs. This paper presents
an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up
merging process. The proposed algorithm can partition triples data in three levels: blank
nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes
involved in reasoning rules, it is better to store with an optimal neighbor node in the same
partition instead of splitting into separate partitions. The process of merging associated
nodes, needs to start with the node in the smallest cost and then repeat it until the final
number of partitions is met. Finally, the feasibility and rationality of the merging algorithm
are analyzed in detail through bibliographic cases. In summary, the partitioning methods
proposed in this paper can be applied in distributed storage, data retrieval, data export, and
semantic reasoning of large-scale triples graphs. In the future, we will research the
automation setting of the number of partitions with machine learning algorithms.
其他摘要:In recent years, an increasing number of knowledge bases have been built using linked data, thus datasets have grown substantially. It is neither reasonable to store a large amount of triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs, because many joins can cause performance problems between graphs. This paper presents an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up merging process. The proposed algorithm can partition triples data in three levels: blank nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes involved in reasoning rules, it is better to store with an optimal neighbor node in the same partition instead of splitting into separate partitions. The process of merging associated nodes needs to start with the node in the smallest cost and then repeat it until the final number of partitions is met. Finally, the feasibility and rationality of the merging algorithm are analyzed in detail through bibliographic cases. In summary, the partitioning methods proposed in this paper can be applied in distributed storage, data retrieval, data export, and semantic reasoning of large-scale triples graphs. In the future, we will research the automation setting of the number of partitions with machine learning algorithms.