期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2016
卷号:9
期号:2
页码:103-120
DOI:10.14257/ijgdc.2016.9.2.10
出版社:SERSC
摘要:Recent years have witnessed an increasing interesting in data stream processing, such as network monitoring, the e-business, advertising system and etc. Join is applied to explore the correlation among the tuples from multiple streams. In this paper, we present a general method named Distributed Streams Join (DSJ) to process multi-way windowed streams θ -joins using a shared-nothing cluster. DSJ contains a distribution method named Time-Slice Distribution Method (TDM) and a join method named Transfer Join Method (TJM). Different from previous work, DSJ can (1) process multi-way θ -joins under arbitrary predicates; (2) preserve the integrity of results and load balance while distributing tuples to different nodes for parallel joining; (3) carry out the join operation in a local optimum order according to the histograms maintained in a real-time way. We have built DSJ on our own stream processing cluster to deal with multi-way streams joins and the experiments demonstrate that our DSJ can not only guarantee the load balance among all the computing nodes but also improve the throughput effectively.