期刊名称:International Journal of Applied Mathematics and Computer Science
电子版ISSN:2083-8492
出版年度:2019
卷号:29
期号:3
页码:1-10
DOI:10.2478/amcs-2019-0034
出版社:De Gruyter Open
摘要:Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration.
This is largely due to the volume of the data acquired using advanced observational tools. While other challenges
typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant
obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed
at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel
processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the
algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and
condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded
that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
关键词:big data; astronomy; data reduction; nearest neighbor search; kd;trees;