摘要:As the volume of spatial data has rapidly increased over the last several decades, there is a growing concern about missing and incomplete observations that may result in biased conclusions. Several recent studies have reported that machine learning techniques can more efficiently address this limitation in emerging data sets than conventional interpolation approaches, such as inverse distance weighting and kriging. However, most existing studies focus on data from environmental sciences; so, further evaluations are required to assess their strengths and limitations for socioeconomic data, such as house price data. In this study, we conducted a comparative analysis of four commonly used methods: neural networks, random forests, inverse distance weighting, and kriging. We applied these methods to the real estate transaction data of Seoul, South Korea, and demonstrated how the values of the houses at which no transactions are recorded could be predicted. Our empirical analysis suggested that the neural networks and random forests can provide a more accurate estimation than the interpolation counterparts. Of the two machine learning techniques, the results from a random forest model were slightly better than those from a neural network model. However, the neural network appeared to be more sensitive to the amount of training data, implying that it has the potential to outperform the other methods when there are sufficient data available for training.
关键词:house prices; machine learning; spatial interpolation; neural networks; random forests; real estate transactions data