摘要:AbstractIn building Heating Ventilation and Air Conditioning (HVAC) control systems, the HVAC must be controlled in real-time according to the actual building cooling load demand while minimizing energy consumption. Given that the building cooling load is affected by several factors and that historical data are sparse and have outliers due to sensor errors, existing prediction methods cannot accurately calculate the building's cooling load demand. Therefore, this paper first pre-processes the historical building cooling load data, fills in the missing data utilizing the KNNImputer algorithm, and then locates and replaces the outlier data through the 3-sigma method, solving the poor data availability problem. Moreover, to address the problem of small feature variety in historical data, this paper analyzes the time features and the climatic features separately. Specifically, our method splits the time features into several independent features for describing the impact of pedestrian flow on the building cooling load and expands the climatic features to increase the variety of the features that can be used for training and improving the model's prediction accuracy. Finally, the LightGBM algorithm is exploited to train the processed data and build a building cooling load prediction model. The proposed data pre-processing method and model training algorithm are validated through Python programming, with the results highlighting that the proposed method effectively improves the prediction accuracy of building cooling loads.