摘要:This article proposes a methodology to discover patterns in observed climatologic data, particularly temperatures and rainfall, in subnational political division units using an automatic classification algorithm (a decision tree produced by the C4.5 algorithm). Thus, the patterns represent classification trees, assuming that: (1) every political division unit contains at least one climatological station, and (2) the recording periods of the stations are relatively similar in duration and in their initial and ending years. A series of classification models are produced by using different subsets from an experimental dataset. This dataset contains information from 3606 climatological stations in Mexico with recording periods whose durations, initial and ending years are diverse. The target (dependent) variable in all these models is the name of the political unit (i.e., the state). The predictors are 36 monthly features per each climatological station: 12 features corresponding to a minimum temperature, 12 to a maximum temperature, and 12 to cumulative rainfall. The altitude feature is also used as one of the predictors, in addition to the other 36; however, it is used only to quantify its additional contribution to the modelling. The results show that classification trees are effective models for describing and representing non-trivial patterns to characterize the political division units based on their monthly temperatures and rainfalls. One of the remarkable findings is that the cumulative rainfall of May is the feature with highest discrimination capability to the characterization task, which is consistent with the theoretical background on Mexican climatology. In addition, classification trees offer higher expressivity to non-experts in machine learning.
其他摘要:This article proposes a methodology to discover patterns in observed climatologic data, particularly temperatures and rainfall, in subnational political division units using an automatic classification algorithm (a decision tree produced by the C4.5 algorithm). Thus, the patterns represent classification trees, assuming that: (1) every political division unit contains at least one climatological station, and (2) the recording periods of the stations are relatively similar in duration and in their initial and ending years. A series of classification models are produced by using different subsets from an experimental dataset. This dataset contains information from 3606 climatological stations in Mexico with recording periods whose durations, initial and ending years are diverse. The target (dependent) variable in all these models is the name of the political unit (i.e., the state). The predictors are 36 monthly features per each climatological station: 12 features corresponding to a minimum temperature, 12 to a maximum temperature, and 12 to cumulative rainfall. The altitude feature is also used as one of the predictors, in addition to the other 36; however, it is used only to quantify its additional contribution to the modelling. The results show that classification trees are effective models for describing and representing non-trivial patterns to characterize the political division units based on their monthly temperatures and rainfalls. One of the remarkable findings is that the cumulative rainfall of May is the feature with highest discrimination capability to the characterization task, which is consistent with the theoretical background on Mexican climatology. In addition, classification trees offer higher expressivity to non-experts in machine learning.