摘要:A geodemographic classification aims to describe the most salient characteristics of a small
area zonal geography. However, such representations are influenced by the methodological
choices made during their construction. Of particular debate are the choice and specification
of input variables, with the objective of identifying inputs that add value but also aim for
model parsimony. Within this context, our paper introduces a principal component analysis
(PCA)-based automated variable selection methodology that has the objective of identifying
candidate inputs to a geodemographic classification from a collection of variables. The
proposed methodology is exemplified in the context of variables from the UK 2011 Census,
and its output compared to the Office for National Statistics 2011 Output Area Classification
(2011 OAC). Through the implementation of the proposed methodology, the quality of the
cluster assignment was improved relative to 2011 OAC, manifested by a lower total withincluster
sum of square score. Across the UK, more than 70.2% of the Output Areas (OAs)
occupied by the newly created classification (i.e. AVS-OAC) outperform the 2011 OAC, with
particularly strong performance within Scotland and Wales.
关键词:Geodemographics ; variable selection ; UK census ; spatial data mining ; principal component analysis