USING LAND-USE MODELLING TO STATISTICALLY DOWNSCALE POPULATION PROJECTIONS TO SMALL AREAS.
Cameron, Michael P. ; Cochrane, William
USING LAND-USE MODELLING TO STATISTICALLY DOWNSCALE POPULATION PROJECTIONS TO SMALL AREAS.
1. INTRODUCTION
Local government planners, property developers, real estate agents,
large businesses and other stakeholders need good projections or
forecasts of the future spatial distribution of population for planning
purposes (Foss, 2002). Indeed, Myers (2001, p.384) notes that
"planning analysts regard population statistics as integral to
virtually all aspects of planning". This includes planning for
future land developments, schools, hospitals, child care centres, care
services for elderly people, traffic flows, electorate boundaries, and
so on. In the case of local government, this need for good data is often
reinforced by legislation that increasingly calls for fiscal
sustainability. For instance, the recently amended Local Government Act
2002 in New Zealand, requires territorial and local authorities to
engage in asset management planning with a fifty-year time horizon. In
order to develop detailed asset management plans, local governments
therefore need a good understanding of future population growth, not
only in total but for particular localities within their districts or
regions.
The risks to planners of planning on the basis of an inaccurate
forecast of the future population distribution can be large. An
overestimate of future population growth for a particular area will
induce over-investment in infrastructure (such as roads, water and other
utilities) with resultant costs on the local authority. These costs may
be able to be passed onto private sector developers but, if not, they
will be borne by local ratepayers. On the other hand, an underestimate
of future population growth will lead to infrastructure being
insufficient to meet the needs of the population, with costs (in terms
of congestion or shortages of services) borne by the local populations.
While accuracy is only one of several criteria on which small-area
population forecasts might be judged (Tayman and Swanson, 1996; Tayman,
2011), from the perspective of local authority planners the need for
accurate and timely population forecasts is clear.
A range of government and non-government organisations (typically
academics or consultants) produce population projections. The projection
assumptions that will lead to the most accurate forecast of future
population are unknown (and some might argue, unknowable). This means
that all forecasts of future population will be subject to error. The
magnitude of forecast error tends to be substantially larger for areas
with small populations, in comparison with more populous areas (Cameron
and Poot, 2011), which poses a particular problem for those interested
in the distribution of population over small areas. Moreover, the
methods for projecting population at small-area level are
under-developed relative to projection methods for larger areas,
although research in this area has increased recently (Chi, 2009; Chi et
al., 2011). In particular, it is unclear whether modellers should adopt
a top-down approach to population projections (projecting large areas
such as the national population first, followed by sequentially smaller
component areas) or a bottom-up approach (projecting small areas first;
then deriving projections for larger areas by summing the projections of
their component small areas).
In this paper, a novel approach is adopted that uses a top-down
cohort component population projection to define a district-level
population, and then allocates the population spatially to small areas
using statistical downscaling based on a model of land use. Statistical
downscaling refers to using statistical methods to interpolate
regional-scale variables to smaller geographical scales, and is widely
used in the climate change literature (e.g. see Kim et al. (1984) for an
early application). Four different model specifications based on land
use are developed, and then compared with two naive projections. The
remainder of the paper proceeds as follows. First, the range of
approaches that have been applied to the projection of population for
small areas are described, and the strengths and weaknesses of each
approach are outlined. Second, the study area and the projection model
are described, as well as the method for evaluating the in-sample and
out-of-sample forecasting performance of the model. Third, the results
of the evaluation are presented. Finally, the implications of the
results for the use of similar models in local authority planning are
discussed.
2. SMALL-AREA POPULATION PROJECTIONS
A variety of methods have been applied by demographers and
population modellers to develop small-area population projections to
satisfy planning needs. These methods can be generally categorised into
four types: (1) naive models, e.g. extrapolation, or growth share
models; (2) the 'traditional' demographic cohort component
model; (3) statistical methods, using data such as building consents;
and (4) urban growth modelling approaches. While there may be some
overlap between these methods, and it is possible (and sometimes
desirable) to combine approaches, the following paragraphs discuss the
relative strengths and weaknesses for local authority planning of each
of these methods separately.
Naive models are essentially simple extrapolations of past
populations. The simplest of these models involve an assumption of no
population change (i.e. constant population), linear population growth
based on past growth trends, and exponential growth (i.e. constant
population growth rates) based on past growth trends. Slightly more
sophisticated are growth share models, which are top-down models where
the population is initially projected at a higher geographical level,
and then that projected growth is shared between the different small
areas. For example, the total population for a state or county may be
projected first, and then the growth share model used to allocate the
state- or county-level population to the census tract level.
Naive models tend to perform reasonably well, in terms of forecast
accuracy, when compared with more sophisticated models (van der Gaag et
al., 2003). Wilson (2015a) tests a wide array of these naive models for
small areas in Australia, England/Wales, and New Zealand. He finds that,
in terms of individual models, a constant share of population (CSP)
model works best (smallest error) for England/Wales, and a variation on
a constant share of growth (CSG+) model works best for Australia and New
Zealand. Following White (1954), the CSG+ model assumes a constant share
of growth for those areas that experienced positive population growth in
the base period (i.e. just before the start of the projection), and no
growth for areas that declined in population in the base period. Within
the class of naive models, more sophisticated models are not necessarily
always better. For instance, Rayer and Smith (2010) have found linear
extrapolation to be more accurate than growth share models for
sub-county areas in Florida. However, naive models have practical
limitations. First and foremost, they lack a strong theoretical basis.
The purely mechanistic and deterministic application of past time trends
into the future may be appropriate for small areas that have stable and
predictable growth paths, but most small areas are subject to unexpected
changes in population. By ignoring the demographic or urban/land use
drivers of population change (that the more sophisticated models
outlined below feature), these models will fail to adequately account
for changes in these drivers. On a related note, because local
contextual factors are not incorporated into the model, naive models of
population change are difficult to justify to planners or, importantly,
to elected officials. This is because there are no mechanisms for local
policy to affect the future population distribution. This can lead to a
lack of 'buy-in' from important end-users of the projections.
The traditional workhorse of demographic projections is the cohort
component model (CCM). In the CCM the population is projected by first
projecting the three components of population change: (1) births,
typically projected by means of age-specific fertility rates applied to
women of childbearing ages; (2) deaths, typically projected by means of
age-sex-specific mortality (or its complement, survivorship) rates
applied to the population of each age and sex; and (3) migration, which
may be projected in a number of different ways (van der Gaag et al.,
2003).
The advantage of the CCM is that all of the demographic drivers of
population change are explicitly included in the model, because the CCM
formula is an identity. Depending on the process used for modelling each
component, other known drivers of population change can be explicitly
included through their influences on fertility, mortality, and/or
migration. Because the drivers are included, the CCM is typically
intuitively understandable for local authority planners and elected
officials.
However, at small-area levels, the CCM faces significant
challenges. First, the data necessary for deriving assumptions about
future fertility, mortality and migration may not be available at small
geographical scales (Wilson, 2015b). Where these data are available,
data quality or precision may be so low that it may be difficult to
derive robust age-specific rates for each component (fertility,
mortality, inward and outward migration) at local levels (Wilson,
2015b;. Tayman et al., 1998). For instance, age-sex-specific data are
typically required to estimate fertility and mortality rates. At the
small-area level, the counts of births and deaths that occur each year
(particularly when you consider age-specific counts) are small and
highly variable, such that the estimation of fertility and mortality
rates becomes extremely challenging.
Second, despite the promise that incorporating drivers of each
component (fertility, mortality, migration) holds, most CCM models fail
to adequately take account of a myriad of socio-economic,
infrastructural, physical land use and other contextual factors that
exert substantial influence over the spatial allocation of population
and households at smaller geographical levels. Typically, these factors
are excluded due to data unavailability and the inability to reliably
forecast them. Contextual factors matter much more at the local level,
such as the availability of suitable land, services and amenities, and
the plans of public and private land developers (Murdock et al., 1991).
In particular, land use and availability constraints, planning
constraints, and the availability of infrastructure are all variables
that local authority planners would expect to impact on the future
population distribution at the small-area level.
Of course, the quantitative test of the CCM is whether it
outperforms other models in evaluations of small-area projections.
Unfortunately, past studies have shown that CCMs do not outperform
simpler methods in projecting small area populations (Smith, 1997; Smith
and Tayman, 2003). Statistical models offer one way to include the
important contextual variables that are absent from the naive models,
and often missing from CCM models. For instance, regression models have
become increasingly common in small-area projections (Alho and Spencer,
2005), including more recently spatial regression models (Chi et al.,
2011). Spatial regression models may be preferred over aspatial models
because the effects of the characteristics and contexts of neighbouring
areas may also be important drivers of population change in each small
area (Chi et al., 2011), and traditional regression models are unable to
account for all of the spatial interactions (Lesage and Pace, 2009). For
example, Chi et al. (2011) used spatial lag models to derive population
projections for census tracts in Milwaukee, Wisconsin.
However, as with the limitation for including more detailed
population drivers within CCM models, data availability may be a serious
issue, and all data that is used within the statistical model must also
be projected. Statistical models also suffer from a range of
well-recognised issues, including temporal instability of coefficients
and over-fitting (Tayman and Schafer, 1985). Finally, like CCM models,
statistical models have not been demonstrated to outperform even simple
models of small-area populations in terms of forecast accuracy, even
when the statistical model includes spatial interactions. For instance,
Chi et al., (2011) found that their spatial lag model for Milwaukee did
not unambiguously outperform projections derived from simple
extrapolation methods.
The final category of small-area projection models is models based
on urban growth modelling approaches, including: (1) Cellular Automata
(CA) modelling; (2) Artificial neural networks; (3) Fractal modelling;
(4) Agent-based modelling; and (5) Decision-trees modelling. CA
modelling involves separating each area into a grid of cells, each of
which has a number of characteristics (which may include population
size). In each time step of the model, each cell may change its
characteristics in response to shifts in the characteristics of
neighbouring cells and changes in the nature of the system as a whole
(see also the description of the land use model in the following
section). These urban growth modelling methods are described and
reviewed in detail by Triantakonstantis and Mountrakis (2012). The
advantages of urban growth modelling approaches include a much stronger
theoretical base than statistical modelling, and that these models are
able to more explicitly account for the local socio-economic conditions
and physical and planning constraints at the small-area level. However,
the limitations of urban growth modelling approaches are similar to
those for statistical models, including high data requirements.
An alternative to applying one of the four approaches above is to
combine two or more approaches in order to leverage their particular
strengths, and attempt to address their limitations. One increasingly
common combined approach involves using demographic projections such as
CCM models to derive estimates of the future population at a relatively
broad geographical scale, then using one of the other approaches to
systematically downscale or apportion the population to the small-area
level. Combining two approaches can take account of both the underlying
demographic processes that drive population change, and the local-level
conditions that primarily determine the spatial allocation of households
and people (Wilson, 2015b). Moreover, by combining two methods the
demographic model is not overextended to a point where the data
necessary to derive population projection assumptions (fertility,
mortality, and migration) are not readily available.
In the combined approach the method of allocating population
between small areas becomes the most important determinant of forecast
accuracy at the small-area level. Land use based models have been used
to downscale or allocate population to small areas for at least the last
two decades. Tayman (1996) reports results of a forecast based on a
spatial interaction land use model for San Diego County. The land use
model uses place-of-work employment to allocate population, such that
the population tends to locate closer to their place-of-work, while
constraining population based on each zone's capacity to
accommodate additional residential development. Tayman and Swanson
(1996) used similar models for San Diego and Dallas-Fort Worth.
3. DATA AND METHODS
Data
The Waikato Region of New Zealand had a 2013 total population of
approximately 425 000 (about 10 per cent of the total New Zealand
population). It has a central main city (Hamilton City) with a 2013
population of approximately 150 000, two districts that are peri-urban
(Waikato District and Waipa District), and a number of other Territorial
Authority (TA) areas (the second tier of local government administration
in New Zealand) in whole or in part (refer to Table 1). The region is
not a simple aggregation of the TAs because the region is largely based
on a water catchment area, whereas the TA boundaries reflect
administrative divisions that are historical and somewhat arbitrary.
In this paper, projections are developed at the Area Unit (AU)
level. Area Units are the next smaller geographical area below TAs in
the geographical hierarchy used by Statistics New Zealand. They serve no
particular administrative purpose--however, each AU is a distinct
geographical entity, and in urban areas they generally coincide with
suburbs and have a population of 3 000-5 000. As shown in Table 1, in
the Waikato Region there are 197 non-marine non-island AUs, with a mean
population size in 2013 of 2 124 (median 1 840), and a range from a
minimum of zero to a maximum of 7 750.
Statistical Downscaling Method
In this paper, statistical downscaling was combined with
projections of future land use to allocate projected TA-level
populations to each AU. The three-step approach is similar to that
employed by Tayman and Swanson (1996) and Tayman et al. (1998), but uses
a combined statistical and urban growth modelling approach to allocate
population to the AUs.
First, the population was projected at the TA level for the region
(including for each part-TA) by the National Institute of Demographic
and Economic Analysis, using a cohort component model (Cameron and
Cochrane, 2014). The 'Baseline Medium' TA-level projected
populations were used as an input in the following stages, including a
backcast projection from 2013 (the base year of the TA-level
projections) to 2006. Second, land use was projected using the Waikato
Integrated Scenario Explorer (WISE) model. The WISE model is a
systems-based integrated model that incorporates economic, demographic,
and environmental components across the entire Waikato Region (Rutledge
et al., 2008; 2010). The WISE model begins with a base land use map in
2006, incorporating 24 different land uses, of which there are three
residential land use classes (medium-high density, low density, and
lifestyle blocks) (Rutledge et al., 2010). At each annual time step, the
economic and demographic submodels generate demands for economic and
residential land use, which are inputs into a dynamic, spatially
explicit land use change model (Huser et al., 2009). The demographic
inputs into the WISE model are the TA-level population projections for
the Waikato region developed in the first step.
The land use change model is a CA model specified at the level of
four-hectare grid cells (200m x 200m). The CA model apportions land to
different uses at each annual time step based on a combination of four
factors: (1) zoning (which constrains the land uses that are available
in each area); (2) suitability (the biophysical suitability of land for
different uses); (3) accessibility (assesses the attractiveness of a
location for different land uses based on proximity to desirable or
undesirable features); and (4) local influence (assesses the
attractiveness of a location for a land use based on the composition of
land use in the surrounding neighbourhood). The CA land use model
attempts to meet the external demands for land (from the economic and
demographic models) by assigning cells with the highest transition
potentials (determined by their zoning, suitability, accessibility and
local influence) to new land uses. Transitions are made at each annual
time step.
The demand for residential land of each type is determined by first
assigning a given proportion of population in each territorial authority
to each residential land use type, and the residual proportion is spread
across all non-residential land uses. The proportions are generally
stable but vary over time for some TAs. Next, the number of residential
land use cells of each type required is determined by combining the
population in each residential land use calculated in the first step
with population density values for each residential land use type. These
population densities also vary over time, between pre-determined maximum
and minimum values. The area of each land use type (in hectares) and the
residential population densities (by residential land use type) were
exported from the WISE model for 2006 and 2013 for use in the next step.
In the third step, land use was used to statistically downscale the
TA-level population projections to the AU level. This was achieved in
two stages, projecting: (1) the population located in residential land
uses; and (2) the population located in non-residential land uses. In
the first stage, the number of hectares of each residential land use
type in each AU and the residential population densities (both from the
WISE model) were used to calculate the residential population of each AU
(i.e. the population located in residential land uses) for each year
(2006 and 2013). The difference between the sum of the residential
populations across all AUs in each TA and the overall projected TA-level
population provides an estimate of the total non-residential population
in that TA (i.e. the population located in non-residential land uses).
To estimate the non-residential population in each AU, linear
regression models were used, with the 2006 TA-level non-residential
population as the dependent variable, and the 2006 baseline
non-residential land use (by type) as explanatory variables. That is,
regression model is estimated of the general form:
NR[P.sub.i] = [alpha] + [N.sub.ki][beta] + [[epsilon].sub.i] (1)
Where NR[P.sub.i] is the non-residential population of area unit i,
[N.sub.ki] is a vector of land uses k in AU i, and [[epsilon].sub.i] is
an idiosyncratic error term. Four alternative model and data
specifications were tested for this model: (I) standard ordinary least
squares (OLS) regression, based on absolute land use; (II) standard OLS,
based on principal components of land use; (III) a spatial Durbin model,
based on absolute land use; and (IV) a spatial Durbin model, based on
principal components of land use.
The rationale for applying these four different specifications was
as follows. Absolute land use (in hectares) is the most basic land use
variable, and this data specification was included because the
coefficients for each land use type would be easy for planners to
interpret (as the number of people per hectare). In contrast, principal
components analysis takes the land use dataset and converts it into a
set of linearly uncorrelated components (Joliffe, 2010). This avoids any
problems of multicollinearity between the land use variables. The
principal component specifications also allow the model to account for
different types of internal land use structures that may be reflected in
different land-use-specific population densities at the AU level.
Spatial Durbin models account for neighbourhood effects (i.e. where the
non-residential population in AU i is affected by the size of the
non-residential population in surrounding AUs) and for lag effects
(where the non-residential population in AU i is affected not only by
the amount of each land use type in that area unit, but also the amount
of each land use type in surrounding AUs) (Lesage and Pace, 2009).
Eleven land uses were initially excluded from the models (bare
surfaces; indigenous vegetation; other exotic vegetation; wetlands;
fresh water; marine; aquaculture; utilities; mines and quarries; urban
parks; and airports), because they were unlikely to contain much of the
population. The three residential land uses were also excluded from the
models, as the population in those land uses was already accounted for.
That leaves ten land use variables in the model. Separate regression
models were fitted for Waikato District, Hamilton City, and Waipa
District, with a fourth regression model fitted for the remaining TAs
(due to small individual sample sizes). The fourth model initially
included TA-level fixed effects to account for unobserved differences in
population density profile between each TA. Each regression model was
reduced to a final preferred model by removing the least significant
variable in a backward stepwise fashion until the root mean squared
error (RMSE) was minimised. The resulting regression models are a
reasonably good fit for the data, with adjusted coefficients of
determination ([R.sup.2]) between 0.17 and 0.80 (These results are
available on request from the authors.).
The regression model coefficient estimates were then used, along
with projected land use from the WISE model for 2013, to provide a
projection of the non-residential population of each AU in 2013. When
added to the residential population from the first stage of step 3, the
sum provides an un-scaled population projection for each AU. However,
two issues arose with these un-scaled projections: (1) the projections
demonstrated significant discontinuity with the known population trend
between 2001 and 2006 for a number of AUs; and (2) a number of AUs were
projected to quickly fall to zero (or negative) population. To reduce
the impact of the discontinuities, the in-sample residual was calculated
for each AU in 2006 (being the difference between the actual 2006
population and the estimated 2006 population). This in-sample residual
for each AU was added to the projected AU population. This reflects the
fact that the residuals in the population projection model are likely to
be correlated over time. To reduce the impact of projected de-population
of (particularly rural) AUs, each un-scaled AU population projection was
constrained so that population would not fall by more than 25 per cent
over a ten-year period. This maximum constraint is similar to the
maximum long-run population decline observed in any AU over the period
1996-2006. Moreover, this adjustment is justifiable because the spatial
distribution of population is subject to a substantial degree of
inertia--once houses have been constructed in a given location, some
population is likely to remain in that location for a long time. That
is, population decline at small spatial scales is a relatively slow
process, unlike that projected in the initial unconstrained models.
Finally, the combined population of all AUs in each TA was
constrained to be consistent with the projected population of the TA
from the cohort component model. Discrepancies between the AU-based
population total and the TA-level projection were eliminated by applying
a common scaling factor to the AU populations for each TA, calculated as
the ratio of the projected TA-level population to the sum of the
unconstrained AU populations.
Evaluation Method
The performance of the approach was evaluated in two ways. First,
the in-sample performance of the model was examined. Specifically, the
four alternative regression models for projecting non-residential
population (and the resulting estimates of the AU-level populations) in
2006 were compared with those estimated with actual populations. Second,
the out-of-sample forecast accuracy was evaluated by doing a post-hoc
comparison of the small area forecasts with data from the 2013 estimated
usually resident populations (based on the 2013 Census). The forecast
accuracy of the four models was compared with that of two naive models:
(1) a linear extrapolation, that takes the population change from 2001
to 2006, and extrapolates this to 2013; and (2) a CSG+ (modified
constant share of growth) model, which assumes a constant share of TA
growth for each AU that experienced positive population growth between
2001 and 2006, and no growth for areas that declined in population
between 2001 and 2006 (White, 1954; Wilson, 2015a). Rather than using a
population projection model in the CSG+ model, the change between actual
2006 and 2013 estimated usually resident populations was used. This
provides an over-conservative estimate of the degree of error and bias
in the CSG+ model.
Multiple measures of forecast error and bias were estimated.
Following Wilson (2015b), the primary measure of forecast accuracy is
weighted mean absolute percentage error (WMAPE). This measure is a
weighted mean of the absolute percentage errors, with the weights being
the size of the actual populations in the year projected (Siegel, 2002).
WMAPE is preferable to other measures (such as Mean Absolute Percentage
Error) when there is a wide range of population sizes. The AU
populations in the study area range from zero to 7 750 in 2006, which
makes WMAPE the most suitable measure.
The median absolute percentage error (MedAPE), the median algebraic
percentage error (MedALPE), and the root mean square error (RMSE) are
also reported. MedAPE and RMSE both measure forecast precision because
the direction of the error does not affect these measures, while MedALPE
measures forecast bias. Although Mean Absolute Percentage Error (MAPE)
and Mean Algebraic Percentage Error (MALPE) are the most commonly used
measures of forecast accuracy and bias respectively (Tayman, 1996),
MedAPE and MedALPE are preferable over MAPE and MALPE. This is because
using the median error reduces the impacts of extreme outliers (i.e.
unusually large, or small, errors) and the skewed nature of the
distribution of error in small populations, on the overall measures of
error and bias (Tayman and Swanson, 1999). For instance, Tayman (1996)
shows that MAPE tends to overstate the error, and that the degree of
overstatement is largest for areas with the smallest population size. In
contrast to these other measures, RMSE penalises the forecaster for
forecasts that are further from the actual population (Stoto, 1983),
which may be helpful for risk averse planners adopting a minimax
approach, i.e. where forecasts will provide the planner with greater
utility if the largest errors are minimised.
4. RESULTS
Table 2 shows the results of the evaluation of the in-sample (2006)
and out-of-sample (2013) performance of the four model and data
specifications (I-IV), using all four error measures (WMAPE, MedAPE,
MedALPE, and RMSE). Two out-of-sample comparisons are included: (1)
using the raw statistical model described in the previous section; and
(2) using the statistical model, but carrying forward the 2006 in-sample
residual and using it to modify the 2013 projection.
Overall, the models exhibit a moderate degree of accuracy, with
in-sample errors of between 14.3 and 19.0 per cent. There is an overall
downward bias in the models, as the MedALPE values are consistently
negative. Unlike Wilson and Rowe (2011) the estimates of WMAPE here are
larger than MedAPE, which probably reflects that absolute errors are
largest for the AUs with larger populations. In terms of in-sample
performance, Models I and II perform similarly, but are clearly
dominated by Models III and IV (the spatial regression models), which
exhibit smaller degrees of both error and bias. Comparing Models III and
IV, both are similar in terms of error, but Model IV exhibits a smaller
degree of bias. The median extent of bias in Model IV is 0.1 per cent
under-projection, compared with 0.7 per cent under-projection for Model
III. The comparison between the four models is similar in the two
out-of-sample comparisons.
Comparing the in-sample with the first out-of-sample results
demonstrates that the land-use-based projection model performs nearly as
well at seven years after baseline as it does in the baseline year.
There is little degradation of performance over time for any of the
models, with WMAPE increasing by between 1.5 and 2.8 percentage points
between the in-sample and out-of-sample measures.
In the modified out-of-sample measures, Model III is clearly worse
than other models, and Models II and IV perform best and nearly
identically. There is no evidence of forecast bias in either of these
models, and WMAPE is just 6.7 per cent. Comparing the out-of-sample and
modified out-of-sample results demonstrates the substantial performance
improvement that is obtained by carrying forward the in-sample residual.
Across all models this reduces the WMAPE by between one half and two
thirds.
Table 3 shows the results of the out-of-sample comparison between
the four land-use-based models and the two naive models (linear
extrapolation, and CSG+). To ensure comparability, the models were used
to allocate the 2013 estimated usually resident population, rather than
the 2006-base projected TA-level populations for 2013. Thus the error
measures differ slightly from those reported in Table 2. As with the
last comparison in Table 2, Models II and IV perform the best of the
four land-use-based models. They also perform better than the naive
linear extrapolation on three out of the four error measures (i.e. all
except RMSE). However, the CSG+ model performs the best on all error
measures, with a WMAPE of 5.6 per cent, 1.1 percentage points better
than Models II and IV.
5. DISCUSSION AND CONCLUSION
This paper reported results of new land-use-based models for small
area population projections, and compared those models' forecast
accuracy with that of naive projections based on linear extrapolation
and a modified constant-shares-of-growth model. The land-use-based
approach can readily be employed to projections in other areas, but
necessarily requires a land use model. However, the land use model need
not be as detailed as that employed here.
To date, few studies have compared the forecast accuracy of
combined projection models with that of simpler models. The preferred
model (Model II) uses the WISE land use model to derive the residential
population of each AU, and a spatial Durbin model using absolute
nonresidential land use (in hectares) as explanatory variables to derive
the nonresidential population of each AU. Model II was preferred over
the similar Model IV because of the ease of interpretation of
coefficients for end-users. The preferred model has an out-of-sample
WMAPE of 6.7 per cent over a seven-year projection horizon.
The error in the preferred model compares favourably with previous
studies that use a variety of models (and measures of error). Wilson and
Rowe (2011) found WMAPEs after five years varied from 6.0-7.3 per cent
for areas with a population of 2 000-4 999, and 5.0-6.7 per cent for
populations of 5 000-14 999, for projections of the population of
Queensland, Australia. These WMAPEs increased to 8.2-11.3 per cent and
6.7-9.5 per cent respectively for a ten-year projection horizon. Tayman
et al. (1998) report that MAPE is a decreasing function of population
size, based on San Diego data and a projection model that uses land use
to allocate populations to small areas. They show MAPEs for a ten-year
projection that range from 72 per cent for populations of 500, to 39 per
cent for populations of 5 000, and to 10.5 per cent for populations of
50 000. Tayman and Swanson (1996), using a land-use-based model for
census tracts in Detroit, Dallas-Fort Worth, and San Diego, found MAPEs
of between 18.6 and 28.5 per cent for a 10-year forecast horizon. For
comparison, the unweighted out-of-sample MAPE for the preferred Model II
is 10.9 per cent for a seven-year projection, which is substantially
lower than those reported in these previous studies. Census tracts
typically have larger population sizes than the area units projected
here, which further demonstrates the efficacy of the approach.
The source of the improved projections performance of the
land-use-based models, relative to previous projections models, is
predominantly generated by the carrying-forward of in-sample residuals.
Without carrying forward the residuals, the out-of-sample performance of
the models looks much more similar to those of other models. This
procedure makes sense for statistical and urban growth models (but not
for extrapolation or CCM models). If other studies using statistical
models, such as Chi et al. (2011), carried forward residuals in their
forecasts, then their model performance may look much better.
Land-use-based population models that account for spatial
interdependence (Models II and IV) outperform models that ignore these
effects (Models I and III). The population density of a given small area
reflects a complex interplay of the land use of that particular area,
and the land uses of surrounding areas. For instance, urban land uses
will have quite different characteristics and population densities than
rural land uses, even within the same category of land use. Spatial
Durbin models allow us to capture the spatial dependence, and small area
population projection models should make more use of spatial models.
However, despite their good performance in comparison with past
modelling efforts, the land-use-based forecasts do not outperform the
naive CSG+ model (White, 1954; Wilson, 2015a). The inability of complex
models to outperform simple models in projections of small area
population is a general finding in the literature on small area
population projections (van der Gaag et al., 2003). However, the WISE
land use model used here has recently undergone significant improvement,
with input from a wide group of local authority planners. This improved
model, which operates with a 2013-base land use map, demonstrates
substantially better in-sample performance, with a nearly 30 per cent
reduction in WMAPE (based on Model I) to 13.5 per cent. Unfortunately,
out-of-sample model testing based on this new land use model will not be
possible until after the 2018 Census, but these initial results are
extremely promising.
There are a number of limitations to the models presented here.
First, the projections do not include an explicit measure of
uncertainty. Instead, the AU populations were simply forecasted as point
estimates. However, the measures of forecast error could be used to
estimate uncertainty (Tayman, 2011). It is worth noting that the degree
of uncertainty present in population projections at smaller geographic
levels is substantially larger for smaller populations (Cameron and
Poot, 2011), so understanding better the uncertainty in the estimates is
clearly important. Second, because the forecasts are based on a
statistical model at the small-area level, they potentially suffer from
the same limitations as statistical models outlined in the introduction.
However, because the statistical model is only used to project the
non-residential population, rather than the whole population, these
problems are somewhat mitigated. Third, the projections were evaluated
based on only a single period in time and a single region of New
Zealand. It may be that demographic trends fit the land-use-based models
particularly well (or not so well) by chance alone. The model will be
evaluated further in later periods, but should also be applied to other
regions and contexts.
Fourth, it is likely that small area population projections present
a problem of endogeneity. If projections are used in planning decisions
then they may become somewhat self-fulfilling prophecies. For instance,
if population is projected to increase in a given AU, then planners may
create infrastructure that supports the expected additional population,
leading to more development in that AU and consequently more population.
However, if population had been projected to increase elsewhere instead,
then infrastructure spending, development and population growth would be
directed towards that other area instead. Thus, small area population
projections should be used as one tool among many in the planning
process.
Finally, despite the forecast accuracy of the land-use-based models
being lower than naive models, the land-use-based models do serve an
important purpose. Too often, population projection models are seen by
local authority planners and elected officials as 'black
boxes' or academic curiosities that have little relevance to the
real world. As Rainford and Masser (1987) note, bridging the gap between
the technical aspects of forecasting and the needs of planners is both
important and difficult. Achieving 'buy-in' from planners and
elected officials is imperative in ensuring that population projections
are understood and used effectively to achieve improved planning
outcomes. One part of this is to ensure that planners can recognise that
planning and policy have demonstrable effects on the projected
populations at the small-area level. The land-use-based models have been
very successful in this, and are being used extensively in long term
planning processes at the local and regional level. Further enhancements
to the model, including the improved land use modelling described above,
will likely further increase the acceptance of planners for integrated
modelling approaches.
ACKNOWLEDGEMENTS: This research was funded by Waikato Regional
Council, FutureProof, Latitude Planning and Waikato Shared Services.
Development of the WISE model was funded by the Foundation for Research,
Science and Technology. We are grateful to Jacques Poot and to
participants at the 8th International Conference on Population
Geographies, the 55 th European Regional Science Association Congress,
and a seminar at Statistics New Zealand, for their helpful comments. We
also thank Beat Huser, Tony Fenton, and Hedwig van Delden for assistance
with the land use modelling, and Sialupapu Siameja for research
assistance.
REFERENCES
Alho, J. M. and Spencer, B. D. (2005). Statistical Demography and
Forecasting. Springer, New York.
Cameron, M. P. and Cochrane, W. (2014). Population, Family and
Household, and Labour Force Projections for the Waikato Region,
2013-2063. Research report commissioned by the Waikato Regional Council
Hamilton, New Zealand: University of Waikato.
Cameron, M. P. and Poot, J. (2011). Lessons from Stochastic
Small-Area Population Projections: The Case of Waikato Subregions in New
Zealand. Journal of Population Research, 28, pp. 245-265.
Chi, G. (2009). Can Knowledge Improve Population Forecasts at
Subcounty Level? Demography, 46, pp. 405-427.
Chi, G., Zhou, X. and Voss, P. R. (2011). Small-Area Population
Forecasting in an Urban Setting: A Spatial Regression Approach. Journal
of Population Research, 28, pp. 185-201.
Foss, W. (2002). Small Area Population Forecasting. The Appraisal
Journal, 70, pp. 163-172.
Huser, B., Rutledge, D., van Delden, H., Wedderburn, L. M.,
Cameron, M., Elliot, S., Fenton, T., Hurkens, J., McBride, G., McDonald,
G., O'Connor, M., Phyn, D., Poot, J., Price, R., Small, B., Tait,
A., Vanhout, R. and Woods, R. A. (2009). Development of an Integrated
Spatial Decision Support System (ISDSS) for Local Government in New
Zealand. Proceedings of the 18th World IMACS / MODSIM Congress on
Modelling and Simulation, pp. 2370-2376, Lincoln University,
Christchurch.
Joliffe, I. T. (2010). Principal component analysis. 2nd edition,
Springer, New York.
Kim, J. W., Chang, J. T., Baker, N. L., Wilks, D. S. and Gates, W.
L. (1984). The Statistical Problem of Climate Inversion: Determination
of the Relationship Between Local and Large-Scale Climate. Monthly
Weather Review, 112, pp. 2069-2077.
Lesage, J. and Pace, R. K. (2009). Introduction to spatial
econometrics, CRC Press, Boca Rato, FL.
Local Government Act 2002 (NZ). Available at
http://www.legislation.govt.nz/act/public/2002/0084/latest/DLM34
14327.html.
Murdock, S. H., Hamm, R. R., Voss, P. R., Fannin, D. and Pecotte,
B. (1991). Evaluating Small-Area Population Pprojections. Journal of the
American Planning Association, 57, pp. 432-443.
Myers, D. (2001). Demographic Futures as a Guide to Planning:
Example of Latinos and the Compact City. Journal of the American
Planning Association, 67, pp. 383-397.
Rainford, P. and Masser, I. (1987). Population Forecasting and
Urban Planning Practice. Environment and Planning A, 19, pp. 1463-1475.
Rayer, S. and Smith, S. K. (2010). Factors Affecting the Accuracy
of Subcounty Population Forecasts. Journal of Planning Education and
Research, 30, pp. 147-161.
Rutledge, D. T., Cameron, M., Elliott, S., Fenton, T., Huser, B.,
McBride, G., McDonald, G., O'Connor, M., Phyn, D., Poot, J., Price,
R., Scrimgeour, F., Small, B., Tait, A., van Delden, H., Wedderburn, L.
and Woods, R. A. (2008). Choosing Rregional Futures: Challenges and
Choices in Building Integrated Models to Support Long-Term Regional
Planning in New Zealand. Regional Science Policy and Practice, 1, pp.
85-108.
Rutledge, D., Cameron, M., Elliott, S., Hurkens, J., McDonald, G.,
McBride, G., Phyn, D., Poot, J., Price, R., Schmidt, J., van Delden, H.,
Tait, A. and Woods, R. (2010). WISE--Waikato Integrated Scenario
Explorer Technical Specifications Version 1.1. Research report
commissioned by Environment Waikato, Hamilton: Landcare Research.
Siegel, J. S. (2002). Applied Demography: Applications to Business,
Government, Law and Public Policy, Academic Press, San Diego, CA.
Smith, S. K. (1997). Further Thoughts on Simplicity and Complexity
in Population Projection Models. Journal of Forecasting, 13, pp.
557-565.
Smith, S. K. and Tayman, J. (2003). An Evaluation of Population
Projections by Age. Demography, 40, pp. 741-757.
Stoto, M. A. (1983) The Accuracy of Population Projections. Journal
of the American Statistical Association, 78, pp. 13-20.
Tayman, J. (1996). The Accuracy of Small-Area Population Forecasts
Based on a Spatial Interaction Land-Use Modeling System. Journal of the
American Planning Association, 62, pp. 85-98.
Tayman, J. (2011). Assessing Uncertainty in Small Area Forecasts:
State of the Practice and Implementation Strategy. Population Research
and Policy Review, 30, pp. 781-800.
Tayman, J. and Schafer, E. (1985). The Impact of Coefficient Drift
and Measurement Error on the Accuracy of Ratio-Correlation Population
Estimates. Review of Regional Studies, 15, pp. 3-10.
Tayman, J., Schafer, E. and Carter, L. (1998). The Role of
Population Size in the Determination and Prediction of Population
Forecast Errors: An Evaluation Using Confidence Intervals for Subcounty
Areas. Population Research and Policy Review, 17, pp. 1-20.
Tayman, J. and Swanson, D. A. (1996). On the Utility of Population
Forecasts. Demography, 33, pp. 523-528.
Tayman, J. and Swanson, D. A. (1999). On the Validity of MAPE as a
Measure of Population Forecasting Accuracy. Population Research and
Policy Review, 18, pp. 299-322.
Triantakonstantis, D. and Mountrakis, G. (2012). Urban Growth
Prediction: A Review of Computational Models and Human Perceptions.
Journal of Geographic Information System, 4, pp. 555-587.
van der Gaag, N., van Wissen, L., Rees, P., Stillwell, J. and
Kupiszewski, M. (2003). Study of Past and Future Interregional Migration
Trends and Patterns within European Union Countries: In Search for a
Generally Applicable Explanatory Model. Netherlands Interdisciplinary
Demographic Institute, The Hague.
White, H. R. (1954). Empirical Study of the Accuracy of Selected
Methods of Projecting State Populations. Journal of the American
Statistical Association, 49, pp. 480-498.
Wilson, T. (2015a). New Evaluations of Simple Models for Small Area
Population Forecasts. Population Space and Place, 21, pp. 335-353.
Wilson, T. (2015b). Short-Term Forecast Error of Australian Local
Government Area Population Projections. Australasian Journal of Regional
Studies, 21, pp. 253-275.
Wilson, T. and Rowe, F. (2011). The Forecast Accuracy of Local
Government Area Population Projections: A Case Study of Queensland.
Australasian Journal of Regional Studies, 17, pp. 204-243.
Michael P. Cameron
Associate Professor, Department of Economics, Research Associate,
National Institute of Demographic and Economic
Analysis, University of Waikato, Hamilton, 3240, New Zealand.
Email: mcam@waikato.ac.nz.
William Cochrane
Senior Lecturer, Faculty of Arts and Social Sciences, Research
Associate, National Institute of Demographic and
Economic Analysis, University of Waikato, Hamilton, 3240, New
Zealand.
Email: billc@waikato.ac.nz.
Table 1. Territorial Authority Populations for the Waikato
Region, 2013.
Territorial Population Count of Area Mean AU Median AU
Authority Units (AUs) Population Population
Thames- 27 040 10 2 704 2 845
Coromandel
District
Hauraki 18 740 8 2 343 1 945
District
Waikato 64 890 31 2 093 1 860
District
Matamata- 32 200 13 2 477 2 510
Piako
District
Hamilton 150 250 46 3 266 3 305
City
Waipa 46 380 29 1 599 1 300
District
Otorohanga 9 330 5 1 866 1 750
District
South 22 530 16 1 408 1 060
Waikato
District
Waitomo 9 330 7 1 333 1 000
District
(part)
Taupo 34 120 28 1 219 625
District
(part)
Rotorua 3 640 4 910 870
District
(part)
Waikato 418 450 197 2 124 1 840
Region
(Total)
Territorial Minimum AU Maximum AU
Authority Population Population
Thames- 730 4 490
Coromandel
District
Hauraki 500 4 790
District
Waikato 0 5 550
District
Matamata- 300 4 520
Piako
District
Hamilton 160 7 750
City
Waipa 200 3 770
District
Otorohanga 350 4 180
District
South 160 3 690
Waikato
District
Waitomo 210 4 670
District
(part)
Taupo 10 4 410
District
(part)
Rotorua 160 1 740
District
(part)
Waikato 0 7 750
Region
(Total)
Source: Authors' calculations
Table 2. In-Sample and Out-of-Sample Model Performance.
Error Measure Model I Model II Model III Model IV
In-sample
WMAPE (%) 19.0 19.0 16.2 16.3
MedAPE (%) 17.5 17.6 14.8 14.3
MedALPE (%) -2.7 -2.9 -0.7 -0.1
RMSE (%) * 26.6 26.5 23.3 22.8
Out-of-sample
WMAPE (%) 20.8 20.5 19.0 18.2
MedAPE (%) 19.3 19.1 16.2 16.5
MedALPE (%) -0.6 -0.6 -1.9 2.1
RMSE (%) * 28.3 28.4 27.0 25.9
Modified Out-
of-sample **
WMAPE (%) 7.5 6.7 9.0 6.7
MedAPE (%) 5.7 4.8 6.7 4.8
MedALPE (%) -0.8 0.0 -1.4 0.0
RMSE (%) * 14.3 14.0 15.8 14.0
Note: * As a percentage of the mean AU population; ** Modified
out-of-sample measures include a correction, whereby the in-sample
residual is carried forward to form part of the forecast.
Source: Authors' calculations
Table 3. Comparative Out-Of-Sample Model Performance.
Error Model Model Model Model Linear CSG+
Measure I II III IV
WMAPE (%) 7.3 6.7 8.7 6.7 7.6 5.6
MedAPE 5.6 5.0 6.4 5.0 6.4 4.5
(%) MedALPE (%) -1.7 -1.7 -0.8 -1.7 -2.3 -0.3
RMSE (%) * 14.1 13.6 15.6 13.6 12.6 10.3
Source: the Authors
COPYRIGHT 2017 Regional Science Association, Australian and New Zealand Section
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.