Predicting geographical human risk of West Nile Virus--Saskatchewan, 2003 and 2007.
Epp, Tasha Y. ; Waldner, Cheryl L. ; Berke, Olaf 等
The introduction of West Nile virus (WNV) into North America
sparked an interest in predicting where and when the virus would appear.
(1) Predictive risk mapping is a process by which components of the
disease cycle are used to create models and subsequent risk maps. (2,3)
The methods have become more practical for a broader range of diseases
and study locations because remote sensing can now provide environmental
information at required spatial and temporal resolution. (4,5)
Vector-borne diseases, such as WNV, are particularly amenable to
spatial and temporal analysis as they are highly influenced by regular,
seasonal climate, and environmental changes. (3,6-8) During the season,
mosquitoes become infected with the West Nile virus primarily through
bird-blood meals and then retransmit the virus to any one of multiple
bird species, a cycle which amplifies the virus. Governed by
environmental conditions and host behaviours, infected mosquitoes can
spread WNV to other incidental hosts, such as humans and horses.
Defining the risk of WNV infection is a key component to public
health intervention strategies. (9) Prioritization of vector-borne
disease programs in the overall public health budget is a juggling act,
affected by limited funding availability. In Saskatchewan, interventions
are prioritized largely based on environmental conditions conducive to
mosquito development and surveillance for clinical disease in humans.
Health officials could increase the cost effectiveness of control and
surveillance programs with a method of predicting differences in
regional risk of infection.
The primary objective of this study was to describe the application
of a previously established model to predict areas of low, medium, and
high risk of WNV in humans in both 2003 and 2007 in the province of
Saskatchewan. (10) The second objective was to use historical
surveillance data from 2003-2005 to make predictions of areas of risk of
WNV in humans in 2007.
MATERIALS AND METHODS
WNV infection risk--2003 and 2007
Human surveillance data were obtained from Saskatchewan Health as
the number of laboratory-confirmed WNV individuals (which included WN
fever, WN neurological syndrome and asymptomatic individuals,
http://www.health.gov.sk.ca/wnv-surveillance-resultsarchive (accessed
July 14, 2009)) per rural municipality (RM). In 2003, each RM with WNV
individuals (sampled RM) was classified by category of WNV infection
risk using the 25th and 75th percentiles: low-risk (0.0-0.09%),
medium-risk (>0.09%-0.41%), and high-risk (>0.41%). This
classification was repeated in 2007, with the following results:
low-risk (0-0.14%), medium-risk (>0.14%0.36%), and high-risk
(>0.36%). The population at risk was determined for 2003 and 2007
using Statistics Canada 2001 and 2006 census data by rural municipality,
respectively.
Environmental variables
Variables used in the analysis were the same as those identified in
a previous study regarding WNV infection in horses in 2003.10 Those
variables that had multiple statistically significant time periods were
condensed into one or two principal components with principal component
analysis before use in the final models (SPSS 14.0, SPSS Inc., Chicago,
IL, USA). (10,11)
Land Surface Temperature
Land Surface Temperature (LST) images (Moderate Resolution Imaging
Spectrometer satellite (MODIS); Earth Observing System Gateway, National
Aeronautics and Space Administration; http://lpdaac.usgs.gov) were
provided as 8-day composites (1 kilometre resolution) beginning May 1st
and ending September 13th for 2003 and 2007. The images were joined
together and clipped to show only the province of Saskatchewan (PCI
Geomatica 9, PCI Geomatics, Richmond, ON, Canada). The images included
daytime (maximum) and nighttime (minimum) temperatures and were
manipulated to give a mean LST. For each year, the mean LST averaged for
each RM was calculated for each 8-day composite.
Precipitation
Precipitation values (mm) were obtained for 2003 and 2007 on a
daily basis from Environment Canada. Eight-day composites (total
precipitation per time period) were created to match the remotely sensed
time periods. Interpolation among the 176 stations in the province was
accomplished using Inverse Distance Weighted (IDW) method (ArcGIS 9.2,
ESRI Inc., Redlands, CA, USA). (12) For each year, the averaged total
value for each time period by RM was calculated.
Vegetation
Normalized Difference Vegetation Index (NDVI) (MODIS satellite;
http://lpdaac.usgs.gov) is a simple index of vegetation cover which
allows monitoring of seasonal changes in vegetation growth. (13) Images
(500 metre resolution) were provided as 16-day composites starting April
23rd and ending September 13th for both 2003 and 2007. For each year,
the average value per RM was calculated.
Land Cover
North Digital and South Digital Land Cover dataset based on
satellite imagery from 2000 for the province of Saskatchewan was
obtained from Information Services Corporation of Saskatchewan.
Classifications were further aggregated to make a manageable number of
categories for analysis. Those categories selected for consideration in
the models included: water, wetland (which includes bog, marsh, fen,
etc.), and treed (which includes pine, spruce, hardwood, softwood,
etc.). The percentage of RM covered by each of the categories was
calculated.
Statistical analysis
Overview
Discriminant analysis (SPSS 14.0, SPSS Inc., Chicago, IL, USA) was
used to predict membership in the three mutually exclusive groups (low,
medium and high risk). (14) The yearly datasets were divided into a) a
training dataset (consisting of a random selection of RMs with data) and
b) a testing dataset (consisting of the remaining RMs with data and any
RMs without data) (Table 1). The training data were used to analyze the
known differences between RMs with data; subsequently, these differences
were then applied to the testing data to assess the accuracy of the
predictions of remaining RMs with and without data.
Multivariable model selection was partially determined through
overall classification or prediction accuracy percentage for both
training and testing datasets. This was defined as the proportion of RMs
with data correctly classified based on the observed risk category
(pre-model classification based on proportion data) compared to the
predicted risk category (post-model classification).14 In addition,
multivariable models were fit with an unequal weighting scheme to adjust
the posterior probabilities to account for prior knowledge of observed
group membership. (14) Separate matrices (to account for unequal group
covariance matrices) were used when Box's M test was significant
(p=0.05) and the prediction accuracy percentage changed substantially
from a model that used a common matrix for all groups.14 Ultimately, the
final model was the one that produced the best overall classification
with the least overlap of risk categorization between high- and low-risk
areas.
[FIGURE 1 OMITTED]
The results of the discriminant analysis were twofold: a) providing
functions or sets of variables by which the risk categories were
discriminated by, how well each of these functions discriminated and
which variables within the functions were most informative, and b)
providing a set of three probabilities predicting the likeliness of
membership in each of the three risk categories. (14) Individual RMs
were classified (based on the functions) into one of the risk categories
by predicting the group (low-, medium- or high-risk) to which the
individual RM most likely belonged. (14) The categorization rule is less
reliable for RMs with maximum probability of <75%. Therefore,
maximizing the overall probability of group membership for all RMs in
each group was used in final model selection. Chloropleth maps of the
predicted risk categories were generated using ArcGIS 9.2.
Yearly Models (2003, 2007)
Yearly models were based on WNV infection risk and environmental
variables by RM from within each year (2003 and 2007).
Comparison of the 2003 and 2007 yearly models was done with the
kappa statistic.
[FIGURE 2 OMITTED]
Historical Prediction Model
Information from modeling of horse and human surveillance data
conducted in Saskatchewan in 2003-2005 was used to create a historical
training dataset. (10) Selection of RMs (n=72) with suitable data was
based on consistent predictions from previously established models where
at least 2 of the predictions had probabilities of group membership of
75% or higher. The historical training dataset was used to train the
modeling of the 2007 human dataset. Comparison of the 2007 yearly and
historically trained models was done with the kappa statistic.
RESULTS
Predictive ability--2003 human dataset
The numbers of predicted RMs in the three risk categories were: 59
in the low-risk group, 203 in the medium-risk group, and 36 in the
high-risk group (Figure 1). The model used two functions to predict
RM category (Table 1).
Predictive ability--2007 human dataset
The numbers of predicted RMs in the three risk categories were: 59
in the low-risk group, 198 in the medium-risk group, and 41 in the
high-risk group (Figure 2). The model used one function to predict RM
category (Table 1). The second function was not statistically important
in the prediction of RMs but was retained to maximize model accuracy.
The agreement in the classification result for individual RMs when
compared between the 2007 and 2003 models was poor; kappa was 0.10 for
high-risk RMs and 0.62 for low-risk RMs.
[FIGURE 3 OMITTED]
Predictive ability--historical training data on 2007 human dataset
The number of predicted RMs in the three risk categories was: 65 in
the low-risk group, 136 in the medium-risk group, and 97 in the
high-risk group (Figure 3). The model used two functions to predict RM
category (Table 1). Both functions incorporated precipitation
information; function 1 used June values while function 2 used July
values.
Comparison of the historically trained 2007 model with the original
2007 model revealed that both models classified the following number of
RMs the same: 43 low-risk RMs, 118 medium-risk RMs and 33 high-risk RMs.
The agreement, as calculated using kappa statistic, was 0.40 for
high-risk RMs and 0.61 for low-risk RMs. The historically trained model
clearly demonstrated a southwest to northeast trend of decreasing risk,
which is not as clear from the other models.
DISCUSSION
The models in this study try to geographically predict which areas
are at risk of infection (high, medium or low risk) by defining a set of
criteria upon which to classify that risk. There appears to be a trend
of high-risk areas concentrating in the south-central and south-western
portions of the province. However, individual RMs did differ in their
category of risk depending on the model and year, a reflection of the
model's limitations. The models could have had error introduced due
to training dataset selection, inaccuracies in the RM classification
prior to entry into the model (i.e., location of human individual, not
the location of exposure), and reliance on summarized environmental
data. Predictions of individual RMs should be used with caution;
instead, by applying a smoothing technique, the maps would indicate
general but larger areas or trends of high risk of infection.
In the model trained by the historical dataset, model predictions
compared to the original risk categories based on passive surveillance
were only 45% accurate; however, the model did maintain high
probabilities that the predictions for each risk group were trustworthy.
The predictive map clearly demonstrated a gradient of risk decreasing
from south to north which mirrors what is found by mosquito trapping
programs. (8)
The environmental variables included in this analysis were based on
previous models built using horse surveillance data and included
precipitation, temperature, vegetation and land cover, specifically
wetlands, water and treed areas. (10) The present predictive models
provided only marginally accurate predictions of risk geographically.
Obviously, the complexity of the cycle is not completely explained by
these variables and their interactions alone. Factors such as
biodiversity, predators, parasites, food availability, human behaviour
and spatial resources will affect interactions between the vector and
hosts, while immune status of the hosts will become more important the
longer the virus remains endemic in an area. (9)
Variables contributing to the model functions were fairly
consistent between models. Precipitation and temperature were important
in the prediction of risk of WNV in humans, particularly in 2003;
decreasing rainfall into July and higher temperatures overall were
associated with high-risk areas. Culex tarsalis uses standing water with
increased organic content for oviposition, which would be washed away by
increased rainfall. (8,15) Habitat was also highly important in the
prediction of human WNV risk, particularly in 2007. In the present
study, vegetation index was slightly higher on average in low-risk areas
as was the percentage of RMs covered in trees, water and wetland. C.
tarsalis prefers shallow, often stagnant water of high organic content
with little tree cover surrounding the sites, such as water-filled hoof
prints near livestock watering sites. (8) In the 2007 model, the
percentage of water and wetland coverage was not statistically important
to the prediction process. This could be influenced by the fact that
actual wetland capacity was much higher in 2007 than 2003 owing to the
few wet years that occurred between them (personal communication, P.
Curry, Saskatchewan Health, December 2007).
By specifying what time periods should be incorporated into the
model-building process, it could be used as a method to make early
predictions or inform public health authorities about the development of
high-risk areas as the season progresses. The usefulness of the models
as a predictor of high-risk areas must be coupled with the knowledge of
vector abundance and host population dynamics. Historically, maps of
mosquito vectors consistently indicated high-risk areas in the
southeastern portion of the province. If mosquito trapping data were
available, these models could validate predictions made based on
mosquito information or signal areas where mosquito data were required.
With further research, greater accuracy in predicting WNV risk of
infection will occur.
REFERENCES
(1.) Rogers DJ, Myers MF, Tucker CJ, Smith PF, White DJ, Backenson
B, et al. Predicting the distribution of West Nile fever in North
America using satellite sensor data. PE&RS 2002;68:112-14.
(2.) Brooker S, Hay SI, Bundy DAP. Tools from ecology: Useful for
evaluating infection risk models? Trends in Parasitology 2002;18:70-74.
(3.) Kitron U. Risk maps: Transmission and burden of vector-borne
diseases. Parasitology Today 2000;16:324-25.
(4.) Beck LR, Lobitz BM, Wood BL. Remote sensing and human health:
New sensors and new opportunities. Emerg Infect Dis 2000;6:217-26.
(5.) Rogers DJ, Randolph SE, Snow RW, Hay SI. Satellite imagery in
the study and forecast of malaria. Nature 2002;415:710-18.
(6.) Hay SI, Tucker CJ, Rogers DJ, Packer MJ. Remotely sensed
surrogates of meteorological data for the study of the distribution and
abundance of arthropod vectors of disease. Ann Trop Med Hyg
1996;90:1-19.
(7.) Mellor PS, Leake CJ. Climatic and geographic influences on
arboviral infections and vectors. Rev Sci Tech Off Int Epiz
2000;19:41-54.
(8.) Curry P. Saskatchewan mosquitoes and West Nile virus. Blue Jay
2004;62:104-11.
(9.) Rainham DGC. Ecological complexity and West Nile virus:
Perspectives on improving public health response. Can J Public Health
2005;96:37-40.
(10.) Epp T. West Nile Virus: From Surveillance to Prediction Using
Saskatchewan Horses [thesis]. Saskatoon, SK: University of Saskatchewan,
2007. Available online at:
http://library2.usask.ca/theses/available/etd-08012007-160845/ (Accessed
April 1, 2009).
(11.) Dohoo I, Martin W, Stryhn H. Veterinary Epidemiologic
Research, 1st ed. Charlottetown, PE: AVC Inc., 2003;322.
(12.) Waller LA, Gotway CA. Applied Spatial Statistics for Public
Health Data, 1st ed. Hoboken, NJ: John Wiley & Sons, Inc., 2004;494.
(13.) Jensen JR. Introductory Digital Image Processing, 3rd ed.
Upper Saddle River, NJ: Pearson Prentice Hall, 2005;311-19.
(14.) Klecka WR. Discriminant Analysis. Sage University Papers,
Series on Quantitative Application in the Social Sciences, 07-019.
Beverly Hills, CA: Sage Publications, 1980;1-71.
(15.) Shaman J, Day JF. Achieving operational hydrologic monitoring
of mosquito-borne disease. Emerg Infect Dis 2005;11:1343-50.
Received: December 4, 2008
Accepted: April 24, 2009
Tasha Y. Epp, DVM, PhD, [1] Cheryl L. Waldner, DVM, PhD, [1] Olaf
Berke, PhD [2]
[1.] Large Animal Clinical Sciences, Western College of Veterinary
Medicine, University of Saskatchewan, Saskatoon, SK
[2.] Department of Population Health, Ontario Veterinary College,
University of Guelph, Guelph, ON
Correspondence and reprint requests: Tasha Epp, Large Animal
Clinical Sciences, WCVM, University of Saskatchewan, 52 Campus Drive,
Saskatoon, SK S7N 5B4, Tel: 306-966-6542, Fax: 306-966-7159, E-mail:
tasha.epp@usask.ca
Table 1. Results from Final 2003, 2007 and
Historically Trained 2007 Multivariable Models
2003
Model 148 Sampled RMs
RMs * sampled Training 118 sampled RMs
(observed
categorization: 28
low-risk, 62
medium-risk,
28 high-risk)
Testing 30 sampled RMs
(observed
categorization:
9 low-risk, 11
medium-risk, 10
high-risk)
Plus remaining 150
unsampled RMs
Model Accuracy Training 67%
Testing 60%
Significant Function 1 Mean LST ([double dagger])
Variables
([dagger])
NDVI ([double dagger])
Precipitation
Tree coverage
Function 2 Water coverage
Wetland coverage
Eigenvalues Function 1 0.544
Function 2 0.203
Group Low risk 85%
membership
probability
Medium risk 67%
High risk 74%
Historically Trained
Model 2007 2007
RMs * sampled 193 Sampled RMs
154 sampled RMs 72 RMs with historical
(observed data (observed
categorization: 35 categorization: 22
low-risk, 77 low-risk, 32
medium-risk, medium-risk,
42 high-risk) 18 high-risk)
39 sampled RMs 193 sampled RMs
(observed (observed
categorization: categorization:
12 low-risk, 15 47 low-risk,
medium-risk, 12 92 medium-risk,
high-risk) 54 high-risk)
Plus remaining 105 Plus remaining 105
Model Accuracy unsampled RMs unsampled RMs
61% 100%
Significant 44% 45%
Variables
([dagger]) NDVI Tree coverage
Mean LST NDVI
Precipitation Mean LST
Tree coverage Precipitation
Water coverage Water coverage
([section])
Wetland coverage Wetland coverage
([section])
Eigenvalues Precipitation
0.446 7.55
Group 0.020 2.495
membership
probability 76% 91%
57% 91%
63% 96%
* RMs = rural municipalities
([dagger]) Variables that were significant
in the model are recorded by function in
the order of importance for contributing
to the function.
([double dagger]) LST = land surface
temperature, NDVI = normalized
difference vegetation index
([section]) Function not statistically
significant but retained for increased
model accuracy