"Google Flu Trends" and emergency department triage data predicted the 2009 pandemic H1N1 waves in Manitoba.
Malik, Mohammad Tufail ; Gumel, Abba ; Thompson, Laura H. 等
Traditional methods of public health surveillance based on clinical
or laboratory case reports are often expensive to implement and
maintain; not sensitive enough to detect the early stages of an
outbreak; and not suitable to detect outbreaks of novel pathogens. (1)
Syndromic surveillance is emerging as a practical alternative approach
to monitor influenza disease activity that does not rely on collecting
data on diagnosed cases. Syndromic surveillance involves the use of
"health-related data that precede diagnosis and signal sufficient
probability of a case or an outbreak to warrant further public health
response". (2) Clinical syndromes, such as influenza-like illness
(ILI), and other proxies, such as the number of emergency department
(ED) visits for ILIs and the volume of influenza-related Internet search
engine queries, (3-7) are used to monitor disease activity in order to
detect new outbreaks and predict the trajectory and impact of ongoing
outbreaks.
The 2009 H1N1 influenza pandemic provides a unique opportunity to
evaluate the utility of these indicators in predicting and monitoring
influenza outbreaks. We examined the performance of influenza syndromic
indicators, based on Google Flu Trends (GFT) and ED data, with respect
to the early detection and monitoring of the H1N1 pandemic waves in
Manitoba. Syndromic indicator data were compared to reference data
defined as the weekly count of laboratory- confirmed H1N1 cases in
Manitoba during the 2009 pandemic.
METHODS
Emergency department data
Information on ED visits to Winnipeg hospitals was obtained from
the database of the Emergency Department Information System (EDIS) for
the period from December 2008 to June 2010. EDIS is a real-time ED
monitoring system implemented across Winnipeg hospitals that captures
information on every ED visit, including patient demographics and
'chief complaints.' We obtained aggregated daily data on the
number of ED visits attributed to ILI and the total number of visits
(for any reason) to all EDs included in EDIS. A visit was attributed to
ILI if the patient's chief complaint was listed as weakness,
shortness of breath, cough, headache, fever, sore throat, upper
respiratory tract infection, or respiratory arrest. This definition
likely overestimates the actual number of ILI visits, as none of these
complaints are specific to the ILI syndrome. However, this definition
has been used consistently throughout the study period, so time trends
may still reflect changes over time in ED use due to ILI. Using ED data,
we defined two syndromic indicators: 1) weekly count of all ED visits
triaged as ILI (ED ILI volume), and 2) percentage of all ED visits that
were triaged as an ILI (ED ILI percent).
Google Flu Trends data
GFT data for Manitoba for the duration of the two waves of the H1N1
pandemic in Manitoba, between April 2009 and January 2010, were
downloaded from the GFT website. (8) GFT uses a previously-alidated
algorithm (9) and Google's aggregated search query data to provide
jurisdiction-specific estimates of influenza disease activity in near
real-time. (8) In Canada, these estimates were calibrated using publicly
available data on number of ILI cases per 100,000 physician visits as
provided by the FluWatch sentinel surveillance system, (10) which uses a
network of primary care practitioners across Canada to monitor physician
visits for ILI illness. Hence, GFT flu activity estimates are presented
as the number of ILI cases per 100,000 physician visits. (8) A team of
Google researchers recently reported that GFT data predicted peaks in
influenza activity in the United States sooner than traditional flu
surveillance systems. (9)
[FIGURE 1 OMITTED]
Virologic data
Weekly numbers of laboratory-confirmed H1N1 influenza cases
occurring in Manitoba during the period of January 2009 to January 2010
were obtained from the Flu Surveillance Website of Manitoba Health. (11)
In Manitoba, a laboratory-confirmed case of pandemic H1N1 influenza was
defined as an individual who tested positive for H1N1 influenza A virus
by real-time reverse-transcriptase PCR or viral culture. (11)
Statistical analysis
To assess the strength of the correlation between each of the three
indicators (predictor variables) and virologic data (response variable),
we fitted the following linear model separately for each indicator:
[[gamma].sub.t] = [[beta].sub.0] + [[beta].sub.1][x.sub.t-[tau]]
(Model 1)
where [[gamma].sub.t] is the number of laboratory-confirmed H1N1
cases occurring during week t, and [x.sub.t-[tau]] is the weekly value
for the predictor variable (GFT, ED ILI volume or ED ILI percent) in the
week t-[tau]. Correlations with weekly virologic data were calculated
for different lag periods ([tau] = 0,1,2,3, 4 weeks, where [tau] = 0
indicates no lag). We used Matlab (12) to estimate the linear regression
coefficients corresponding to the least-squares solution of the system
of equations describing the model. The coefficient of determination,
[R.sup.2] (0 [less than or equal to] [R.sup.2] [less than or equal to]
1), was used as a measure of the goodness of fit of our models to the
observed data,13 with a larger value of [R.sup.2] (closer to 1)
reflecting a better linear model fit. Because there is only a single
explanatory variable in Model 1, [R.sup.2] is equivalent to the square
of the Pearson correlation coefficient measuring the strength of
association between the response and the explanatory variables.
[FIGURE 2 OMITTED]
In anticipation of differences in the patterns of ED visits and
Internet searches for health information between the two pandemic waves,
we fitted Model 1 separately to Wave 1 (April to October, 2009) and Wave
2 data (after October 2009 to January 2010). We also assessed whether
GFT data could predict the volume and proportion of ED ILI visits by
fitting Model 1 to the data with GFT data as the predictor variable and
either of ED ILI volume or ED ILI percent as the response variable.
RESULTS
Figure 1 shows the time series for the weekly counts of
laboratory-confirmed H1N1 cases (the epidemic curve) in Manitoba plotted
against the three syndromic indicators: GFT data, ED ILI volume, and ED
ILI percent. Like many jurisdictions in the northern hemisphere,
Manitoba experienced two pandemic waves in 2009. The first wave began in
early May 2009, and the second and much larger one in early October
2009. The presence of two waves is evident in the time-series curves for
the GFT and ED indicators (Figure 1). All three indicators peaked
earlier than the epidemic curve of laboratory-confirmed cases,
especially during the second wave where the peak of the epidemic curve
lagged behind the peak of the other curves by about 1-2 weeks.
These observations were confirmed by the results of the linear
regression analysis (Table 1) based on Model 1. For the GFT data (left
panel), [R.sup.2] (and therefore the correlation coefficient) was
highest when the GFT data are shifted ahead by two weeks
([R.sup.2]=0.686), indicating that the best-fitting model is the one
with about a 2-week lag period. Similarly, the best-fitting models for
both ED indicators were observed for a time lag of 1-2 weeks (Table 1),
with the ED ILI volume indicator slightly outperforming the ED ILI
percent indicator.
Table 1 also shows that all three indicators performed better as
predictors of the virologic time trends during the second wave than
during the first wave, although the strongest correlations were still
present in models with a 1- to 2-week lag. For example, the model based
on the GFT indicator with a 2-week time lag had an [R.sup.2] of 0.733
for Wave 2 data and an [R.sup.2] of 0.558 for Wave 1 data. For the model
based on the ED ILI percent data, the best-fitting model was with a
2-week lag in Wave 1 ([R.sup.2]=0.469) and with a 1-week lag in Wave 2
([R.sup.2]=0.605). The better linear fit of the GFT indicator is shown
in Figure 2.
Figure 1 too shows a strong congruence between the time series of
the GFT and both the ED ILI volume and the ED ILI percent indicators.
The results of corresponding linear regression analysis are shown in
Table 2. The best-fitting model was the one for GFT (predictor variable)
and ED ILI volume (dependent variable) with no time lag ([R.sup.2]=
0.86).
DISCUSSION
We found that syndromic indicators based on GFT and ED data were
strongly correlated with each other and with virologic data during both
waves of the 2009 H1N1 pandemic in Manitoba. The epidemic curve based on
laboratory-confirmed cases generally lagged behind the time series of
these syndromic indicators by 1-2 weeks.
These findings confirm previous reports demonstrating the utility
of ED data in the detection of influenza outbreaks in the general
population. (14-16) Our results are also consistent with the findings of
three recently published studies that evaluated the performance of GFT
data in predicting levels of influenza activity during the 2009
pandemic. (5,6,17) However, in all these studies, syndromic indicators
were validated against national sentinel ILI surveillance data rather
than actual counts of laboratory-confirmed cases.
Our findings are also consistent with studies performed during
pre-pandemic influenza seasons which showed that ILI-related Internet
search queries were strongly correlated with conventional influenza
surveillance indicators. For instance, one study found that Yahoo
ILI-related search queries were strongly correlated with the number of
culture-positive influenza cases and with mortality from pneumonia and
influenza during the 2004-08 flu seasons in the US. (18) Similar results
were reported for analyses based on Google search queries, (9) Google
Trends, (7) Twitter messages (19) and other social media Web sites. (20)
Compared with conventional methods of influenza surveillance, GFT
has several advantages. (21) First, GFT information is free, easily
accessible, and is provided using an intuitive simple-to-use interface.
Second, the information is updated daily permitting near real-time
monitoring of influenza activity which could facilitate early detection
of community outbreaks. This is a significant advantage over
conventional influenza surveillance systems, where information
dissemination is hampered by unavoidable delays in the reporting and
collation of data. As virologic data tend to correlate with increased
utilization of health care resources (e.g., ED visits,
hospitalizations), GFT information might be a useful tool in predicting
and planning for increased demands for health care. Third, GFT does not
require voluntary reporting by laboratories or health care
professionals. GFT information is likely to remain available even in the
event of a severe pandemic that overwhelms health care resources. Last,
GFT information is currently available for more than 20 countries around
the world, permitting easy tracking and comparison of flu activity
worldwide.
Like other syndromic indicators, concerns have been raised about
the lack of specificity of GFT data, e.g., influenza-related news
stories may result in spikes in Internet searches. (21) The resulting
false alarms could be avoided by simultaneously using multiple syndromic
indicators (e.g., ED data, calls to health lines) to assess levels of
influenza activity. On the other hand, GFT data may also be of low
sensitivity, especially in the detection of small localized outbreaks.
In addition, the sensitivity of GFT data may depend on local levels of
Internet utilization. For example, Valdivia et al. found weaker
correlations between GFT data and sentinel physician surveillance data
in countries with lesser reliance on the Internet as a source of health
information. (17)
Our study had several limitations. As only Manitoba data were
included, findings may not be applicable to other provinces. The
reference standard (number of laboratory-confirmed cases) likely
underestimated the incidence of influenza in the population, because the
number of detected cases largely reflects the proportion of symptomatic
patients who were tested for the infection, and is influenced by
accessibility of medical care, physicians' practices, and
laboratory testing guidelines. (22) Midway through the second wave,
laboratory testing of mild ILI cases was suspended in Manitoba. A
significant drop in the number of laboratory-confirmed cases during the
latter half of the second wave is obvious in the epidemic curve, and may
have affected the strength of the measured association. The EDIS is not
available for regions outside Winnipeg and for some smaller hospitals in
Winnipeg, and this may have also weakened the strength of association
between EDIS indicators and virologic data.
In conclusion, during an influenza season characterized by high
levels of disease activity, GFT and ED indicators provided a good
indication of weekly counts of laboratory-confirmed influenza cases in
Manitoba 1-2 weeks in advance. Syndromic surveillance using GFT and ED
represents a timely and cost-effective addition to conventional
influenza surveillance, capable of predicting disease incidence and
related increases in health care utilization.
Financial support: This work was partially supported by CIHR
Pandemic Outbreak Team Leader Grant (PTL-97126).
Disclaimer: The interpretation and conclusions contained herein do
not necessarily represent those of the Winnipeg Regional Health
Authority.
Conflict of Interest: None to declare.
Received: November 14, 2010
Accepted: March 17, 2011
REFERENCES
(1.) Elliot A. Syndromic surveillance: The next phase of public
health monitoring during the H1N1 influenza pandemic. Euro Surveill
2009;14:44.
(2.) Centers for Disease Control and Prevention. Syndromic
Surveillance: An Applied Approach to Outbreak Detection. 2008. Available
at: http://www.cdc.gov/ncphi/disss/nndss/syndromic.htm (Accessed October
1, 2010).
(3.) Carneiro Herman A, Mylonakis E. Google Trends: A web-based
tool for real-time surveillance of disease outbreaks. Clin Infect Dis
2009;49(10):1557-64.
(4.) Seifter A, Schwarzwalder A, Geis K, Aucott J. The utility of
"Google Trends" for epidemiological research: Lyme disease as
an example. Geospatial Health 2010;4(2):135-37.
(5.) Wilson N, Mason K, Tobias M, Peacey M, Huang Q, Baker M.
Interpreting Google Flu Trends data for pandemic H1N1 influenza: The New
Zealand experience. Euro surveillance: Bulletin europeen sur les
maladies transmissibles= European Communicable Disease Bulletin
2009;14(44).
(6.) Kelly H, Grant K. Interim analysis of pandemic influenza
(H1N1) 2009 in Australia: Surveillance trends, age of infection and
effectiveness of seasonal vaccination. Euro Surveill 2009;14(31):2.
(7.) Pelat C, Turbelin C, Bar-Hen A, Flahault A, Valleron A-J. More
diseases tracked by using Google trends. (Letter to the editor). Emerg
Infect Dis 2009;15(8):1327(2).
(8.) Google Flu Trends-Canada. Available at: www.google.org/
(Accessed May 9, 2010).
(9.) Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS,
Brilliant L. Detecting influenza epidemics using search engine query
data. Nature [10.1038/nature07634]. 2009;457(7232):1012-14.
(10.) Public Health Agency of Canada. Flu Watch for September 6,
2009 to September 12, 2009 (Week 36). 2009. Available at:
http://www.phacaspc.gc.ca/fluwatch/09-10/w36_09/index-eng.php (Accessed
September 23, 2009).
(11.) Manitoba Health. General Information on Lab-Confirmed Cases
of Pandemic H1N1 Influenza. 2010. Available at:
http://www.gov.mb.ca/health/publichealth/sri/stats1.html (Accessed July
10, 2010).
(12.) MATLAB version 7.10.0 (R2010a). Natick, MA: The MathWorks
Inc., 2010.
(13.) Canavos G. Applied Probability and Statistical Methods. New
York, NY: Little, Brown and Company, 1984.
(14.) Shimoni Z, Niven M, Kama N, Dusseldorp N, Froom P. Increased
complaints of fever in the emergency room can identify influenza
epidemics. Eur J Intern Med 2008;19(7):494-98.
(15.) Irvin CB, Nouhan PP, Rice K. Syndromic analysis of
computerized emergency department patients' chief complaints: An
opportunity for bioterrorism and influenza surveillance. Ann Emerg Med
2003;41(4):447-52.
(16.) Heffernan R, Mostashari F, Das D, Karpati A, Kulidorff M,
Weiss D. Syndromic surveillance in public health practice, New York
City. Emerg Infect Dis 2004;10(5):858-64.
(17.) Valdivia A, Lopez-Alcalde J, Vicente M, Pichiule M, Ruiz M,
Ordobas M, et al. Monitoring influenza activity in Europe with Google
Flu Trends: Comparison with the findings of sentinel physician
networks-results for 2009-10. Euro surveillance: Bulletin europeen sur
les maladies transmissibles= European Communicable Disease Bulletin
2010;15(29).
(18.) Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet
searches for influenza surveillance. Clin Infect Dis
2008;47(11):1443-48.
(19.) Culotta A. Detecting influenza outbreaks by analyzing Twitter
messages. CoRR [serial on the Internet]. 2010. Available at:
http://arxiv.org/abs/1007.4748 (Accessed November 15, 2010).
(20.) Corley CD, Cook DJ, Mikler AR, Singh KP. Text and structural
data mining of influenza mentions in web and social media. Int J Environ
Res Public Health 2010;7(2):596.
(21.) Wilson K, Brownstein JS. Early detection of disease outbreaks
using the Internet. CMAJ2009;180(8):829-31.
(22.) Mahmud SM, Becker M, Keynan Y, Elliott L, Thompson LH, Fowke
K, et al. Estimated cumulative incidence of pandemic (H1N1) influenza
among pregnant women during the first wave of the 2009 pandemic. CMAJ
2010;182(14):1522-24.
Correspondence: Dr. Salaheddin Mahmud, Department of Community
Health Sciences, University of Manitoba, S111--750 Bannatyne Avenue,
Winnipeg, MB R3E 0W3, E-mail: salah.mahmud@gmail.com
Mohammad Tufail Malik, PhD, (1) Abba Gumel, PhD, (1) Laura H.
Thompson, MSc, (2) Trevor Strome, MSc, (3) Salaheddin M. Mahmud, MD, PhD
(2,3)
Author Affiliations
[1.] Department of Mathematics, University of Manitoba, Winnipeg,
MB
[2.] Department of Community Health Sciences, University of
Manitoba, Winnipeg, MB
[3.] Winnipeg Regional Health Authority, Winnipeg, MB
Table 1. Results From Linear Regression Analysis, Based on Model 1,
for the Weekly Counts of Laboratory-confirmed H1N1 Cases
(Dependent Variable) in Manitoba With Each of Google Flu Trend Data,
ED ILI Volume and ED ILI Percent, by Wave
Lag ([tau]) GFT
Weeks [[beta].sub.0] [[beta].sub.1] [R.sup.2]
Both Waves
0 -40.43 0.019 0.419
1 -68.35 0.024 0.658
2 -71.60 0.025 0.686
3 -32.49 0.018 0.358
4 15.28 0.009 0.099
Wave 1
0 -18.08 0.014 0.212
1 -47.74 0.022 0.522
2 -50.23 0.023 0.558
3 -32.38 0.019 0.358
4 -16.57 0.014 0.204
Wave 2
0 -67.35 0.022 0.394
1 -138.79 0.029 0.687
2 -155.42 0.031 0.733
3 -45.57 0.019 0.265
4 100.03 0.003 0.007
Lag ([tau]) ED ILI Volume
Weeks [[beta].sub.0] [[beta].sub.1] [R.sup.2]
Both Waves
0 -210.99 0.27 0.311
1 -304.62 0.36 0.547
2 -313.98 0.37 0.563
3 -257.40 0.31 0.411
4 -72.59 0.14 0.078
Wave 1
0 -52.33 0.093 0.067
1 -147.93 0.195 0.295
2 -188.62 0.239 0.443
3 -191.29 0.244 0.445
4 -168.03 0.220 0.360
Wave 2
0 -282.63 0.343 0.350
1 -446.97 0.477 0.627
2 -465.90 0.487 0.595
3 -328.12 0.371 0.332
4 166.77 -0.03 0.003
Lag ([tau]) ED Percent ILI
Weeks [[beta].sub.0] [[beta].sub.1] [R.sup.2]
Both Waves
0 -294.35 19.02 0.338
1 -383.09 23.66 0.523
2 -376.52 23.29 0.503
3 -304.65 19.57 0.357
4 -104.50 9.12 0.077
Wave 1
0 -95.33 7.49 0.099
1 -209.75 14.10 0.354
2 -246.84 16.27 0.469
3 -231.77 15.48 0.416
4 -202.47 13.81 0.330
Wave 2
0 -467.77 27.15 0.389
1 -656.76 35.49 0.605
2 -635.57 34.31 0.521
3 -421.33 24.64 0.267
4 203.26 -3.49 0.006
Table 2. Results From Regression Analysis for ED ILI Volume
and ED ILI Percent (Dependent Variables) and
Google Flu Trend Data (Predictor Variable), by
Wave
Lag ([tau]) ED ILI Volume
Weeks [[beta].sub.0] [[beta].sub.1] [R.sup.2]
Both Waves
0 712.05 0.057 0.86
1 753.21 0.049 0.651
2 817.71 0.038 0.384
3 894.87 0.024 0.159
Wave 1
0 649.32 0.077 0.832
1 709.36 0.063 0.540
2 790.64 0.041 0.233
Wave 2
0 693.5 0.055 0.876
1 761.33 0.047 0.609
2 897.82 0.032 0.268
Lag ([tau]) ED Percent ILI
Weeks [[beta].sub.0] [[beta].sub.1] [R.sup.2]
Both Waves
0 14.32 0.0008 0.852
1 14.75 0.0008 0.704
2 15.49 0.0006 0.479
3 16.5 0.0005 0.249
Wave 1
0 12.9 0.0012 0.872
1 13.68 0.001 0.615
2 14.81 0.0007 0.305
Wave 2
0 15.45 0.0007 0.825
1 16.04 0.0006 0.634
2 17.48 0.0005 0.341