Background: Google Flu Trends was developed to estimate US influenza-like illness (ILI) rates from internet searches; however ILI does not necessarily correlate with actual influenza virus infections.
Methods and findings: Influenza activity data from 2003-04 through 2007-08 were obtained from three US surveillance systems: Google Flu Trends, CDC Outpatient ILI Surveillance Network (CDC ILI Surveillance), and US Influenza Virologic Surveillance System (CDC Virus Surveillance). Pearson's correlation coefficients with 95% confidence intervals (95% CI) were calculated to compare surveillance data. An analysis was performed to investigate outlier observations and determine the extent to which they affected the correlations between surveillance data. Pearson's correlation coefficient describing Google Flu Trends and CDC Virus Surveillance over the study period was 0.72 (95% CI: 0.64, 0.79). The correlation between CDC ILI Surveillance and CDC Virus Surveillance over the same period was 0.85 (95% CI: 0.81, 0.89). Most of the outlier observations in both comparisons were from the 2003-04 influenza season. Exclusion of the outlier observations did not substantially improve the correlation between Google Flu Trends and CDC Virus Surveillance (0.82; 95% CI: 0.76, 0.87) or CDC ILI Surveillance and CDC Virus Surveillance (0.86; 95%CI: 0.82, 0.90).
Conclusions: This analysis demonstrates that while Google Flu Trends is highly correlated with rates of ILI, it has a lower correlation with surveillance for laboratory-confirmed influenza. Most of the outlier observations occurred during the 2003-04 influenza season that was characterized by early and intense influenza activity, which potentially altered health care seeking behavior, physician testing practices, and internet search behavior.