100 years of influenza
16 Jan 2018
Nina Cromeyer Dieke

In 1918, as the world sighed in relief from war,  a deadly virus infected around 40% of the planet’s population. Spanish flu mortality estimates range from 20-50 million, more than the death toll of WW1 itself (17 million). The flu didn’t originate in Spain or have anything particularly Spanish about it. Wartime censorship prevented the press in warring countries from printing anything that undermined the war effort. However, in neutral Spain, the papers reported on the flu in May 1918, and it ramped up after Spanish King Alfonso XIII himself contracted the virus.

In the United States, the first reported case of Spanish flu was in Kansas in March 1918. To this day, the source of this deadly influenza pandemic, which swept the world as far as the Arctic and remote Pacific islands, is still unknown.

Fast forward 100 years, and another severe strain of flu is doing the rounds. It has killed 27 people in California alone as of 9th January, compared to an average of three or four at this time of year, and the rate of national hospitalisations in the US has doubled in one week. Likewise, in the UK GPs are reporting a 78% increase in cases in just two weeks, and Australia experienced the worst flu season in over a decade during its winter last year.

Influenza is a contagious respiratory illness cause by a few different flu viruses. The Spanish flu was a H1N1 strain. The strain currently doing the rounds is the deadlier H3N2.

Considering the impact and frequency of the flu, various parties have attempted to visualise it using big data, citizen science and digital technology. In this blog we present three of those attempts, outline their pros and cons, and show why even Google can get it terribly wrong sometimes.

CDC FluVaxView

The Centers for Disease Control in the US produce an annual interactive visualiser of flu immunisations on national, regional and state levels. You can see data for the general population, healthcare workers, people in nursing homes and pregnant women. The data is from a few surveys (4 of 6 run by the CDC). The resulting visualisation clearly shows what percentage of the people in a state had the flu jab each year.

Pros: national, state and regional levels; very easy to understand; historical; very reliable sources

Cons: USA only; not real-time data


A partnership between Public Health England and LSHTM, Flusurvey is an ongoing citizen science project to help researchers monitor flu trends in the UK. It also launched in 2009 during the swine flu pandemic and over 7,500 people have taken part. The map is updated every three minutes, and sure enough, it’s currently very red.

Flusurvey for 12th January 2018

Flusurvey track influenza-like illness (ILI) as defined by the CDC. Only UK residents can participate, but there are partner projects in Belgium, Denmark, France, Ireland, Italy, Netherlands, Portugal, Spain, and Sweden.

Pros: real-time data; very easy to interpret

Cons: UK only; unreliable data (based on symptoms, not diagnosis)

Google Flu Tracker

Google launched its flu tracker (GFT) in 2009, to much fanfare, as a tool for internet-based biosurveillance, or digital epidemiology. It worked by interpreting people’s flu-related Google searches as predictors of flu incidence.

During a talk at ASTMH 2017, Dr Alessandro Vespignani explained the digital data streams now available to use. Some come from active data collection tools, through smartphone apps like Flu Near You which crowdsource surveillance data. And others come from passive methods like our digital traces, including Wikipedia searches and GFT.

According to Dr Simon Pollett also at ASTMH, Internet-based surveillance provides real-time data (“nowcasting”), it’s often free and geographically inclusive, and it may capture illnesses not medically attended to (traditional surveillance data comes from healthcare points, after people have presented).

But for Google, “reality kicked in” and its predictions ultimately failed dramatically. It failed to predict the peak of the 2013 flu season by 140%. According to a Wired study of the failure, Google’s model was vulnerable to including unrelated searches, and it also didn’t account for changes in search behaviour over time. The study found GFT performed well for a couple of years, but then required substantial revision.

Google’s archive tracker page just says: “It is still early days for nowcasting and similar tools for understanding the spread of diseases like flu and dengue – we're excited to see what comes next.”

“Big data cannot solve bias,” said Dr Pollett during his ASTMH talk. There will always be differences between populations who have internet access and those who don’t, as well as populations who create internet data about diseases and those who don’t. These biases must be considered when interpreting internet-based surveillance data.

Pros: international data at country level; real-time data

Cons: unreliable data (unrelated terms used); doesn’t exist anymore

With flu striking every year with varying degrees of severity, it’s clear data alone can’t change behaviour, let alone prevent the spread of infection, no matter how well it is presented. However, it can influence vaccine development, help states prepare for future seasons and strains, and help alert people to be extra cautious. (See: How to wash your hands)