Twitter is a host to an unlimited amount of chit-chat, gossip, and poorly written sentences. While it can be hard to get a grasp on the overall picture of what the talk is all about, having access to all the tweets can be surprisingly handy if you have the computers and software to parse, process, and analyze all this information.
Case in point is a project from Johns Hopkins University to track the spread of influenza from online tweets. Previously the team managed to track how influenza progressed on a national level, but the reality is that there are great regional and local differences in how the disease spreads. This led them to focus on one city, New York, to see whether local variations can also be discovered. Turns out that by querying billions of tweets and pulling out only ones originating from New York City that actually mention the writer’s own incidence of the flu, they showed that their results tracked closely with traditionally gathered data on the 2012-2013 flu season.
Basic results from the study abstract in PLoS ONE:
Our system’s influenza prevalence estimates were strongly correlated with surveillance data from the Centers for Disease Control and Prevention for the United States (r = 0.93, p < 0.001) as well as surveillance data from the Department of Health and Mental Hygiene of New York City (r = 0.88, p < 0.001). Our system detected the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets from other chatter.