Friday, May 01, 2020

Predicting the Virus

As a complexity theorist I often find myself far more intrigued in what we cannot compute than what we can. 

In 2009 I posted on some predictions of the spread of the H1N1 virus which turned out to be off by two orders of magnitude. I wrote "I always worry that bad predictions from scientists make it harder to have the public trust us when we really need them to." Now we need them to.

We find ourselves bombarded with predictions from a variety of experts and even larger variety of mathematicians, computer scientists, physicists, engineers, economists and others who try to make their own predictions with no earlier experience in epidemiology. Many of these models give different predictions and even the best have proven significantly different than reality. We keep coming back to the George Box quote "All models are wrong, but some are useful."

So why do these models have so much trouble? The standard complaint of inaccurate and inconsistently collected data certainly holds. And if a prediction changes our behavior, we cannot fault the predictor for not continuing to be accurate.

There's another issue. You often here of a single event having a dramatic effect in a region--a soccer game in Italy, a funeral in Georgia, a Bar Mitzvah in New York. These events ricocheted, people infected attended other events that infected others. This becomes a complex process that simple network models can never get right. Plenty of soccer games, funerals and Bar Mitzvahs didn't spread the virus. If a region has hadn't a large number of cases and deaths is it because they did the right thing or just got lucky. Probably something in between but that makes it hard to generalize and learn from experience. We do know that less events means less infection but beyond that is less clear.

As countries and states decide how to open up and universities decide how to handle the fall semester, we need to rely on some sort of predictive models and the public's trust in them to move forward. We can't count on the accuracy of any model but which models are useful? We don't have much time to figure it out.

1 comment:

  1. If you follow the different predictions, the most useful so far have been the original IHME graph-matching model, which actually got the White House to change their tune, and now Youyang Gu's model. It made huge sense when you couldn't trust anything about testing rates. (IHME have been working for weeks on trying to combine an epidemiological model with their graph matching, since the graph matching model cannot handle lifting of restrictions.)

    There is an interesting interview of Chris Murray of IHME on 538 about their model: https://fivethirtyeight.com/features/politics-podcast-how-one-modeler-is-trying-to-forecast-the-toll-of-covid-19/

    A key point that clearly influenced Murray was the original SARS outbreak, where every epidemiological model was showing a pandemic that never happened.

    The SEIRS models (linear transition systems with delays from Susceptible to Exposed to Infected to Recovered and back to Susceptible) were similarly way off early on Covid-19 since the model parameters were such guesses. Youyang Gu's model does incorporate SEIRS models now, which currently seems better than IHME's - though IHME are going to be changing theirs soon.

    To me, the big issue with the simple version of these SEIRS models is something that we as a field know well. There are single parameters for rates and delays between stages of the SEIRS pipeline and hence involve simple branching processes. They expand quickly for the same reason that random graphs are expanders. Real graphs of interactions on a large scale are limited by geography and may be closer to "small worlds" graphs that Jon Kleinberg described, which has polynomial rather than exponential growth. That feels much closer to the kind of seed events you mention. On each local scale, the SEIRS models seem to work much better, but the parameters vary by population density and mobility - each county (or even smaller unit) really deserves its own SEIRS parameters, but the death models depend a lot on how careful one can be among those who are particularly vulnerable - if you let it get into nursing homes or hospitals, you are going to get big skew.

    Right now, maybe Youyang Gu's model is the most useful. We are at a particularly tricky time because of the widely varying responses, both official and societal.

    ReplyDelete