Projection of Covid-19 Cases: Which Model to Use?

Researchers around the world have been flabbergasted by the extent and pace of Covid-19 spread. Even strong healthcare systems of the western world have been proven to be not 'strong enough' to fight Covid-19, let alone what that means for the healthcare system of developing countries like Bangladesh which has been in complete disarray from the beginning. Scientists of all fields are trying to learn and possibly contribute to bring some new insights and to contribute with their capacity—probably unseen in the recent past. While this combined effort is noteworthy, but we can and need to do more.

Projection modelers are playing a pivotal role in helping the policymakers by providing information as to what is to expect so that the health system can be better prepared. A plethora of models for projections are available that are giving large variations in projected numbers of cases and deaths--this is one of the core features of the modeling, and so not necessarily bad.

While I saw projections are made and used for the western world, for Bangladesh, unfortunately, no such model was seen until the end of April (though I have been learning and doing it since the first week of April--granted, it got unnoticed). I thought it would be a good idea to use my existing econometrics knowledge to do some projection. But jumping into this endeavor, I realized that I need to learn more. So, I delved in. After reading a number of articles, blogs, newspaper reports, preprints, watching videos, I figured out that carrying out projection on Covid-19 cases is not straightforward and it is also not desirable to rely on just one specific model that can explain all countries--every model has its strengths and weaknesses and so what is a good fit for one country can of no use for another country. Even a ‘gold standard’ model developed by the University of Washington also has flaws. For instance, it assumes Gaussian distribution (daily cases and deaths) which means COVID cases will have equal tails, but this characteristic does not follow suit the data for many countries as the most countries who were successful—apparently and at least of today—had a large tail in the right-hand side of the distribution. That does not necessarily mean other models would do better as a lot of yet to know. Since uncertainties and puzzles are teaming up with Covid-19, it is important to bring all dimensions in the table so that policymakers can see various possibilities under the various scenarios. All the projection models I have gone through can be classified broadly into two types:

Theory-based or simulation-based

This approach is incredibly useful when we don't have enough data. The most popular one is the classical epidemiological model: Susceptible, Infected, and Removed (SIR). More sophisticated versions are also available such as SEIR (includes exposed), SEIQR (includes quarantined), so on and so forth. While these models are enormously useful, this type of model tends to overestimate the actual cases. That is why when some researchers of Imperial College did this type of modeling, they had to face a huge backlash. We have to keep in mind that they had to project without data, and their projection can be treated as the possible worst cases that could have happened without any government response--so, it is definitely useful. Of late, other researchers have also come forward to use the same model but they offer estimations of parameters with the actual data. Therefore, we can still use this model as a benchmark. In addition, what would happen if we release or lift lockdown can also be seen using simulation.

Matlab codes and packages are available which can easily be adjusted to fit a country ( or city). I know some R packages can also easily do this type of analysis. However, if one is good at programming and writing one's own codes, one should be able to implement it in any software he/she likes. I have also seen some excel macro to implement the SIR model. Please make sure that you know what you are doing since you might be running codes keeping yourself completely in the Blackbox.

Data-Based Model

a) Time series model: Economists have been doing forecasting using time series models for decades for various economic phenomena. Similar tools can also be very useful for projecting Covid-19 cases. Interestingly, the projection I made using time series models (on April 20), turns to be most accurate for Bangladesh so far. So, there is no way to under-estimate these models. Using non-linear least-square estimates, and curve-fitting tools, several modeling approaches can be implemented: models with exponential, logistics, Gompertz distributional assumption; ARIMA, exponential smoothing; and of course, standard regression (say quadratic, cubic) with trends. So far, non-linear least square models specifically model with Gomperz distributional assumption is performing excellently in forecasting the cases for Bangladesh, where the ARIMA type model and exponential smoothing are providing dismal projection (so I scrapped them, but give it a try if you are doing for other countries). Most of these modeling frameworks can easily be implemented with Stata; surely, a more powerful package/language say Matlab, R, Python can definitely be useful.


b) Machine leaning/Predictive analytics tools: Recently, Facebook Analytics Team has developed a package for forecasting (fbprophet) using which the projection can be made easily. Both R and Python packages are available to run projections. As this package is written by the Facebook analytics team, so it is expected that it is pretty robust. However, so far it is slightly over-estimating the actual cases, especially for Bangladesh, but with a relaxed lockdown, this projection can be right. I see some ML gigs are also writing codes with various algorithms. Particle Swarm Optimization (PSO) algorithm seems to be very popular in forecasting--a paper published in the Bulletin of WHO used this algorithm. In addition, an algorithm/function that can capture logistic type patterns (say hill function) turns out also to be useful. To implement a predictive analytics tool, one will need to be comfortable with running a high-level language (say R, Python, Matlab); Stata/SPSS/Eviews or similar package is not suitable for this kind of activity. Certainly, if someone is an expert user of any programming, he/she can easily write codes and run the model he/she likes.


Due to the nature of Covid-19, finding a model for projecting cases and health is a challenge. Since scientists are still learning, it is a good idea to use various projection models and provide various perspectives under various modeling frameworks. We need a model that can capture as many dimensions as possible--ideally a model that can capitalize on the strength of theoretical and empirical aspects. Clearly, finding such a model is difficult, if not impossible, and this is particularly true for Covid-19. Therefore, a combined effort of various levels of expertise from epidemiologists, economists, machine learning experts, mathematicians, engineers, GIS experts, and other relevant experts can enrich the projection for Covid-19 cases and deaths, and thus, help policymakers to adopt the current policy.
All models are wrong; some are useful
(This is a condensed and a draft version. A detailed and final version is forthcoming where I will include codes in various packages that can directly be implemented in various contexts)

Visit my website to see the projection for Covid-19 Bangladesh using some of the models I just mentioned

https://sites.google.com/site/shafiunihe/recent-work-on-covid-19

Comments