AI-Enhanced Predictive Models to Combat the Next COVID Wave
The identification of potential hot spots, reporting of infection and death rates, and its impact on the demand for health-care services is critical information that the U.S. and other countries need to decide how to restart their economies.
IntroductionPrediction of the novel coronavirus mode and ease of transmission, the extent of its pathogenesis, the impact on population health economics, potential for mutation, the identification of potential hot spots, reporting of infection and death rates, and its impact on the demand for health-care services is critical information that the U.S. and other countries need to decide how to restart their economies.
Predictive data science relies heavily on evidence-based facts, which for the current COVID-19 pandemic are constantly evolving and sometimes unreliable1. Most early statistical models were developed in a rush2 and flawed in some respects, due primarily to the shortage of observational data driven chiefly by limited testing capacity and complicated by test results that produce too many false readings34.
This article analyzes the recent COVID-19 prediction issues and proposes an alternative approach to predict the COVID-19 disease dynamics more effectively. This will enable public-sector leaders to formulate better epidemic control procedures and an exit strategy for the current wave and beyond.
COVID-19 prediction challengesAn effective data model to track and respond to an epidemic requires accurate and statistically significant data on virus reproducibility, transmissibility, fatality rates, herd immunity levels, re-infection rates, mutability, etc. Information across all these parameters has been impossible to get for COVID-19. Forecasting precision largely depends on the availability of reliable data, which at the beginning of an outbreak caused by a new virus is rare, making predictions uncertain5.
Another challenge is the availability of good observational data. This is primarily due to the inability of the U.S. and other countries to conduct mass testing. Also, the variance in immunodiagnostic test types and specimen collection techniques leads to differences in sensitivity and specificity rates and makes it difficult to get the number of people who are or were actually infected. Antibodies can also be developed in response to other forms of coronaviruses like the common cold or seasonal influenza. Given these uncertainties it is hard to determine who is immune, who is asymptomatic or with mild symptoms. Others are symptomatic but have never been tested and are still potential carriers.
There is still no consensus on the absolute characteristics of the highly susceptible group. As new data becomes available, conditions like obesity, diabetes, cardiovascular disease and vitamin D deficiency in addition to age increase the risk of serious complications and mortality from COVID-19 infection. Scientists don’t know if the virus transmission chain has any strong seasonal influence or weather dependency. Any predictive model that works with so many unknown variables will be problematic in some respects.
The Institute for Health Metrics and Evaluation (IHME) model, which the White House relied on, didn’t follow the SEIR (Susceptible, Exposed, Infected, Resistant) or agent-based approach. Instead, it used a computational method that assumed a disease progression rate based on data from other countries like China, Spain, Italy, etc. to extrapolate COVID-19 mortality and hospitalization rates in the U.S. What was missing in this approach were differences in key regional parameters like population characteristics, availability and variance in COVID-19 testing, access to critical care facilities, levels of quarantine measures, social distancing implementation date, etc. All these differed significantly between the U.S. and countries like Italy and China. The projections from this model are an example of a non-mechanistic model and suffered from the fallacy of Farr's Law6, which states that all epidemics tend to follow a roughly symmetrical pattern shifted and scaled to fit any epidemic data. Therefore, IHME’s projections were confusing for policymakers and the public seeking the right guidance7.
Some universities developed models based on traditional epidemiological theory. These early models incorporated estimates of virus contagiousness, transmission process, reproducibility rate, co-morbidity factors that increase the risk of serious illness or death, and the timeframe from infection to actual clinical recovery. SEIR-based models like these, though more precise compared to IHME-like empirical models, were still challenged by evolving knowledge of the COVID-19 virology, insufficient understanding of population risk, and false negative test results that led to underestimating disease spread. In addition, these models generally did not account for the influence of population behavior, local environmental factors and the impact of political decisions on epidemic control. Moreover many of these models do not clearly mention the key assumptions that have been included or the sensitivity to errors accounted in their assumptions8. All of this has impacted their predictions and projections.
The Imperial College London model that predicted high infection and fatality rate actually failed to infer the obvious change in population behavior even in the absence of government-mandated interventions and the variance in virus reproduction (R0) number because of the same.
The epidemiology of the SARS-CoV-2 (COVID-19) is different from two other virulent coronaviruses, SARS-CoV-1 that causes severe acute respiratory syndrome (SARS) and Middle East Respiratory Syndrome coronavirus (MERS-CoV). According to CIDRAP at the University of Minnesota, models which were initially based on these pathogens do not provide useful guidance in predicting what to expect with the current pandemic9. The most recent UMass Amherst’s ensemble type model (commonly used in weather predictions) triangulates a comparative prediction from multiple models. This approach may incorporate the uncertainty and errors already integrated into existing models it has referenced to develop their prediction hypothesis.
Insufficient information to determine the right exit strategyThe COVID-19 pandemic has compelled countries to apply wartime rules, lockdown, shelter in place, curfew, closed borders and so on, to suppress the virus spread. All these measures have had devastating social and financial consequences — major fiscal deficits, skyrocketing unemployment, and negative growth rates.
The future course of the virus is a matter of great debate. The epidemic trend shows some positive signs that the COVID-19 pandemic is moderating; e.g., fewer new vs. predicted cases. Many leaders have started to think about how to best transition from the lockdown phase. Others have moved ahead without much preparation for the potential fallout. The focus has now switched to adoption of an exit strategy that will guide the loosening of government restrictions in place and take concrete steps toward the new normal.
Epidemic forecasting models are considered to be the guiding tools for decision-makers trying to contain an outbreak and are equally important in defining an exit strategy. The models should help government and task force leaders answer critical questions such as:
- When and how it’s safe to reopen the economy?
- What’s the plan if, as many experts predict, a second wave of the virus happens in a few months?
- If that happens, what population factors will determine the probable location of the next epicenter?
- In a new epicenter, how will hospitalizations, ICU admissions and ventilators be planned for and managed?
- And finally, how do we get back to the “new normal” with less disruption to the economy and to an individual’s livelihood?
Enabling better predictions to navigate the next waveWhile the epidemic appears to be moderating, the crisis is far from over. So what guidance do we need to begin to emerge from lockdown? The World Health Organization warns that abruptly ending a lockdown order could result in new outbreaks10. Many experts are concerned about having one more COVID-19 waves in late fall or early winter of this year, conceivably sooner depending on how the reopening is handled. Public health decision-makers need better information and insights into early warning indicators to navigate the various post-lockdown scenarios.
All the most promising new models that provide COVID-19 predictions are under considerable scrutiny within the scientific community. How can we do this better? Why not adopt a cutting-edge artificial intelligence (AI) model designed around an artificial neural network-based, dynamic epidemiological model, which is adaptive, built to scale, automated, and semi or unsupervised in learning. This would be one way to address some of the shortcomings of earlier models. Though a universally accepted and large-scale COVID-19 testing report will not be immediately available to make a reliable denominator11, an advanced AI model could provide more comprehensive results than the current COVID-19 predictive models.
A model like this can be self-sustaining and will require a reduced number of data pipelines to learn and predict compared to current models that have long learning curves and depend on a large volume of correctly labeled training data. Getting this data is a challenge due to limited testing and under-reporting of mortality when deaths happen outside of the hospital. Where the current models tend to be backward-looking and there is an inevitable “adjustment delay,” the recommended AI model would be much closer to real time in prediction with continuous learning and adjustment to any new changes being made to the input parameters. An AI model would have the ability to discover complex patterns, auto-detect anomalies, self-learn and self-heal, and judge the accuracy of the variables (e.g., variance in test results) to produce reliable results.
COVID-19 does not follow an identical path across all regions of the world. The disease spreads unevenly within countries and regions with varying prevalence and fatality rates. Regional population characteristics such as age distribution, socio-economic status, percentage of older adults with comorbidities, and risk factors (e.g., smoking, obesity, drug dependencies), regional and environmental factors such as population size, density, individual mobility, and social distancing effects, etc., are key parameters in the model development.
The model should look beyond the assumption that every individual in a population subset has the same chance of catching the infection. Instead, the population should be subdivided into smaller groups — by their individual socio-economic, demographic, education level, unhealthy habits and health status. The number of infected individuals who quarantined and no longer can spread the infection should be incorporated into the model as well.
The training data pipeline of the model should consist of large regional population data sets similar to what Medicare and Medicaid collect. These data sets would include each individual’s demographic details, clinical variables like valid COVID-19 test reports, clinical data — treatment records, health risks, lab results, etc. This should be augmented with social determinants of health — living conditions, education, transportation links, access to care, social contacts, mobility and so forth. This of course would be integrated with COVID-19-related local epidemiological factors like the infection rate over time, contact rate, case fatality reports, personal preventive hygiene practices (hand washing, face mask adoption, social distancing), environmental factors like temperature, and incorporate a meta-analysis of COVID-19 golden research studies.
The model developed with the above guiding principles can help provide realistic insights into the disease progression and more granular patient-centric behavior in response to the virus. It will guide task force leaders to proactively detect, track and quarantine COVID-19 vulnerable individuals to keep the infection transmission to a manageable level. Through early prediction of the virus epicenter and disease hot spot trends, the model will help in the optimal allocation of personal protective equipment and ensure region-specific readiness for health-care services demand.
Comprehensive AI-driven intelligence like this will eventually help the government implement a soft and directed (compared to a harsh and wide) quarantine policy; e.g., at-risk individual, household or zone-level social distancing — that will form the basis of a prudent exit strategy to reopen and keep the economy open with confidence.
ConclusionOur health system is currently overwhelmed by the actual and projected COVID-19 hospitalization and case fatality rates. Many of the deaths being reported in conjunction to coronavirus cases are due to underlying health conditions exacerbated by the infection. In any epidemic, modeling is very important to help public authorities stay on top of the situation and make data-based decisions. This is especially true for a virus like COVID-19 that is an unknown entity and likely more dangerous than any other virus since the Spanish flu12. Until we have a standard testing mechanism, better test coverage and a proven treatment regime, predictions need to be unbiased, consistent and realistic.
Unfortunately, most of the existing models remain asymmetric in their risk calculation and do not provide granular enough guidance about how to manage the pandemic going forward. Now is the time to adopt a new deep learning AI model built with an ideal blend of epidemiology, bio-informatics and health economics that will provide more statistically relevant forecasts on the actual nature and course of the virus. This will in turn become a tool for governments to make rational decisions that help contain the spread of the virus, mitigating its impact on population health and the economy by adopting a robust, well-informed exit strategy.
About the Authors
Eric Paternoster is Chief Executive Officer of Infosys Public Services, an Infosys subsidiary focused on public sector in US and Canada. In this role, he oversees company strategy and execution for profitable growth, and advises public sector organizations on strategy, technology and operations. He also serves on the Boards of Infosys Public Services and the McCamish subsidiary of Infosys BPM.
Eric has over 30 years of experience in public sector, healthcare, consulting and business technology with multiple firms. Prior to his current role, he was Senior Vice President and Head of Insurance, Healthcare and Life Sciences business unit, where he grew the business from $90 million to over $700 million with 60+ clients across Americas, Europe and Asia. Eric joined Infosys in 2002 as Head of Business Consulting for Eastern US and Canada.
Prior to joining Infosys, Eric was a partner with Ernst & Young, where he led financial services consulting in e-commerce, profitability improvement, activity-based costing, IT strategy, and web channel development. As a partner at Accenture’s (Andersen Consulting) Financial Services practice, Eric led the implementation of a country-wide mortgage processing platform for the largest building society in Ireland, and order-ship-bill system for Procter & Gamble’s North American business.
Eric has hands-on experience in healthcare. As Vice President (acting) of Anthem Blue Cross Blue Shield, he was responsible for overseeing application development. He also led the implementation of a new medical and lost-time claims and prescription processing system for the largest state-run workers’ compensation system in the US, and directed multiple M&A systems programs.
Eric is a frequent speaker at healthcare and public sector forums and a contributor to industry publications and analysts on strategy, industry trends, and organizational competitiveness. He has been quoted in publications such as Wall Street Journal, Forbes, Politico, Modern Healthcare, HealthLeaders, and Insurance Business Review.
Eric holds a Master of Business Administration with a concentration in finance from the University of Cincinnati, and a bachelor’s degree in engineering from the US Military Academy. He served in the US Army, leading infantry units in Korea and the US, and left as a Captain.
Principal Consultant and Head, Government Healthcare Analytics Solutions, Infosys Public Services; Member, Editorial Board of Telehealth and Medicine Today Peer Review Journal
Dr. De is head of government healthcare analytics for Infosys Public Services. He has extensive experience in the public healthcare sector and previously worked for the World Health Organization, UNICEF and the Indian Public Health Association.
At Infosys, Dr. De leads the area of advanced data science and artificial intelligence-enabled population health, social determinants of health analytics, opioid management, care management, and value-based care. He is a frequent public speaker at various healthcare conferences, forums and at major universities, including the Massachusetts Institute of Technology.
Dr. De is based in Hartford, Connecticut. He holds a medical degree from the University of Calcutta and master’s degree in healthcare administration from the Tata Institute of Social Sciences in Mumbai, India.
1Who Is John Ioannidis? American Institute for Economic Research; April 19, 2020
2Special report: The simulations driving the world’s response to COVID-19; How epidemiologists rushed to model the coronavirus pandemic; Nature.com; April 2, 2020
3COVID-19 Testing: Challenges, Limitations and Suggestions for Improvement; Hu, Es; Aril 9,2020
4COVID-19 testing: overcoming challenges in the next phase of the epidemic; Stat news; March 31, 2020
5Forecasting the novel coronavirus COVID-19; Fotios Petropoulos, Spyros, Makridakis; March 31, 2020
6Caution Warranted: Using the Institute for Health Metrics and Evaluation Model for Predicting the Course of the COVID-19 Pandemic; Nicholas P. Jewell, PhD; Joseph A. Lewnard, PhD; and Britta L. Jewell, PhD; Annals of Internal Medicine
7Influential Covid-19 model uses flawed methods and shouldn’t guide U.S. policies, critics say; statnews; April 17, 2020
8Predictive Mathematical Models of the COVID-19 Pandemic, Underlying Principles and Value of Projections; Nicholas P. Jewell, PhD; Joseph A. Lewnard, PhD; Britta L. Jewell, PhD; JAMA; April 16, 2020
9COVID-19: The CIDRAP Viewpoint; April 30, 2020
10“Coronavirus: 'Deadly resurgence' if curbs lifted too early, WHO warns.” BBC News, April 10,2020;
11Denominator matters in estimating COVID-19 mortality rates; Bamba Gaye, Anouar Fanidi, Xavier Jouven; European Heart Journal; April 7, 2020.
12Experts predict ‘significant COVID-19 activity’ in US for up to 2 years; NY Post; May 1,2020