The dilemma of predictions based upon data from early in epidemics

When new diseases appear, the most serious cases and the majority of deaths tend to take place in hospital. Early data records the number of patients admitted, the treatments that they are given, those that have recovered and those that die. This gives a hospital case fatality rate (CFR). In a study published in the Lancet of the novel Coronavirus, now known as COVID-19, in the first cohort in Wuhan the CFR was 11%¹.

There are two main reasons why this data can be unreliable when projected forwards to different populations.

1. The numerator and denominator issue

Mortality rate = Number of deaths/number of cases of infection

Death is the numerator. Death is hard data. With established and known medical conditions, it is easy to measure the number of people that have died and relatively easy to know the number of people that have been diagnosed and admitted to hospital.

One dilemma in new infectious illnesses is the lack of available and reliable tests. It takes time for these to be developed. If we only make a diagnosis on cases with a positive test we may miss some people who have died without having the diagnosis confirmed. Death is also lagging data, 10-12 days in the China studies so early mortality rates may also underestimate cases which will die later. Both of these factors could make early mortality rates underestimate the real mortality from newly evolving disease.

How about the number of total infections? This is the denominator. This is soft data. Because it takes time to develop cheap and widely available tests there is an inherent bias in the people that are tested. People with less severe symptoms and those who do not go to hospital are less likely to be diagnosed. Obviously the hospital mortality in this situation will significantly overestimate the true severity of the disease.

Imagine a condition of a new disease in which 100 people were admitted to hospital and 10 of them died. This would give a hospital case fatality rate of 10%. But now imagine that we develop a test which shows that only one in every 100 people who developed the illness in the community became unwell enough to go to hospital. The other 99 go to bed at home or some even keep working with sore throats or a mild cough. In this situation, there are actually 10,000 people with the illness, 100 who go to hospital and 10 who die. This means that the true mortality rate of this disease is 0.1%. This is the mortality rate of influenza.

2. Viral Load

A second problem with projections based on early cohorts is the assumption that the disease will always behave in the same way in future clusters. One of the reasons that the AIDS specialist was so wrong in the 1980s related to the issue of viral load. We now know that during the AIDS epidemic, the first clusters were infected with a large amount of virus. Haemophiliacs had high levels of virus in blood transfusions and the small cohort of the LGBT community initially impacted had an increased risk of repetitive exposure to infected individuals. The result was that people died very quickly and this data was projected forwards.

It took time to appreciate that there were factors in the first population which were different to future clusters. The early clusters basically had more serious disease because they were infected with more virus. We know that during SARS viral load was associated with severe disease and this is the case in many other infectious illness.

We have no evidence yet that viral load has had an impact on COVID-19 infection although the severity of disease early in the epidemic in health care workers is highly suggestive. The principle is that we must always be cautious in projecting the behaviour of a disease in one population into subsequent clusters. Measuring mortality early in epidemics is notoriously difficult. As a general rule mortality rates early in epidemics tend to overestimate ultimate mortality and it is common to see the severity of the illness down regulated over time. Furthermore, whilst it is possible for viruses to mutate and become either more infectious or more severe in general it is statistically more likely for coronavirus infections to become less severe over time.

As always better data at least gives the option to make better decisions and the nature of this epidemic (epidemic doubling time approximately 7 days) means that we will get a doubling of our data every week.

To learn more about the differentiation between a disease and an epidemic, click here.

References

Chen, N., Zhou, M., Dong, X., Qu, J., Gong, F., Han, Y., Qiu, Y., Wang, J., Liu, Y., Wei, Y., Xia, J., Yu, T., Zhang, X. and Zhang, L. (2020). Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet.
Wu, Joseph T, et al. “Nowcasting and Forecasting the Potential Domestic and International Spread of the 2019-NCoV Outbreak Originating in Wuhan, China: a Modelling Study.” The Lancet, 31 Jan. 2020, www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30260-9/fulltext.