07 July 2020: 08.24

Sheila Bird, formerly Programme Leader, MRC Biostatistics Unit, Cambridge Institute of Public Health, CB2 0SR
Bent Nielsen, Department of Economics and Nuffield College University of Oxford

Supported by the European Research Council (grant 694262, DisCont)

Now-casting for England: By applying statistical now-casting, we do not need to wait as long as eight days to get a decent estimate (with specified uncertainty) for hospitalized COVID-19 deaths in England. The below plots show now-casts of deaths for England. See Regions & Archive for regions and for older plots.

Weekly reporting: The number of cases have now fallen to a much lower level than seen at the peak of the epidemic. We will now move to weekly updating of the web site. This will be on Saturdays. (24 May 2020)

Interpretation of above figure: The above figure shows rolling now-casts of hospitalized deaths in England with 95% error bands. On the y-axis we indicate counts of deaths. On the x-axis we indicate the most recent dates-of-death for which we have data. These data were released at the date indicated in the legend box. The sums of announced deaths are indicated with black crosses.
The solid, first half of the curve goes through crosses. These are counts for older dates-of-death. They are still being updated. A few more cases, perhaps about 20-30?, may be added over the next few weeks. We ignore this discrepancy and consider these data as final. Said in another way, the method targets the total number of cases released after 10 days. We expect that the final numbers will be 20-30 (?) higher. We will soon have enough data to analyse this. (22 Apr 2020)
The dashed, second half of the curve goes through plus-symbols. These are now-casts of the counts for more recent dates-of-deaths. Each of these now-casts come with vertical lines indicating approximate 95% error bands. It is seen that the error bands are larger for the more recent dates.
As the title indicates the analysis is based on data from the 7 most recent reporting dates and the 10 most recent dates-of-deaths. The graph appears to be robust to varying the former number. The latter number is chosen, since most cases are reported within 10 days.
We note a curious dip in the number of cases on 31 March. The local peak for all of England in the period 29 March - 22 April was on the 8 April. (22 Apr 2020)
It seems as if the number of cases are gradually decreasing. This means that the daily reported numbers, which include the delay, tend to be above the nowcasts for the number of cases for the most recent date-of-death. For instance on 21 Apr 776 cases were reported. Of these 654 related to the 10 most recent days, 11-20 Apr. The nowcast for 20 Apr is 579 (95% error band 479-680) on 22 Apr. (22 Apr 2020)
Jumps for the most recent date-of-death occur when the reported number, which is the first day reporting delay is out of line with previous first day reporting delays. See recursive plots below.
The error bands tend to widen when the delay distribution has been less stable in recent days. See delay plots below.
The number of cases reported in weekends and published on Sundays and Mondays tend to be lower than for other days. This has an effect on the now-casts. These have a tendency to drop with Sunday and Monday releases and then come up again the following days. The weekend pattern has been a bit different for each of the past weekends so we do not have a method to correct for this effect. In most instances the final numbers are within the approximate 95% error bands. Statistically, there should be a 5% chance that the final number does not fall within the error bands. (26 Apr 2020)

Now-casting for English regions: Plots similar to that above are also done for the English regions. The interpretation is the same. As the number of cases are smaller, the statistical uncertainty is larger.
We are now beginning to see that the method for computing confidence bands breaks down as the number of cases become smaller. This is seen for the North East and Yorkshire and for the South West and This methodological issue is enhanced for regions with shorter reporting delay such as for North East and Yorkshire (13 May 2020)

View all Regions

The reporting delay: Each day NHS England reports information on deaths of patients who have died in hospitals in England and had tested positive for COVID-19 at time of death. All deaths are recorded against the date of death rather than the date the deaths were announced. NHS points out that the totals reported on any day may not include all deaths that occurred on that day or on recent prior days.

Reporting delay is a well-known feature of death statistics (Bird, 2013). Interpretation of the NHS figures should take into account the fact that totals by date of death, particularly for most recent days, are likely to be updated in future releases. NHS England writes on the reporting delay in the data: "Interpretation of the figures should take into account the fact that totals by date of death are likely to be updated in future releases for more recent dates. For example, a positive result for COVID-19 may occur days after confirmation of death. Cases are only included in the data when the positive COVID-19 test result is received, or death certificate confirmed with COVID-19 mentioned. This results in a lag between a given date of death and exhaustive daily death figures for that day." (13 May 2020)

Method: The method adjusts for overall delay (across age-groups and regions) in the reporting-in of hospitalized COVID-19 deaths for England. The method self-adapts to temporal changes in the reporting-distribution and deliberately does not parameterize how we expect the trajectory to look a priori. Specifically, we chose an over-dispersed Poisson model with an age-cohort specification. Specifically, we chose an over-dispersed Poisson model with an age-cohort specification. This method corresponds to the chain-ladder method used in general insurance for estimating unknown liabilities (England, Verrall 2002). We apply a recent theory for uncertainty of estimates and now-casts in the presence of over-dispersion (Harnau, Nielsen 2018) extending (Martínez-Miranda, Nielsen, Nielsen, 2015, 2016). After some experimentations we settled for an approach that only exploits data from the 7 most recent reporting days. This is because the delay distribution varies over time. As a consequence, the now-casts may jump from one reporting day to the next when there are shifts in the data. An alternative, would be a more smooth approach that would appear more stable over time, but it would have less ability to follow the shifts in the data.
The number of cases have by now decreased considerably since the peak. At the same time the delay distribution has become tighter, although still varying considerably throughout out the week. The parameters of the method have been adjusted so as to use data from the 5 most recent reporting days and 7 most recent dates-of-death. (24 May 2020)

Software: We used an adapted version of the R package apc (Nielsen, 2015). Download: apc from CRAN and further documentation and development version.
Additional code and data is needed. Download: code from [17 Apr 2020: 8.43] and (daily updated) data from CovidReportingNHS.xlsx. This contains five R files and one data file in xlsx format. Update the parameters (drive & choice of region & choice of destination for plots) in CovidReporting_Main_16apr2020.r and run in R. Further instructions on Regions & Archive page.

Recursive nowcasts: These help in tracking the performance of the forecasts over time. In the below figure the black crosses, plusses and lines are the same as in the above graph; that is using the most recently reported data. The red crosses, plusses and lines are drawn using the data reported one day earlier. And so on. Now, consider, for instance, the date-of-death 5 days ago. The 5 crosses of different colours show how the information about the number of cases grow day by day. Higher up there are five errors bands of matching colours. They tend to get narrower. Looking at an older data-of-death we see that the final observation tend to be included in all error bands for that date.

Delay distributions: A difficulty for the analysis is that the delay distribution changes over time. This may depend on local administrative features at hospitals as well as effects from weekends and bank holidays

For England, the delay distributions appear to be quite stable recently. A question remains, whether there are weekend effects in the data.
The numbers reported on Sundays and Monday are often low, but not systematically so. There could be weekend effects, some times not. With the data arriving on Tuesdays we tend to get more confidence in the data. (2 May 2020)
The delay distributions seems to have become less stable recently. Friday 15 May 2020 (date-of-death) seems a bit unusual, followed by a weekend effect. (20 May 2020)

Regions & Archive: See Regions & Archive for regions and for older plots.

Short term forecasting: Castle, Doornik & Hendry present short term forecasts for a variety of countries.

David Spiegelhalter on Twitter (16 Apr 2020) The Scientist (18 May 2020)

Bird SM. Editorial: Counting the dead properly and promptly. Journal of the Royal Statistical Society Series A 2013; 176: 815 - 817.
England PD, Verrall RJ. Stochastic claims reserving in general insurance. British Actuarial Journal 2008; 8: 443 - 518.
Harnau J, Nielsen B. Over-dispersed age-period-cohort models. Journal of the American Statistical Association 2018; 113: 1722 - 1732.
Martínez-Miranda MD, Nielsen B, Nielsen JP. Inference and forecasting in the age-period-cohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society Series A 2015; 178: 29 - 55.
Martínez-Miranda MD, Nielsen B, Nielsen JP. Simple benchmark for mesothelioma projection for Great Britain. Occupational and Environmental Medicine 2016; 73: 561 - 563.
Nielsen B. apc: An R package for age-period-cohort analysis. The R Journal 2015; 7: 52 - 64.