Brief analysis of UK coronavirus DHSC daily death figures

Simple views of the data

The data are the daily deaths from Covid-19 in the UK, announced by DHSC each day, and available at https://coronavirus.data.gov.uk/archive. The data were revised on 29th April to include all deaths certified to be caused by the coronavirus, and not just those in hospital. There was a retrospective revision of all daily figures, and now for each “publication day” there is a whole series of death tallies for each past day. (I previously provided analyses of the pre-29th April data.)

An initial look at the whole archive concludes that

the revisions are insubstantial, so it is reasonable to operate with only the most recent dataset. Of course, this may change in future.
there are strong day-of-the-week effects, but these changed between Fri 17th April and Sat 18th April, for a reason unknown to me. Furthermore, they changed suddenly, with no evidence for change within either half of the dataset. This may also change as new data arrives. The data will be presented here for all days and, for the moment, the statistical analysis will only be applied to Sat 18th April onwards.

This document was prepared using R-markdown and RStudio. To find all the R code used to generate the tables and figures, download the .Rmd file. The only thing you will need to change is the directory to which you have downloaded the data file, in .csv format, for the day to be analysed. The program expects that file to be alphabetically the last file in the directory, as it will be if the only files you keep there are the series of .csv files for death rates from 29th April onwards.

Turning to the data, Table 1 shows the death tallies arranged by day of the week, and Figure 1 plots them by day of the week. Each curve has a much simpler pattern than if we ignored day of the week. The weekly pattern is presumably due to working shifts and administrative arrangements for people involved in different stages of reporting a death.

##            6 Mar- 13 Mar- 20 Mar- 27 Mar- 3 Apr- 10 Apr- 17 Apr- 24 Apr- 1 May- 8 May- 15 May-
## 6Friday       1      1      36     284     714   1152     935    1005     739    626    384   
## 7Saturday     1     18      56     294     760    839    1115     843     621    345    468   
## 1Sunday       0     15      35     214     644    686     498     420     315    269    170   
## 2Monday       1     22      74     374     568    744     559     338     288    210    160   
## 3Tuesday      4     16     149     382    1038   1044    1172     909     693    627    545   
## 4Wednesday    0     34     186     670    1034    842     837     795     649    494          
## 5Thursday     2     43     183     652    1103   1029     727     674     539    428

Table 1. The numbers of deaths arranged by day of the week

Figure 1. The numbers of deaths plotted by day of the week

Next, in Table 2 and Figure 2, we express each figure as a ratio, by dividing it by the figure of exactly one week previously. These ratios are not affected by day of the week effects, and they also incorporate a whole week of change, so are likely to be more reliable indicators. The obvious pattern is that the ratios are high to begin with, and do get gradually lower. The grey line at a ratio of one separates ratios indicating numbers are increasing, from those that indicate a decrease

##            13 Mar- 20 Mar- 27 Mar- 3 Apr- 10 Apr- 17 Apr- 24 Apr- 1 May- 8 May- 15 May-
## 6Friday     1.00   36.00    7.89    2.51   1.61    0.81    1.07    0.74   0.85   0.61  
## 7Saturday  18.00    3.11    5.25    2.59   1.10    1.33    0.76    0.74   0.56   1.36  
## 1Sunday      Inf    2.33    6.11    3.01   1.07    0.73    0.84    0.75   0.85   0.63  
## 2Monday    22.00    3.36    5.05    1.52   1.31    0.75    0.60    0.85   0.73   0.76  
## 3Tuesday    4.00    9.31    2.56    2.72   1.01    1.12    0.78    0.76   0.90   0.87  
## 4Wednesday   Inf    5.47    3.60    1.54   0.81    0.99    0.95    0.82   0.76         
## 5Thursday  21.50    4.26    3.56    1.69   0.93    0.71    0.93    0.80   0.79

Table 2. The ratio of number of deaths to the preceding same day of the week

Figure 2. The ratios plotted against time

How quickly will the death rates fall?

The ratios seem to be currently between 0.7 and 0.9, and we can ask how quickly will the numbers decrease? A transformation of the ratios allows us to see in Table 3 and Figure 3 the number of doublings or halvings per week. In the early days, there were large positive figures, indicating more than two doublings per week, so four times as many people were dying at the end of the week than at the beginning. This was a very steep ascent at the beginning of the epidemic.

Later, the doubling times came down, and did so gradually, and eventually became negative on the 15th April (apart from a couple of rare reversions). Once negative, we can take the signs off, and think of them as the number of halvings in a week. For the figures to drop as fast as they rose, we would need to see the number of halvings per week to rise to more than 2, to match the number of doublings per week of 2. However, the number of halvings hovers around 0.3 to 0.4, suggesting it would take 2.5 to 3 weeks to halve the number of deaths. This clearly indicates a very slow tailing off, certainly compared to the very rapid initial increase.

The slowness of decline must be the result of ineffectiveness in our lockdown. Infectious people are still meeting susceptible people, and passing the infection on. These figures don’t tell us whether these are key workers who use public transport, health workers and hospital patients, care home workers and residents, or people not obeying the lockdown restrictions. Or, indeed, it is possible that non-key workers who are obeying the lockdown restrictions are still not sufficiently protected. I am sure someone connected to SAGE is studying these important questions, with more informative data.

These figures can also be used to note that a delay of one week in imposing the lockdown, at a time when there were two doublings per week, would result in an extension of over four weeks now, to get down to any given level. The sharp rise and slow fall makes those early decisions look very important.

##            13 Mar- 20 Mar- 27 Mar- 3 Apr-  10 Apr- 17 Apr- 24 Apr- 1 May-  8 May-  15 May-
## 6Friday     0.0000  5.1699  2.9798  1.3300  0.6901 -0.3011  0.1042 -0.4435 -0.2394 -0.7051
## 7Saturday   4.1699  1.6374  2.3923  1.3702  0.1427  0.4103 -0.4034 -0.4409 -0.8480  0.4399
## 1Sunday        Inf  1.2224  2.6122  1.5894  0.0911 -0.4621 -0.2458 -0.4150 -0.2277 -0.6621
## 2Monday     4.4594  1.7500  2.3374  0.6029  0.3894 -0.4125 -0.7258 -0.2310 -0.4557 -0.3923
## 3Tuesday    2.0000  3.2192  1.3583  1.4422  0.0083  0.1669 -0.3666 -0.3914 -0.1444 -0.2022
## 4Wednesday     Inf  2.4517  1.8489  0.6260 -0.2963 -0.0086 -0.0743 -0.2927 -0.3937        
## 5Thursday   4.4263  2.0894  1.8330  0.7585 -0.1002 -0.5012 -0.1092 -0.3225 -0.3327

Table 3. The doublings/halvings per week, based on the ratios in Table 2. It is the number of doublings in a week if positive, and the number of halvings in a week if negative.

Figure 3. The doublings/halvings per week plotted against time. For the epidemic to reduce at the same rate it increased, the later figures would have to reach the same magnitude as the earlier figures, but be negative. Unfortunately, the early doublings per week are high, being greater than 2 for a number of weeks. The halvings per week reach only low numbers, and indicate a long “shoulder” to the decline in deaths.

Very rough estimates of the number of remaining deaths

If we assume the ratio from one week to the previous week is really fixed, and the variations we see in the tables are the result of random fluctuations, then we can estimate how many deaths are still to come in this wave of the epidemic. The estimates also assume no change in the lockdown conditions, as they would almost certainly change the ratio.

One method for estimation is to analyse a sequence of days’ data, allowing for day of the week effects, and estimating the rate of decline in numbers. That rate of decline estimates the ratio \(r\) from one week to the next, and comes with a standard error to indicate the imprecision in the estimate. The remaining number of deaths can then be estimated from the number of deaths in the most recent week \(D\) (note that here I always use the most recent week up until the day of the analysis, even when the rate is estimated using data from longer ago). The number next week is expected to be \(r D\), then \(r^2 D\), \(r^3 D\), and so on. Summing this geometric progression gives the estimate of all future deaths as \[ \frac{r D}{1-r}\, . \] 95% confidence intervals on the rate of decline give 95% confidence intervals on the ratio, which can be used to provide a 95% prediction interval on the expected future number of deaths in the first wave.

Note that these prediction intervals do not take all the uncertainties into account. The most recent week’s deaths are also subject to error in a sense. Further, the same figures that are used to calculate the sum of the most recent week’s deaths are often also used in the estimation of the rate. I can see no simple way to take these additional factors into account, but they emphasise the need for caution in interpreting the prediction intervals. A second point along the same lines is that these are approximate prediction intervals, but even if they are exactly right, the actual number of deaths will have random fluctuations around those predictions.

This exercise has been carried out in Tables 4 and 5. It has already been stressed that the estimates depend on the ratio not changing. The prediction intervals are made wide by our uncertainy in estimating the decline. Tables 4 and 5 use different sequences of days, but all the sequences are fairly short, so the estimate of rate of decline is rather fuzzy. It is not so long since the deaths started to decline, so it is nor surprising we can’t estimate the rate sharply. Tables 4 and 5 further restrict their use of data to after the sudden change in day-of-the-week effects on 17th April. Allowing parameters for that change would effectively reduce the number of datapoints, which could only be made up for by going more than one week further back – but, by then, the epidemic was only just turning round. There is a tension between including more data to get more accurate estimates, on the one hand, and using only recent data so we are more sure the evidence is relevant to future rates.

##          API.lo    API.hi      sc.f.      t-Sq?
## 18-Apr 6844.454  19936.28  7.0150552 0.56099176
## 19-Apr 6731.801  20355.67  7.1633722 0.98257898
## 20-Apr 6490.740  18873.08  7.1149217 0.97558955
## 21-Apr 7825.168  20123.19  4.3869476 0.49964421
## 22-Apr 7717.605  20545.62  4.4686375 0.13739225
## 23-Apr 7514.097  14318.33  2.5231255 0.73535093
## 24-Apr 7624.732  10694.00  0.8339079 0.05209972
## 25-Apr 8313.584  12451.38  0.9500947 0.04131417
## 26-Apr 6046.650  14879.12  5.1249463 0.41444900
## 27-Apr 6193.441  16292.86  5.3420552 0.50861382
## 28-Apr 5975.969  14942.61  5.1551607 0.58390094
## 29-Apr 6343.222  20933.84  6.3437193 0.81556628
## 30-Apr 6035.233  19162.34  6.3122673 0.79054643
## 1-May  5961.022  19180.28  6.2951313 0.66683974
## 2-May  4831.718  14402.04  7.4656563 0.79656235
## 3-May  6095.380 637284.22 13.0033320 0.66042357
## 4-May  5498.142 173300.95 14.2251922 0.87418016
## 5-May  5573.556 276067.68 14.0334104 0.94406505
## 6-May  5399.205 129779.57 13.6409687 0.65372086

Table 4. Minimum approximate prediction intervals for expected remaining number of deaths based on estimate and SE of time-slope in a log-linear quasi-Poisson model. Each row is based on data from a fortnight beginning on the day specified on each row, ending with the most recent fortnight. The third column gives the scale factor in the quasi-Poisson model, and the fourth the p-value for adding a squared term in time.

##          API.lo    API.hi     sc.f.     t-Sq?
## 18-Apr 8642.212  11703.71  7.861726 0.3635163
## 19-Apr 8853.009  12322.73  7.697164 0.6897320
## 20-Apr 8766.064  12439.17  8.045540 0.6645400
## 21-Apr 9176.119  13039.01  7.129806 0.9260520
## 22-Apr 9016.989  13456.41  7.484140 0.8947869
## 23-Apr 8547.834  12664.91  6.943987 0.4957216
## 24-Apr 8133.291  12017.65  6.531146 0.1160096
## 25-Apr 8278.543  12837.71  6.614718 0.2032218
## 26-Apr 8552.476  14034.30  6.471509 0.4318464
## 27-Apr 8597.472  14764.95  6.754132 0.5489279
## 28-Apr 8473.870  15245.57  7.232826 0.5629331
## 29-Apr 7904.978  15688.54  7.705796 0.4356090
## 30-Apr 7651.899  16707.79  8.347606 0.3819585
## 1-May  7383.038  18089.90  9.105290 0.3337770
## 2-May  7262.094  21764.17  9.858639 0.3875308
## 3-May  7345.729  30944.40 10.368099 0.5959861
## 4-May  7151.550  42269.05 11.521930 0.6823199
## 5-May  7210.148  84309.11 12.449614 0.8403306
## 6-May  5399.205 129779.57 13.640969 0.6537209

Table 5. As Table 4, but the data used always ends with the most recent day, while the starting day gets one day later with each row. The final row uses exactly a fortnight’s data. This uses more data than Table 4, but it is not balanced over days of the week and, by virtue of being longer, the periods are more likely to have substantial mis-specification.

Table 4 uses exactly one fortnight’s data in each row, but looks at different fortnights back to 18th April. This exactly balances day of the week effects, but keeps the period short and so keeps the uncertainty high. Table 5 uses all the data from a starting point up to the most recent data, and varies that starting point from 18th April up until a fortnight ago. Table 5 thus uses more data, and the prediction intervals should be narrower, especially higher in the table. If the rate of decline is still changing over time, this strictly invalidates the models behind both tables, but Table 4 will be less affected than Table 5, because it discards older data.

The slow rate of decline sadly predicts many more deaths. On 10th May, the prediction intervals go between about 7,000 to over 25,000. These predictions should go down over time, because the future becomes the past, and the remaining deaths should go down. On 17th May, recent poor figures have extended the prediction interval to over 100,000. By 19th May, both the fortnight analysis and the analysis of all days since the change in day of the week pattern both show a 95% confidence interval with an upper limit of over 100,000. Two poor days (this term detracts from the personal tragedy of each death) have made the picture much worse.

The final conclusion is an obvious and very gloomy one, that the epidemic is declining very slowly from a considerable height. We would do well to impose a lockdown much sooner in future waves, and also to take steps if possible to make the lockdown more rigorous, so that the decline happens more quickly.

Some technical points

I end with some technical points about Tables 4 and 5. The scale factor measures the degree of over-disperion. This is high in the higher parts of both tables, and the reason can be seen in the variability in ratio in the final columns of Table 2. These ratios seem to settle down and become less variable. In the analyses beginning on 24th April, the scale factors suddenly plunge below 1, and stays there for two days before going right back up. There is clearly some lumpiness that is not determined by day-of-the-week – I wonder about the cause. It seems when there are no lumps, or possibly when the lumps all occur on matching days of the week, the variability is close to Poisson, but just one lump, or one mis-matched lump, puts the scale factor up to 5.

The final column in Tables 4 and 5 looks at evidence for non-linearity (on the log scale) in the effect of time. The value is a p-value testing for a quadratic effect. With multiple p-values, it is important not to get too excited over occasional values less than 0.05. The two “significant” values on 24th and 25th April correspond to two disappointing ratios on 7th and 8th May, so it looks as though the possible quadratic term would be reducing the rate of decline. The general reason for providing this p-value is that it is one way of looking for mis-specification of the linear model from which the estimate and confidence interval are taken. At some speeds of change, the older Table 5 results would become unreliable and this could be indicated by the p-value for non-linearity.

Brief analysis of UK coronavirus DHSC daily death figures

Alan Grafen

2020-05-19

Simple views of the data

How quickly will the death rates fall?

Very rough estimates of the number of remaining deaths

Some technical points