Preliminary analyses on data from the DHSC coronavirus deaths archive

The Department of Health and Social Security (DHSC) changed its reporting methods for coronavirus deaths on 29th April 2020, including providing on each “publication day” figures for cumulative and new deaths on each previous day. In analyses of the pre-29th April data, I found there were strong day-of-the-week effects, and also that these changed on a particular day. The previous analyses were mainly interested in whether and when the epidemic would shrink rather than grow. Now, the emphasis is on how fast is the epidemic declining, and how many deaths are likely to occur by the end of this first wave.

Before carrying out the main analyses, I made various checks on the data, and these are presented here in this document. They are

Whether there are many revisions of data. It turns out not, and so it is reasonable to analyse just the most recent publication day’s account of the historical record.
Whether there are day-of-the-week effects – there are. Whether they change over time – they do. It turns out that they change suddenly between Friday 17th April and Saturday 18th April. The implication is that without making lots of assumptions, the only practical way to look at the future is to ignore data before 18th April.

Now you know the conclusions, you probably don’t want to know how I reached them. Turn back now!

Is the historical record being changed frequently?

First, all the data is read in. If repeating this exercise, you can use my function get_data_from_folder if you point it at a directory that contains only the .csv versions of the DHSC death data from 29th April to the present. The code reads it into to a single data-frame, adds ‘day’ for the day of the deaths reported, ‘pubday’ for the day the dataset was published, and then prints out all the cases where data has changed. There is by 10th May only one, which was probably a mistake in transmission of a number (confusing a 6 and a 9 and requiring an adjustment of 30).

get_data_from_folder<-function(dir,cols,text="Select one of the data files",addIDcol=FALSE,IDcolName="filename"){
    
ordir<-getwd()

if (missing(dir)) {
    message("before flush.console\n")
    cat("\n")}

# message("before flush.console")

# This crashed R    
# flush.console()

# message("after flush.console")

if (missing(dir)) {
    a<-file.choose()
    dir<-dirname(a)}

files<-list.files(dir,no..=TRUE)

setwd(dir)

# Second read each one in turn, and do the small tasks

for (i in files) {
    cat(i)
    resin<-read.csv(i)[,cols]
    if (addIDcol) {resin[,ncol(resin)+1]<-i; if (is.element(IDcolName,colnames(resin))) stop("IDcolName is already a column name in the datasets"); colnames(resin)[ncol(resin)]<-IDcolName}
    if (i!=files[1]) resdf<-rbind(resdf,resin) else resdf<-resin
}

# Third, remove the unneeded original copy of the last file and dataframe, and reset wd

remove(resin)
setwd(ordir)
return(resdf)}

## This line gets the data for me -- it is commented out so if necessary, it will prompt you for the relevant folder,
##   which should contain DHSC datafiles from from after the 29th April 2020
deathspost<-get_data_from_folder("~/OneDrive - Nexus365/Coronavirus stuff/Data/deaths/posteq29",addIDcol=TRUE)

## coronavirus-deaths_202004292132.csvcoronavirus-deaths_202004301702.csvcoronavirus-deaths_202005011647.csvcoronavirus-deaths_202005021621.csvcoronavirus-deaths_202005031513.csvcoronavirus-deaths_202005041501.csvcoronavirus-deaths_202005051554.csvcoronavirus-deaths_202005061549.csvcoronavirus-deaths_202005071534.csvcoronavirus-deaths_202005081443.csvcoronavirus-deaths_202005091431.csvcoronavirus-deaths_202005101500.csvcoronavirus-deaths_202005111457.csvcoronavirus-deaths_202005121454.csvcoronavirus-deaths_202005131522.csvcoronavirus-deaths_202005141344.csvcoronavirus-deaths_202005151534.csvcoronavirus-deaths_202005161352.csvcoronavirus-deaths_202005171653.csvcoronavirus-deaths_202005181505.csvcoronavirus-deaths_202005191711.csv

## Adding day and pubday - this converts the dates within the files to days counting from 6th March 2020, and converts
##  the filenames into pubdays i.e. publication days. Days are the day on which the deaths were reported, while pubday
##  is the the datafile was released. The point of the distinction is that sometimes a figure is revised after first
##  publication, and this is seen on later pubdays.
add.days<-function(daf){
    
repyear<-with(daf,as.numeric(substr(Reporting.date,1,4)))
repmonth<-with(daf,as.numeric(substr(Reporting.date,6,7)))
repday<-with(daf,as.numeric(substr(Reporting.date,9,10)))
day<-repday-5
day<-ifelse(repmonth>=4,day+31,day)
day<-ifelse(repmonth>=5,day+30,day)
day<-ifelse(repmonth>=6,day+31,day)
day<-ifelse(repmonth>=7,day+30,day)
day<-ifelse(repmonth>=8,day+31,day)
daf$day<-day

pyear<-with(daf,as.numeric(substr(filename,20,23)))
pmonth<-with(daf,as.numeric(substr(filename,24,25)))
pday<-with(daf,as.numeric(substr(filename,26,27)))
pubday<-pday-5
pubday<-ifelse(pmonth>=4,pubday+31,pubday)
pubday<-ifelse(pmonth>=5,pubday+30,pubday)
pubday<-ifelse(pmonth>=6,pubday+31,pubday)
pubday<-ifelse(pmonth>=7,pubday+30,pubday)
pubday<-ifelse(pmonth>=8,pubday+31,pubday)
daf$pubday<-pubday

return(daf)

}

deathspost<-add.days(deathspost)

myxt<-xtabs(~day+pubday+Area.name,data=deathspost)
max(myxt)

## [1] 1

min(myxt)

## [1] 0

sum(myxt)

## [1] 6111

dim(deathspost)

## [1] 6111    9

## This should be max=1, min=0.

## This looks for variation in the reports for one day, across different publication days. And then looks
##  for the datapoints that show the difference. At the time of writing, there is only one large revision
##   in all the datafiles, but of course it persists through all subsequent publication days. There are two
##   small changes in Northern Ireland, where it looks like one death was initially misallocated by one day.
myag<-aggregate(deathspost$Daily.change.in.deaths,by=with(deathspost,list(day=day,Area.name=Area.name)),FUN=sd)
myag0<-subset(myag,x!=0)
print(myag0)

##     day        Area.name         x
## 126  65 Northern Ireland 0.4045199
## 127  66 Northern Ireland 0.3162278
## 257  55   United Kingdom 6.5465367
## 267  65   United Kingdom 0.4045199

for (j in (1:nrow(myag0))) print(subset(deathspost,day==myag0$day[j] & Area.name==myag0$Area.name[j]))

##             Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2638 Northern Ireland N92000002    Nation     2020-05-09                      5
## 2934 Northern Ireland N92000002    Nation     2020-05-09                      5
## 3235 Northern Ireland N92000002    Nation     2020-05-09                      4
## 3541 Northern Ireland N92000002    Nation     2020-05-09                      4
## 3852 Northern Ireland N92000002    Nation     2020-05-09                      4
## 4168 Northern Ireland N92000002    Nation     2020-05-09                      4
## 4489 Northern Ireland N92000002    Nation     2020-05-09                      4
## 4815 Northern Ireland N92000002    Nation     2020-05-09                      4
## 5146 Northern Ireland N92000002    Nation     2020-05-09                      4
## 5482 Northern Ireland N92000002    Nation     2020-05-09                      4
## 5823 Northern Ireland N92000002    Nation     2020-05-09                      4
##      Cumulative.deaths                            filename day pubday
## 2638               427 coronavirus-deaths_202005091431.csv  65     65
## 2934               427 coronavirus-deaths_202005101500.csv  65     66
## 3235               426 coronavirus-deaths_202005111457.csv  65     67
## 3541               426 coronavirus-deaths_202005121454.csv  65     68
## 3852               426 coronavirus-deaths_202005131522.csv  65     69
## 4168               426 coronavirus-deaths_202005141344.csv  65     70
## 4489               426 coronavirus-deaths_202005151534.csv  65     71
## 4815               426 coronavirus-deaths_202005161352.csv  65     72
## 5146               426 coronavirus-deaths_202005171653.csv  65     73
## 5482               426 coronavirus-deaths_202005181505.csv  65     74
## 5823               426 coronavirus-deaths_202005191711.csv  65     75
##             Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2929 Northern Ireland N92000002    Nation     2020-05-10                      3
## 3230 Northern Ireland N92000002    Nation     2020-05-10                      4
## 3536 Northern Ireland N92000002    Nation     2020-05-10                      4
## 3847 Northern Ireland N92000002    Nation     2020-05-10                      4
## 4163 Northern Ireland N92000002    Nation     2020-05-10                      4
## 4484 Northern Ireland N92000002    Nation     2020-05-10                      4
## 4810 Northern Ireland N92000002    Nation     2020-05-10                      4
## 5141 Northern Ireland N92000002    Nation     2020-05-10                      4
## 5477 Northern Ireland N92000002    Nation     2020-05-10                      4
## 5818 Northern Ireland N92000002    Nation     2020-05-10                      4
##      Cumulative.deaths                            filename day pubday
## 2929               430 coronavirus-deaths_202005101500.csv  66     66
## 3230               430 coronavirus-deaths_202005111457.csv  66     67
## 3536               430 coronavirus-deaths_202005121454.csv  66     68
## 3847               430 coronavirus-deaths_202005131522.csv  66     69
## 4163               430 coronavirus-deaths_202005141344.csv  66     70
## 4484               430 coronavirus-deaths_202005151534.csv  66     71
## 4810               430 coronavirus-deaths_202005161352.csv  66     72
## 5141               430 coronavirus-deaths_202005171653.csv  66     73
## 5477               430 coronavirus-deaths_202005181505.csv  66     74
## 5818               430 coronavirus-deaths_202005191711.csv  66     75
##           Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 4    United Kingdom K02000001        UK     2020-04-29                    765
## 250  United Kingdom K02000001        UK     2020-04-29                    795
## 501  United Kingdom K02000001        UK     2020-04-29                    795
## 757  United Kingdom K02000001        UK     2020-04-29                    795
## 1018 United Kingdom K02000001        UK     2020-04-29                    795
## 1284 United Kingdom K02000001        UK     2020-04-29                    795
## 1555 United Kingdom K02000001        UK     2020-04-29                    795
## 1831 United Kingdom K02000001        UK     2020-04-29                    795
## 2112 United Kingdom K02000001        UK     2020-04-29                    795
## 2398 United Kingdom K02000001        UK     2020-04-29                    795
## 2689 United Kingdom K02000001        UK     2020-04-29                    795
## 2985 United Kingdom K02000001        UK     2020-04-29                    795
## 3286 United Kingdom K02000001        UK     2020-04-29                    795
## 3592 United Kingdom K02000001        UK     2020-04-29                    795
## 3903 United Kingdom K02000001        UK     2020-04-29                    795
## 4219 United Kingdom K02000001        UK     2020-04-29                    795
## 4540 United Kingdom K02000001        UK     2020-04-29                    795
## 4866 United Kingdom K02000001        UK     2020-04-29                    795
## 5197 United Kingdom K02000001        UK     2020-04-29                    795
## 5533 United Kingdom K02000001        UK     2020-04-29                    795
## 5874 United Kingdom K02000001        UK     2020-04-29                    795
##      Cumulative.deaths                            filename day pubday
## 4                26097 coronavirus-deaths_202004292132.csv  55     55
## 250              26097 coronavirus-deaths_202004301702.csv  55     56
## 501              26097 coronavirus-deaths_202005011647.csv  55     57
## 757              26097 coronavirus-deaths_202005021621.csv  55     58
## 1018             26097 coronavirus-deaths_202005031513.csv  55     59
## 1284             26097 coronavirus-deaths_202005041501.csv  55     60
## 1555             26097 coronavirus-deaths_202005051554.csv  55     61
## 1831             26097 coronavirus-deaths_202005061549.csv  55     62
## 2112             26097 coronavirus-deaths_202005071534.csv  55     63
## 2398             26097 coronavirus-deaths_202005081443.csv  55     64
## 2689             26097 coronavirus-deaths_202005091431.csv  55     65
## 2985             26097 coronavirus-deaths_202005101500.csv  55     66
## 3286             26097 coronavirus-deaths_202005111457.csv  55     67
## 3592             26097 coronavirus-deaths_202005121454.csv  55     68
## 3903             26097 coronavirus-deaths_202005131522.csv  55     69
## 4219             26097 coronavirus-deaths_202005141344.csv  55     70
## 4540             26097 coronavirus-deaths_202005151534.csv  55     71
## 4866             26097 coronavirus-deaths_202005161352.csv  55     72
## 5197             26097 coronavirus-deaths_202005171653.csv  55     73
## 5533             26097 coronavirus-deaths_202005181505.csv  55     74
## 5874             26097 coronavirus-deaths_202005191711.csv  55     75
##           Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2639 United Kingdom K02000001        UK     2020-05-09                    346
## 2935 United Kingdom K02000001        UK     2020-05-09                    346
## 3236 United Kingdom K02000001        UK     2020-05-09                    345
## 3542 United Kingdom K02000001        UK     2020-05-09                    345
## 3853 United Kingdom K02000001        UK     2020-05-09                    345
## 4169 United Kingdom K02000001        UK     2020-05-09                    345
## 4490 United Kingdom K02000001        UK     2020-05-09                    345
## 4816 United Kingdom K02000001        UK     2020-05-09                    345
## 5147 United Kingdom K02000001        UK     2020-05-09                    345
## 5483 United Kingdom K02000001        UK     2020-05-09                    345
## 5824 United Kingdom K02000001        UK     2020-05-09                    345
##      Cumulative.deaths                            filename day pubday
## 2639             31587 coronavirus-deaths_202005091431.csv  65     65
## 2935             31587 coronavirus-deaths_202005101500.csv  65     66
## 3236             31586 coronavirus-deaths_202005111457.csv  65     67
## 3542             31586 coronavirus-deaths_202005121454.csv  65     68
## 3853             31586 coronavirus-deaths_202005131522.csv  65     69
## 4169             31586 coronavirus-deaths_202005141344.csv  65     70
## 4490             31586 coronavirus-deaths_202005151534.csv  65     71
## 4816             31586 coronavirus-deaths_202005161352.csv  65     72
## 5147             31586 coronavirus-deaths_202005171653.csv  65     73
## 5483             31586 coronavirus-deaths_202005181505.csv  65     74
## 5824             31586 coronavirus-deaths_202005191711.csv  65     75

#subset(deathspost,day==55 & Area.name=="United Kingdom")

This is a low enough rate of correction that it is fine to analyse just the most recent day’s data.

Day-of-the-week effects

In this section, I fit quasi-Poisson log-linear models to predict the number of deaths from

day (t), which runs from 1 on 6th March 2020 and increases each day. t is a treated as a continuous variable
day_of_the_week (dow), which is a factor with seven levels
variables (breakpoint1, breakpoint2, etc) that allow the straight line (on a log scale) against day to have kinks at a prespecified set of breakpoints. Thus, I am fitting linear splines against t. I try various intervals, but it turns out one kink a week is enough.

lastf<-deathspost$filename[nrow(deathspost)]
dft<-subset(deathspost,filename==lastf)
dft$filename<-NULL
dft<-subset(dft,Area.type=="UK")
Deaths<-dft$Daily.change.in.deaths[nrow(dft):1]
## We now have all the data we are carrying forward in Deaths

ndays<-100
downames<-c("1Sunday","2Monday","3Tuesday","4Wednesday","5Thursday","6Friday","7Saturday")[c(6:7,1:5)]
dow<-factor(rep(downames,length=ndays))
t<-c(1:ndays)
week<-rep(1:100,each=7,length=ndays)
weeknames<-c("6 Mar-","13 Mar-","20 Mar-","27 Mar-","3 Apr-","10 Apr-","17 Apr-","24 Apr-","1 May-","8 May-", "15 May-")

fulldf<-data.frame(t=t,dow=dow,week=week)
df<-fulldf[1:length(Deaths),]
df$Deaths<-Deaths
df$wts<-1  

rm(Deaths,t,dow,week)

## This fits a log-linear model, allowing splines, that is, the straight line for time is allowed to change 
##  direction at each of the "breaklocations". It also uses quasi-Poisson not Poisson, and allows 
##  some variables to be added *after* the spline variables. This allows us to test for a variable,
##  controlling for all the spline variables. The model terms to go in after the spline variables go in "model2"
fap4.pq<-function(model,model2,from=1,to=nrow(df),breaklocations,outp,control){
    dftemp<-df[c(from:to),]
    if (!missing(control)) mycontrol<-control else mycontrol<-glm.control()
    m<-glm(model,family=quasipoisson,weights=wts,data=dftemp, control=mycontrol) 
    if (missing(breaklocations)) stop("fap4.p requires breaklocations to be specified")
    modeldev<-"~."
    for (ix in 1:length(breaklocations)) {
    dftemp[,ncol(dftemp)+1]<-with(dftemp,ifelse(t>=breaklocations[ix],(t-breaklocations[ix]),0))
    varname<-paste0("breakvar",as.character(ix))
    colnames(dftemp)[ncol(dftemp)]<-varname
    modeldev<-paste0(modeldev,"+",varname)
    }   
    if (!missing(model2)) modeldev<-paste0(modeldev,"+",model2)
    m<-update(m,modeldev)
    m$st.res<-(m$y - m$fitted.values)/sqrt(m$fitted.values)
    if (!missing(outp)) if(outp) {
        print(anova(m,test="Chisq"))
        print(summary(m))
        plot(dftemp$t,m$st.res)
        }
    return(m)
}


## First, we ask how many breaks are need to adequately represent the changes in Death rate. Up until 14th May,
##  weekly splines were significant, but the p-value on that day became greater than 0.05. It's clear that
##  there is change over the period, as the epidemic has grown and shrunk. However, the non-significant later
##  kinks in the spline are outweighing the highly significant early kinks. After the kink-frequency ANOVA table, 
##  I'll show the significance of the individual kinks in the order fitted (with the ANOVA table) and then on 
##  a fully-adjusted basis (with the coefficients table in the summary).
##  
mf3.5<-fap4.pq(Deaths~dow+t,breaklocations=seq(3.5,nrow(df)-2,by=3.5))
mf7<-fap4.pq(Deaths~dow+t,breaklocations=seq(7,nrow(df)-2,by=7))
mf14<-fap4.pq(Deaths~dow+t,breaklocations=seq(14,nrow(df)-2,by=14))
anova(mf14,mf7,mf3.5,test="F")

## Analysis of Deviance Table
## 
## Model 1: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 + 
##     breakvar5
## Model 2: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 + 
##     breakvar5 + breakvar6 + breakvar7 + breakvar8 + breakvar9 + 
##     breakvar10
## Model 3: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 + 
##     breakvar5 + breakvar6 + breakvar7 + breakvar8 + breakvar9 + 
##     breakvar10 + breakvar11 + breakvar12 + breakvar13 + breakvar14 + 
##     breakvar15 + breakvar16 + breakvar17 + breakvar18 + breakvar19 + 
##     breakvar20
##   Resid. Df Resid. Dev Df Deviance      F  Pr(>F)  
## 1        62     904.09                             
## 2        57     755.44  5   148.65 2.2523 0.06447 .
## 3        47     627.69 10   127.75 0.9678 0.48334  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(mf7,test="F")

## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##            Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                          74    24887.1                       
## dow         6   2239.7        68    22647.4  28.6334 1.629e-15 ***
## t           1   4620.1        67    18027.3 354.3983 < 2.2e-16 ***
## breakvar1   1   3088.2        66    14939.1 236.8872 < 2.2e-16 ***
## breakvar2   1   4552.0        65    10387.1 349.1762 < 2.2e-16 ***
## breakvar3   1   5158.7        64     5228.4 395.7141 < 2.2e-16 ***
## breakvar4   1   3488.1        63     1740.3 267.5651 < 2.2e-16 ***
## breakvar5   1    908.0        62      832.3  69.6477 1.839e-11 ***
## breakvar6   1     49.2        61      783.2   3.7713   0.05709 .  
## breakvar7   1     22.5        60      760.6   1.7266   0.19411    
## breakvar8   1      0.5        59      760.1   0.0393   0.84360    
## breakvar9   1      2.8        58      757.3   0.2138   0.64554    
## breakvar10  1      1.9        57      755.4   0.1465   0.70332    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(mf7)

## 
## Call:
## glm(formula = Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + 
##     breakvar4 + breakvar5 + breakvar6 + breakvar7 + breakvar8 + 
##     breakvar9 + breakvar10, family = quasipoisson, data = dftemp, 
##     weights = wts, control = mycontrol)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.5686  -2.1900   0.3001   1.8532   7.2637  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.722754   3.131334  -0.550  0.58436    
## dow2Monday     0.013376   0.088887   0.150  0.88092    
## dow3Tuesday    0.678948   0.077380   8.774 3.62e-12 ***
## dow4Wednesday  0.558755   0.079885   6.994 3.25e-09 ***
## dow5Thursday   0.502663   0.080354   6.256 5.47e-08 ***
## dow6Friday     0.592977   0.078916   7.514 4.42e-10 ***
## dow7Saturday   0.499876   0.080175   6.235 5.92e-08 ***
## t              0.427011   0.484999   0.880  0.38232    
## breakvar1     -0.167643   0.546768  -0.307  0.76026    
## breakvar2     -0.006654   0.147765  -0.045  0.96424    
## breakvar3     -0.087883   0.061984  -1.418  0.16168    
## breakvar4     -0.100101   0.032792  -3.053  0.00344 ** 
## breakvar5     -0.080800   0.024523  -3.295  0.00170 ** 
## breakvar6      0.001683   0.023803   0.071  0.94389    
## breakvar7     -0.018736   0.025263  -0.742  0.46135    
## breakvar8     -0.003831   0.028044  -0.137  0.89183    
## breakvar9      0.003205   0.032304   0.099  0.92131    
## breakvar10     0.019278   0.050343   0.383  0.70319    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 13.03647)
## 
##     Null deviance: 24887.07  on 74  degrees of freedom
## Residual deviance:   755.44  on 57  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

## This pattern is consistent with a big change in slope over weeks 4 and 5. Then, the sequential tests would
##  show significance for all the early kinks, because the early kink allows a steep slope before it, and
##  a lower slope afterwards, even if the pattern afterwards is up then down. Once we have the kinks where
##  there really are changes, none of others are called for.  It is worth them all in in for now, I
##  think. This is because we make sure that our inferences are robust against the possibility of changes in
##  the slope, even if they're non-significant. After a while, it may be sensible to mix the periods, and have
##  more frequent kinks in the period of rapid change, and less frequent kinks elsewhere. The problem with a 
##  rapid response is that it's unsatisfactory to make inevitably data-dependent choices of dropping variables, 
##  which will have unknown effects on the various tests we're interested in carrying out. So for now, I'll
##  keep all the weekly kinks, while I think about a good way to remove unwanted parameters. There is also the
##  problem of orthogonality of the slopes on time (t plus various breakpoint variables) with the parameters of 
##  dow, making it attractive to keep changes in slope to the same day of the week.

## Just to zoom in on where the changes happen, here are the days of the breakpoints
seq(7,nrow(df)-2,by=7)

##  [1]  7 14 21 28 35 42 49 56 63 70

## So, fap4.pq makes the change occur on the day of the breakpoint, and the kink has happened on the day after.
##  So changes that come into force on days 29 and 36, which are the 12th and 19th April. The UK lockdown
##  was imposed in a speech on 23rd March, and so in effect came into force on 24th March. That is 2 weeks and
##  5 days before we see the first effect, with the current set of breakpoints. That is about the likely time
##  for death rates to take to change in response to behaviour. We would need a slightly different set of 
##  analyses to ask whether there was any evidence of the previous "advice" having an effect, and to ask
##  how long the change in rate went on for, before the new rate stabilised. I may do that soon, but for now
##  return to the "preliminary" task of looking at the data from day 44 on.

## In the pre-29 April 2020 data, there was a clear change in the day-of-the-week effects, when the reporting 
##  scheme changed. Here, I check for whether these effects change in the post-29 April 2020 data. With no 
##  particular place to look, I look once a week.
##
## So, you're about to see an analysis for the day-of-the-week effects changing on each day in the sequence 7, 
##   14. 21... The analyses are performed, and the loop stores the day of the break (bp), the p-value for
##   the interaction stept(t,bp):dow, and the deviance drop associated with that term.
##
daya<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowpa<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowdda<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp in seq(7,nrow(df)-2,by=7)) {
  keepday<-keepday+1
  m1<-fap4.pq(Deaths~t+dow,model2="stept(t,bp)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
  daya[keepday]<-bp; myan1<-anova(m1,test="F")
  tdowpa[keepday]<-myan1[["dow:stept(t, bp)","Pr(>F)"]]
  tdowdda[keepday]<-myan1[["dow:stept(t, bp)","Deviance"]]}

print(cbind(daya,tdowpa,tdowdda))

##       daya       tdowpa   tdowdda
##  [1,]    7 9.918231e-01   7.06580
##  [2,]   14 8.686297e-01  34.89396
##  [3,]   21 9.500604e-01  22.96617
##  [4,]   28 7.518096e-02 147.72760
##  [5,]   35 5.534445e-02 156.24339
##  [6,]   42 3.603667e-05 323.18326
##  [7,]   49 4.229035e-04 278.03592
##  [8,]   56 3.643366e-02 169.14059
##  [9,]   63 1.513385e-01 122.98038
## [10,]   70 8.296277e-02  86.19991

## There are places where a change looks likely, from the p-values. But we have looked at quite a few p-values.
##   What about just a general trend in the day-of-the-week effects?
m2<-fap4.pq(Deaths~t+dow,model2="t*dow",breaklocations=seq(7,nrow(df)-2,by=7))
anova(m2,test="F")

## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##            Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                          74    24887.1                       
## t           1   4543.4        73    20343.7 531.1761 < 2.2e-16 ***
## dow         6   2316.4        67    18027.3  45.1361 < 2.2e-16 ***
## breakvar1   1   3088.2        66    14939.1 361.0457 < 2.2e-16 ***
## breakvar2   1   4552.0        65    10387.1 532.1883 < 2.2e-16 ***
## breakvar3   1   5158.7        64     5228.4 603.1178 < 2.2e-16 ***
## breakvar4   1   3488.1        63     1740.3 407.8026 < 2.2e-16 ***
## breakvar5   1    908.0        62      832.3 106.1519 4.616e-14 ***
## breakvar6   1     49.2        61      783.2   5.7480   0.02021 *  
## breakvar7   1     22.5        60      760.6   2.6315   0.11093    
## breakvar8   1      0.5        59      760.1   0.0599   0.80768    
## breakvar9   1      2.8        58      757.3   0.3259   0.57059    
## breakvar10  1      1.9        57      755.4   0.2233   0.63855    
## t:dow       6    305.3        51      450.2   5.9485 9.165e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Yes!!  But, actually not as big a deviance drop as the sudden one.

## To find the "best" breakpoint by day, we do each day in the area where the weekly checks showed some interest
daya<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowpa<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowdda<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp in seq(36,48)) {
  keepday<-keepday+1
  m1<-fap4.pq(Deaths~t+dow,model2="stept(t,bp)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
  daya[keepday]<-bp; myan1<-anova(m1,test="F")
  tdowpa[keepday]<-myan1[["dow:stept(t, bp)","Pr(>F)"]]
  tdowdda[keepday]<-myan1[["dow:stept(t, bp)","Deviance"]]}
print(cbind(daya,tdowpa,tdowdda))

##       daya       tdowpa  tdowdda
##  [1,]   36 6.088055e-02 152.2619
##  [2,]   37 1.971647e-02 179.6369
##  [3,]   38 8.043819e-03 209.7833
##  [4,]   39 8.201195e-04 251.2190
##  [5,]   40 6.965331e-05 314.7435
##  [6,]   41 3.069034e-05 314.1271
##  [7,]   42 3.603667e-05 323.1833
##  [8,]   43 4.842176e-06 339.6750
##  [9,]   44 8.973474e-05 308.3149
## [10,]   45 2.232488e-04 292.9550
## [11,]   46 6.151372e-05 318.0819
## [12,]   47 6.320745e-04 259.1012
## [13,]   48 3.814862e-04 280.0304

### So the biggest drops are on days 40, 41, 42, 43 (43 is top by a bit). They are in Easter week.
###
###

##  We now ask whether the best breakpoint for dow effects removes the t:dow interaction, in a still smaller 
##   range. We do this by seeing whether the time trend in day-of-the-week effects disappears when the sudden 
##   change is controlled for.
##
## Note that we are fitting splines which we hope take account of the slowly changing progress of the epidemic
##  well enough that we can test for day-of-the-week effects.
##
## The purpose of this series of analyses is that a single sudden change would show no general time trend if the
##   breakpoint tested was the actual day of the change. But if you test with a nearby breakpoint, then the time 
##  trend will appear, because the sudden change gives the appearance of a trend in the half of the data the real 
##  breakpoint appears in. Thus, finding no general time trend (t:dow) is what we'd expect at the true breakpoint, 
##  and we'd expect to see general time trends at the breakpoints either side. On 10th May, the t:dow term is
##  statistically significant, though only at 0.03 which, given the number of tests we are doing, is not very
##  significant. However, I need to look out for evidence that the day of the week effects are continuing
##  to change, so I will watch this space.
##

daya<-array(NA,dim=length(c(39:44)))
tdowpa3<-array(NA,dim=length(daya))
dowsteptpa3<-array(NA,dim=length(daya))
tdowpa3a<-array(NA,dim=length(daya))
dowsteptpa3a<-array(NA,dim=length(daya))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp2 in 39:44) {
keepday<-keepday+1
#print(paste("bp2 breakpoint for the step change =",bp2))
m3<-fap4.pq(Deaths~t+dow,model2="t*dow+stept(t,bp2)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
m3a<-fap4.pq(Deaths~t+dow,model2="stept(t,bp2)*dow+t*dow",breaklocations=seq(7,nrow(df)-2,by=7))
myan3<-anova(m3,test="F")
myan3a<-anova(m3a,test="F")
daya[keepday]<-bp2
tdowpa3[keepday]<-myan3["t:dow","Pr(>F)"]
dowsteptpa3[keepday]<-myan3["dow:stept(t, bp2)","Pr(>F)"]
tdowpa3a[keepday]<-myan3a["t:dow","Pr(>F)"]
dowsteptpa3a[keepday]<-myan3a["dow:stept(t, bp2)","Pr(>F)"]
}   

print(cbind(daya,tdowpa3,dowsteptpa3,tdowpa3a,dowsteptpa3a))

##      daya      tdowpa3 dowsteptpa3   tdowpa3a dowsteptpa3a
## [1,]   39 3.036495e-05  0.03183323 0.01286501 1.824297e-04
## [2,]   40 2.526584e-05  0.01750210 0.02565121 1.863205e-05
## [3,]   41 3.386411e-05  0.11919666 0.16577104 2.559897e-05
## [4,]   42 2.290734e-05  0.02112048 0.04369465 1.287349e-05
## [5,]   43 1.009671e-05  0.02529283 0.12103626 3.209837e-06
## [6,]   44 7.718275e-05  0.18901913 0.20970406 7.044746e-05

## These show that when the time trend is earlier in the model (columns 2 and 3), the step change is highly
##  significant on day 43, but the time trend itself remains highly significant. Whereas, when the step change
##  is earlier (columns 4 and 5), it is most significant on day 43, and on that day the time trend is barely 
##  significant. Thus, the most economical conclusion is that there is a step change on day 43.

## The conclusion on day-of-the-week effects is that on day 43, there was a change to the way the weeks worked, and
##  it involves a step change in number of deaths as well as change in dow effects. For marginality reasons, the 
##  main effect change would be not unexpected.

## The p-value of the time-trend interaction with dow in column 4 was down to 0.01 on 12th May, and so I've
##   decided to do some further checking. This t:dow term actually looks for where the difference between
##   pre-day 44 plus day 44 up to the break is different from after the break. I've been dropping the pre-
##   day 44 data, and so the inclusion of that isn't really relevant to the main analyses. Thus, although
##   this may need revisiting, I'm accepting the step change at day 43, and presuming that the new
##   significance is down to changes after that day.
##
## 
##
##  Thus, I've added the following code, to look for a second change in day of the week effects, just in the
##    data after day 44, so the break at day 43 is already done subsetting the data, and need not be included
##    in the models. The significance of 0.01 for t:dow in the model we've just seen may well be reduced in just
##    the post-day-44 data, as the "weight" of the changes in days 1 to 43 is no longer present. These next
##    analyses look for such significance, and also check whether data at the beginning (first loop) or end 
##    (second loop) of the period are creating the call for the t:dow interaction (if any).
##    

for (begday in c(44:48)) {
  print(begday)
  print(anova(
   fap4.pq(Deaths~t+dow,model2="t*dow",from=begday,to=nrow(df),breaklocations=seq(49,nrow(df)-2,by=7)),test="F"))
}

## [1] 44
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         31     3801.1                       
## t          1  1558.46        30     2242.7 211.8428 7.577e-10 ***
## dow        6  2054.00        24      188.7  46.5339 1.871e-08 ***
## breakvar1  1    12.39        23      176.3   1.6847    0.2153    
## breakvar2  1     1.60        22      174.7   0.2175    0.6482    
## breakvar3  1     4.86        21      169.8   0.6606    0.4300    
## breakvar4  1     2.30        20      167.5   0.3133    0.5845    
## t:dow      6    62.57        14      105.0   1.4175    0.2756    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 45
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         30     3391.6                       
## t          1  1202.41        29     2189.2 152.2316 1.496e-08 ***
## dow        6  2012.17        23      177.0  42.4587 8.443e-08 ***
## breakvar1  1     2.12        22      174.9   0.2685    0.6131    
## breakvar2  1     1.02        21      173.9   0.1293    0.7250    
## breakvar3  1     4.57        20      169.3   0.5781    0.4606    
## breakvar4  1     2.19        19      167.1   0.2773    0.6074    
## t:dow      6    62.37        13      104.8   1.3161    0.3172    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 46
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         29     3384.4                       
## t          1  1386.49        28     1997.9 175.5188 1.594e-08 ***
## dow        6  1820.86        22      177.0  38.4179 3.769e-07 ***
## breakvar1  1     4.36        21      172.6   0.5525    0.4716    
## breakvar2  1     1.91        20      170.7   0.2420    0.6316    
## breakvar3  1     5.17        19      165.6   0.6548    0.4342    
## breakvar4  1     2.45        18      163.1   0.3103    0.5877    
## t:dow      6    66.72        12       96.4   1.4077    0.2886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 47
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         28     3384.3                       
## t          1  1539.61        27     1844.7 180.8760 3.575e-08 ***
## dow        6  1695.00        21      149.7  33.1886 2.003e-06 ***
## breakvar1  1     2.70        20      147.0   0.3169    0.5848    
## breakvar2  1     0.14        19      146.9   0.0162    0.9009    
## breakvar3  1     4.00        18      142.9   0.4695    0.5074    
## breakvar4  1     2.16        17      140.7   0.2537    0.6244    
## t:dow      6    45.50        11       95.2   0.8909    0.5334    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 48
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         27    2856.99                       
## t          1  1102.14        26    1754.85 127.8772 5.096e-07 ***
## dow        6  1605.17        20     149.68  31.0403 6.616e-06 ***
## breakvar1  1    17.75        19     131.94   2.0591    0.1818    
## breakvar2  1     0.52        18     131.42   0.0603    0.8109    
## breakvar3  1     3.79        17     127.62   0.4402    0.5220    
## breakvar4  1     3.38        16     124.25   0.3917    0.5454    
## t:dow      6    36.18        10      88.06   0.6997    0.6568    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

for (endday in c(65:nrow(df))) {
  print(endday)
  print(anova(
   fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=endday,breaklocations=seq(49,nrow(df)-2,by=7)),test="F"))
}

## [1] 65
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         21    2004.42                       
## t          1   486.93        20    1517.49 114.9031 0.0001224 ***
## dow        6  1402.17        14     115.32  55.1456 0.0002084 ***
## breakvar1  1     8.25        13     107.07   1.9458 0.2218429    
## breakvar2  1    13.31        12      93.77   3.1396 0.1366119    
## breakvar3  1    31.30        11      62.47   7.3856 0.0418926 *  
## breakvar4  0     0.00        11      62.47                       
## t:dow      6    41.05         5      21.42   1.6143 0.3078742    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 66
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         22    2303.20                       
## t          1   694.59        21    1608.60 163.9420 1.394e-05 ***
## dow        6  1490.49        15     118.11  58.6325 4.598e-05 ***
## breakvar1  1     8.90        14     109.21   2.1009    0.1974    
## breakvar2  1     9.56        13      99.64   2.2571    0.1837    
## breakvar3  1     6.92        12      92.73   1.6328    0.2485    
## breakvar4  0     0.00        12      92.73                       
## t:dow      6    67.06         6      25.66   2.6382    0.1315    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 67
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         23    2697.61                       
## t          1   965.20        22    1732.41 249.6025 9.863e-07 ***
## dow        6  1612.64        16     119.77  69.5052 6.971e-06 ***
## breakvar1  1     7.98        15     111.79   2.0648   0.19388    
## breakvar2  1    11.60        14     100.19   3.0002   0.12686    
## breakvar3  1     6.32        13      93.87   1.6346   0.24182    
## breakvar4  0     0.00        13      93.87                       
## t:dow      6    66.44         7      27.42   2.8637   0.09738 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 68
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         24    2697.66                       
## t          1   857.80        23    1839.86 244.6436 2.785e-07 ***
## dow        6  1710.48        17     129.39  81.3042 9.905e-07 ***
## breakvar1  1    11.84        16     117.55   3.3763   0.10344    
## breakvar2  1     5.03        15     112.52   1.4352   0.26521    
## breakvar3  1     0.76        14     111.76   0.2161   0.65438    
## breakvar4  0     0.00        14     111.76                       
## t:dow      6    83.42         8      28.34   3.9654   0.03846 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 69
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         25    2729.32                       
## t          1   872.73        24    1856.60 195.7758 2.061e-07 ***
## dow        6  1724.61        18     131.99  64.4791 6.935e-07 ***
## breakvar1  1    13.83        17     118.15   3.1034   0.11197    
## breakvar2  1     3.63        16     114.53   0.8136   0.39056    
## breakvar3  1     2.27        15     112.25   0.5096   0.49340    
## breakvar4  0     0.00        15     112.25                       
## t:dow      6    71.76         9      40.49   2.6830   0.08889 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 70
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         26    2798.25                       
## t          1   940.11        25    1858.15 224.9353 3.501e-08 ***
## dow        6  1723.33        19     134.81  68.7224 1.502e-07 ***
## breakvar1  1    16.12        18     118.69   3.8579   0.07790 .  
## breakvar2  1     2.73        17     115.96   0.6532   0.43779    
## breakvar3  1     3.58        16     112.38   0.8561   0.37663    
## breakvar4  0     0.00        16     112.38                       
## t:dow      6    70.22        10      42.16   2.8002   0.07253 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 71
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         27    2898.96                       
## t          1  1040.67        26    1858.29 248.9962 2.147e-08 ***
## dow        6  1707.75        20     150.55  68.1008 1.569e-07 ***
## breakvar1  1     9.12        19     141.42   2.1829   0.17034    
## breakvar2  1     8.12        18     133.31   1.9421   0.19364    
## breakvar3  1     0.01        17     133.30   0.0014   0.97125    
## breakvar4  1    20.92        16     112.38   5.0055   0.04923 *  
## t:dow      6    70.22        10      42.16   2.8002   0.07253 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 72
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         28    2934.42                       
## t          1  1059.00        27    1875.41 167.1846 5.382e-08 ***
## dow        6  1709.54        21     165.88  44.9807 4.154e-07 ***
## breakvar1  1     9.80        20     156.08   1.5465    0.2395    
## breakvar2  1     2.58        19     153.50   0.4070    0.5365    
## breakvar3  1     2.60        18     150.90   0.4100    0.5351    
## breakvar4  1     1.52        17     149.39   0.2397    0.6341    
## t:dow      6    79.44        11      69.94   2.0902    0.1371    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 73
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         29     3364.3                       
## t          1  1357.12        28     2007.2 160.2551 2.659e-08 ***
## dow        6  1837.49        22      169.7  36.1632 5.291e-07 ***
## breakvar1  1     9.16        21      160.6   1.0813    0.3189    
## breakvar2  1     3.87        20      156.7   0.4572    0.5117    
## breakvar3  1     1.21        19      155.5   0.1428    0.7122    
## breakvar4  1     0.15        18      155.3   0.0175    0.8970    
## t:dow      6    51.88        12      103.5   1.0209    0.4569    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 74
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         30     3799.3                       
## t          1  1677.53        29     2121.7 212.6019 1.957e-09 ***
## dow        6  1949.30        23      172.4  41.1741 1.018e-07 ***
## breakvar1  1     8.34        22      164.1   1.0575    0.3225    
## breakvar2  1     5.12        21      159.0   0.6488    0.4350    
## breakvar3  1     0.44        20      158.5   0.0564    0.8160    
## breakvar4  1     1.88        19      156.7   0.2383    0.6336    
## t:dow      6    52.17        13      104.5   1.1020    0.4117    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 75
## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         31     3801.1                       
## t          1  1558.46        30     2242.7 211.8428 7.577e-10 ***
## dow        6  2054.00        24      188.7  46.5339 1.871e-08 ***
## breakvar1  1    12.39        23      176.3   1.6847    0.2153    
## breakvar2  1     1.60        22      174.7   0.2175    0.6482    
## breakvar3  1     4.86        21      169.8   0.6606    0.4300    
## breakvar4  1     2.30        20      167.5   0.3133    0.5845    
## t:dow      6    62.57        14      105.0   1.4175    0.2756    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=45,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=46,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=47,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=67,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=66,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=65,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")

## The conclusion is that the there is one day on which the time-trend interaction is significant, but
##   with a p-value of only 0.03 (on 13th May), and the residual DF are small anyway. So, it seems
##   reasonable to ignore that time-trend interaction for the moment.
##

The results show we should assume that the day-of-the-week effects change between 17th and 18th April. We now go on to look at a few useful figures from the analysis of post-day-44 data. It appears that the day-of-the-week effects are stronger in the second period. The splines started off showing a significant change in slope, but don’t so in the post-day-44 period. So, there’s no evidence from the splines that the rate of change of deaths with time is anything but linear (on the log scale). That would mean a geometric decrease in deaths with the ratio being exp(7*(the slope of t)) for week-to-week comparisons.

## Thus, we are safe to use the data from day 44 onwards, and ignore changes in day-of-the-week effects. 
##   This allows us to use just a fortnight's data at a time.

## So the best model, keeping the step change but not the smooth change is
m4<-fap4.pq(Deaths~t+dow,model2="stept(t,43)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
anova(m4,test="F")

## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##                  Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                                74    24887.1                       
## t                 1   4543.4        73    20343.7 637.9307 < 2.2e-16 ***
## dow               6   2316.4        67    18027.3  54.2075 < 2.2e-16 ***
## breakvar1         1   3088.2        66    14939.1 433.6079 < 2.2e-16 ***
## breakvar2         1   4552.0        65    10387.1 639.1463 < 2.2e-16 ***
## breakvar3         1   5158.7        64     5228.4 724.3312 < 2.2e-16 ***
## breakvar4         1   3488.1        63     1740.3 489.7619 < 2.2e-16 ***
## breakvar5         1    908.0        62      832.3 127.4860 2.312e-15 ***
## breakvar6         1     49.2        61      783.2   6.9032   0.01139 *  
## breakvar7         1     22.5        60      760.6   3.1604   0.08153 .  
## breakvar8         1      0.5        59      760.1   0.0719   0.78969    
## breakvar9         1      2.8        58      757.3   0.3914   0.53441    
## breakvar10        1      1.9        57      755.4   0.2682   0.60684    
## stept(t, 43)      1     45.5        56      709.9   6.3901   0.01468 *  
## dow:stept(t, 43)  6    339.7        50      370.2   7.9489 4.842e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(m4)

## 
## Call:
## glm(formula = Deaths ~ t + dow + breakvar1 + breakvar2 + breakvar3 + 
##     breakvar4 + breakvar5 + breakvar6 + breakvar7 + breakvar8 + 
##     breakvar9 + breakvar10 + stept(t, 43) + dow:stept(t, 43), 
##     family = quasipoisson, data = dftemp, weights = wts, control = mycontrol)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.7561  -1.5971   0.3722   1.5103   3.7042  
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -1.555675   2.299561  -0.677  0.50183    
## t                           0.425619   0.355944   1.196  0.23744    
## dow2Monday                  0.058743   0.092029   0.638  0.52618    
## dow3Tuesday                 0.388743   0.084820   4.583 3.08e-05 ***
## dow4Wednesday               0.371248   0.084148   4.412 5.46e-05 ***
## dow5Thursday                0.382310   0.082982   4.607 2.84e-05 ***
## dow6Friday                  0.412347   0.082787   4.981 7.94e-06 ***
## dow7Saturday                0.257993   0.089986   2.867  0.00605 ** 
## breakvar1                  -0.166151   0.401269  -0.414  0.68060    
## breakvar2                  -0.006643   0.108915  -0.061  0.95161    
## breakvar3                  -0.088337   0.045777  -1.930  0.05933 .  
## breakvar4                  -0.098385   0.024310  -4.047  0.00018 ***
## breakvar5                  -0.087372   0.018671  -4.679 2.23e-05 ***
## breakvar6                  -0.041645   0.023235  -1.792  0.07913 .  
## breakvar7                   0.039829   0.028127   1.416  0.16297    
## breakvar8                  -0.016361   0.021065  -0.777  0.44099    
## breakvar9                   0.006118   0.023683   0.258  0.79721    
## breakvar10                  0.021100   0.037155   0.568  0.57265    
## stept(t, 43)               -0.011993   0.138691  -0.086  0.93143    
## dow2Monday:stept(t, 43)    -0.092518   0.131711  -0.702  0.48567    
## dow3Tuesday:stept(t, 43)    0.547147   0.115791   4.725 1.91e-05 ***
## dow4Wednesday:stept(t, 43)  0.391763   0.119450   3.280  0.00190 ** 
## dow5Thursday:stept(t, 43)   0.263807   0.121419   2.173  0.03457 *  
## dow6Friday:stept(t, 43)     0.412798   0.118176   3.493  0.00101 ** 
## dow7Saturday:stept(t, 43)   0.410305   0.120413   3.407  0.00130 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 7.122038)
## 
##     Null deviance: 24887.07  on 74  degrees of freedom
## Residual deviance:   370.25  on 50  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

## NB The p-values for stept and stept*dow are a bit bigger than in the previous series, when we now omit 
##     the t:dow term, as its own p-value was 0.12.
# This shows the before and after values. 
coef(m4)[3:8]

##    dow2Monday   dow3Tuesday dow4Wednesday  dow5Thursday    dow6Friday 
##    0.05874294    0.38874350    0.37124824    0.38230991    0.41234704 
##  dow7Saturday 
##    0.25799283

coef(m4)[(length(coef(m4))-5):length(coef(m4))]

##    dow2Monday:stept(t, 43)   dow3Tuesday:stept(t, 43) 
##                -0.09251845                 0.54714682 
## dow4Wednesday:stept(t, 43)  dow5Thursday:stept(t, 43) 
##                 0.39176297                 0.26380748 
##    dow6Friday:stept(t, 43)  dow7Saturday:stept(t, 43) 
##                 0.41279843                 0.41030524

coef(m4)[3:8]+coef(m4)[(length(coef(m4))-5):length(coef(m4))]

##    dow2Monday   dow3Tuesday dow4Wednesday  dow5Thursday    dow6Friday 
##   -0.03377551    0.93589032    0.76301121    0.64611739    0.82514547 
##  dow7Saturday 
##    0.66829807

#  The dow effects get stronger, not weaker. Sunday's effect is zero by construction. The stept value 
coef(m4)[length(coef(m4))-6]

## stept(t, 43) 
##  -0.01199331

#  is present every day, so 
c(7*coef(m4)[length(coef(m4))-6],sum(coef(m4)[(length(coef(m4))-5):length(coef(m4))]))

## stept(t, 43)              
##  -0.08395318   1.93330249

# are the summed weekly effects of (i) stept (ii) dow-changes


## This new model m5 is from=44, and shows that dow is still important after-eq day 44
m5<-fap4.pq(Deaths~t+dow,from=44,breaklocations=seq(49,nrow(df)-2,by=7))
m5a<-fap4.pq(Deaths~t,model2="dow",from=44,breaklocations=seq(49,nrow(df)-2,by=7))
anova(m5,test="F")

## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         31     3801.1                       
## t          1  1558.46        30     2242.7 191.7934 1.039e-11 ***
## dow        6  2054.00        24      188.7  42.1298 2.581e-10 ***
## breakvar1  1    12.39        23      176.3   1.5253    0.2311    
## breakvar2  1     1.60        22      174.7   0.1969    0.6620    
## breakvar3  1     4.86        21      169.8   0.5981    0.4484    
## breakvar4  1     2.30        20      167.5   0.2837    0.6002    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m5a,test="F")

## Analysis of Deviance Table
## 
## Model: quasipoisson, link: log
## 
## Response: Deaths
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                         31     3801.1                       
## t          1  1558.46        30     2242.7 191.7934 1.039e-11 ***
## breakvar1  1    15.45        29     2227.2   1.9013    0.1832    
## breakvar2  1     0.21        28     2227.0   0.0258    0.8740    
## breakvar3  1     0.54        27     2226.5   0.0659    0.8000    
## breakvar4  1     0.14        26     2226.4   0.0168    0.8982    
## dow        6  2058.83        20      167.5  42.2288 2.527e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## And now to get the slopes against day, in the different zones of the splines.
nbps<-length(seq(49,nrow(df)-2,by=7))
myc<-coef(m5)
mycs<-myc[c(2,9:(9+nbps-1))]
cumsum(mycs)

##           t   breakvar1   breakvar2   breakvar3   breakvar4 
## -0.06377520 -0.02280185 -0.03951459 -0.03331947 -0.01217119

## The slope looks to be small and negative, and the spline variables are not statistically significant
##   in model m5a, which controls for day-of-the-week _before_ looking for the spline changes, and so 
##   the changes in slope could be zero.

Preliminary analyses on data from the DHSC coronavirus deaths archive

Alan Grafen

2020-05-19

Is the historical record being changed frequently?

Day-of-the-week effects