The Department of Health and Social Security (DHSC) changed its reporting methods for coronavirus deaths on 29th April 2020, including providing on each “publication day” figures for cumulative and new deaths on each previous day. In analyses of the pre-29th April data, I found there were strong day-of-the-week effects, and also that these changed on a particular day. The previous analyses were mainly interested in whether and when the epidemic would shrink rather than grow. Now, the emphasis is on how fast is the epidemic declining, and how many deaths are likely to occur by the end of this first wave.
Before carrying out the main analyses, I made various checks on the data, and these are presented here in this document. They are
Now you know the conclusions, you probably don’t want to know how I reached them. Turn back now!
First, all the data is read in. If repeating this exercise, you can use my function get_data_from_folder if you point it at a directory that contains only the .csv versions of the DHSC death data from 29th April to the present. The code reads it into to a single data-frame, adds ‘day’ for the day of the deaths reported, ‘pubday’ for the day the dataset was published, and then prints out all the cases where data has changed. There is by 10th May only one, which was probably a mistake in transmission of a number (confusing a 6 and a 9 and requiring an adjustment of 30).
get_data_from_folder<-function(dir,cols,text="Select one of the data files",addIDcol=FALSE,IDcolName="filename"){
ordir<-getwd()
if (missing(dir)) {
message("before flush.console\n")
cat("\n")}
# message("before flush.console")
# This crashed R
# flush.console()
# message("after flush.console")
if (missing(dir)) {
a<-file.choose()
dir<-dirname(a)}
files<-list.files(dir,no..=TRUE)
setwd(dir)
# Second read each one in turn, and do the small tasks
for (i in files) {
cat(i)
resin<-read.csv(i)[,cols]
if (addIDcol) {resin[,ncol(resin)+1]<-i; if (is.element(IDcolName,colnames(resin))) stop("IDcolName is already a column name in the datasets"); colnames(resin)[ncol(resin)]<-IDcolName}
if (i!=files[1]) resdf<-rbind(resdf,resin) else resdf<-resin
}
# Third, remove the unneeded original copy of the last file and dataframe, and reset wd
remove(resin)
setwd(ordir)
return(resdf)}
## This line gets the data for me -- it is commented out so if necessary, it will prompt you for the relevant folder,
## which should contain DHSC datafiles from from after the 29th April 2020
deathspost<-get_data_from_folder("~/OneDrive - Nexus365/Coronavirus stuff/Data/deaths/posteq29",addIDcol=TRUE)
## coronavirus-deaths_202004292132.csvcoronavirus-deaths_202004301702.csvcoronavirus-deaths_202005011647.csvcoronavirus-deaths_202005021621.csvcoronavirus-deaths_202005031513.csvcoronavirus-deaths_202005041501.csvcoronavirus-deaths_202005051554.csvcoronavirus-deaths_202005061549.csvcoronavirus-deaths_202005071534.csvcoronavirus-deaths_202005081443.csvcoronavirus-deaths_202005091431.csvcoronavirus-deaths_202005101500.csvcoronavirus-deaths_202005111457.csvcoronavirus-deaths_202005121454.csvcoronavirus-deaths_202005131522.csvcoronavirus-deaths_202005141344.csvcoronavirus-deaths_202005151534.csvcoronavirus-deaths_202005161352.csvcoronavirus-deaths_202005171653.csvcoronavirus-deaths_202005181505.csvcoronavirus-deaths_202005191711.csv
## Adding day and pubday - this converts the dates within the files to days counting from 6th March 2020, and converts
## the filenames into pubdays i.e. publication days. Days are the day on which the deaths were reported, while pubday
## is the the datafile was released. The point of the distinction is that sometimes a figure is revised after first
## publication, and this is seen on later pubdays.
add.days<-function(daf){
repyear<-with(daf,as.numeric(substr(Reporting.date,1,4)))
repmonth<-with(daf,as.numeric(substr(Reporting.date,6,7)))
repday<-with(daf,as.numeric(substr(Reporting.date,9,10)))
day<-repday-5
day<-ifelse(repmonth>=4,day+31,day)
day<-ifelse(repmonth>=5,day+30,day)
day<-ifelse(repmonth>=6,day+31,day)
day<-ifelse(repmonth>=7,day+30,day)
day<-ifelse(repmonth>=8,day+31,day)
daf$day<-day
pyear<-with(daf,as.numeric(substr(filename,20,23)))
pmonth<-with(daf,as.numeric(substr(filename,24,25)))
pday<-with(daf,as.numeric(substr(filename,26,27)))
pubday<-pday-5
pubday<-ifelse(pmonth>=4,pubday+31,pubday)
pubday<-ifelse(pmonth>=5,pubday+30,pubday)
pubday<-ifelse(pmonth>=6,pubday+31,pubday)
pubday<-ifelse(pmonth>=7,pubday+30,pubday)
pubday<-ifelse(pmonth>=8,pubday+31,pubday)
daf$pubday<-pubday
return(daf)
}
deathspost<-add.days(deathspost)
myxt<-xtabs(~day+pubday+Area.name,data=deathspost)
max(myxt)
## [1] 1
min(myxt)
## [1] 0
sum(myxt)
## [1] 6111
dim(deathspost)
## [1] 6111 9
## This should be max=1, min=0.
## This looks for variation in the reports for one day, across different publication days. And then looks
## for the datapoints that show the difference. At the time of writing, there is only one large revision
## in all the datafiles, but of course it persists through all subsequent publication days. There are two
## small changes in Northern Ireland, where it looks like one death was initially misallocated by one day.
myag<-aggregate(deathspost$Daily.change.in.deaths,by=with(deathspost,list(day=day,Area.name=Area.name)),FUN=sd)
myag0<-subset(myag,x!=0)
print(myag0)
## day Area.name x
## 126 65 Northern Ireland 0.4045199
## 127 66 Northern Ireland 0.3162278
## 257 55 United Kingdom 6.5465367
## 267 65 United Kingdom 0.4045199
for (j in (1:nrow(myag0))) print(subset(deathspost,day==myag0$day[j] & Area.name==myag0$Area.name[j]))
## Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2638 Northern Ireland N92000002 Nation 2020-05-09 5
## 2934 Northern Ireland N92000002 Nation 2020-05-09 5
## 3235 Northern Ireland N92000002 Nation 2020-05-09 4
## 3541 Northern Ireland N92000002 Nation 2020-05-09 4
## 3852 Northern Ireland N92000002 Nation 2020-05-09 4
## 4168 Northern Ireland N92000002 Nation 2020-05-09 4
## 4489 Northern Ireland N92000002 Nation 2020-05-09 4
## 4815 Northern Ireland N92000002 Nation 2020-05-09 4
## 5146 Northern Ireland N92000002 Nation 2020-05-09 4
## 5482 Northern Ireland N92000002 Nation 2020-05-09 4
## 5823 Northern Ireland N92000002 Nation 2020-05-09 4
## Cumulative.deaths filename day pubday
## 2638 427 coronavirus-deaths_202005091431.csv 65 65
## 2934 427 coronavirus-deaths_202005101500.csv 65 66
## 3235 426 coronavirus-deaths_202005111457.csv 65 67
## 3541 426 coronavirus-deaths_202005121454.csv 65 68
## 3852 426 coronavirus-deaths_202005131522.csv 65 69
## 4168 426 coronavirus-deaths_202005141344.csv 65 70
## 4489 426 coronavirus-deaths_202005151534.csv 65 71
## 4815 426 coronavirus-deaths_202005161352.csv 65 72
## 5146 426 coronavirus-deaths_202005171653.csv 65 73
## 5482 426 coronavirus-deaths_202005181505.csv 65 74
## 5823 426 coronavirus-deaths_202005191711.csv 65 75
## Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2929 Northern Ireland N92000002 Nation 2020-05-10 3
## 3230 Northern Ireland N92000002 Nation 2020-05-10 4
## 3536 Northern Ireland N92000002 Nation 2020-05-10 4
## 3847 Northern Ireland N92000002 Nation 2020-05-10 4
## 4163 Northern Ireland N92000002 Nation 2020-05-10 4
## 4484 Northern Ireland N92000002 Nation 2020-05-10 4
## 4810 Northern Ireland N92000002 Nation 2020-05-10 4
## 5141 Northern Ireland N92000002 Nation 2020-05-10 4
## 5477 Northern Ireland N92000002 Nation 2020-05-10 4
## 5818 Northern Ireland N92000002 Nation 2020-05-10 4
## Cumulative.deaths filename day pubday
## 2929 430 coronavirus-deaths_202005101500.csv 66 66
## 3230 430 coronavirus-deaths_202005111457.csv 66 67
## 3536 430 coronavirus-deaths_202005121454.csv 66 68
## 3847 430 coronavirus-deaths_202005131522.csv 66 69
## 4163 430 coronavirus-deaths_202005141344.csv 66 70
## 4484 430 coronavirus-deaths_202005151534.csv 66 71
## 4810 430 coronavirus-deaths_202005161352.csv 66 72
## 5141 430 coronavirus-deaths_202005171653.csv 66 73
## 5477 430 coronavirus-deaths_202005181505.csv 66 74
## 5818 430 coronavirus-deaths_202005191711.csv 66 75
## Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 4 United Kingdom K02000001 UK 2020-04-29 765
## 250 United Kingdom K02000001 UK 2020-04-29 795
## 501 United Kingdom K02000001 UK 2020-04-29 795
## 757 United Kingdom K02000001 UK 2020-04-29 795
## 1018 United Kingdom K02000001 UK 2020-04-29 795
## 1284 United Kingdom K02000001 UK 2020-04-29 795
## 1555 United Kingdom K02000001 UK 2020-04-29 795
## 1831 United Kingdom K02000001 UK 2020-04-29 795
## 2112 United Kingdom K02000001 UK 2020-04-29 795
## 2398 United Kingdom K02000001 UK 2020-04-29 795
## 2689 United Kingdom K02000001 UK 2020-04-29 795
## 2985 United Kingdom K02000001 UK 2020-04-29 795
## 3286 United Kingdom K02000001 UK 2020-04-29 795
## 3592 United Kingdom K02000001 UK 2020-04-29 795
## 3903 United Kingdom K02000001 UK 2020-04-29 795
## 4219 United Kingdom K02000001 UK 2020-04-29 795
## 4540 United Kingdom K02000001 UK 2020-04-29 795
## 4866 United Kingdom K02000001 UK 2020-04-29 795
## 5197 United Kingdom K02000001 UK 2020-04-29 795
## 5533 United Kingdom K02000001 UK 2020-04-29 795
## 5874 United Kingdom K02000001 UK 2020-04-29 795
## Cumulative.deaths filename day pubday
## 4 26097 coronavirus-deaths_202004292132.csv 55 55
## 250 26097 coronavirus-deaths_202004301702.csv 55 56
## 501 26097 coronavirus-deaths_202005011647.csv 55 57
## 757 26097 coronavirus-deaths_202005021621.csv 55 58
## 1018 26097 coronavirus-deaths_202005031513.csv 55 59
## 1284 26097 coronavirus-deaths_202005041501.csv 55 60
## 1555 26097 coronavirus-deaths_202005051554.csv 55 61
## 1831 26097 coronavirus-deaths_202005061549.csv 55 62
## 2112 26097 coronavirus-deaths_202005071534.csv 55 63
## 2398 26097 coronavirus-deaths_202005081443.csv 55 64
## 2689 26097 coronavirus-deaths_202005091431.csv 55 65
## 2985 26097 coronavirus-deaths_202005101500.csv 55 66
## 3286 26097 coronavirus-deaths_202005111457.csv 55 67
## 3592 26097 coronavirus-deaths_202005121454.csv 55 68
## 3903 26097 coronavirus-deaths_202005131522.csv 55 69
## 4219 26097 coronavirus-deaths_202005141344.csv 55 70
## 4540 26097 coronavirus-deaths_202005151534.csv 55 71
## 4866 26097 coronavirus-deaths_202005161352.csv 55 72
## 5197 26097 coronavirus-deaths_202005171653.csv 55 73
## 5533 26097 coronavirus-deaths_202005181505.csv 55 74
## 5874 26097 coronavirus-deaths_202005191711.csv 55 75
## Area.name Area.code Area.type Reporting.date Daily.change.in.deaths
## 2639 United Kingdom K02000001 UK 2020-05-09 346
## 2935 United Kingdom K02000001 UK 2020-05-09 346
## 3236 United Kingdom K02000001 UK 2020-05-09 345
## 3542 United Kingdom K02000001 UK 2020-05-09 345
## 3853 United Kingdom K02000001 UK 2020-05-09 345
## 4169 United Kingdom K02000001 UK 2020-05-09 345
## 4490 United Kingdom K02000001 UK 2020-05-09 345
## 4816 United Kingdom K02000001 UK 2020-05-09 345
## 5147 United Kingdom K02000001 UK 2020-05-09 345
## 5483 United Kingdom K02000001 UK 2020-05-09 345
## 5824 United Kingdom K02000001 UK 2020-05-09 345
## Cumulative.deaths filename day pubday
## 2639 31587 coronavirus-deaths_202005091431.csv 65 65
## 2935 31587 coronavirus-deaths_202005101500.csv 65 66
## 3236 31586 coronavirus-deaths_202005111457.csv 65 67
## 3542 31586 coronavirus-deaths_202005121454.csv 65 68
## 3853 31586 coronavirus-deaths_202005131522.csv 65 69
## 4169 31586 coronavirus-deaths_202005141344.csv 65 70
## 4490 31586 coronavirus-deaths_202005151534.csv 65 71
## 4816 31586 coronavirus-deaths_202005161352.csv 65 72
## 5147 31586 coronavirus-deaths_202005171653.csv 65 73
## 5483 31586 coronavirus-deaths_202005181505.csv 65 74
## 5824 31586 coronavirus-deaths_202005191711.csv 65 75
#subset(deathspost,day==55 & Area.name=="United Kingdom")
This is a low enough rate of correction that it is fine to analyse just the most recent day’s data.
In this section, I fit quasi-Poisson log-linear models to predict the number of deaths from
lastf<-deathspost$filename[nrow(deathspost)]
dft<-subset(deathspost,filename==lastf)
dft$filename<-NULL
dft<-subset(dft,Area.type=="UK")
Deaths<-dft$Daily.change.in.deaths[nrow(dft):1]
## We now have all the data we are carrying forward in Deaths
ndays<-100
downames<-c("1Sunday","2Monday","3Tuesday","4Wednesday","5Thursday","6Friday","7Saturday")[c(6:7,1:5)]
dow<-factor(rep(downames,length=ndays))
t<-c(1:ndays)
week<-rep(1:100,each=7,length=ndays)
weeknames<-c("6 Mar-","13 Mar-","20 Mar-","27 Mar-","3 Apr-","10 Apr-","17 Apr-","24 Apr-","1 May-","8 May-", "15 May-")
fulldf<-data.frame(t=t,dow=dow,week=week)
df<-fulldf[1:length(Deaths),]
df$Deaths<-Deaths
df$wts<-1
rm(Deaths,t,dow,week)
## This fits a log-linear model, allowing splines, that is, the straight line for time is allowed to change
## direction at each of the "breaklocations". It also uses quasi-Poisson not Poisson, and allows
## some variables to be added *after* the spline variables. This allows us to test for a variable,
## controlling for all the spline variables. The model terms to go in after the spline variables go in "model2"
fap4.pq<-function(model,model2,from=1,to=nrow(df),breaklocations,outp,control){
dftemp<-df[c(from:to),]
if (!missing(control)) mycontrol<-control else mycontrol<-glm.control()
m<-glm(model,family=quasipoisson,weights=wts,data=dftemp, control=mycontrol)
if (missing(breaklocations)) stop("fap4.p requires breaklocations to be specified")
modeldev<-"~."
for (ix in 1:length(breaklocations)) {
dftemp[,ncol(dftemp)+1]<-with(dftemp,ifelse(t>=breaklocations[ix],(t-breaklocations[ix]),0))
varname<-paste0("breakvar",as.character(ix))
colnames(dftemp)[ncol(dftemp)]<-varname
modeldev<-paste0(modeldev,"+",varname)
}
if (!missing(model2)) modeldev<-paste0(modeldev,"+",model2)
m<-update(m,modeldev)
m$st.res<-(m$y - m$fitted.values)/sqrt(m$fitted.values)
if (!missing(outp)) if(outp) {
print(anova(m,test="Chisq"))
print(summary(m))
plot(dftemp$t,m$st.res)
}
return(m)
}
## First, we ask how many breaks are need to adequately represent the changes in Death rate. Up until 14th May,
## weekly splines were significant, but the p-value on that day became greater than 0.05. It's clear that
## there is change over the period, as the epidemic has grown and shrunk. However, the non-significant later
## kinks in the spline are outweighing the highly significant early kinks. After the kink-frequency ANOVA table,
## I'll show the significance of the individual kinks in the order fitted (with the ANOVA table) and then on
## a fully-adjusted basis (with the coefficients table in the summary).
##
mf3.5<-fap4.pq(Deaths~dow+t,breaklocations=seq(3.5,nrow(df)-2,by=3.5))
mf7<-fap4.pq(Deaths~dow+t,breaklocations=seq(7,nrow(df)-2,by=7))
mf14<-fap4.pq(Deaths~dow+t,breaklocations=seq(14,nrow(df)-2,by=14))
anova(mf14,mf7,mf3.5,test="F")
## Analysis of Deviance Table
##
## Model 1: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 +
## breakvar5
## Model 2: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 +
## breakvar5 + breakvar6 + breakvar7 + breakvar8 + breakvar9 +
## breakvar10
## Model 3: Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 + breakvar4 +
## breakvar5 + breakvar6 + breakvar7 + breakvar8 + breakvar9 +
## breakvar10 + breakvar11 + breakvar12 + breakvar13 + breakvar14 +
## breakvar15 + breakvar16 + breakvar17 + breakvar18 + breakvar19 +
## breakvar20
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 62 904.09
## 2 57 755.44 5 148.65 2.2523 0.06447 .
## 3 47 627.69 10 127.75 0.9678 0.48334
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mf7,test="F")
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 74 24887.1
## dow 6 2239.7 68 22647.4 28.6334 1.629e-15 ***
## t 1 4620.1 67 18027.3 354.3983 < 2.2e-16 ***
## breakvar1 1 3088.2 66 14939.1 236.8872 < 2.2e-16 ***
## breakvar2 1 4552.0 65 10387.1 349.1762 < 2.2e-16 ***
## breakvar3 1 5158.7 64 5228.4 395.7141 < 2.2e-16 ***
## breakvar4 1 3488.1 63 1740.3 267.5651 < 2.2e-16 ***
## breakvar5 1 908.0 62 832.3 69.6477 1.839e-11 ***
## breakvar6 1 49.2 61 783.2 3.7713 0.05709 .
## breakvar7 1 22.5 60 760.6 1.7266 0.19411
## breakvar8 1 0.5 59 760.1 0.0393 0.84360
## breakvar9 1 2.8 58 757.3 0.2138 0.64554
## breakvar10 1 1.9 57 755.4 0.1465 0.70332
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mf7)
##
## Call:
## glm(formula = Deaths ~ dow + t + breakvar1 + breakvar2 + breakvar3 +
## breakvar4 + breakvar5 + breakvar6 + breakvar7 + breakvar8 +
## breakvar9 + breakvar10, family = quasipoisson, data = dftemp,
## weights = wts, control = mycontrol)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.5686 -2.1900 0.3001 1.8532 7.2637
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.722754 3.131334 -0.550 0.58436
## dow2Monday 0.013376 0.088887 0.150 0.88092
## dow3Tuesday 0.678948 0.077380 8.774 3.62e-12 ***
## dow4Wednesday 0.558755 0.079885 6.994 3.25e-09 ***
## dow5Thursday 0.502663 0.080354 6.256 5.47e-08 ***
## dow6Friday 0.592977 0.078916 7.514 4.42e-10 ***
## dow7Saturday 0.499876 0.080175 6.235 5.92e-08 ***
## t 0.427011 0.484999 0.880 0.38232
## breakvar1 -0.167643 0.546768 -0.307 0.76026
## breakvar2 -0.006654 0.147765 -0.045 0.96424
## breakvar3 -0.087883 0.061984 -1.418 0.16168
## breakvar4 -0.100101 0.032792 -3.053 0.00344 **
## breakvar5 -0.080800 0.024523 -3.295 0.00170 **
## breakvar6 0.001683 0.023803 0.071 0.94389
## breakvar7 -0.018736 0.025263 -0.742 0.46135
## breakvar8 -0.003831 0.028044 -0.137 0.89183
## breakvar9 0.003205 0.032304 0.099 0.92131
## breakvar10 0.019278 0.050343 0.383 0.70319
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 13.03647)
##
## Null deviance: 24887.07 on 74 degrees of freedom
## Residual deviance: 755.44 on 57 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
## This pattern is consistent with a big change in slope over weeks 4 and 5. Then, the sequential tests would
## show significance for all the early kinks, because the early kink allows a steep slope before it, and
## a lower slope afterwards, even if the pattern afterwards is up then down. Once we have the kinks where
## there really are changes, none of others are called for. It is worth them all in in for now, I
## think. This is because we make sure that our inferences are robust against the possibility of changes in
## the slope, even if they're non-significant. After a while, it may be sensible to mix the periods, and have
## more frequent kinks in the period of rapid change, and less frequent kinks elsewhere. The problem with a
## rapid response is that it's unsatisfactory to make inevitably data-dependent choices of dropping variables,
## which will have unknown effects on the various tests we're interested in carrying out. So for now, I'll
## keep all the weekly kinks, while I think about a good way to remove unwanted parameters. There is also the
## problem of orthogonality of the slopes on time (t plus various breakpoint variables) with the parameters of
## dow, making it attractive to keep changes in slope to the same day of the week.
## Just to zoom in on where the changes happen, here are the days of the breakpoints
seq(7,nrow(df)-2,by=7)
## [1] 7 14 21 28 35 42 49 56 63 70
## So, fap4.pq makes the change occur on the day of the breakpoint, and the kink has happened on the day after.
## So changes that come into force on days 29 and 36, which are the 12th and 19th April. The UK lockdown
## was imposed in a speech on 23rd March, and so in effect came into force on 24th March. That is 2 weeks and
## 5 days before we see the first effect, with the current set of breakpoints. That is about the likely time
## for death rates to take to change in response to behaviour. We would need a slightly different set of
## analyses to ask whether there was any evidence of the previous "advice" having an effect, and to ask
## how long the change in rate went on for, before the new rate stabilised. I may do that soon, but for now
## return to the "preliminary" task of looking at the data from day 44 on.
## In the pre-29 April 2020 data, there was a clear change in the day-of-the-week effects, when the reporting
## scheme changed. Here, I check for whether these effects change in the post-29 April 2020 data. With no
## particular place to look, I look once a week.
##
## So, you're about to see an analysis for the day-of-the-week effects changing on each day in the sequence 7,
## 14. 21... The analyses are performed, and the loop stores the day of the break (bp), the p-value for
## the interaction stept(t,bp):dow, and the deviance drop associated with that term.
##
daya<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowpa<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowdda<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp in seq(7,nrow(df)-2,by=7)) {
keepday<-keepday+1
m1<-fap4.pq(Deaths~t+dow,model2="stept(t,bp)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
daya[keepday]<-bp; myan1<-anova(m1,test="F")
tdowpa[keepday]<-myan1[["dow:stept(t, bp)","Pr(>F)"]]
tdowdda[keepday]<-myan1[["dow:stept(t, bp)","Deviance"]]}
print(cbind(daya,tdowpa,tdowdda))
## daya tdowpa tdowdda
## [1,] 7 9.918231e-01 7.06580
## [2,] 14 8.686297e-01 34.89396
## [3,] 21 9.500604e-01 22.96617
## [4,] 28 7.518096e-02 147.72760
## [5,] 35 5.534445e-02 156.24339
## [6,] 42 3.603667e-05 323.18326
## [7,] 49 4.229035e-04 278.03592
## [8,] 56 3.643366e-02 169.14059
## [9,] 63 1.513385e-01 122.98038
## [10,] 70 8.296277e-02 86.19991
## There are places where a change looks likely, from the p-values. But we have looked at quite a few p-values.
## What about just a general trend in the day-of-the-week effects?
m2<-fap4.pq(Deaths~t+dow,model2="t*dow",breaklocations=seq(7,nrow(df)-2,by=7))
anova(m2,test="F")
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 74 24887.1
## t 1 4543.4 73 20343.7 531.1761 < 2.2e-16 ***
## dow 6 2316.4 67 18027.3 45.1361 < 2.2e-16 ***
## breakvar1 1 3088.2 66 14939.1 361.0457 < 2.2e-16 ***
## breakvar2 1 4552.0 65 10387.1 532.1883 < 2.2e-16 ***
## breakvar3 1 5158.7 64 5228.4 603.1178 < 2.2e-16 ***
## breakvar4 1 3488.1 63 1740.3 407.8026 < 2.2e-16 ***
## breakvar5 1 908.0 62 832.3 106.1519 4.616e-14 ***
## breakvar6 1 49.2 61 783.2 5.7480 0.02021 *
## breakvar7 1 22.5 60 760.6 2.6315 0.11093
## breakvar8 1 0.5 59 760.1 0.0599 0.80768
## breakvar9 1 2.8 58 757.3 0.3259 0.57059
## breakvar10 1 1.9 57 755.4 0.2233 0.63855
## t:dow 6 305.3 51 450.2 5.9485 9.165e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Yes!! But, actually not as big a deviance drop as the sudden one.
## To find the "best" breakpoint by day, we do each day in the area where the weekly checks showed some interest
daya<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowpa<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
tdowdda<-array(NA,dim=length(seq(7,nrow(df)-2,by=7)))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp in seq(36,48)) {
keepday<-keepday+1
m1<-fap4.pq(Deaths~t+dow,model2="stept(t,bp)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
daya[keepday]<-bp; myan1<-anova(m1,test="F")
tdowpa[keepday]<-myan1[["dow:stept(t, bp)","Pr(>F)"]]
tdowdda[keepday]<-myan1[["dow:stept(t, bp)","Deviance"]]}
print(cbind(daya,tdowpa,tdowdda))
## daya tdowpa tdowdda
## [1,] 36 6.088055e-02 152.2619
## [2,] 37 1.971647e-02 179.6369
## [3,] 38 8.043819e-03 209.7833
## [4,] 39 8.201195e-04 251.2190
## [5,] 40 6.965331e-05 314.7435
## [6,] 41 3.069034e-05 314.1271
## [7,] 42 3.603667e-05 323.1833
## [8,] 43 4.842176e-06 339.6750
## [9,] 44 8.973474e-05 308.3149
## [10,] 45 2.232488e-04 292.9550
## [11,] 46 6.151372e-05 318.0819
## [12,] 47 6.320745e-04 259.1012
## [13,] 48 3.814862e-04 280.0304
### So the biggest drops are on days 40, 41, 42, 43 (43 is top by a bit). They are in Easter week.
###
###
## We now ask whether the best breakpoint for dow effects removes the t:dow interaction, in a still smaller
## range. We do this by seeing whether the time trend in day-of-the-week effects disappears when the sudden
## change is controlled for.
##
## Note that we are fitting splines which we hope take account of the slowly changing progress of the epidemic
## well enough that we can test for day-of-the-week effects.
##
## The purpose of this series of analyses is that a single sudden change would show no general time trend if the
## breakpoint tested was the actual day of the change. But if you test with a nearby breakpoint, then the time
## trend will appear, because the sudden change gives the appearance of a trend in the half of the data the real
## breakpoint appears in. Thus, finding no general time trend (t:dow) is what we'd expect at the true breakpoint,
## and we'd expect to see general time trends at the breakpoints either side. On 10th May, the t:dow term is
## statistically significant, though only at 0.03 which, given the number of tests we are doing, is not very
## significant. However, I need to look out for evidence that the day of the week effects are continuing
## to change, so I will watch this space.
##
daya<-array(NA,dim=length(c(39:44)))
tdowpa3<-array(NA,dim=length(daya))
dowsteptpa3<-array(NA,dim=length(daya))
tdowpa3a<-array(NA,dim=length(daya))
dowsteptpa3a<-array(NA,dim=length(daya))
keepday<-0
stept<-function(t,breakp){ ifelse(t<=breakp,0,1)}
for (bp2 in 39:44) {
keepday<-keepday+1
#print(paste("bp2 breakpoint for the step change =",bp2))
m3<-fap4.pq(Deaths~t+dow,model2="t*dow+stept(t,bp2)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
m3a<-fap4.pq(Deaths~t+dow,model2="stept(t,bp2)*dow+t*dow",breaklocations=seq(7,nrow(df)-2,by=7))
myan3<-anova(m3,test="F")
myan3a<-anova(m3a,test="F")
daya[keepday]<-bp2
tdowpa3[keepday]<-myan3["t:dow","Pr(>F)"]
dowsteptpa3[keepday]<-myan3["dow:stept(t, bp2)","Pr(>F)"]
tdowpa3a[keepday]<-myan3a["t:dow","Pr(>F)"]
dowsteptpa3a[keepday]<-myan3a["dow:stept(t, bp2)","Pr(>F)"]
}
print(cbind(daya,tdowpa3,dowsteptpa3,tdowpa3a,dowsteptpa3a))
## daya tdowpa3 dowsteptpa3 tdowpa3a dowsteptpa3a
## [1,] 39 3.036495e-05 0.03183323 0.01286501 1.824297e-04
## [2,] 40 2.526584e-05 0.01750210 0.02565121 1.863205e-05
## [3,] 41 3.386411e-05 0.11919666 0.16577104 2.559897e-05
## [4,] 42 2.290734e-05 0.02112048 0.04369465 1.287349e-05
## [5,] 43 1.009671e-05 0.02529283 0.12103626 3.209837e-06
## [6,] 44 7.718275e-05 0.18901913 0.20970406 7.044746e-05
## These show that when the time trend is earlier in the model (columns 2 and 3), the step change is highly
## significant on day 43, but the time trend itself remains highly significant. Whereas, when the step change
## is earlier (columns 4 and 5), it is most significant on day 43, and on that day the time trend is barely
## significant. Thus, the most economical conclusion is that there is a step change on day 43.
## The conclusion on day-of-the-week effects is that on day 43, there was a change to the way the weeks worked, and
## it involves a step change in number of deaths as well as change in dow effects. For marginality reasons, the
## main effect change would be not unexpected.
## The p-value of the time-trend interaction with dow in column 4 was down to 0.01 on 12th May, and so I've
## decided to do some further checking. This t:dow term actually looks for where the difference between
## pre-day 44 plus day 44 up to the break is different from after the break. I've been dropping the pre-
## day 44 data, and so the inclusion of that isn't really relevant to the main analyses. Thus, although
## this may need revisiting, I'm accepting the step change at day 43, and presuming that the new
## significance is down to changes after that day.
##
##
##
## Thus, I've added the following code, to look for a second change in day of the week effects, just in the
## data after day 44, so the break at day 43 is already done subsetting the data, and need not be included
## in the models. The significance of 0.01 for t:dow in the model we've just seen may well be reduced in just
## the post-day-44 data, as the "weight" of the changes in days 1 to 43 is no longer present. These next
## analyses look for such significance, and also check whether data at the beginning (first loop) or end
## (second loop) of the period are creating the call for the t:dow interaction (if any).
##
for (begday in c(44:48)) {
print(begday)
print(anova(
fap4.pq(Deaths~t+dow,model2="t*dow",from=begday,to=nrow(df),breaklocations=seq(49,nrow(df)-2,by=7)),test="F"))
}
## [1] 44
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 31 3801.1
## t 1 1558.46 30 2242.7 211.8428 7.577e-10 ***
## dow 6 2054.00 24 188.7 46.5339 1.871e-08 ***
## breakvar1 1 12.39 23 176.3 1.6847 0.2153
## breakvar2 1 1.60 22 174.7 0.2175 0.6482
## breakvar3 1 4.86 21 169.8 0.6606 0.4300
## breakvar4 1 2.30 20 167.5 0.3133 0.5845
## t:dow 6 62.57 14 105.0 1.4175 0.2756
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 45
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 30 3391.6
## t 1 1202.41 29 2189.2 152.2316 1.496e-08 ***
## dow 6 2012.17 23 177.0 42.4587 8.443e-08 ***
## breakvar1 1 2.12 22 174.9 0.2685 0.6131
## breakvar2 1 1.02 21 173.9 0.1293 0.7250
## breakvar3 1 4.57 20 169.3 0.5781 0.4606
## breakvar4 1 2.19 19 167.1 0.2773 0.6074
## t:dow 6 62.37 13 104.8 1.3161 0.3172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 46
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 29 3384.4
## t 1 1386.49 28 1997.9 175.5188 1.594e-08 ***
## dow 6 1820.86 22 177.0 38.4179 3.769e-07 ***
## breakvar1 1 4.36 21 172.6 0.5525 0.4716
## breakvar2 1 1.91 20 170.7 0.2420 0.6316
## breakvar3 1 5.17 19 165.6 0.6548 0.4342
## breakvar4 1 2.45 18 163.1 0.3103 0.5877
## t:dow 6 66.72 12 96.4 1.4077 0.2886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 47
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 28 3384.3
## t 1 1539.61 27 1844.7 180.8760 3.575e-08 ***
## dow 6 1695.00 21 149.7 33.1886 2.003e-06 ***
## breakvar1 1 2.70 20 147.0 0.3169 0.5848
## breakvar2 1 0.14 19 146.9 0.0162 0.9009
## breakvar3 1 4.00 18 142.9 0.4695 0.5074
## breakvar4 1 2.16 17 140.7 0.2537 0.6244
## t:dow 6 45.50 11 95.2 0.8909 0.5334
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 48
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 27 2856.99
## t 1 1102.14 26 1754.85 127.8772 5.096e-07 ***
## dow 6 1605.17 20 149.68 31.0403 6.616e-06 ***
## breakvar1 1 17.75 19 131.94 2.0591 0.1818
## breakvar2 1 0.52 18 131.42 0.0603 0.8109
## breakvar3 1 3.79 17 127.62 0.4402 0.5220
## breakvar4 1 3.38 16 124.25 0.3917 0.5454
## t:dow 6 36.18 10 88.06 0.6997 0.6568
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
for (endday in c(65:nrow(df))) {
print(endday)
print(anova(
fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=endday,breaklocations=seq(49,nrow(df)-2,by=7)),test="F"))
}
## [1] 65
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 21 2004.42
## t 1 486.93 20 1517.49 114.9031 0.0001224 ***
## dow 6 1402.17 14 115.32 55.1456 0.0002084 ***
## breakvar1 1 8.25 13 107.07 1.9458 0.2218429
## breakvar2 1 13.31 12 93.77 3.1396 0.1366119
## breakvar3 1 31.30 11 62.47 7.3856 0.0418926 *
## breakvar4 0 0.00 11 62.47
## t:dow 6 41.05 5 21.42 1.6143 0.3078742
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 66
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 22 2303.20
## t 1 694.59 21 1608.60 163.9420 1.394e-05 ***
## dow 6 1490.49 15 118.11 58.6325 4.598e-05 ***
## breakvar1 1 8.90 14 109.21 2.1009 0.1974
## breakvar2 1 9.56 13 99.64 2.2571 0.1837
## breakvar3 1 6.92 12 92.73 1.6328 0.2485
## breakvar4 0 0.00 12 92.73
## t:dow 6 67.06 6 25.66 2.6382 0.1315
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 67
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 23 2697.61
## t 1 965.20 22 1732.41 249.6025 9.863e-07 ***
## dow 6 1612.64 16 119.77 69.5052 6.971e-06 ***
## breakvar1 1 7.98 15 111.79 2.0648 0.19388
## breakvar2 1 11.60 14 100.19 3.0002 0.12686
## breakvar3 1 6.32 13 93.87 1.6346 0.24182
## breakvar4 0 0.00 13 93.87
## t:dow 6 66.44 7 27.42 2.8637 0.09738 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 68
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 24 2697.66
## t 1 857.80 23 1839.86 244.6436 2.785e-07 ***
## dow 6 1710.48 17 129.39 81.3042 9.905e-07 ***
## breakvar1 1 11.84 16 117.55 3.3763 0.10344
## breakvar2 1 5.03 15 112.52 1.4352 0.26521
## breakvar3 1 0.76 14 111.76 0.2161 0.65438
## breakvar4 0 0.00 14 111.76
## t:dow 6 83.42 8 28.34 3.9654 0.03846 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 69
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 25 2729.32
## t 1 872.73 24 1856.60 195.7758 2.061e-07 ***
## dow 6 1724.61 18 131.99 64.4791 6.935e-07 ***
## breakvar1 1 13.83 17 118.15 3.1034 0.11197
## breakvar2 1 3.63 16 114.53 0.8136 0.39056
## breakvar3 1 2.27 15 112.25 0.5096 0.49340
## breakvar4 0 0.00 15 112.25
## t:dow 6 71.76 9 40.49 2.6830 0.08889 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 70
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 26 2798.25
## t 1 940.11 25 1858.15 224.9353 3.501e-08 ***
## dow 6 1723.33 19 134.81 68.7224 1.502e-07 ***
## breakvar1 1 16.12 18 118.69 3.8579 0.07790 .
## breakvar2 1 2.73 17 115.96 0.6532 0.43779
## breakvar3 1 3.58 16 112.38 0.8561 0.37663
## breakvar4 0 0.00 16 112.38
## t:dow 6 70.22 10 42.16 2.8002 0.07253 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 71
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 27 2898.96
## t 1 1040.67 26 1858.29 248.9962 2.147e-08 ***
## dow 6 1707.75 20 150.55 68.1008 1.569e-07 ***
## breakvar1 1 9.12 19 141.42 2.1829 0.17034
## breakvar2 1 8.12 18 133.31 1.9421 0.19364
## breakvar3 1 0.01 17 133.30 0.0014 0.97125
## breakvar4 1 20.92 16 112.38 5.0055 0.04923 *
## t:dow 6 70.22 10 42.16 2.8002 0.07253 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 72
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 28 2934.42
## t 1 1059.00 27 1875.41 167.1846 5.382e-08 ***
## dow 6 1709.54 21 165.88 44.9807 4.154e-07 ***
## breakvar1 1 9.80 20 156.08 1.5465 0.2395
## breakvar2 1 2.58 19 153.50 0.4070 0.5365
## breakvar3 1 2.60 18 150.90 0.4100 0.5351
## breakvar4 1 1.52 17 149.39 0.2397 0.6341
## t:dow 6 79.44 11 69.94 2.0902 0.1371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 73
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 29 3364.3
## t 1 1357.12 28 2007.2 160.2551 2.659e-08 ***
## dow 6 1837.49 22 169.7 36.1632 5.291e-07 ***
## breakvar1 1 9.16 21 160.6 1.0813 0.3189
## breakvar2 1 3.87 20 156.7 0.4572 0.5117
## breakvar3 1 1.21 19 155.5 0.1428 0.7122
## breakvar4 1 0.15 18 155.3 0.0175 0.8970
## t:dow 6 51.88 12 103.5 1.0209 0.4569
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 74
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 30 3799.3
## t 1 1677.53 29 2121.7 212.6019 1.957e-09 ***
## dow 6 1949.30 23 172.4 41.1741 1.018e-07 ***
## breakvar1 1 8.34 22 164.1 1.0575 0.3225
## breakvar2 1 5.12 21 159.0 0.6488 0.4350
## breakvar3 1 0.44 20 158.5 0.0564 0.8160
## breakvar4 1 1.88 19 156.7 0.2383 0.6336
## t:dow 6 52.17 13 104.5 1.1020 0.4117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 75
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 31 3801.1
## t 1 1558.46 30 2242.7 211.8428 7.577e-10 ***
## dow 6 2054.00 24 188.7 46.5339 1.871e-08 ***
## breakvar1 1 12.39 23 176.3 1.6847 0.2153
## breakvar2 1 1.60 22 174.7 0.2175 0.6482
## breakvar3 1 4.86 21 169.8 0.6606 0.4300
## breakvar4 1 2.30 20 167.5 0.3133 0.5845
## t:dow 6 62.57 14 105.0 1.4175 0.2756
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=45,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=46,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=47,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=68,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=67,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=66,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
#anova(fap4.pq(Deaths~t+dow,model2="t*dow",from=44,to=65,breaklocations=seq(49,nrow(df)-2,by=7)),test="F")
## The conclusion is that the there is one day on which the time-trend interaction is significant, but
## with a p-value of only 0.03 (on 13th May), and the residual DF are small anyway. So, it seems
## reasonable to ignore that time-trend interaction for the moment.
##
The results show we should assume that the day-of-the-week effects change between 17th and 18th April. We now go on to look at a few useful figures from the analysis of post-day-44 data. It appears that the day-of-the-week effects are stronger in the second period. The splines started off showing a significant change in slope, but don’t so in the post-day-44 period. So, there’s no evidence from the splines that the rate of change of deaths with time is anything but linear (on the log scale). That would mean a geometric decrease in deaths with the ratio being exp(7*(the slope of t)) for week-to-week comparisons.
## Thus, we are safe to use the data from day 44 onwards, and ignore changes in day-of-the-week effects.
## This allows us to use just a fortnight's data at a time.
## So the best model, keeping the step change but not the smooth change is
m4<-fap4.pq(Deaths~t+dow,model2="stept(t,43)*dow",breaklocations=seq(7,nrow(df)-2,by=7))
anova(m4,test="F")
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 74 24887.1
## t 1 4543.4 73 20343.7 637.9307 < 2.2e-16 ***
## dow 6 2316.4 67 18027.3 54.2075 < 2.2e-16 ***
## breakvar1 1 3088.2 66 14939.1 433.6079 < 2.2e-16 ***
## breakvar2 1 4552.0 65 10387.1 639.1463 < 2.2e-16 ***
## breakvar3 1 5158.7 64 5228.4 724.3312 < 2.2e-16 ***
## breakvar4 1 3488.1 63 1740.3 489.7619 < 2.2e-16 ***
## breakvar5 1 908.0 62 832.3 127.4860 2.312e-15 ***
## breakvar6 1 49.2 61 783.2 6.9032 0.01139 *
## breakvar7 1 22.5 60 760.6 3.1604 0.08153 .
## breakvar8 1 0.5 59 760.1 0.0719 0.78969
## breakvar9 1 2.8 58 757.3 0.3914 0.53441
## breakvar10 1 1.9 57 755.4 0.2682 0.60684
## stept(t, 43) 1 45.5 56 709.9 6.3901 0.01468 *
## dow:stept(t, 43) 6 339.7 50 370.2 7.9489 4.842e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m4)
##
## Call:
## glm(formula = Deaths ~ t + dow + breakvar1 + breakvar2 + breakvar3 +
## breakvar4 + breakvar5 + breakvar6 + breakvar7 + breakvar8 +
## breakvar9 + breakvar10 + stept(t, 43) + dow:stept(t, 43),
## family = quasipoisson, data = dftemp, weights = wts, control = mycontrol)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.7561 -1.5971 0.3722 1.5103 3.7042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.555675 2.299561 -0.677 0.50183
## t 0.425619 0.355944 1.196 0.23744
## dow2Monday 0.058743 0.092029 0.638 0.52618
## dow3Tuesday 0.388743 0.084820 4.583 3.08e-05 ***
## dow4Wednesday 0.371248 0.084148 4.412 5.46e-05 ***
## dow5Thursday 0.382310 0.082982 4.607 2.84e-05 ***
## dow6Friday 0.412347 0.082787 4.981 7.94e-06 ***
## dow7Saturday 0.257993 0.089986 2.867 0.00605 **
## breakvar1 -0.166151 0.401269 -0.414 0.68060
## breakvar2 -0.006643 0.108915 -0.061 0.95161
## breakvar3 -0.088337 0.045777 -1.930 0.05933 .
## breakvar4 -0.098385 0.024310 -4.047 0.00018 ***
## breakvar5 -0.087372 0.018671 -4.679 2.23e-05 ***
## breakvar6 -0.041645 0.023235 -1.792 0.07913 .
## breakvar7 0.039829 0.028127 1.416 0.16297
## breakvar8 -0.016361 0.021065 -0.777 0.44099
## breakvar9 0.006118 0.023683 0.258 0.79721
## breakvar10 0.021100 0.037155 0.568 0.57265
## stept(t, 43) -0.011993 0.138691 -0.086 0.93143
## dow2Monday:stept(t, 43) -0.092518 0.131711 -0.702 0.48567
## dow3Tuesday:stept(t, 43) 0.547147 0.115791 4.725 1.91e-05 ***
## dow4Wednesday:stept(t, 43) 0.391763 0.119450 3.280 0.00190 **
## dow5Thursday:stept(t, 43) 0.263807 0.121419 2.173 0.03457 *
## dow6Friday:stept(t, 43) 0.412798 0.118176 3.493 0.00101 **
## dow7Saturday:stept(t, 43) 0.410305 0.120413 3.407 0.00130 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 7.122038)
##
## Null deviance: 24887.07 on 74 degrees of freedom
## Residual deviance: 370.25 on 50 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
## NB The p-values for stept and stept*dow are a bit bigger than in the previous series, when we now omit
## the t:dow term, as its own p-value was 0.12.
# This shows the before and after values.
coef(m4)[3:8]
## dow2Monday dow3Tuesday dow4Wednesday dow5Thursday dow6Friday
## 0.05874294 0.38874350 0.37124824 0.38230991 0.41234704
## dow7Saturday
## 0.25799283
coef(m4)[(length(coef(m4))-5):length(coef(m4))]
## dow2Monday:stept(t, 43) dow3Tuesday:stept(t, 43)
## -0.09251845 0.54714682
## dow4Wednesday:stept(t, 43) dow5Thursday:stept(t, 43)
## 0.39176297 0.26380748
## dow6Friday:stept(t, 43) dow7Saturday:stept(t, 43)
## 0.41279843 0.41030524
coef(m4)[3:8]+coef(m4)[(length(coef(m4))-5):length(coef(m4))]
## dow2Monday dow3Tuesday dow4Wednesday dow5Thursday dow6Friday
## -0.03377551 0.93589032 0.76301121 0.64611739 0.82514547
## dow7Saturday
## 0.66829807
# The dow effects get stronger, not weaker. Sunday's effect is zero by construction. The stept value
coef(m4)[length(coef(m4))-6]
## stept(t, 43)
## -0.01199331
# is present every day, so
c(7*coef(m4)[length(coef(m4))-6],sum(coef(m4)[(length(coef(m4))-5):length(coef(m4))]))
## stept(t, 43)
## -0.08395318 1.93330249
# are the summed weekly effects of (i) stept (ii) dow-changes
## This new model m5 is from=44, and shows that dow is still important after-eq day 44
m5<-fap4.pq(Deaths~t+dow,from=44,breaklocations=seq(49,nrow(df)-2,by=7))
m5a<-fap4.pq(Deaths~t,model2="dow",from=44,breaklocations=seq(49,nrow(df)-2,by=7))
anova(m5,test="F")
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 31 3801.1
## t 1 1558.46 30 2242.7 191.7934 1.039e-11 ***
## dow 6 2054.00 24 188.7 42.1298 2.581e-10 ***
## breakvar1 1 12.39 23 176.3 1.5253 0.2311
## breakvar2 1 1.60 22 174.7 0.1969 0.6620
## breakvar3 1 4.86 21 169.8 0.5981 0.4484
## breakvar4 1 2.30 20 167.5 0.2837 0.6002
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m5a,test="F")
## Analysis of Deviance Table
##
## Model: quasipoisson, link: log
##
## Response: Deaths
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev F Pr(>F)
## NULL 31 3801.1
## t 1 1558.46 30 2242.7 191.7934 1.039e-11 ***
## breakvar1 1 15.45 29 2227.2 1.9013 0.1832
## breakvar2 1 0.21 28 2227.0 0.0258 0.8740
## breakvar3 1 0.54 27 2226.5 0.0659 0.8000
## breakvar4 1 0.14 26 2226.4 0.0168 0.8982
## dow 6 2058.83 20 167.5 42.2288 2.527e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## And now to get the slopes against day, in the different zones of the splines.
nbps<-length(seq(49,nrow(df)-2,by=7))
myc<-coef(m5)
mycs<-myc[c(2,9:(9+nbps-1))]
cumsum(mycs)
## t breakvar1 breakvar2 breakvar3 breakvar4
## -0.06377520 -0.02280185 -0.03951459 -0.03331947 -0.01217119
## The slope looks to be small and negative, and the spline variables are not statistically significant
## in model m5a, which controls for day-of-the-week _before_ looking for the spline changes, and so
## the changes in slope could be zero.