Fitting the time course of the deaths in the epidemic

Here I describe the statistical modelling behind the predictions about the remaining deaths in the first wave of the UK coronavirus epidemic. The data are available at https://coronavirus.data.gov.uk/archive. The general linear model approach is taken, to fit a log-linear model to the daily number of deaths. The predictors will be time (a continuous variable t, in days, starting from 1), and day-of-the-week (a categorical variable dow, starting from 5Thursday on day 1, which is 6th March). The functions make it easy to find one’s way through fitting the model t+dow, and then adding terms of the following kinds

It is natural to represent the time pattern as splines, and so expect a number of time kinks that match empirically the arbitrary curve of the course of the epidemic. However, if a time step seems powerfully suggested by the data on statistical grounds, then it would be wrong to omit it. The day of the week effects will change when the reporting methods change, which they have done at least twice. This is because the delays in reporting depend on shift patterns and the bureaucratic effects of weekends. The initial model has no “dow kink”, so the initial day of the week effects are assumed to be constant over time. But if the reporting lags drift over time, there could be a kink. And once the figures include all deaths, not just hospital deaths, it may be that the different sources of information have different lags, and the changing balance of sources could result in a time trend in the day of the week effects.

The functions use a “pattern”, which is a specification of four sets of days, the days of time steps, the days of time kinks, the days of dow steps and the days of dow kinks. add.terms takes a pattern, and fits all possible further steps and kinks, one at a time, and shows the “best” in each category. Relevant information is provided for each of the four “best” changes. One can then be chosen, and added to the pattern, and add.terms can be run again. At each stage, and certainly at the end, it is useful to run lsearch, which takes a pattern, and checks all combinations of days that are found by allowing each separate day to remain the same, or to move up or down by one day. lsearch reports which the best model is from all of those possibilities. If the best outcome matches it, then the input pattern is a locally best model by varying the days.

Here, first, is the code defining the functions. It also works out details of changes in the archive record, but the printing has been suppressed. So far, these haven’t suggested doing other than analysing the most recent data. But it would revise our understanding of previous outcomes if the data was revised a lot. To use this code yourself, you will need to get the data in somehow. The simplest is to have all the deaths data from the https://coronavirus.data.gov.uk/archive in directory, and replace my directory as as argument to get_data_from_folder with your own directory. It takes the most recent file in that directory and uses that data for all days. Everything should work after that.

## ============== Here below is the edited version from 2020_06_02 ==========

get_data_from_folder<-function(dir,cols,text="Select one of the data files",addIDcol=FALSE,IDcolName="filename"){
    
ordir<-getwd()

if (missing(dir)) {
    message("before flush.console\n")
    cat("\n")}

# message("before flush.console")

# This crashed R    
# flush.console()

# message("after flush.console")

if (missing(dir)) {
    a<-file.choose()
    dir<-dirname(a)}

files<-list.files(dir,no..=TRUE)

setwd(dir)

# Second read each one in turn, and do the small tasks

for (i in files) {
    cat(i)
    resin<-read.csv(i)[,cols]
    if (addIDcol) {resin[,ncol(resin)+1]<-i; if (is.element(IDcolName,colnames(resin))) stop("IDcolName is already a column name in the datasets"); colnames(resin)[ncol(resin)]<-IDcolName}
    if (i!=files[1]) resdf<-rbind(resdf,resin) else resdf<-resin
}

# Third, remove the unneeded original copy of the last file and dataframe, and reset wd

remove(resin)
setwd(ordir)
return(resdf)}

## This line gets the data for me -- it is commented out so if necessary, it will prompt you for the relevant folder,
##   which should contain DHSC datafiles from from after the 29th April 2020
deathspost<-get_data_from_folder("~/OneDrive - Nexus365/Coronavirus stuff/Data/deaths/posteq29",addIDcol=TRUE)
## coronavirus-deaths_202004292132.csvcoronavirus-deaths_202004301702.csvcoronavirus-deaths_202005011647.csvcoronavirus-deaths_202005021621.csvcoronavirus-deaths_202005031513.csvcoronavirus-deaths_202005041501.csvcoronavirus-deaths_202005051554.csvcoronavirus-deaths_202005061549.csvcoronavirus-deaths_202005071534.csvcoronavirus-deaths_202005081443.csvcoronavirus-deaths_202005091431.csvcoronavirus-deaths_202005101500.csvcoronavirus-deaths_202005111457.csvcoronavirus-deaths_202005121454.csvcoronavirus-deaths_202005131522.csvcoronavirus-deaths_202005141344.csvcoronavirus-deaths_202005151534.csvcoronavirus-deaths_202005161352.csvcoronavirus-deaths_202005171653.csvcoronavirus-deaths_202005181505.csvcoronavirus-deaths_202005191711.csvcoronavirus-deaths_202005201630.csvcoronavirus-deaths_202005211346.csvcoronavirus-deaths_202005221347.csvcoronavirus-deaths_202005231511.csvcoronavirus-deaths_202005241458.csvcoronavirus-deaths_202005251534.csvcoronavirus-deaths_202005261537.csvcoronavirus-deaths_202005271508.csvcoronavirus-deaths_202005281507.csvcoronavirus-deaths_202005291612.csvcoronavirus-deaths_202005301512.csvcoronavirus-deaths_202005311434.csvcoronavirus-deaths_202006011604.csvcoronavirus-deaths_202006021540.csvcoronavirus-deaths_202006031412.csvcoronavirus-deaths_202006041356.csvcoronavirus-deaths_202006051441.csvcoronavirus-deaths_202006061346.csvcoronavirus-deaths_202006071502.csvcoronavirus-deaths_202006081341.csvcoronavirus-deaths_202006091440.csvcoronavirus-deaths_202006101522.csvcoronavirus-deaths_202006111329.csvcoronavirus-deaths_202006121339.csvcoronavirus-deaths_202006131430.csv
## Adding day and pubday - this converts the dates within the files to days counting from 6th March 2020, and converts
##  the filenames into pubdays i.e. publication days. Days are the day on which the deaths were reported, while pubday
##  is the the datafile was released. The point of the distinction is that sometimes a figure is revised after first
##  publication, and this is seen on later pubdays.
add.days<-function(daf){
    
repyear<-with(daf,as.numeric(substr(Reporting.date,1,4)))
repmonth<-with(daf,as.numeric(substr(Reporting.date,6,7)))
repday<-with(daf,as.numeric(substr(Reporting.date,9,10)))
day<-repday-5
day<-ifelse(repmonth>=4,day+31,day)
day<-ifelse(repmonth>=5,day+30,day)
day<-ifelse(repmonth>=6,day+31,day)
day<-ifelse(repmonth>=7,day+30,day)
day<-ifelse(repmonth>=8,day+31,day)
daf$day<-day

pyear<-with(daf,as.numeric(substr(filename,20,23)))
pmonth<-with(daf,as.numeric(substr(filename,24,25)))
pday<-with(daf,as.numeric(substr(filename,26,27)))
pubday<-pday-5
pubday<-ifelse(pmonth>=4,pubday+31,pubday)
pubday<-ifelse(pmonth>=5,pubday+30,pubday)
pubday<-ifelse(pmonth>=6,pubday+31,pubday)
pubday<-ifelse(pmonth>=7,pubday+30,pubday)
pubday<-ifelse(pmonth>=8,pubday+31,pubday)
daf$pubday<-pubday

return(daf)

}

deathspost<-add.days(deathspost)

myxt<-xtabs(~day+pubday+Area.name,data=deathspost)
#max(myxt)
#min(myxt)
#sum(myxt)
#dim(deathspost)
## This should be max=1, min=0.

## This looks for variation in the reports for one day, across different publication days. And then looks
##  for the datapoints that show the difference. At the time of writing, there is only one large revision
##   in all the datafiles, but of course it persists through all subsequent publication days. There are two
##   small changes in Northern Ireland, where it looks like one death was initially misallocated by one day.
myag<-aggregate(deathspost$Daily.change.in.deaths,by=with(deathspost,list(day=day,Area.name=Area.name)),FUN=sd)
myag0<-subset(myag,x!=0)
#print(myag0)

# for (j in (1:nrow(myag0))) print(subset(deathspost,day==myag0$day[j] & Area.name==myag0$Area.name[j]))
#subset(deathspost,day==55 & Area.name=="United Kingdom")

lastf<-deathspost$filename[nrow(deathspost)]
dft<-subset(deathspost,filename==lastf)
dft$filename<-NULL
dft<-subset(dft,Area.type=="UK")
Deaths<-dft$Daily.change.in.deaths[nrow(dft):1]
#######   Deaths<-c(Deaths,412,377)   ## a cheat for when the archive was unavailable
## We now have all the data we are carrying forward in Deaths
##  but we also indicate first and last dates of the data
datafirstdate<-"2020-03-06"
datalastdate<-paste0(substr(lastf,20,23),"-",substr(lastf,24,25),"-",substr(lastf,26,27))
#######   datalastdate<-"2020-05-27"   ## a cheat for when the archive was unavailable

ndays<-length(Deaths)+100
downames<-c("1Sunday","2Monday","3Tuesday","4Wednesday","5Thursday","6Friday","7Saturday")
dow<-factor(rep(downames[c(6:7,1:5)],length=ndays))
dowSun<-ifelse(dow==downames[1],1,ifelse(dow==downames[7],-1,0))
dowMon<-ifelse(dow==downames[2],1,ifelse(dow==downames[7],-1,0))
dowTue<-ifelse(dow==downames[3],1,ifelse(dow==downames[7],-1,0))
dowWed<-ifelse(dow==downames[4],1,ifelse(dow==downames[7],-1,0))
dowThu<-ifelse(dow==downames[5],1,ifelse(dow==downames[7],-1,0))
dowFri<-ifelse(dow==downames[6],1,ifelse(dow==downames[7],-1,0))
dowdummies<-"(dowSun+dowMon+dowTue+dowWed+dowThu+dowFri)"

downames<-c("1Sunday","2Monday","3Tuesday","4Wednesday","5Thursday","6Friday","7Saturday")
dowdummies<-"(dowSun+dowMon+dowTue+dowWed+dowThu+dowFri)"
weeknames<-c("6 Mar-","13 Mar-","20 Mar-","27 Mar-","3 Apr-","10 Apr-","17 Apr-","24 Apr-","1 May-","8 May-", "15 May-","22-May","29-May")


makedf<-function(ndays){

    dow<-factor(rep(downames[c(6:7,1:5)],length=ndays))
    dowSun<-ifelse(dow==downames[1],1,ifelse(dow==downames[7],-1,0))
    dowMon<-ifelse(dow==downames[2],1,ifelse(dow==downames[7],-1,0))
    dowTue<-ifelse(dow==downames[3],1,ifelse(dow==downames[7],-1,0))
    dowWed<-ifelse(dow==downames[4],1,ifelse(dow==downames[7],-1,0))
    dowThu<-ifelse(dow==downames[5],1,ifelse(dow==downames[7],-1,0))
    dowFri<-ifelse(dow==downames[6],1,ifelse(dow==downames[7],-1,0))

    contrasts(dow)<-contr.sum(7)
    t<-c(1:ndays)
    week<-rep(1:100,each=7,length=ndays)

    fulldf<-data.frame(t=t,dow=dow,week=week,wts=1,dowSun=dowSun,dowMon=dowMon,
                        dowTue=dowTue,dowWed=dowWed,dowThu=dowThu,dowFri=dowFri)
    
    return(fulldf)
}

df<-makedf(length(Deaths))
df$Deaths<-Deaths

rm(Deaths,dow,dowSun,dowMon,dowTue,dowWed,dowThu,dowFri)

## Now prepare for fitpat
nulpat<-list(t.steps=c(),t.kinks=c(),dow.steps=c(),dow.kinks=c())

mytstep<-function(t,where,dir) if (dir>=0) ifelse(t-where>=0,1,0) else ifelse(t-where<0,1,0) 
mytkink<-function(t,where,dir){ ifelse(dir*(t-where)>=0,(t-where),0) }

masch<-function(i){if (i>=0) return(as.character(i)) else return(paste0("m",as.character(-i)))}

## This used to be in fitpat, but I need to run it for predictions with indf=fulldf
##  I am suspending the backwards calculation for dow.kinks, in light of marginality and
##  the base model not having dow:t in it. 2020_06_10
expatdf<-function(indf,pat,breakdir=1){
    modeldev<-"~."
# Now t.steps
    for (ix in seq_along(pat$t.steps)) {
        indf[,ncol(indf)+1]<-mytstep(indf$t,pat$t.steps[ix],breakdir)
        backpart<-if (breakdir>0) "" else "back."
        varname<-paste0(backpart,"tstep",masch(pat$t.steps[ix]))
        colnames(indf)[ncol(indf)]<-varname
    modeldev<-paste0(modeldev,"+",varname)
    }
# Now t.kinks
    for (ix in seq_along(pat$t.kinks)) {
        indf[,ncol(indf)+1]<-mytkink(indf$t,pat$t.kinks[ix],breakdir) 
        backpart<-if (breakdir>0) "" else "back."
        varname<-paste0(backpart,"tkink",masch(pat$t.kinks[ix]))
        colnames(indf)[ncol(indf)]<-varname
    modeldev<-paste0(modeldev,"+",varname)
    }
# Now dow.steps
    for (ix in seq_along(pat$dow.steps)) {
    indf[,ncol(indf)+1]<-mytstep(indf$t,pat$dow.steps[ix],breakdir)
    backpart<-if (breakdir>0) "" else "back."
    varname<-paste0(backpart,"dowstep",masch(pat$dow.steps[ix]))
    colnames(indf)[ncol(indf)]<-varname
    modeldev<-paste0(modeldev,"+",varname,":",dowdummies)
    }
# Now dow.kinks
    for (ix in seq_along(pat$dow.kinks)) {
    indf[,ncol(indf)+1]<-mytkink(indf$t,pat$dow.kinks[ix],1)  ## The last "1" was breakdir until 2020_06_10
    backpart<-if (breakdir>0) "" else "back."
    varname<-paste0(backpart,"dowkink",masch(pat$dow.kinks[ix]))
    colnames(indf)[ncol(indf)]<-varname
    modeldev<-paste0(modeldev,"+t:",varname,":",dowdummies)
    }
    return(list(df=indf,modeldev=modeldev))
}


## Fits a pattern, and does some other things if required...
fitpat<-function(model,model2,from=1,to=nrow(df),pat=nulpat,breakdir=1,outp){
#   print(breakdir)
    dftemp<-df[c(from:to),]
#   if (!missing(control)) mycontrol<-control else mycontrol<-glm.control()
    m<-glm(model,family=quasipoisson,weights=wts,data=dftemp) 
    initrd<-m$deviance
    initrdf<-m$df.residual
##  if (missing(breaklocations)) stop("fap4.p requires breaklocations to be specified") -- Delete soon!

    epd<-expatdf(dftemp,pat,breakdir)
    
    dftemp<-epd$df
    modeldev<-epd$modeldev
        
    if (!missing(model2)) modeldev<-paste0(modeldev,"+",model2)
    if (length(unlist(pat))>0) m<-update(m,modeldev)
    m$st.res<-(m$y - m$fitted.values)/sqrt(m$fitted.values)
    if (!missing(outp)) if(outp) {
        print(anova(m,test="F"))
        print(summary(m))
        plot(dftemp$t,m$st.res)
        }
    return(list(m=m,dftemp=dftemp))  
}


###  add.term takes a pattern, and adds each possible new time step, time kink, dow step and dow kink, one at a time,
###   and presents the best of each. "Each possible" is provided as a parameter patpos, and excludes values too close
###   to existing values (in patav). breakdir determines whether the time breaks are fitted forwards, so the coefficient
###   of t is the initial slope of the epidemic (breakdir>0), or backwards, when that coefficient is the final slope 
###   (breakdir<0). The output $inpat gives the pattern that was provided as a parameter, which is handy for checking
###   with lsearch, or for storing, when the decision is not to add any more days before checking.
add.term<-function(pat,patpos=patpos0,patav=patav0,breakdir=1) {
smear<-function(i,j) {c((i-j):(i+j))}


for (ip in ipvec) {
    for (it in seq_along(pat[[ip]])) {
        givenplus<-unique(as.vector(vapply(pat[[ip]],FUN=function(x) smear(x,patav[[ip]]),FUN.VALUE=rep(3,2*patav[[ip]]+1))))
        patpos[[ip]]<-setdiff(patpos[[ip]],givenplus)
    }
}

patout<-nulpat
for (ip in ipvec) {
patout[[ip]]$day<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$pvalue<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$dev.drop<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$res.dev<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$df.drop<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$res.df<-array(NA,dim=length(patpos[[ip]]))
patout[[ip]]$mdd<-array(NA,dim=length(patpos[[ip]]))
}


#print("first glm about to happen")
fpbase<-fitpat(Deaths~t+dow, pat=pat, breakdir=breakdir)
#print("first glm just gone")
dftemp<-fpbase$dftemp


for (ip in ipvec) {
keepday<-0
for (bp in patpos[[ip]]) {
#print(paste(ip,bp,keepday))
  keepday<-keepday+1
  if (ip=="t.steps") {newvar<-mytstep(dftemp$t,bp,breakdir); newmod<-"~.+newvar"}
  if (ip=="t.kinks") {newvar<-mytkink(dftemp$t,bp,breakdir);newmod<-"~.+newvar"}
  if (ip=="dow.steps") {newvar<-mytstep(dftemp$t,bp,breakdir);newmod<-paste0("~.+newvar:",dowdummies)}
  if (ip=="dow.kinks") {newvar<-mytkink(dftemp$t,bp,1);newmod<-paste0("~.+newvar:t:",dowdummies)}
## ## The last "1" was breakdir until 2020_06_10 -- because of marginality, dow.kinks is different
  newm<-update(fpbase$m,newmod)
    patout[[ip]]$day[keepday]<-bp
    patout[[ip]]$res.dev[keepday]<-newm$deviance
    patout[[ip]]$res.df[keepday]<-newm$df.residual
    patout[[ip]]$dev.drop[keepday]<-fpbase$m$deviance - newm$deviance
    patout[[ip]]$df.drop[keepday]<-fpbase$m$df.residual-newm$df.residual
    num<-patout[[ip]]$dev.drop[keepday]/patout[[ip]]$df.drop[keepday]
    den<-patout[[ip]]$res.dev[keepday]/patout[[ip]]$res.df[keepday]
    patout[[ip]]$pvalue[keepday]<-1-pf(num/den,patout[[ip]]$df.drop[keepday],patout[[ip]]$res.df[keepday])
    patout[[ip]]$mdd[keepday]<-patout[[ip]]$dev.drop[keepday]/patout[[ip]]$df.drop[keepday] 
  }
for (j in seq_along(patout[[ip]])) { names(patout[[ip]][[j]])<-patout[[ip]]$day }
}


best<-nulpat
for (ip in ipvec){  ## seq_along(patout) changed to ipvec as part of 2020_06_01 error fixing, but for clarity not substance
bpix<-which(patout[[ip]]$mdd==max(patout[[ip]]$mdd,na.rm=TRUE))
best[[ip]]<-list(day=patout[[ip]]$day[bpix],mdd=patout[[ip]]$mdd[bpix], pvalue=patout[[ip]]$pvalue[bpix], dev.drop=patout[[ip]]$dev.drop[bpix], df.drop=patout[[ip]]$df.drop[bpix], res.dev=patout[[ip]]$res.dev[bpix], res.df=patout[[ip]]$res.df[bpix])
if (min(patout[[ip]]$res.df)!=max(patout[[ip]]$res.df)) print(paste0("Beware -- residual df vary for ",names(patout)[[ip]]))
#print(unlist(best[ip]))
}

besta<-array(NA,dim=c(4,7))
rownames(besta)<-ipvec
colnames(besta)<-c("day","mdd","pvalue","dev.drop","df.drop","res.dev","res.df")
for (ip in ipvec) for (j in 1:7) besta[ip,j]<-best[[ip]][[j]]  ## i changed to ip over ipvec for clarity not substance over 2020_06_01 errors


print(unlist(pat))  ## Note from 2020_06_01 error fixing: this will be in the original (and possibly irregular) order
print(besta)  ## This will be in the regular ipvec order, but any change in order will not reflect an error in calculations
return(list(inpat=pat,best=besta,patout=patout))
}

putintopat<-function(v,pat){
    pat1<-list()
    i<-0
    for (ip in ipvec) for (j in seq_along(pat[[ip]])) pat1[[ip]][[j]]<-v[i<-i+1]
    return(pat1)
    }

## Key vector that orders pat effectively inside functions, and provides a framework within which missing types
##  can be handled.
ipvec<-c("t.steps","t.kinks","dow.steps","dow.kinks")

# lcheck varies each day in the pattern independently with -1,+0,+1, so there are 3^(number of break days)
#  models to fit. It's return includes the pattern for the best of these models (as $bestone).
lcheck<-function(pat,patwid=patwid0,spa){        
    list.of.lists<-function(len,width){
        if (len==0) return ("empty")        
        if (len==1) return (as.list(-width:width))
        outlist<-lapply(list.of.lists(len-1,width),function(x) append(x,-width,0))
        for (i in (-width+1):width) outlist<-append(outlist,lapply(list.of.lists(len-1,width),function(x) append(x,i,0)))
        return(outlist)
    }
    
    empty<-function(x) identical(x,"empty")

    mll<-nulpat
    for (ip in ipvec) mll[[ip]]<-list.of.lists(length(pat[[ip]]),patwid[[ip]])
    
    devdims<-c()
    for (ip in ipvec) devdims<-c(devdims,c(rep(2*patwid[[ip]]+1,length(pat[[ip]]))))
## The ipvec in the preceding line fixes the error I found on 2020_06_01. And it is right because
##   the code for devind below uses the ipvec order and not the seq_along(pat) order

    if (missing(spa)) devaa<-array(NA,dim=devdims) else devaa<-spa

    devdf<-devaa
        
    trypat<-nulpat
    for (j1 in seq_along(mll[["t.steps"]])) {
        if (empty(mll[["t.steps"]])) j1<-NULL else {
        trypat[["t.steps"]]<- if (is.null(pat[["t.steps"]])) nulpat[["t.steps"]] else pat[["t.steps"]]+mll[["t.steps"]][[j1]]}
        for (j2 in seq_along(mll[["t.kinks"]])) {
            if (empty(mll[["t.kinks"]])) j2<-NULL else {
            trypat[["t.kinks"]]<- if (is.null(pat[["t.kinks"]])) nulpat[["t.kinks"]] else pat[["t.kinks"]]+mll[["t.kinks"]][[j2]]}
            for (j3 in seq_along(mll[["dow.steps"]])) {
                if (empty(mll[["dow.steps"]])) j3<-NULL else {
                trypat[["dow.steps"]]<- if (is.null(pat[["dow.steps"]])) nulpat[["dow.steps"]] else pat[["dow.steps"]]+mll[["dow.steps"]][[j3]]}
                 for (j4 in seq_along(mll[["dow.kinks"]])) {
                    if (empty(mll[["dow.kinks"]])) j4<-NULL else {
                    trypat[["dow.kinks"]]<- if (is.null(pat[["dow.kinks"]])) nulpat[["dow.kinks"]] else pat[["dow.kinks"]]+mll[["dow.kinks"]][[j4]]}
#                   print(unlist(trypat))
                    fp<-fitpat(Deaths~t+dow,pat=trypat)
                    inds<-c("t.steps"=j1,"t.kinks"=j2,"dow.steps"=j3,"dow.kinks"=j4)
                    devind<-c()
                    for (ip in ipvec) devind<-c(devind,if (is.null(pat[[ip]])) NULL else patwid[[ip]]+1+mll[[ip]][[inds[ip]]])
#                   print(devind)
#                   print(inds)
                    devaa[matrix(devind,nrow=1)]<-fp$m$deviance
                    devdf[matrix(devind,nrow=1)]<-fp$m$df.residual
#                   print(sum(is.na(devaa)))
    }}}}
    

    index<-0
    for (ip in ipvec) {
        for (i in seq_along(pat[[ip]])) {
                    index<-index+1
                    dimnames(devaa)[[index]]<-dimnames(devdf)[[index]]<-paste0(ip,".",pat[[ip]][[i]]+c(-patwid[[ip]]:patwid[[ip]]))
        }
    }

    bestpoints<-which(devaa==min(devaa),arr.ind=TRUE)
#   print(bestpoints)
    patwidadj<-patwid
    for (ip in ipvec) patwidadj[[ip]]<-if (is.null(pat[[ip]])) NULL else rep(patwid[[ip]],length(pat[[ip]]))
    bestpoints<-bestpoints+array(rep(unlist(pat)-unlist(patwidadj)-1,each=nrow(bestpoints)),dim=dim(bestpoints))

#   print(bestpoints)
    if (min(devdf)!=max(devdf)) print("Beware -- residual df vary")
    if (nrow(bestpoints)==1) {
        print("one best")
        best<-as.vector(bestpoints)
        best<-putintopat(best,pat)  ## relies on me having ordered the space by the sequence of ipvec.
#       print(best)
        print(unlist(pat))  ## An irregularly ordered pat will show up here irregularly, but it will be fixed
        print(unlist(best))  ## in order in best -- but the sums will all have been right. See 2020_06_01 error.
        if (all(unlist(best)==unlist(pat))) print("locally best")
#       print(best)
        return(list(bestone=best,arr=devaa,arr.df=devdf))
        } else {
        print("multiple bests"); return(list(bests=bestpoints,arr=devaa,arr.df=devdf))
    }
    }


make.pred<-function(pat,ndaysextra=7*26) {
    sumbyweek<-function(x) {
    outvec<-array(NA,dim=c(length(x)%/%7))
    for (i in 1:(length(x)%/%7)) {
        outvec[i]<-sum(x[(1+7*(i-1)):(7*i)])
#       print(outvec[i])
    }
    return(outvec)}

    m.1a<-fitpat(Deaths~t+dow,pat=pat,breakdir=1)
    m.1b<-fitpat(Deaths~t+dow,pat=pat,breakdir=-1)
    cat(paste0("\n","The model fits ",nrow(df)," days of data from ",datafirstdate," to ",datalastdate," and has the following pattern:\n"))
    print(unlist(pat))
    cat(paste0("and leaves residual deviance ",
                format(m.1a$m$deviance,nsmall=1,digits=1),
                " on ",m.1a$m$df.residual," degrees of freedom.\n"))
    weekly.ratio<-exp(7*m.1b$m$coef["t"])
    slope.figs<-c(m.1a$m$coef["t"],m.1b$m$coef["t"],-m.1a$m$coef["t"]/m.1b$m$coef["t"])
    names(slope.figs)<-c("initial slope","final slope","ratio")

    dwsf<-sum(df$Deaths[length(df$Deaths)+(-6:0)])
    dwsfpred<-sum(m.1b$m$fitted.values[length(m.1b$m$fitted.values)+(-6:0)])
    
    cat(paste0("\n","Total number in last week = ",dwsf,", and fitted number = ",format(dwsfpred,nsmall=1,digits=1),"\n\n"))
    if (weekly.ratio<1) {
        
        ## weekly ratio less than one
        
    
    weekly.predictions<-c(weekly.ratio,weekly.ratio/(1-weekly.ratio),dwsf*weekly.ratio/(1-weekly.ratio),dwsfpred*weekly.ratio/(1-weekly.ratio))
    names(weekly.predictions)<-c("weekly ratio","rem. mult.","n pred","fit pred")
    print(noquote(format(slope.figs,digits=2,nsmall=3)))
    cat("\n")
#   cat(paste0("\n"))
#   print(weekly.predictions)
#   cat("\n")

    
    cit95<-suppressMessages(confint(m.1b$m,"t"))
    wr.lo<-exp(7*cit95[1])
    wr.hi<-exp(7*cit95[2])

    weekly.pred.pi.lo<-weekly.pred.pi.hi<-weekly.predictions
    weekly.pred.pi.lo<-c(wr.lo,wr.lo/(1-wr.lo),dwsf*wr.lo/(1-wr.lo),dwsfpred*wr.lo/(1-wr.lo))
    weekly.pred.pi.hi<-c(wr.hi,wr.hi/(1-wr.hi),dwsf*wr.hi/(1-wr.hi),dwsfpred*wr.hi/(1-wr.hi))
    wpia<-rbind(lo.95=weekly.pred.pi.lo,pred=weekly.predictions,hi.95=weekly.pred.pi.hi)
    colnames(wpia)<-names(weekly.predictions)
    print(noquote(format(wpia,digits=4,nsmall=1)))
    cat("\n")
    
    fulldf<-makedf(nrow(df)+ndaysextra)
    epd<-expatdf(fulldf,pat,breakdir=1)
    mp<-suppressWarnings(predict(m.1a$m,epd$df[c((nrow(df)+1):nrow(epd$df)),],se.fit=TRUE,type="response"))

    preds.week<-sumbyweek(mp$fit)
    cat(paste0("The predicted future 26 weekly totals are:\n"))
    cat(format(preds.week,nsmall=1,digits=1))
    cat(",\n")
    cat(paste0("which sum to ", format(sum(preds.week),nsmall=1,digits=1),"."))
    return(invisible(list(wpia=wpia,daily=mp,weekly=preds.week))) } else {
        
    ## weekly ratio greater than or equal to one
        
        
        cat(paste0("The weekly ratio on the basis of the final time trend is ",format(weekly.ratio,digits=2,nsmall=3),"\n\n"))
    cit95<-suppressMessages(confint(m.1b$m,"t"))
    wr.lo<-exp(7*cit95[1])
    wr.hi<-exp(7*cit95[2])
                cat(paste0("with a 95% confidence interval of (",format(wr.lo,digits=2,nsmall=3),", ",format(wr.hi,digits=2,nsmall=3),"). \n\n"))

        
cat(paste0("The weekly ratio estimate is greater than or equal to one, and therefore we cannot sensibly ",
"estimate the future number of deaths in this first wave. If the weekly ratio is genuinely above one, ",
"then on the current simple model, the number would be infinite. More realistically, the numbers would ",
"grow until a lockdown occurs or the number of people who have had the disease, assuming they are and ",
"remain immune, will dampen the rate of spread below 1 again. It is, of course, possible that the current ",
"estimate is wrong and that the true rate is less than one. We can hope."),fill=TRUE)
    }
    }

## This function (i) puts a single pattern into "regular order", namely t.steps, t.kinks, dow.steps, dow.kinks, and eliminates any
##  duplication within each category. And it merges an arbitrary number of separate patterns into a single pattern. It is useful
##  for finding a pattern to fit the data, building on the existing output and needing to include only the extra items in each
##  command line. In principle, it would help with automating the search.
onepat<-function(pat1,pat2,...){
    outpat<-list()
    if (missing(pat2)) {
        for (ip in ipvec) outpat[[ip]]<-sort(unique(c(pat1[[ip]])))
        return(outpat) } else { 
        for (ip in ipvec) outpat[[ip]]<-sort(unique(c(pat1[[ip]],pat2[[ip]])))
        return(onepat(outpat,...))
        }
 }

pat0<-list(t.steps=c(4),t.kinks=c(10),dow.steps=c(20),dow.kinks=c(30))
patpos0<-list(t.steps=c(3:(nrow(df)-2)),t.kinks=c(3:(nrow(df)-3)),dow.steps=c(15:(nrow(df)-14)),dow.kinks=c(0:(nrow(df)-14)))
patav0<-list(t.steps=1,t.kinks=1,dow.steps=6,dow.kinks=6)
patwid0<-list(t.steps=1,t.kinks=1,dow.steps=1,dow.kinks=1)

Here is an example of the use of the functions, to find a reasonable fit to the time course. It may or not be the use that led to today’s best time-pattern.

at1<-add.term(nulpat)#
## NULL
##           day        mdd    pvalue   dev.drop df.drop   res.dev res.df
## t.steps    24 19437.1207 0.0000000 19437.1207       1  6754.565     91
## t.kinks    32 24314.8860 0.0000000 24314.8860       1  1876.799     91
## dow.steps  33   176.7888 0.7256402  1060.7330       6 25130.952     86
## dow.kinks  86    55.2902 0.9805846   331.7412       6 25859.944     86
at1<-add.term(list(t.kinks=c(32)))#
## t.kinks 
##      32 
##           day       mdd      pvalue dev.drop df.drop  res.dev res.df
## t.steps    44 143.33381 0.007662420 143.3338       1 1733.466     90
## t.kinks    22 147.85721 0.006726465 147.8572       1 1728.942     90
## dow.steps  47  63.81988 0.002956384 382.9193       6 1493.880     85
## dow.kinks   0  39.18583 0.070511980 235.1150       6 1641.684     85
at1<-add.term(onepat(list(t.kinks=c(22,32))))#
## t.kinks1 t.kinks2 
##       22       32 
##           day       mdd       pvalue dev.drop df.drop  res.dev res.df
## t.steps    38 203.65707 0.0008668241 203.6571       1 1525.285     89
## t.kinks    44 212.65409 0.0006538554 212.6541       1 1516.288     89
## dow.steps  47  65.84026 0.0010712115 395.0416       6 1333.901     84
## dow.kinks   0  39.03679 0.0514512147 234.2208       6 1494.721     84
at1<-add.term(onepat(list(t.kinks=c(22,32,44))))#
## t.kinks1 t.kinks2 t.kinks3 
##       22       32       44 
##           day      mdd      pvalue  dev.drop df.drop  res.dev res.df
## t.steps    40 58.03186 0.064617611  58.03186       1 1458.256     88
## t.kinks    25 10.00682 0.446553061  10.00682       1 1506.281     88
## dow.steps  47 54.31850 0.002203684 325.91102       6 1190.377     83
## dow.kinks   0 46.00342 0.009029873 276.02054       6 1240.268     83
at1<-add.term(onepat(at1$inpat,list(dow.steps=c(47))))#
##  t.kinks1  t.kinks2  t.kinks3 dow.steps 
##        22        32        44        47 
##           day       mdd     pvalue   dev.drop df.drop   res.dev res.df
## t.steps    83 41.623060 0.08853151  41.623060       1 1148.7540     82
## t.kinks    46  9.044604 0.43044451   9.044604       1 1181.3325     82
## dow.steps  79 35.224625 0.01715029 211.347751       6  979.0293     77
## dow.kinks  80 13.391700 0.47916418  80.350202       6 1110.0269     77
nrow(df)
## [1] 100
at1<-add.term(onepat(at1$inpat,list(dow.steps=c(79))))#
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         22         32         44         47         79 
##           day       mdd       pvalue   dev.drop df.drop  res.dev res.df
## t.steps    80 25.089595 1.614990e-01  25.089595       1 953.9397     76
## t.kinks    46  8.676953 4.123054e-01   8.676953       1 970.3524     76
## dow.steps  86 77.833958 1.678829e-08 467.003746       6 512.0256     71
## dow.kinks  72 61.480302 5.450476e-06 368.881812       6 610.1475     71
lc1<-lcheck(at1$inpat)#
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         22         32         44         47         79 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         23         33         44         47         79
lc1<-lcheck(lc1$bestone)#
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         23         33         44         47         79 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         24         33         44         47         79
lc1<-lcheck(lc1$bestone)#
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         24         33         44         47         79 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         25         33         44         47         79
lc1<-lcheck(lc1$bestone)#
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         25         33         44         47         79 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         25         33         44         47         79 
## [1] "locally best"
pat.1<-lc1$bestone
at1<-add.term(lc1$bestone)
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         25         33         44         47         79 
##           day      mdd       pvalue  dev.drop df.drop  res.dev res.df
## t.steps     9 32.84100 1.059515e-01  32.84100       1 932.4060     76
## t.kinks    10 19.40193 2.156483e-01  19.40193       1 945.8451     76
## dow.steps  86 77.70824 1.138986e-08 466.24943       6 498.9976     71
## dow.kinks  72 61.25975 4.420484e-06 367.55852       6 597.6885     71
at1<-add.term(onepat(at1$inpat,list(dow.steps=c(86))))
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 
##         25         33         44         47         79         86 
##           day      mdd      pvalue  dev.drop df.drop  res.dev res.df
## t.steps     9 32.75598 0.029828023  32.75598       1 466.2416     70
## t.kinks    10 19.32677 0.097530465  19.32677       1 479.6708     70
## dow.steps  40 19.29622 0.007110432 115.77734       6 383.2202     65
## dow.kinks  65 10.17328 0.188940777  61.03970       6 437.9579     65
at1<-add.term(onepat(at1$inpat,list(dow.steps=c(40))))
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         25         33         44         40         47         79         86 
##           day      mdd     pvalue dev.drop df.drop  res.dev res.df
## t.steps     9 33.65418 0.01569005 33.65418       1 349.5661     64
## t.kinks    10 19.35107 0.06968195 19.35107       1 363.8692     64
## dow.steps  31 10.97796 0.07424781 65.86775       6 317.3525     59
## dow.kinks  65 10.05066 0.10746646 60.30396       6 322.9163     59
lc1<-lcheck(at1$inpat)
## [1] "Beware -- residual df vary"
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         25         33         44         40         47         79         86 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         25         33         45         40         47         79         85
lc1<-lcheck(lc1$bestone)
## [1] "Beware -- residual df vary"
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         25         33         45         40         47         79         85 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         24         33         46         40         47         79         85
lc1<-lcheck(lc1$bestone)
## [1] "Beware -- residual df vary"
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         24         33         46         40         47         79         85 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         23         33         46         40         47         79         85
lc1<-lcheck(lc1$bestone)
## [1] "Beware -- residual df vary"
## [1] "one best"
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         23         33         46         40         47         79         85 
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         23         33         46         40         47         79         85 
## [1] "locally best"
at1<-add.term(onepat(lc1$bestone))
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         23         33         46         40         47         79         85 
##           day       mdd     pvalue  dev.drop df.drop  res.dev res.df
## t.steps    32 23.335711 0.04171336 23.335711       1 345.8258     64
## t.kinks     9  9.780095 0.19162798  9.780095       1 359.3814     64
## dow.steps  31 12.313913 0.03432043 73.883481       6 295.2780     59
## dow.kinks  65  9.175103 0.13142552 55.050621       6 314.1109     59
pat.2<-at1$inpat

make.pred(pat.1)
## 
## The model fits 100 days of data from 2020-03-06 to 2020-06-13 and has the following pattern:
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 
##         25         33         44         47         79 
## and leaves residual deviance 965.2 on 77 degrees of freedom.
## 
## Total number in last week = 1197, and fitted number = 1279.2
## 
## initial slope   final slope         ratio 
##         0.232        -0.030         7.710 
## 
##       weekly ratio rem. mult. n pred    fit pred 
## lo.95    0.7944       3.8627  4623.6648 4941.0271
## pred     0.8102       4.2675  5108.2418 5458.8648
## hi.95    0.8261       4.7489  5684.4556 6074.6293
## 
## The predicted future 26 weekly totals are:
## 1036.3  839.6  680.2  551.1  446.5  361.7  293.0  237.4  192.3  155.8  126.2  102.3   82.9   67.1   54.4   44.1   35.7   28.9   23.4   19.0   15.4   12.5   10.1    8.2    6.6    5.4,
## which sum to 5436.0.

These are the predictions with three t.kinks, which make the epidemic increase turn down and then flatten off a bit, with two changes of day-of-the-week patterns.

make.pred(pat.2)
## 
## The model fits 100 days of data from 2020-03-06 to 2020-06-13 and has the following pattern:
##   t.kinks1   t.kinks2   t.kinks3 dow.steps1 dow.steps2 dow.steps3 dow.steps4 
##         23         33         46         40         47         79         85 
## and leaves residual deviance 369.2 on 65 degrees of freedom.
## 
## Total number in last week = 1197, and fitted number = 1244.8
## 
## initial slope   final slope         ratio 
##         0.254        -0.032         8.046 
## 
##       weekly ratio rem. mult. n pred    fit pred 
## lo.95    0.7895       3.7496  4488.3270 4667.7270
## pred     0.8016       4.0395  4835.3075 5028.5763
## hi.95    0.8137       4.3674  5227.8285 5436.7865
## 
## The predicted future 26 weekly totals are:
## 997.8 799.8 641.1 513.9 411.9 330.2 264.7 212.1 170.1 136.3 109.3  87.6  70.2  56.3  45.1  36.2  29.0  23.2  18.6  14.9  12.0   9.6   7.7   6.2   4.9   4.0,
## which sum to 5012.6.

The second set allows two further changes in day-of-the-week pattern.