My Projects
Things I’m interested in and actively working on:
- My main DPhil research is focused on generative modelling of protein folding trajectories as a natural corollary to AlphaFold2 progress in solving the problem of structure prediction from sequence. Given a 3D structure, can we train models using paradigms like diffusion and Flow Matching to generatively predict possible folding pathways in a trajectory fashion that map to empirical data such as psi-value analysis and MD simulations of protein folding. Particularly interesting is whether any sufficiently performing model is learning implicit physics (or enforcing such physics in the training regime) that can appropriately capture natural folding energy landscapes and be exploitable for perturbation analysis and drug target identification.
- Inspired by a recent paper [AbMelt] I’ve spent some months putting together a python package for the simulation of antibody Fv regions with GROMACS at high temperature (350K) for ML prediction of their thermostability (Tm) from MD features. Initial results from this work [GitHub] suggest that the training set of antibodies is likely a major determinant of the effectiveness of the learned prediction models as my initial results were significantly worse at prediction on the test set than AbMelt’s. We came to this conclusion because our training set was taken from Jain et al. consisting of a very diverse set of IgGs whilst, although their training data is unavailable for us to confirm this, I expect that AbMelt trained on a more closely related set of mAbs from a potentially clinical pipeline.
 
- Differential Scanning Simulation (DSS) for protein thermostability prediction. Inspired by Differential Scanning Flourometry (DSF), an experimental technique used to estimate protein thermostabiity Tm values, this is a GROMACS MD protocol I’m developing for gradual temperature annealing increasing over a simulation until the point at which the protein has reached a state of 50% unfolded. This is challenging because in typical DSF the Tm is defined as the temperature at which half the population is unfolded whilst in MD simulations we typically work with a single protein so there it is undetermined whether 50% protein structure unfolded of a single protein will be synonymous with the canonical definition of Tm (or possibility for a new definition of Tm). Further, without flourophores we do not have a quantitative measure of the degree of unfoldedness of a simulated protein so another challenge is determining some “unfoldedness” metric from erpicial measures such as Radius of Gyration, weighted secondary structure content scoring and the fraction of native internal contacts.
Fun side projects I try fit into my spare time include home server labbing (DIY micro-HPC building) with Raspberry Pis and old computers I’ve gathered over the years, and modding vintage games such as Mount & Blade Warband. Alongside this website of course!