skip to primary navigationskip to content
 

Research Computing Use Cases

1) Chris Gilligan, Plant Sciences

The group is carrying out stochastic simulation and parameter estimation using Bayesian methods for epidemic models and also running Lagrangian simulations with heavy demands on meteorological data. They have been using about 80TB of Research Data Store for data processing and 50TB Research Cold Store for longer term storage. In about nine months they have used over 600, 000 hours CPU time, about half paid-for and half free (SL2 and SL3). In order to make an estimate for a grant budget they would look at the amount of resources required to do similar previous work.

2) Chris Illingworth, Genetics

" My work regularly involves statistical analyses of genomic data which can be computationally intensive to perform; as such I make regular use of the HPC. After changes made in the last couple of years the HPC is easy to use and well-managed. I have had a fantastic response from the team who run it whenever I have encountered any problems or had any queries about the system. When applying for grants it can be tough up-front to estimate how much time and storage to apply for, but experience has been useful in working out approximately how much is needed for the daily requirements of my lab; a mixture of central storage, and locally provided storage in the department, has been ideal for my needs."

3) Ben Luisi, Biochemistry

" My colleagues and I have been using HPC extensively to process data from cryo electron microscopy and generate structures of macromolecules with biomedical relevance. We have been using both GPU and CPU modes for extensive calculations and 100TB RDS storage space for handling our large datasets and the intermediate files generated from data processing and analysis. The resources have been invaluable for our work, which has allowed us to elucidate the high resolution structures of molecular machines that transport molecules across biological membranes and enzyme assemblies that use RNA to help regulate the control of gene expression."

4) Russell Hamilton, Centre for Trophoblast Research

"The Centre for Trophoblast Research Bioinformatics Facility operates under a cost recovery model. When determining the bioinformatics costs for a grant proposals we use a standard hourly rate,  set to include compute, storage (short and long term) as well as the analysis time. The number of hours specified in the costing is calculated based on the type (e.g. RNA-Seq) and size of the experiment (number of samples, and groups to be compared)."

5) Aylwyn Scally, Genetics

"In general my process for estimating compute usage and storage is as follows. I first identify previous projects that are most similar in terms of computation needs and usage type. To do this I match projects based on a breakdown into the following categories: pipeline development, algorithm development, data processing, simulation/sampling (including Monte Carlo methods), and large memory tasks. (Most of my jobs are trivially parallel at large scale, but some things like assembly and/or certain simulations need to hold tens of gigabytes in memory.) Then I scale by the amount of data involved or other relevant factor. I rarely make any allowance for increases in CPU/IO efficiency compared to a few years ago, because in my experience the application complexity usually scales as well. Estimating development time is tricky, particularly as a lot of it is done outside the HPC, but one has to budget for some contingency because not all development can be done in this way, particularly for issues specific to the HPC architecture. Also, even after much development, there will always be many 'production' runs at scale which have to be repeated because a bug/error emerges at a late stage, or data issues become clear during postprocessing and analysis. Storage costs are estimated similarly; I count on keeping most intermediate files around during the life of the project (compressed where possible), and then reducing to a set of key outputs afterwards. I always aim to write pipelines which clean up redundant files as they go."