UQRUG 42

meeting

Overview: Parallization in R.
Questions: parallels, HPC, memory utilisation, detectCores, for loops

Published

September 27, 2023

2023-09-27: UQRUG 42

R Overview of the Month

This month at UQRUG, Raúl will be providing an overview of using, and the advantages of, Parallelization in R.

Find more details here: https://uqrug.netlify.app/posts/2023-09-27-september-parallelisation/

Attendees

Add your name, where you’re from, and why you’re here:

Name	Where are you from?	What brings you here?
Nicholas Wiggins	Library	Here to help!
Raul Riesco	ACE	Here to learn and help :)
David Green	UQ Research Computing Centre	R on HPC help
Raimundo Sanchez	Research Fellow SHRS	R
Giulio Centorame	IMB	Parallel computation hurts my brain
Jessica Hintzsche	QAAFI	Here to figure out how to make my code more efficient
Felicity Charles	UQ Gatton	Here to learn
Grant Taylor	UQ St Lucia	Learn more about R
Ryan	SOE	Learn R
Jocelyn	UQCCR	Still learning
Valentina Urrutia Guada	Library	Here to learn & help
Luke Gaiter	Library	Learning

Questions

Q1 - How do you avoid running out of memory with parallel jobs? - Giulio Centorame

I constantly have issues with running out of memory with parallel operations. E.g. using the package furrr

# Setup multisession
library(furrr)
library(magrittr)
plan(multisession, workers = 2)

# Vector of paths with huge tables I want to import
paths <- c("big_table1.tsv", "big_table2.tsv", "big_table3.tsv")

# Load everything
# If the tables are too big, R runs out of memory
paths %>%
    future_map(read.table)

This can cause issue especially when using HPCs, since the jobs will often fail if the process tries to allocate more memory than requested. How do I limit the amount of memory each child process/all the processes can use?

Answers

David: The UQ HPCs can allocate up to 2TB of RAM. If you need more RAM, request it. If you are still running out of RAM, then it might be worth splitting the job into multiple separate jobs. e.g. if you’re trying to process the three big tables, then run each big table in a separate job
Name: Ryan I don’t think you have control over the memory of each child process. You might want to use data.table from previous presentations here. It uses openmp for parallization and could be more efficient. I also believe the HPC has openmp all set, so in R, it should be plug and play already. —

Q2 - Using detectCores() will detect all cores on a HPC is there a workaround? - David

Answers

Ryan: Maybe…

cores = Sys.getenv("SLURM_CPU_PER_TASK")
makecluster(cores)

This does a more refine version of the above, but requires an extra package, so not as good for HPC use

cores <- length(parallelly::availableWorkers())

Q3 - for loops - Grant Taylor

I have the same code for six participants. Made six lots of code, but of course this is very inefficient. Here is a small part…

#######################   SUB 004   ############################################
#library(tidyverse)

load("0_data/parsedData_cogPsyRep_sub_004.RData")   #  contains a list called "data" which is converted into a dataframe for the Sub (see next line)

df4=data.frame(data)

nTrialsAll=length(df4$Cond1)
nTrialsAll

MRT=median(df4$Time1)
MRT
probCorr=mean(df4$Correct==TRUE)
probCorr

numDR4=sum(df4$DoubleResp==TRUE)
numDR4
probDR4=(sum(df4$DoubleResp==TRUE)/nrow(df4))
probDR4


ACCdf4=tapply(df4$Correct==TRUE, list(df4$Cond1, df4$Cond2),mean)
ACCdf4


RTdf4=tapply(df4$Time1, list(df4$Cond1, df4$Cond2),median)
RTdf4

Sounds good. Anything that is more efficient.
Yes, this example is for sub 4 but I have S=c(1,2,3,4,6,9) # the Sub numbers I tried something like this…

for (s in S)
{load(paste("0_data/parsedData_cogPsyRep_sub_00",s,".RData",sep=""))
}  ## loads the data for a specific Sub

This particular one probably doesn’t need it (paralellisation) Thanks! I’ll give that a go :)

Answers

Raul:

list_files <- list.files(path = "./0_data", pattern = "parsedData_cogPsyRep_sub")
for (file in list_files){
  a<-data.frame(file)
  return(length(a$Cond1))
  return(median(a$Time1))
}

Nick: If you have any further issues or questions, don’t hesitate to reach out to training@library.uq.edu.au