Australian Centre for Ecogenomics // University of Salamanca
September 27, 2023
Have you ever noticed that RStudio never reaches 100% CPU usage even when running a very demanding task?
R runs only on a single thread on the CPU by default
Is it the most efficient way to run functions?
It is possible to parallelize processes in R using specialized packages.
parallel
Basic concepts
There are two main ways in which code can be parallelized, via sockets or via forking
parallel is designed to work with functions, and it is analogous to the use of functions like apply, as well as its derivatives lapply and sapply
apply | parallel | INPUT | OUTPUT |
---|---|---|---|
apply | parApply (parRapply, parCapply)1 | data.frame, matrix | vector, list, array |
sapply | parSapply | List, vector, data.frame | vector/matrix |
lapply | parLapply | List, vector, data.frame | list |
foreach is a package designed for looping. It also allows to combine results in diferent formats.
By itself, foreach do not parallelize, but it can be combined with parallel and doParallel to allow paralellization
library(foreach)
library(parallel)
library(doParallel)
clust <- makeCluster(2)
registerDoParallel(clust)
foreach(i=1:2, .combine='c') %dopar% exp(i)
stopCluster(cl = clust)
[1] 2.718282 7.389056
Determine which numbers on a sample are primes
Function:
isprime <- function(num){
prime=TRUE
i=2 #I need to start from 2, as prime numbers can only be divided by 1 and themselves.
while(i<num){ #The while loop will continue running as long as the value of 'i' is less than the specified number
if ((num %% i) == 0){ #The '%%' operator calculates the remainder when our number is divided by 'i.' If the remainder is 0, it will terminate the loop
prime = FALSE
break
}
i <- i+1
}
return(prime)
}
data (10,000 numbers):
primes<- rep(T,10000)
for(i in 1:length(listnumbers)){
primes[i] <- isprime(listnumbers[i])
}
result<-data.frame(number=listnumbers, is_prime=primes)
[1] "Time difference of 5.308078 secs"
library(foreach)
primes_fe <-foreach(i = 1:length(listnumbers), .combine="c") %do% {
isprime(listnumbers[i])
}
result_fe<-data.frame(number=listnumbers, is_prime=primes_fe)
[1] "Time difference of 6.629166 secs"
library(parallel)
library(foreach)
library(doParallel)
cores <- detectCores()
clust <- parallel::makeCluster(cores)
registerDoParallel(clust)
primes_par_fe <- foreach(i = 1:length(listnumbers), .combine="c") %dopar% {
isprime(listnumbers[i])
}
result_par_fe<-data.frame(number=listnumbers, is_prime=primes_par_fe)
parallel::stopCluster(cl = clust)
[1] "Time difference of 2.964536 secs"
primes_sa <- sapply(listnumbers, isprime)
result_sa<-data.frame(number=listnumbers, is_prime=primes_sa)
[1] "Time difference of 5.478974 secs"
library(parallel)
cores <- detectCores()
clust <- parallel::makeCluster(cores)
prime_par_sa <- parSapply(clust, listnumbers, isprime)
result_par_sa<-data.frame(number=listnumbers, is_prime=prime_par_sa)
parallel::stopCluster(cl = clust)
[1] "Time difference of 1.725008 secs"