library(profvis)
profvis({
|>
diamonds # filter out cheaper diamonds
filter(price > 500) |>
# sort by carat
arrange(carat) |>
# how big is each carat group?
group_by(carat) |>
mutate(carat_group_size = n()) |>
# convert price to AUD
mutate(price = usd_to_aud(price))
})
UQRUG 62
Questions: tidyr, dplyr, tmap.mapgl, maplibre, stata, spss
Topic: code profiling with profvis
- Profvis website: https://profvis.r-lib.org/
- How to use it in RStudio: https://support.posit.co/hc/en-us/articles/218221837-Profiling-R-code-with-the-RStudio-IDE#using-the-flame-graph
Sample code to test:
# example dataset from ggplot2
library(ggplot2)
diamonds
# function to convert diamond price from USD to AUD
<- function (x) {
usd_to_aud Sys.sleep(0.02) # wait a little, so our problem is more obvious later on
* 1.55
x
}
# process our data
library(dplyr)
|>
diamonds # filter out cheaper diamonds
filter(price > 500) |>
# sort by carat
arrange(carat) |>
# how big is each carat group?
group_by(carat) |>
mutate(carat_group_size = n()) |>
# convert price to AUD
mutate(price = usd_to_aud(price))
The last block of code is very slow! Why is that?
Let’s use profiling to find out! In RStudio:
use the menu “Profile - Start profiling”
run the problematic block of code
click “Stop profiling” above the console
A new “Profile 1” tab should open with two types of visualisation of your code’s performance:
A Flame Graph tab
A Data tab
Use them to spot the processes that use most of the time.
You can also use the profvis package directly by wrapping the code into the profvis()
function:
Looks like our custom function takes most of the time, and is executed several times! The issue is that we didn’t ungroup after the first mutate, and therefore the second mutate is unnecessarily split between groups instead of running once on the whole column.
Fix it with an ungroup()
step and compare the time it takes to execute:
profvis({
|>
diamonds # filter out cheaper diamonds
filter(price > 500) |>
# sort by carat
arrange(carat) |>
# how big is each carat group?
group_by(carat) |>
mutate(carat_group_size = n()) |>
# remove grouping now that we don't need it!
ungroup() |>
# convert price to AUD
mutate(price = usd_to_aud(price))
})
Attendees
Name | Where are you from? | What brings you here? |
---|---|---|
Thi Quynh Chang | student | learn R tips |
Catherine | UQ/QUT | profvis |
Haileyesus | HDR student at Gatton | |
Tianti | HDR student at Gatton | |
Aprezo | ||
Rodrigo | HDR at SAFS | |
Lamees | ||
Stéphane | Library | Here to help and present |