UQRUG 39

meeting

Overview: Using data.table for fast data manipulation.
Questions: data.table and for loops, csv vs xlsx, LMER, keeping code tidy and readable, standard R operators, using .RData files, storing descriptive stats in a matrix, mutating all cells in dataframe, HPC help, RMarkdown errors

Published

June 28, 2023

2023-06-28: UQRUG 39

R Overview of the Month

Nick Garner will be providing an overview of the basics of data manipulation with data.table and its advantages compared to using the tidyverse

Attendees

Add your name, where you’re from, and why you’re here:

Name	Where are you from?	What brings you here?
Tasneem	SVS	Learning R
Nicholas Wiggins	Library	Here to help!
Maddison Brown	Biol Postgrad	Keeping myself accountable to learn R!
Raul Riesco	ACE	Learning/helping
Jay	Computer Science	Learning
Ryan	SOE	Learning
Semira Hailu	UQCCR	learning R
Nick Garner	SBMS	data.table!
Rene Erhardt	Pharmacy	LMER
Kar Ng	Data Analyst	UQ Student Affairs
Jinnat Ferdous	SVS	Have a question and also for learning
Brett McKinnon	IMB	Learning
Hanh Dao	CHSR	introduction to R
henry marshall	tpch clinician	learning - many thanks!
Ismail Garba	AGFS	R on Bunya HPC
David Green	UQ RCC	Here to learn and help
Mikesh Patel	QCMHR	Learning

Questions

Q1 - What is the difference between data.table and tibble() from tidyverse? AND Can data.table be used for for-loops and creating functions? - Kar

Answers

dataframe vs tibble vs datatable - A dataframe is like a simple table that has rows and columns, a tibble() can handle more metadata than a basic dataframe, a data.table is structured to handle the data in a faster way
Yes, and the processing speed is faster if data.table is used.

Q3 - I’ve got a couple of general questions about LMER - Rene

How to report results? I’ve got sequencing results and ran LMERs to evaluate if a variable can explain the outcome. The results differ greatly. I interpret it that the LMER don’t explain the results.

Answers

No one was able to help with this one - will follow up with training@library.uq.edu.au

Q4 - How do you keep your code tidy and easy to understand? I sometimes go back to old code and it takes a long time to figure out what I’ve done. Also how do you keep up your skills? - Jocelyn

Answers

Raul: I always take a little time adding comments within my code using # symbol (it does not run with the code). It actually helps a lot when you return to your code later on. If I want to publish the code I erase this comments or I organize them in sections. About how I improve… I think the only way is try to do new things in R, you have a lots of tools to help you, and you improve with every new thing. Also courses from time to time, to refresh things…
Kar: There are best practieses - many people and I suggest to use tidyverse style, having spaces between equals, comma,s brackets, and annotate all your codes using hash and add comments. In all my projects, I always have comments in my codes.
Nick: In addition to using comments in R using # you can use Github (or keeping it offline) where you can commit changes often with comments on what changes each time you edit the code. That can help you understand what you’ve done over time and backup your code in case you delete and save over old code. https://happygitwithr.com/index.html
rmarkdown is quite good too
DavidG: we finish the Intro to R workshops with this general advice https://swcarpentry.github.io/r-novice-gapminder/16-wrap-up.html

Q5 - How do you know where to put %>% or == or () or {} or []? – Jinnat

Answers

%>% OR |> - pipes chain functions together. (Kar: Sometime I call it “then”)
== - bool / boolean operator checks to see if one value is equal to the other
!= : Is “Not Equal”
() - generally refers to a function
{} - acts as a container for a block of code to be run together
[] - is for indexing a variable (Kar: mytable[row, column]), also for data.table functions

Q6 - How do we save the workspace with a data.table in it? .RData I think is single threaded read when we come back to load it again – Ryan

Answers

As a general rule I don’t generally save my .RData files, I just re-run the code to bring things back into the workspace the next time round. This can make loading R faster, .RData files can balloon out and be quite large, and avoid complications.
Is there an advantage to saving things as an .RData file?
If you do export a data.table to a .RData file, it appears that it doesn’t retain the data.table format
https://stackoverflow.com/questions/31250999/r-readrds-load-fail-to-give-identical-data-tables-as-the-original

Q7 - I am trying to produce a matrix and then will store the descriptive statistics of a dataset in the matrix. But somehow the code is not working. But I can’t figure out what’s wrong with the code also. – Jinnat

Answers

Compare two datasets using setdiff() could be a good way to compare the differences
Get it working with an example dataset, and see what the main differences are there
It may be the case that there is a column that is in the wrong format, or loaded in wrong

Q8 - I have a data frame of values that I would like to turn into an equivalent data frame (same rows and same columns) with the individual cells representing the percentage of values in the column. Not sure how to do this - Brett

Answers

This can be done using a mutate() and across()

# Calculate the percentages
percentages <- proportions_all %>%
  mutate(across(-Var1, ~ . / sum(.)) * 100)

Q9 - I have bunch of R scripts (Multicriteria optimization using genetic algorithms) that requires a lot of computing time. Tried using the Bunya but struggling to get the singularity container working. Need help getting this working. - Ismail

Answers

Does it need a container?
DG: You could possibly build personal R packages for the version of R available on Bunya.
DG: Software containers can be used if the container has exactly what you need within the container.
DG: Please catch us at https://rcc.uq.edu.au/meetups (HackyHour and/or Virtual HPC sessions)

Q10 - When I knit a RMarkdown script, and if it fails half way through because of a bit of bugs, is there a way that I can continue to journey of the Markdown knitting after fixing the bug, instead of re-running the entire Rmarkdown script again? To save some times! - Kar

Answers

Separate your computation from your visualisation from your reporting
Run your computation in an external script, then run your visualisation externally, then use RMarkdown to bring it all together
Run the ML scripts external to the RMarkdown document, and then pull the results in to the RMarkdown document
trycatch() - can pick up on an error, spit out an error message, and the continue running your code without it hanging and stopping https://www.r-bloggers.com/2020/10/basic-error-handing-in-r-with-trycatch/
Maybe a fancy version of RMarkdown (quarto) could have something that helps? https://quarto.org/docs/computations/r.html

2023-06-28: UQRUG 39

R Overview of the Month

Attendees

Questions

Q1 - What is the difference between data.table and tibble() from tidyverse? AND Can data.table be used for for-loops and creating functions? - Kar

Q2 - Maddi: Just a general question…I am conducting a spatial analysis. Do you recommend any particular way to read tables, e.g. read_table, from_csv, read_excel? I can export data sets as excel or csv. Just want to chose what is easiest. Thanks!

Q3 - I’ve got a couple of general questions about LMER - Rene

Q4 - How do you keep your code tidy and easy to understand? I sometimes go back to old code and it takes a long time to figure out what I’ve done. Also how do you keep up your skills? - Jocelyn

Q5 - How do you know where to put %>% or == or () or {} or []? – Jinnat

Q6 - How do we save the workspace with a data.table in it? .RData I think is single threaded read when we come back to load it again – Ryan

Q7 - I am trying to produce a matrix and then will store the descriptive statistics of a dataset in the matrix. But somehow the code is not working. But I can’t figure out what’s wrong with the code also. – Jinnat

Q8 - I have a data frame of values that I would like to turn into an equivalent data frame (same rows and same columns) with the individual cells representing the percentage of values in the column. Not sure how to do this - Brett

Q9 - I have bunch of R scripts (Multicriteria optimization using genetic algorithms) that requires a lot of computing time. Tried using the Bunya but struggling to get the singularity container working. Need help getting this working. - Ismail

Q10 - When I knit a RMarkdown script, and if it fails half way through because of a bit of bugs, is there a way that I can continue to journey of the Markdown knitting after fixing the bug, instead of re-running the entire Rmarkdown script again? To save some times! - Kar