UQRUG 37

meeting
Overview: AI assisted coding.
Questions: Regular Expressions, data.table complete() equivalent, filtering outliers, boxplots, RStudio new line bug
Published

April 26, 2023

2023-04-26: UQRUG 37

R Overview of the Month

Luke will be going over his exploration of using generative AI for coding.

The main takeaway is that ChatGPT is a tool to assist you, it won’t do all the work for you.

Attendees

Name Where are you from? What brings you here?
Nicholas Wiggins Library Here to Help!
Luke Gaiter Library Here to help
Valentina Urrutia Guada Library Here to help
Raul Riesco Australian Centre for Ecogenomics Here to help too!
Xueyan Ma Graduate Student Learning
Bel QAEHS Just for general ideas
Willy Just Learning SAFS PhD Student
Jay grad student just learning
Lauren Fuge :) undergrad student just learning!
Lizzie ITaLI just leanring R and how it can be applied in my analytical work
Debbie postdoc to learn from Luke
Victor S SBMS
Ben Fitzpatrick UQ Civil Engineering (PostDoc) To Learn & Help (if possible)
Nicholas Garner SBMS Here to learn
Henry TPCH
Semira Hailu PhD Student New to R
Jocelyn UQCCR still learning
KAR NG Data Analyst @ Student Affair Learning
Pierre Bodroux P&F Learning
Renna PhD student Learning (new to R)
Anthony Bloomer Graduate student Here to learn more
Marie Thomas ACE Just getting some ideas
Isaac Kyei Barffour PhD student New To R

Questions

Code not automatically going to the next line and so Data in R doesn’t appear to load properly - Isaac

I have a perculiar problem. When I load data into R, I do not get a vissible notification in R that the data has been loaded. However, any analysis I promt R to perform on the data gets executed correctly.

Answers

  • We tried searching in the global and project options.

  • Stack overflow, and asking Bing AI and ChatGPT.

  • We reinstalled R and RStudio.

https://support.posit.co/hc/en-us/articles/200713843#stopping-on-a-line


What are some good tips for learning Regular Expressions/Regex?

Answers

A good R focused approach to learning Regex could be to learn to use the stringr package, which uses Regex in R. The Get Started page on the Stringr website is useful, as are the cheatsheets!

Another super useful resource is regex101, which allows you to test your regex on a string before you run it on all of your data. https://regex101.com/


Is there a data.table equivalent for the complete() function? - Nicholas

I’ve been analysing large datasets which are much easier to manipulate using data.table compared to functions in tidyverse, but I can’t find a data.table equivalent (or large data equivalent) for complete().

An example of using complete to fill in missing Hours in the “ExampleColumn” : complete(Hour = 0:23, nesting(Groups), fill = list(ExampleColumn = 0))

Answers

  • tidytable may be of use: https://markfairbanks.github.io/tidytable/

  • As may dtplyr https://dtplyr.tidyverse.org/

  • You may simply need to write your own specific data.table code to run it

  • OR bring this up as an issue on the data.table GitHub

Good resource for learning data.table https://atrebas.github.io/post/2019-03-03-datatable-dplyr/#introduction


Data pre-processing in RStudio - Jay

So, let’s say, im working on time-series data (14 days worth of data points @ 1 minute interval). The data is in xlcsv file. 1. How do I screen off the ir-regular data points using R? Ir-regular data points mean any data points that are less than the min. or greater than the max. of fivenum. I don’t want to delete irregular data points, but need to write it another column in csv. 2. after screening out the bad data points, how do i boxplot the good data points at each time step, and have them in a 24-hour span.

Answers

  • My go-to for “filtering” out data without removing it out of the table in R is to using case_when() in a mutate function. For example mutate(NewColumn = case_when(OldColumn >= 5 & OldColumn <= 100 ~ Good, T~ Bad))
mutate(IQR15 = IQR(OldColumn)*1.5, 
    IQRTop = quantile(OldColumn,  probs = 0.75)[[1]] + IQR15,
    IQRBottom = quantile(OldColumn,  probs = 0.25)[[1]] - IQR15, 
    NewColumn = case_when(OldColumn >= IQRBottom & OldColumn <= IQRTop ~ Good, T~ Bad)) 
  • You could then plot in ggplot2 without filtering the data (if you really don’t want to) by using ggplot(data = df[which(df$NewColumn == “Good”),])
  • Then plot that using ggplot2 and geom_boxplot: https://ggplot2.tidyverse.org/reference/geom_boxplot.html