mutate(IQR15 = IQR(OldColumn)*1.5,
IQRTop = quantile(OldColumn, probs = 0.75)[[1]] + IQR15,
IQRBottom = quantile(OldColumn, probs = 0.25)[[1]] - IQR15,
NewColumn = case_when(OldColumn >= IQRBottom & OldColumn <= IQRTop ~ Good, T~ Bad))
UQRUG 37
Questions: Regular Expressions, data.table complete() equivalent, filtering outliers, boxplots, RStudio new line bug
2023-04-26: UQRUG 37
R Overview of the Month
Luke will be going over his exploration of using generative AI for coding.
The main takeaway is that ChatGPT is a tool to assist you, it won’t do all the work for you.
Attendees
Name | Where are you from? | What brings you here? |
---|---|---|
Nicholas Wiggins | Library | Here to Help! |
Luke Gaiter | Library | Here to help |
Valentina Urrutia Guada | Library | Here to help |
Raul Riesco | Australian Centre for Ecogenomics | Here to help too! |
Xueyan Ma | Graduate Student | Learning |
Bel | QAEHS | Just for general ideas |
Willy | Just Learning | SAFS PhD Student |
Jay | grad student | just learning |
Lauren Fuge :) | undergrad student | just learning! |
Lizzie | ITaLI | just leanring R and how it can be applied in my analytical work |
Debbie | postdoc | to learn from Luke |
Victor S | SBMS | |
Ben Fitzpatrick | UQ Civil Engineering (PostDoc) | To Learn & Help (if possible) |
Nicholas Garner | SBMS | Here to learn |
Henry | TPCH | |
Semira Hailu | PhD Student | New to R |
Jocelyn | UQCCR | still learning |
KAR NG | Data Analyst @ Student Affair | Learning |
Pierre Bodroux | P&F | Learning |
Renna | PhD student | Learning (new to R) |
Anthony Bloomer | Graduate student | Here to learn more |
Marie Thomas | ACE | Just getting some ideas |
Isaac Kyei Barffour | PhD student | New To R |
Questions
Code not automatically going to the next line and so Data in R doesn’t appear to load properly - Isaac
I have a perculiar problem. When I load data into R, I do not get a vissible notification in R that the data has been loaded. However, any analysis I promt R to perform on the data gets executed correctly.
Answers
We tried searching in the global and project options.
Stack overflow, and asking Bing AI and ChatGPT.
We reinstalled R and RStudio.
https://support.posit.co/hc/en-us/articles/200713843#stopping-on-a-line
What are some good tips for learning Regular Expressions/Regex?
Answers
A good R focused approach to learning Regex could be to learn to use the stringr package, which uses Regex in R. The Get Started page on the Stringr website is useful, as are the cheatsheets!
Another super useful resource is regex101, which allows you to test your regex on a string before you run it on all of your data. https://regex101.com/
Is there a data.table equivalent for the complete() function? - Nicholas
I’ve been analysing large datasets which are much easier to manipulate using data.table compared to functions in tidyverse, but I can’t find a data.table equivalent (or large data equivalent) for complete().
An example of using complete to fill in missing Hours in the “ExampleColumn” : complete(Hour = 0:23, nesting(Groups), fill = list(ExampleColumn = 0))
Answers
tidytable may be of use: https://markfairbanks.github.io/tidytable/
As may dtplyr https://dtplyr.tidyverse.org/
You may simply need to write your own specific data.table code to run it
OR bring this up as an issue on the data.table GitHub
Good resource for learning data.table https://atrebas.github.io/post/2019-03-03-datatable-dplyr/#introduction
Data pre-processing in RStudio - Jay
So, let’s say, im working on time-series data (14 days worth of data points @ 1 minute interval). The data is in xlcsv file. 1. How do I screen off the ir-regular data points using R? Ir-regular data points mean any data points that are less than the min. or greater than the max. of fivenum. I don’t want to delete irregular data points, but need to write it another column in csv. 2. after screening out the bad data points, how do i boxplot the good data points at each time step, and have them in a 24-hour span.
Answers
- My go-to for “filtering” out data without removing it out of the table in R is to using case_when() in a mutate function. For example mutate(NewColumn = case_when(OldColumn >= 5 & OldColumn <= 100 ~ Good, T~ Bad))
- You could then plot in ggplot2 without filtering the data (if you really don’t want to) by using ggplot(data = df[which(df$NewColumn == “Good”),])
- Then plot that using ggplot2 and geom_boxplot: https://ggplot2.tidyverse.org/reference/geom_boxplot.html