UQRUG 41

meeting
Overview: Basic Statistics in R.
Questions: stats, network analysis, NetCDF, spatial, normality of residuals, intra class correlations, plotting multiple linear regression, reverse geocoding, LD Analysis with genepop
Published

August 30, 2023

2023-08-30: UQRUG 41

R Overview of the Month

This month at UQRUG, Valentina will be going over running some basic statistical analyses using R.

Find more details, and the code here: https://uqrug.netlify.app/posts/2023-08-30-august-stats/

Attendees

Add your name, where you’re from, and why you’re here:

Name Where are you from? What brings you here?
Nicholas Wiggins Library Here to help!
Valentina Urrutia Guada Library Here to help :)
Raul Riesco ACE Learning/helping
Kar Ng UQ paste0(rep(“learning”, 3), “!”)
Cameron West Library Here to learn!
Kim Henville EAIT General lurk
Marguerite King SHRS PhD student Here to learn!
Yufan Wang HABS PhD student here to learn how to use R for bayesian analysis
Abigail O’Hara HASS PhD student General knowledge
Ee Liz Puah HABS Learning
Maca ISSR Learn
Bernadette library AHHH
Lucas Science fac Learn and solve a question
Jay computer science using RStudio for statistical analysis
Jocelyn UQCCR Loving the support from this group!
Ryan SOE Learning
Chris Mancini EAIT Civil Transport Engineering Also love the support from this group :-)
Luke Gaiter Library Trying to fix the in room computer & Learning

Questions

Q1 - What are some good network analysis packages? - Pradeepa/Nick

We’re after some good network analysis packages in R to create a dynamic network of the Australian food system. Already aware of Bootnet, igraph, visNetwork and threejs, but haven’t explored them a lot yet. Looking for more options and opinions on packages.

Answers

  • Raul: I think the most used is igraph, I also experimented with networkD3, it has smoother outputs but it has less options. Anyway, all of them are quite difficult to use, for me it was a lot of trial and error to get the simplest of networks >_<. In general I’ll recomend igraph as it covers most of basic needs.

Q2 - What are some good R packages for bayesian analysis? (Prior distributions) - Yufan

I would like to know if there are any online resources that teach you how to fomulate prior distributions using R.

Answers


Q3 - How to transform a .cvs file into an .nc file - Lucas

I have a .cvs file with contains temperature values per month. Also it has attached lon and lat values. I need those values on QGIS, but i need them separate by month into a raster file.

Answers

  • Nick - I’ll follow up with you on this later. We may be able to explore using the stars R package to achieve this

Q4 - Normality of Residuals or Data points? - Kar

To test for one of the assumption - data normality before deciding the suitable omnibus and post-doc test, which “normality” are we looking at? Residual normality or Data value normality

Answers

  • Raul: In general, Residual normality is used to test if your data adjust to a linear model. When a test (like ANOVA) has a normality assumption is ussualy refer to normality of data (value). But it will depend on the test.
  • Kar: Thank you very much Raul.

Q5 - How do you calculate intra class corelations? - Bernadette

Answers

  • Using the irr package which has a icc function. Then you simply feed that the dataframe and the parameters/model you’re using to calculate the corrlelations.

Q6 - For multiple linear regression, how can we plot it and what’s the graph look like? - Jay

Answers

  • Raul:Maybe this will help? https://rpubs.com/bensonsyd/385183 “however, in many multiple linear regression situations, the variables we are using cannot be simultaneously represented two-dimensionally so exploring the data visually is far more difficult”. All good. It looks like we can’t really graph multiple linear regression easily. Thanks a lot.

Maybe explore these visualisations? https://statsandr.com/blog/multiple-linear-regression-made-simple/#visualizations-1

  • If there are interactions between multiple variables then you can’t visualise that easily.
  • You would have to visualise the plot at different levels of the interaction, as the slope and corrleation would be different at every level of the interaction
  • If you have 2 independent variables, you need three dimensions to visualise each variable, and their interaction. OR have a 2D plot at each level of the interaction - like slices through each of those
  • The only time this is fine is if there is no interaction between the variables, and you can collapse that extra dimension.

Q7 - Does anyone know of good R packages for reverse geocoding? - Chris

For example nearest street address to a GPS coordinate

Answers


Q8 - Does anyone know how to conduct LD analysis using genepop in R? - Ola

Answers

  • Raul: Not familiar with genepop package, but according to the manual, they have a LD (linkage desequilibrium) test. The function is
test_LD(
inputFile,
outputFile = "",
settingsFile = "",
dememorization = 10000,
batches = 100,
iterations = 5000,
verbose = interactive()
)