This is simply a presentational form of the code shown during the August UQRUG.
Load the libraries and data
#Packages and librarieslibrary(tidyverse)library(lattice)#Loading datadata1<-mtcarsdata2<-read.csv("datasets/ttest.csv")data3<-read.csv("datasets/ANOVA.csv")data4<-read.csv("datasets/correl.csv")#Creating objectsY<-mtcars$mpgX<-mtcars$wt
The summary() function provides quick and easy descriptive statistics, and is useful initial step:
summary(data1)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
Visualise the data
With histogram() we can quickly view the descriptive stats plots per variable
histogram(Y) # the same as histogram(data$Y) or histogram(~Y,data)
Visualise the data
Box and whisker plots are another useful way to visualise our data spread quickly and easily with bwplot()
bwplot(Y)
Inferential statistics
The classic two sample t-test can easily be run with the t.test() function
#two sample t-testt.test(time~daytime,data2)
Welch Two Sample t-test
data: time by daytime
t = -6.8311, df = 77.776, p-value = 1.667e-09
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-48.78467 -26.76533
sample estimates:
mean in group 1 mean in group 2
967.900 1005.675
ANOVA visualisation
Before we run the ANOVA test, it can be a good idea to visualise our data with a boxplot, which ggplot2 also has
#ANOVA aov(Y ~ X, data)ggplot(data3)+aes(mode, students, group = mode)+geom_boxplot()
ANOVA
An ANOVA is a good test to determine if the means of multiple independent variables are equal
summary(aov(students~mode,data=data3))
Df Sum Sq Mean Sq F value Pr(>F)
mode 1 0.0191 0.019093 2.953 0.0889 .
Residuals 97 0.6272 0.006466
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation
Before running a corrlelation test, it is good to visualise the relationship of the data with a simple scatterplot such as the lattice xyplot()