Hands-on Exercise 4(1)

Author

farrahmf

Visual Statistical Analysis

Loading packages and importing data

pacman::p_load(ggstatsplot, rstantools, PMCMRplus, tidyverse)

exam_data <- read_csv("data/Exam_data.csv")
Rows: 322 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ID, CLASS, GENDER, RACE
dbl (3): ENGLISH, MATHS, SCIENCE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

One-sample test

Using gghistostats().

set.seed(1234)

gghistostats(
  data = exam_data,
  x = ENGLISH,
  type = "bayes",
  test.value = 60,
  xlab = "English scores"
)

Two-sample mean test

Using ggbetweenstats().

ggbetweenstats(
  data = exam_data,
  x = GENDER,
  y = MATHS,
  type = "np",
  messages = FALSE
)

Oneway ANOVA test

Using ggbetweenstats().

ggbetweenstats(
  data = exam_data,
  x = RACE,
  y = ENGLISH,
  type = "p",
  mean.ci = TRUE,
  pairwise.comparisons = TRUE,
  pairwise.display = "s",
  p.adjust.method = "fdr",
  messages = FALSE
)

Significant Test of Correlation

Using ggscatterscats().

ggscatterstats(
  data = exam_data,
  x = MATHS,
  y = ENGLISH,
  marginal = FALSE
)

Significant Test of Association (Dependence)

Using ggbarstats().

exam1 <- exam_data %>%
  mutate(MATHS_bins =
           cut(MATHS,
               breaks = c(0, 60, 75, 85, 100)))

ggbarstats(exam1,
           x = MATHS_bins,
           y = GENDER)

Visualising Models

Loading packages and installing data

pacman::p_load(readxl, performance, parameters, see)

car_resale <- read_xls("data/ToyotaCorolla.xls", "data")

car_resale
# A tibble: 1,436 × 38
      Id Model       Price Age_0…¹ Mfg_M…² Mfg_Y…³     KM Quart…⁴ Weight Guara…⁵
   <dbl> <chr>       <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1    81 TOYOTA Cor… 18950      25       8    2002  20019     100   1180       3
 2     1 TOYOTA Cor… 13500      23      10    2002  46986     210   1165       3
 3     2 TOYOTA Cor… 13750      23      10    2002  72937     210   1165       3
 4     3  TOYOTA Co… 13950      24       9    2002  41711     210   1165       3
 5     4 TOYOTA Cor… 14950      26       7    2002  48000     210   1165       3
 6     5 TOYOTA Cor… 13750      30       3    2002  38500     210   1170       3
 7     6 TOYOTA Cor… 12950      32       1    2002  61000     210   1170       3
 8     7  TOYOTA Co… 16900      27       6    2002  94612     210   1245       3
 9     8 TOYOTA Cor… 18600      30       3    2002  75889     210   1245       3
10    44 TOYOTA Cor… 16950      27       6    2002 110404     234   1255       3
# … with 1,426 more rows, 28 more variables: HP_Bin <chr>, CC_bin <chr>,
#   Doors <dbl>, Gears <dbl>, Cylinders <dbl>, Fuel_Type <chr>, Color <chr>,
#   Met_Color <dbl>, Automatic <dbl>, Mfr_Guarantee <dbl>,
#   BOVAG_Guarantee <dbl>, ABS <dbl>, Airbag_1 <dbl>, Airbag_2 <dbl>,
#   Airco <dbl>, Automatic_airco <dbl>, Boardcomputer <dbl>, CD_Player <dbl>,
#   Central_Lock <dbl>, Powered_Windows <dbl>, Power_Steering <dbl>,
#   Radio <dbl>, Mistlamps <dbl>, Sport_Model <dbl>, Backseat_Divider <dbl>, …

Multiple Regression Model

Using lm() from Base Stats of R.

model <- lm(Price ~ Age_08_04 + Mfg_Year + KM + Weight + Guarantee_Period,
            data = car_resale)

model

Call:
lm(formula = Price ~ Age_08_04 + Mfg_Year + KM + Weight + Guarantee_Period, 
    data = car_resale)

Coefficients:
     (Intercept)         Age_08_04          Mfg_Year                KM  
      -2.637e+06        -1.409e+01         1.315e+03        -2.323e-02  
          Weight  Guarantee_Period  
       1.903e+01         2.770e+01  

Model Diagnostic: Checking for multicollinearity

Using check_collinearity() from the performance package.

check_collinearity(model)
# Check for Multicollinearity

Low Correlation

             Term   VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
 Guarantee_Period  1.04   [1.01, 1.17]         1.02      0.97     [0.86, 0.99]
        Age_08_04 31.07 [28.08, 34.38]         5.57      0.03     [0.03, 0.04]
         Mfg_Year 31.16 [28.16, 34.48]         5.58      0.03     [0.03, 0.04]

High Correlation

   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
     KM 1.46 [1.37, 1.57]         1.21      0.68     [0.64, 0.73]
 Weight 1.41 [1.32, 1.51]         1.19      0.71     [0.66, 0.76]
check_c <- check_collinearity(model)
plot(check_c)
Variable `Component` is not in your data frame :/

Model Diagnostic: Checking normality assumption

Using check_normality() from the performance package.

model1 <- lm(Price ~ Age_08_04 + KM + Weight + Guarantee_Period,
             data = car_resale)

check_n <- check_normality(model1)

plot(check_n)

Model Diagnostic: Checking for heteroscedasticity

Using check_heteroscedasticity() from the performance package.

check_h <- check_heteroscedasticity(model1)

plot(check_h)

Model Diagnostic: Complete check

Using check_model().

check_model(model1)
Variable `Component` is not in your data frame :/

Visualising Regression Parameters

Using plot() of see package and parameters() of parameters package to visualise the parameters of a regression model.

plot(parameters(model1))

Using ggcoefstats() of ggstatsplot package to visualise the parameters of a regression model.

ggcoefstats(model1, output = "plot")