Read SPSS data into R

Haven enables R to read and write various data formats used by other statistical packages by wrapping the fantastic ReadStat C library written by Evan Miller. Haven is part of the tidyverse.

#read spss files .sav
#install.packages("haven", lib="~/lib/r-cran")
#install.packages("haven", lib="C:/Program Files/R/lib")
library(haven)

#project string
pstr = "csv/dataset-iss-2016-subset1"
pstr = "../../csv/dataset-iss-2016-subset1"

#read sav
dfs = read_sav(paste0(pstr, ".sav"))
#get labels
dfl = data.frame(label=sapply(dfs, function(x) attributes(x)$label))
dfl$vars = rownames(dfl)

#export csv
write.table(dfs, paste0(pstr, ".csv"), sep="\t", row.names=F)
#read csv
dfs = read.table(paste0(pstr, ".csv"), sep='\t', header=T, strip.white=TRUE, stringsAsFactors=FALSE)

CFA using R package “lavaan”

The lavaan (latent variable analysis) package is developed to provide useRs, researchers and teachers a free open-source, but commercial-quality package for latent variable modeling. You can use lavaan to estimate a large variety of multivariate statistical models, including path analysis, confirmatory factor analysis, structural equation modeling and growth curve models.

#install.packages("lavaan", dependencies=TRUE, lib="~/lib/r-cran")
#install.packages("lavaan", dependencies=TRUE, lib="C:/Program Files/R/lib")
library(lavaan)
## This is lavaan 0.6-8
## lavaan is FREE software! Please report any bugs.
#specify formulas for latent variables
animosity       =~ ANI1 + ANI2 + ANI3 + ANI4
ethnocentrism   =~ ETHNO1 + ETHNO2 + ETHNO3

#specify covariance, measurement model
iss.cfa.model <- '
#animosity       =~ ANI1 + ANI2 + ANI3 + ANI4
animosity       =~ ANI1 + ANI2 + ANI3 + ANI4 + ANI5
ethnocentrism   =~ ETHNO1 + ETHNO2 + ETHNO3
'

#fit model
fit <- cfa(iss.cfa.model, data=dfs)

#check standardized factor loadings (check significance values <0.05)
#Standardized Regression Weights, all factor loadings are high (i.e., >.70)
inspect(fit, what="std")
## $lambda
##        anmsty ethncn
## ANI1    0.847  0.000
## ANI2    0.937  0.000
## ANI3    0.893  0.000
## ANI4    0.721  0.000
## ANI5   -0.095  0.000
## ETHNO1  0.000  0.869
## ETHNO2  0.000  0.839
## ETHNO3  0.000  0.940
## 
## $theta
##        ANI1  ANI2  ANI3  ANI4  ANI5  ETHNO1 ETHNO2 ETHNO3
## ANI1   0.283                                             
## ANI2   0.000 0.123                                       
## ANI3   0.000 0.000 0.203                                 
## ANI4   0.000 0.000 0.000 0.480                           
## ANI5   0.000 0.000 0.000 0.000 0.991                     
## ETHNO1 0.000 0.000 0.000 0.000 0.000 0.245               
## ETHNO2 0.000 0.000 0.000 0.000 0.000 0.000  0.297        
## ETHNO3 0.000 0.000 0.000 0.000 0.000 0.000  0.000  0.117 
## 
## $psi
##               anmsty ethncn
## animosity     1.000        
## ethnocentrism 0.307  1.000
#check if model fits data, commonly accepted thresholds
#Chi-square: p > 0.05
#CFI: > 0.90
#TLI: > 0.95 (0.90)
#RMSEA: < 0.10
summary(fit, fit.measures=TRUE, standardized=TRUE)
## lavaan 0.6-8 ended normally after 30 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        17
##                                                       
##   Number of observations                           123
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                39.642
##   Degrees of freedom                                19
##   P-value (Chi-square)                           0.004
## 
## Model Test Baseline Model:
## 
##   Test statistic                               673.798
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.968
##   Tucker-Lewis Index (TLI)                       0.953
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1399.814
##   Loglikelihood unrestricted model (H1)      -1379.993
##                                                       
##   Akaike (AIC)                                2833.628
##   Bayesian (BIC)                              2881.435
##   Sample-size adjusted Bayesian (BIC)         2827.682
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.094
##   90 Percent confidence interval - lower         0.052
##   90 Percent confidence interval - upper         0.135
##   P-value RMSEA <= 0.05                          0.043
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.056
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   animosity =~                                                          
##     ANI1              1.000                               1.183    0.847
##     ANI2              1.037    0.074   13.937    0.000    1.227    0.937
##     ANI3              1.014    0.078   13.010    0.000    1.199    0.893
##     ANI4              0.831    0.090    9.267    0.000    0.983    0.721
##     ANI5             -0.112    0.110   -1.021    0.307   -0.132   -0.095
##   ethnocentrism =~                                                      
##     ETHNO1            1.000                               1.145    0.869
##     ETHNO2            1.174    0.097   12.068    0.000    1.344    0.839
##     ETHNO3            1.125    0.081   13.845    0.000    1.289    0.940
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   animosity ~~                                                          
##     ethnocentrism     0.416    0.140    2.972    0.003    0.307    0.307
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .ANI1              0.552    0.087    6.374    0.000    0.552    0.283
##    .ANI2              0.211    0.056    3.750    0.000    0.211    0.123
##    .ANI3              0.366    0.068    5.423    0.000    0.366    0.203
##    .ANI4              0.893    0.123    7.248    0.000    0.893    0.480
##    .ANI5              1.931    0.246    7.837    0.000    1.931    0.991
##    .ETHNO1            0.427    0.078    5.441    0.000    0.427    0.245
##    .ETHNO2            0.762    0.125    6.111    0.000    0.762    0.297
##    .ETHNO3            0.219    0.076    2.870    0.004    0.219    0.117
##     animosity         1.399    0.244    5.736    0.000    1.000    1.000
##     ethnocentrism     1.312    0.222    5.903    0.000    1.000    1.000

RStudio hands-on

Let’s continue by dowloading the current github repo and import it to the R environment.

#




nils-holmberg.github.io/

github.com/nils-holmberg