This site is meant to illustrate how common analyses can be conducted in a variety of statistical software packages. It was founded mainly to facilitate using SPSS and R in parallel or to facilitate switching from SPSS to R. It is geared towards application of statistics to psychological science.
Important information to read first
It can be necessary to execute one or more commands before analyses can be conducted. Instead of repeating those commands in every example, they are provided here. Click here to expand this section.
SPSS
Before running commands in SPSS, two things are first required. First, the data have to be loaded (see the dedicated section below). Second, that dataset must be activated. In this example, we will assume the dataset is called dat
:
DATASET ACTIVATE dat.
R
Most examples here use an R package called userfriendlyscience
because it contains a large number of functions designed to act similar to their SPSS counterparts. This package, therefore, first has to be installed:
install.packages('userfriendlyscience');
This only has to happen once: after it has been installed, it will remain available. However, it will still have to be loaded in every R session using:
require('userfriendlyscience');
In addition, the data have to be loaded (see the dedicated section below).
Loading data and managing variables
Data preprocessing
Before analyses can be conducted, it is sometimes necessary to perform some preprocessing.
- Transformation (e.g. means and sums);
- Recoding a variable
- Standardizing a variable
Validity and reliability
When an operationalisation (e.g. a questionnaire) is applied, its performance can be described in terms of validity (the degree to which the operationalisation measures or manipulates the construct it was designed to measure or manipulate) and reliability (the susceptibility of the measurements accuracy or manipulations effect to random extraneous influences). To explore an operationalisation’s validity and reliability, a number of analyses exist.
Univariate visualisations (data screening)
Univariate analyses (data screening)
Univariate analyses are analyses of only one variable, for example to inspect distributions or frequencies.
Bivariate visualisations
Bivariate analyses
Bivariate analyses are analyses of associations between two variables.
- Two continuous variables: correlation and regression
- One dichotomous variable, one continuous variable: independent-samples t-test
- A continuous variable measured twice: dependent-samples t-test
- One categorical variable, one continuous variable: oneway analysis of variance
- Two categorical variables: crosstable
Multivariate analyses
Multivariate analyses involve more than two variables.
- Lineair regression analysis
- Logistic regression analysis
- Factorial analysis of variance
- Repeated measures analysis of variance (includes split-plot or mixed analysis of variance)
Moderation
Moderation and mediation are techniques involving more complicated relationships. Moderation, tested using interaction, means that the causal association between two variables is itself influenced by a third variable.
- Moderation analysis with a continous predictor and a dichotomous moderator using regression analysis
- Moderation analysis with a continous predictor and a continous moderator using regression analysis
Mediation
See Moderated mediation.
Moderated mediation
Moderation and mediation are techniques involving complicated relationships. Mediation refers to the situation that the causal association between two variables occurs through the causal association of the antecedent (predictor) with the mediator, and a second causal association of the mediator with the consequent (dependent variable). Moderation, which is tested by using interaction, means that the causal association between two variables is itself influenced by a third variable. Each causal path can be moderated by a continous or dichotomous variable.
- Moderated mediation analysis with a continous or dichtomous predictor and one or two continous or dichtomous moderators (one for each causal path) and m mediators using moderated mediation analysis
Multilevel analysis
Multilevel analysis (MLA) is used for the analysis of hierarchical data. Data at the lowest level are clustered in higher levels. An important application for MLA is in ESM/EMA data. This type of data are called intensive longitudinal data. Subjects have measurements on multiple occasions. The occasions are clustered within the subjects.
- Multilevel analysis using multilevel analysis