rosettastats

This site is meant to illustrate how common analyses can be conducted in a variety of statistical software packages. It was founded mainly to facilitate using SPSS and R in parallel or to facilitate switching from SPSS to R. It is geared towards application of statistics to psychological science.

Important information to read first

It can be necessary to execute one or more commands before analyses can be conducted. Instead of repeating those commands in every example, they are provided here. Click here to expand this section.

SPSS

Before running commands in SPSS, two things are first required. First, the data have to be loaded (see the dedicated section below). Second, that dataset must be activated. In this example, we will assume the dataset is called dat:

DATASET ACTIVATE dat.

R

Most examples here use an R package called userfriendlyscience because it contains a large number of functions designed to act similar to their SPSS counterparts. This package, therefore, first has to be installed:

install.packages('userfriendlyscience');

This only has to happen once: after it has been installed, it will remain available. However, it will still have to be loaded in every R session using:

require('userfriendlyscience');

In addition, the data have to be loaded (see the dedicated section below).

Loading data and managing variables

Data preprocessing

Before analyses can be conducted, it is sometimes necessary to perform some preprocessing.

Validity and reliability

When an operationalisation (e.g. a questionnaire) is applied, its performance can be described in terms of validity (the degree to which the operationalisation measures or manipulates the construct it was designed to measure or manipulate) and reliability (the susceptibility of the measurements accuracy or manipulations effect to random extraneous influences). To explore an operationalisation’s validity and reliability, a number of analyses exist.

Univariate visualisations (data screening)

Univariate analyses (data screening)

Univariate analyses are analyses of only one variable, for example to inspect distributions or frequencies.

Bivariate visualisations

Scatterplots

Bivariate analyses

Bivariate analyses are analyses of associations between two variables.

Two continuous variables: correlation and regression
One dichotomous variable, one continuous variable: independent-samples t-test
A continuous variable measured twice: dependent-samples t-test
One categorical variable, one continuous variable: oneway analysis of variance
Two categorical variables: crosstable

Multivariate analyses

Multivariate analyses involve more than two variables.

Lineair regression analysis
Logistic regression analysis
Factorial analysis of variance
Repeated measures analysis of variance (includes split-plot or mixed analysis of variance)

Moderation

Moderation and mediation are techniques involving more complicated relationships. Moderation, tested using interaction, means that the causal association between two variables is itself influenced by a third variable.

Moderation analysis with a continous predictor and a dichotomous moderator using regression analysis
Moderation analysis with a continous predictor and a continous moderator using regression analysis

Mediation

See Moderated mediation.

Moderated mediation

Moderation and mediation are techniques involving complicated relationships. Mediation refers to the situation that the causal association between two variables occurs through the causal association of the antecedent (predictor) with the mediator, and a second causal association of the mediator with the consequent (dependent variable). Moderation, which is tested by using interaction, means that the causal association between two variables is itself influenced by a third variable. Each causal path can be moderated by a continous or dichotomous variable.

Moderated mediation analysis with a continous or dichtomous predictor and one or two continous or dichtomous moderators (one for each causal path) and m mediators using moderated mediation analysis

Multilevel analysis

Multilevel analysis (MLA) is used for the analysis of hierarchical data. Data at the lowest level are clustered in higher levels. An important application for MLA is in ESM/EMA data. This type of data are called intensive longitudinal data. Subjects have measurements on multiple occasions. The occasions are clustered within the subjects.

Multilevel analysis using multilevel analysis

Rosetta Stats

Rosetta Stats is a statistics chrestomathy, conceptually based on Rosetta Code.