Transformation

Transforming one or more variables means applying mathematical operators to change their values. Transformation can be used to change existing variables or create new variables out of (combinations of) other variables, for example when averaging a variable. In this example, a dataset/dataframe called dat contains a categorical variable groupingVariable, a dichotomous variable dichotomousVariable, and continuous variables dependentVariable, independentVariable, and secondIndependentVariable.

The common mathematical operators are + for addition, - for subtraction, * for multiplication, / for division, and ^ to raise a value to the power of another value. In more complex expressions, these operators are combined with parentheses to explicitly indicate precedence. In addition, a number of functions are available, but the names of those functions will generally differ between statistical packages.

SPSS

The command used for applying mathematical operators to one or several variables in SPSS is called COMPUTE. It can be used in combination with mathematical operators:

COMPUTE newVariable = 2 * dependentVariable.

The mean of two variables can therefore be computed like this:

COMPUTE meanDependentVariable =
  (independentVariable + secondIndependentVariable) / 2.

But there is also a dedicated function available for this:

COMPUTE meanDependentVariable =
  MEAN(independentVariable, secondIndependentVariable).

This second function has a number of variations that allow efficiently dealing with missing values, by explicitly indicating how many values have to be valid to allow a nonmissing result. For example:

COMPUTE meanDependentVariable =
  MEAN.2(independentVariable, secondIndependentVariable).

In this example, meanDependentVariable will be missing for all cases where either independentVariable, secondIndependentVariable, or both is missing.

Similarly, adding variables together can be done either using the + operator, or using the SUM function, potentially appending a period followed by a number to indicate how many variables must have valid values in order to produce a nonmissing result. The following two commands would yield identical results:

COMPUTE sumDependentVariable =
  independentVariable + secondIndependentVariable.
  
COMPUTE sumDependentVariable =
  SUM(independentVariable, secondIndependentVariable).

R

In R, mathematical operators can be used directly:

dat$newVariable <- 2 * dat$dependentVariable;
dat$meanDependentVariable <- 
  (dat$independentVariable + dat$secondIndependentVariable) / 2;

Functions for computing means and sums also exist, for example validMeans and validSums:

dat$meanDependentVariable <- 
  validMeans(dat$independentVariable, dat$secondIndependentVariable);
dat$meanDependentVariable <- 
  validSums(dat$independentVariable, dat$secondIndependentVariable);

In these functions, argument requiredValidValues can be used to specify which proportion (if requiredValidValues has a value lower than 1) or number (if requiredValidValues has a value of 1 or higher) of variables must have nonmissing values to compute the result:

dat$meanDependentVariable <- 
  validMeans(dat$independentVariable, dat$secondIndependentVariable,
             requiredValidValues = 2);
dat$meanDependentVariable <- 
  validSums(dat$independentVariable, dat$secondIndependentVariable,
             requiredValidValues = 2);