Variable transformations

Those transformers are meant to be used to manipulate the content of TSV files once loaded as structure with bids.util.tsvread.

They are mostly meant to be used to implement the transformations described in BIDS stats models but can also be used to manipulate TSV files in batches.

More information on how they function can be found in the variable-transform repository.

The behavior and their “call” in JSON should (hopefully) be fairly close to the pybids-transformers.

Applying transformations

An “array” of transformations can be applied one after the other using bids.transformers().

bids.transformers(varargin)

Apply transformers to a structure.

USAGE:

new_content = transformers(trans, data)

Parameters:

transformers (structure)
data (structure)

Returns:

new_content:

(structure)
json:

(structure) json equivalent of the transformers

Example

data = bids.util.tsvread(path_to_tsv);

% load transformation instruction from a model file
bm = bids.Model('file', model_file);
transformers = bm.get_transformations('Level', 'Run');

% apply transformers
new_content = bids.transformers(transformers.Instructions, data);

% if all fields in the structure have the same number of rows one
% create a new tsv file
bids.util.tsvwrite(path_to_new_tsv, new_content)

Basic operations

Add
Subtract
Multiply
Divide
Power

bids.transformers_list.Basic(transformer, data)

Performs a basic operation with a Value on the Input

Each of these transformations takes one or more columns, and performs a mathematical operation on the input column and a provided operand. The operations are performed on each column independently.

Arguments:

Parameters:

Name – mandatory. Any of Add, Subtract, Multiply, Divide, Power.
Input (char or array) – mandatory. A array of columns to perform operation on.
Value (float) – mandatory. The value to perform operation with (i.e. operand).
Query (char) – Optional. logical expression used to select on which rows to act.
Output (char or array) – Optional. List of column names to write out to.

By default, computation is done in-place on the input (meaning that input columns are overwritten). If provided, the number of values must exactly match the number of input values, and the order will be mapped 1-to-1.

Logical operations

And
Or
Not

bids.transformers_list.Logical(transformer, data)

Each of these transformations:

takes 2 or more columns as input
performs the corresponding logical operation

inclusive or

conjunction

logical negation

returning a single column as output.

If non-logical input are passed, it is expected that:

all zero or nan (for numeric data types),
“NaN” or empty (for char) values

will evaluate to false and all other values will evaluate to true.

Arguments:

Parameters:

Name – mandatory. Any of And, Or, Not.
Input (array) – mandatory. An array of columns to perform operation on. Only 1 for Not
Output (char or array) – Optional. The name of the output column.

Munge operations

Transformations that primarily involve manipulating/munging variables into other formats or shapes.

Assign

bids.transformers_list.Assign(transformer, data)

The Assign transformation assigns one or more variables or columns (specified as the input) to one or more other columns (specified by target and/or output as described below).

Arguments:

Parameters:

Input (char or array) – mandatory. The name(s) of the columns from which attribute values are to be drawn (for assignment to the attributes of other columns). Must exactly match the length of the target argument.
Target (char or array) – mandatory. the name(s) of the columns to which the attribute values taken from the input are to be assigned. Must exactly match the length of the input argument. Names are mapped 1-to-1 from input to target.

Note

If no output argument is specified, the columns named in target are modified in-place.

Parameters:: Output (char or array) – Optional. Names of the columns to output the result of the assignment to. Must exactly match the length of the input and target arguments.

If no output array is provided, columns named in target are modified in-place.

If an output array is provided:

each column in the target array is first cloned,

then the reassignment from the input to the target is applied;

finally, the new (cloned and modified) column is written out to the column named in output.

Parameters:

InputAttr (char or array) – Optional. Specifies which attribute of the input column to assign. Defaults to value. If a array is passed, its length must exactly match that of the input and target arrays.
TargetAttr (char or array) – Optional. Specifies which attribute of the output column to assign to. Defaults to value. If a array is passed, its length must exactly match that of the input and target arrays.

InputAttr and TargetAttr must be one of:

value,
onset,
or duration.

Note

This transformation is non-destructive with respect to the input column(s). In case where in-place assignment is desired (essentially, renaming a column), either use the rename transformation, or set output to the same value as the input.

To reassign the value property of a variable named response_time to the duration property of a face variable (as one might do in order to, e.g., model trial-by-trial reaction time differences for a given condition using a varying-epoch approach), and write it out as a new face_modulated_by_RT column.

Concatenate

bids.transformers_list.Concatenate(transformer, data)

Concatenate columns together.

Arguments:

Parameters:

Input (array) – mandatory. Column(s) to concatenate. Must all be of the same length.
Output (char) – Optional. Name of the output column.

Copy

bids.transformers_list.Copy(transformer, data)

Clones/copies each of the input columns to a new column with identical values and a different name. Useful as a basis for subsequent transformations that need to modify their input in-place.

Arguments:

Parameters:

Input (char or array) – mandatory. Column names to copy.
Output (char or array) – Optional. Names to copy the input columns to. Must be same length as input, and columns are mapped one-to-one from the input array to the output array.

Delete

bids.transformers_list.Delete(transformer, data)

Deletes column(s) from further analysis.

Arguments:

Parameters:: Input (char or array) – mandatory. The name(s) of the columns(s) to delete.

Note

The Select transformation provides the inverse function (selection of columns to keep for subsequent analysis).

DropNA

bids.transformers_list.Drop_na(transformer, data)

Drops all rows with “n/a”.

Arguments:

Parameters:

Input (char or array) – mandatory. The name of the variable to operate on.
Output (char or array) – Optional. The column names to write out to. By default, computation is done in-place meaning that input columnise overwritten).

Factor

bids.transformers_list.Factor(transformer, data)

Converts a nominal/categorical variable with N unique levels to either N indicators (i.e., dummy-coding).

Arguments:

Parameters:: Input (char or array) – mandatory. The name(s) of the variable(s) to dummy-code.

By default it is the first factor level when sorting in alphabetical order (e.g., if a condition has levels ‘dog’, ‘apple’, and ‘helsinki’, the default reference level will be ‘apple’).

The name of the output columns for 2 input columns gender and age with 2 levels (M, F) and (20, 30) respectivaly will of the shape:

gender_F_age_20
gender_F_age_20
gender_M_age_30
gender_M_age_30

Filter

bids.transformers_list.Filter(transformer, data)

Subsets rows using a logical expression.

Arguments:

Parameters:

Input (char or array) – mandatory. The name(s) of the variable(s) to operate on.
Query (char) – mandatory. logical expression used to filter

Supports:

>, <, >=, <=, ==, ~= for numeric values

==, ~= for char operation (case sensitive). Regular expressions are supported

Parameters:: Output (char or array) – Optional. The optional column names to write out to.

By default, computation is done in-place (i.e., input columnise overwritten). If provided, the number of values must exactly match the number of input values, and the order will be mapped 1-to-1.

Label identical rows

bids.transformers_list.Label_identical_rows(transformer, data)

Creates an extra column to index consecutive identical rows in a column. The index restarts at 1 with every change of row content. This can for example be used to label consecutive events of the same trial_type in a block.

Arguments:

Parameters:

Input (char or array) – mandatory. The name(s) of the variable(s) to operate on.
Cumulative (logical) – optional. Defaults to False. If True, the labels are not reset to 0 when encountering new row content.

Note

The labels will be by default be put in a column called Input(i)_label

Merge identical rows

bids.transformers_list.Merge_identical_rows(transformer, data)

Merge consecutive identical rows.

Arguments:

Parameters:: Input (char or array) – mandatory. The name(s) of the variable(s) to operate on.

Note

Only works on data commit from event.tsv
Content is sorted by onset time before merging
If multiple variables are specified, they are merged in the order they are specified
If a variable is not found, it is ignored
If a variable is found, but is empty, it is ignored
The content of the other columns corresponds to the last row being merged: this means that the content from other columns but the one specified in will be deleted except for the last one

Replace

bids.transformers_list.Replace(transformer, data)

Replaces values in one or more input columns.

Arguments:

Parameters:

Input (char or array) – mandatory. Name(s of column(s) to search and replace within.
Replace (array of objects) – mandatory. The mapping old values ("key") to new values. ("value"). key can be a regular expression.
Attribute (array) – Optional. The column attribute to apply the replace to.

Valid values include:

"value" (the default),
"duration",
"onset",
and "all".

In the last case, all three attributes ("value", "duration", and "onset") will be scanned.

Parameters:: Output (char or array) – Optional. Optional names of columns to output. Must match length of input column(s) if provided, and columns will be mapped 1-to-1 in order. If no output values are provided, the replacement transformation is applied in-place to all the inputs.

Select

bids.transformers_list.Select(transformer, data)

The select transformation specifies which columns to retain for subsequent analysis.

Any columns that are not specified here will be dropped.

The only exception is when dealing with data with onset and duration columns (from *_events.tsv files) in this case the onset and duration column are also automatically selected.

Arguments:

Parameters:: Input (char or array) – mandatory. The names of all columns to keep. Any columns not in this array will be deleted and will not be available to any subsequent transformations or downstream analyses.

Note

one can think of select as the inverse the Delete transformation that removes all named columns from further analysis.

Split

bids.transformers_list.Split(transformer, data)

Split a variable into N variables as defined by the levels of one or more other variables.

Arguments:

Parameters:

Input (array) – mandatory. The name of the variable(s) to operate on.
By (array) – Optional. Name(s) for variable(s) to split on.

For example, for given a variable Condition that we wish to split on two categorical columns A and B, where a given row has values A=a and B=1, the generated name will be Condition_BY_A_a_BY_B_1.

Compute operations

Transformations that primarily involve numerical computation on variables.

Constant

bids.transformers_list.Constant(transformer, data)

Adds a new column with a constant value (numeric or char).

Arguments:

Parameters:

Output (char or array) – mandatory. Name of the newly generated column.
Value (float or char) – Optional. The value of the constant, defaults to 1.

Mean

bids.transformers_list.Mean(transformer, data)

Compute mean of a column.

JSON EXAMPLE

{
  "Name":  "Mean",
  "Input": "reaction_time",
  "OmitNan": false,
  "Output": "mean_RT"
}

Arguments:

param Input:: mandatory. The name of the variable to operate on.
type Input:: char or array
param OmitNan:: Optional. If false any column with nan values will return a nan value. If true nan values are skipped. Defaults to false.
type OmitNan:: logical
param Output:: Optional. The optional column names to write out to. By default, computation is done in-place (i.e., input columnise overwritten).
type Output:: char or array

CODE EXAMPLEtransformer = struct('Name', 'Mean', ...
                      'Input', 'reaction_time', ...
                      'OmitNan', false, ...
                      'Ouput', 'mean_RT');

data.reaction_time = TODO

data = bids.transformers(transformer, data);

data.mean_RT = TODO

ans = TODO

Product

bids.transformers_list.Product(transformer, data)

Computes the row-wise product of two or more columns.

Arguments:

Parameters:

Input (array) – mandatory. Names of two or more columns to compute the product of.
Output (string or array) – mandatory. Name of the newly generated column.
OmitNan (logical) – Optional. If false any column with nan values will return a nan value. If true nan values are skipped. Defaults to false.

Scale

bids.transformers_list.Scale(transformer, data)

Scales the values of one or more columns.

Semantics mimic scikit-learn, such that demeaning and rescaling are treated as independent arguments, with the default being to apply both (i.e., standardizing each value so that it has zero mean and unit SD).

Arguments:

Parameters:

Input (char or array) – mandatory. Names of columns to standardize.
Demean (logical) – Optional. If true, subtracts the mean from each input column (i.e., applies mean-centering).
Rescale (logical) – Optional. If true, divides each column by its standard deviation.
ReplaceNa (logical) – Optional. Whether/when to replace missing values with 0. If "off", no replacement is performed. If "before", missing values are replaced with 0 before scaling. If "after", missing values are replaced with 0 after scaling. Defaults to "off"
Output (char or array) – Optional. Optional names of columns to output. Must match length of input column if provided, and columns will be mapped 1-to-1 in order. If no output values are provided, the scaling transformation is applied in-place to all the input.

Std

bids.transformers_list.Std(transformer, data)

Compute the sample standard deviation.

Arguments:

Parameters:

Input (char or array) – mandatory. The name of the variable to operate on.
OmitNan (logical) – Optional. If false any column with nan values will return a nan value. If true nan values are skipped. Defaults to false.
Output (char or array) – Optional. The optional column names to write out to. By default, computation is done in-place (i.e., input columnise overwritten).

Sum

bids.transformers_list.Sum(transformer, data)

Computes the (optionally weighted) row-wise sums of two or more columns.

Arguments:

Parameters:

Input (array) – mandatory. Names of two or more columns to sum.
Output (char or array) – mandatory. Name of the newly generated column.
OmitNan (logical) – Optional. If false any column with nan values will return a nan value. If true nan values are skipped. Defaults to false.
Weights (array) – Optional. Optional array of floats giving the weights of the columns. If provided, length of weights must equal to the number of values in input, and weights will be mapped 1-to-1 onto named columns. If no weights are provided, defaults to unit weights (i.e., simple sum).

Threshold

bids.transformers_list.Threshold(transformer, data)

Thresholds input values at a specified cut-off and optionally binarizes the result.

Arguments:

Parameters:

Input (char or array) – mandatory. The name(s)of the column(s) to threshold/binarize.
Threshold (float) – Optional. The cut-off to use for thresholding. Defaults to 0.
Binarize (logical) – Optional. If true, thresholded values will be binarized (i.e., all non-zero values will be set to 1). Defaults to false.
Above (logical) – Optional. Specifies which values to retain with respect to the cut-off. If true, all value above the threshold will be kept; if false, all values below the threshold will be kept. Defaults to true.
Signed (logical) – Optional. Specifies whether to treat the threshold as signed (default) or unsigned.

For example, when passing above=true and threshold=3, if signed=true, all and only values above +3 would be retained. If signed=false, all absolute values > 3 would be retained (i.e.,values in the range -3 < X < 3 would be set to 0).

Parameters:: Output (char or array) – Optional. Optional names of columns to output. Must match length of input column if provided, and columns will be mapped 1-to-1 in order. If no output values are provided, the threshold transformation is applied in-place to all the inputs.