Reshape Grouped Data to Wide

Initial Situation and Goal

Reshape grouped data to wide format works on tables in a long format with at least one predictor column, one response column, and one grouping column. An example is the 'Indometh' dataset with its first six rows shown in the following table.

Table: Rows 1-3 and 34-36 of Indometh Dataset

Subject time conc
1 1 0.25 1.50
2 1 0.50 0.94
3 1 0.75 0.78
34 4 0.25 1.85
35 4 0.50 1.39
36 4 0.75 1.02

Our target is to reshape this table into a dataset with a row for each 'Subject' and columns for every 'time' entity. The data from the column 'conc' should be available as data. Hence, the result should look like the following table (first 5 columns).

Table: Indometh Dataset in Wide Format

time conc_first_1 conc_first_4 conc_first_2 conc_first_5
0.25 1.50 1.85 2.03 2.05
0.50 0.94 1.39 1.63 1.04
0.75 0.78 1.02 0.71 0.81
1.00 0.48 0.89 0.70 0.39
1.25 0.37 0.59 0.64 0.30
2.00 0.19 0.40 0.36 0.23
3.00 0.12 0.16 0.32 0.13
4.00 0.11 0.11 0.20 0.11
5.00 0.08 0.10 0.25 0.08
6.00 0.07 0.07 0.12 0.10
8.00 0.05 0.07 0.08 0.06

Cornerstone Workflow

To achieve this result in 'Cornerstone' open a dataset and choose menu 'Analysis' -> 'CornerstoneR' -> 'Reshape to Wide' as shown in the following screenshot.

Reshape to Wide: Menu

In the appearing dialog select variable 'time' to predictors. The unique values are shown in a new column. The response variable is 'conc'. We want the data allocated in multiple columns with respect to the grouping variable 'Subject' because the response is available for every group.

Reshape to Wide: Variable Selection

'OK' confirms your selection and the following window appears.

Reshape to Wide: R Script

Now, click the execute button (green arrow) or choose the menu 'R Script' -> 'Execute' and all calculations are done via 'R'. Calculations are done if the text at the lower left status bar contains 'Last execute error state: OK'. Our result is available via the menu 'Summaries' -> 'Wide Data' as shown in the following screenshot.

Reshape to Wide: Result Menu

After clicking this menu a 'Cornerstone' dataset with the reshaped data opens.

Reshape to Wide: Result Dataset

Parametrization in Script Variables

Reshape to Wide can be parametrized by the 'Script Variables' dialog via the menu 'R Script' -> 'Script variables'. The following dialog appears.

Reshape to Wide: Script Variables

'Drop missing combinations'

This checkbox allows the algorithm to drops rows with solely missing values. The default is checked.

'Aggregation'

The text box can hold a comma separated list of aggregation function. The default is empty, which is equivalent to 'first'. If you select variables as 'Auxiliaries' in the variable selection menu they are shown in the drop-down box right from the '<<' button.

Let's take a look at the following dataset example:

Table: Example Dataset with Repetitions

Subject time conc repet
1 0.25 1.50 1
1 0.25 1.67 2
1 0.25 1.58 3
1 0.50 0.94 1
1 0.50 0.89 2
1 0.50 0.72 3

The aggregation function is used if one response data is not identifiable by the combination of predictors and groups. In case Subject 1 for time 0.25 was measured three times. We obtain the following result using the aggregation function 'first', 'mean', and 'sd'

Table: Example Dataset in Wide Format

time conc_first_1 conc_mean_1 conc_sd_1
0.25 1.50 1.583333 0.0850490
0.50 0.94 0.850000 0.1153256

As you can see new columns are added for each additional aggregation function, in this example 'first, mean, sd'. Function 'first' returns the first value of the three 'conc' measurements 1.50, 1.67, and 1.58. Followed by its mean value 'mean' and standard deviation 'sd'.

How is an auxiliary variable used in this case? In case of repeated measurements it can bee necessary to select the last or first repetition.

From the example above we would like to get the data for the highest repetition value and add 'maxby(repet)' to the aggregation function. As opposed to this 'minby(repet)' returns the 'conc' value for the lowest repetition value. The following table shows the corresponding results applying both functions.

Table: Example Dataset in Wide Format

time conc_maxby_1 conc_minby_1
0.25 1.58 1.50
0.50 0.72 0.94

In general it is possible to use all available 'R' functions. Each function should return one value like the 'mean' or 'sd' function. Function which return more than one value, e.g. 'head', can work but it is not defined which value is shown in Cornerstone.

Reshape Grouped Data to Long

Initial Situation and Goal

Reshape grouped data to long format works on tables in wide format with at least one predictor and one response column. An example is shown in the following table.

Place Temp.Jan Temp.July
Europe 0 20
USA -10 15
Asia 20 50

Our target is to reshape this dataset into a dataset with the temperatures 'Temp.Jan' and 'Temp.July' as responses in one column and an additional column to identify the month. All predictors records should be multiplied like shown in the following table to identify each data correctly.

Place variable value
Europe Jan 0
USA Jan -10
Asia Jan 20
Europe July 20
USA July 15
Asia July 50

The predictor 'time' is available in one column because it belongs to all responses. Selected responses are split to multiple columns depending on the value in grouping column.

Cornerstone Workflow

To achieve this result in 'Cornerstone' open a corresponding dataset, e.g. sample dataset 'cities', and choose menu 'Analysis' -> 'CornerstoneR' -> 'Reshape to Long' as shown in the following screenshot.

Reshape to Long: Menu

In the appearing dialog select 'city' as a predictor and both temperatures 'JanTempF' and 'JulyTempF' as responses. We want to stack all temperatures with respect to each city.

Reshape to Long: Variable Selection

'OK' confirms your selection and the following window appears.

Reshape to Long: R Script

Before we start to reshape our data, it is necessary to check the string by which each response variable name is split. Choose menu 'Options' -> 'Script Variables' as shown in the following screenshot.

Reshape to Long: R Script Variables Menu

The appearing dialog shows by default an underscore ('_'). We want to split the variable name at the word 'Temp' because it is redundant and change the text box to 'Temp' as shown in the next screenshot.

Reshape to Long: R Script Variables

If you don't want to split the variable name at all remove all characters from the split text box. 'OK' confirms settings to the script variables.

Now, click the execute button (green arrow) or choose the corresponding menu 'R Script' -> 'Execute' and all calculations are done via 'R'. The text in the lower left status bar turns to 'Last execute error state: OK' to identify a successful termination. Our result is available via menu 'Summaries' -> 'Long Data' as shown in the following screenshot.

Reshape to Long: Result Menu

After clicking a 'Cornerstone' dataset with the reshaped data opens.

Reshape to Long: Result Dataset

Selected predictors are multiplied in their column. All response names are split by 'Temp' and available in two new columns 'variable1' and 'variable2'. The corresponding value is stored in column 'value'.

Transpose Data

Initial Situation and Goal

If data are available in rows, it is necessary to transpose the data set before further evaluation. An example is 'Summary Statistic' table from Cornerstone without groups as shown in the following table.

Statistic MPG Displacement Horsepower Weight Acceleration
Count 398.00 406.00 400.00 406.00 406.00
Mean 23.52 194.78 105.08 2979.41 15.52
St. Dev. 7.82 104.92 38.77 847.00 2.80

To use these data in further analysis or graphs, it is necessary to transpose the data so that they are available as shown in the following table.

colnames Count Mean StDev
MPG 398 23.52 7.82
Displacement 406 194.78 104.92
Horsepower 400 105.08 38.77
Weight 406 2979.41 847.00
Acceleration 406 15.52 2.80

Cornerstone Workflow

To achieve this result in 'Cornerstone', select the 'Analysis' -> 'CornerstoneR' -> 'Transpose Data' menu from the corresponding record, as shown in the following screenshot.

Transpose Data: Menu

In the appearing dialog select the variables 'MPG' to 'Acceleration' as predictors or responses. For this function the division is irrelevant, because all selected variables are transposed. The variable 'Statistics' contains names for the future columns and is optionally selected as 'Group by'.

Transpose Data: Variable Selection

'OK' confirms your selection and the following window appears.

Transpose Data: R Script

Now click the green button (green arrow) or select the menu 'R Script' -> 'Execute' to have all calculations performed by 'R'. The execution was successful, if the status bar in the lower left corner shows the text 'Last execute error state: OK'. Our result is available via the menu 'Summaries' -> 'Transposed Data' as shown in the following screenshot.

Transpose Data: Result Menu

After clicking this menu, the dataset with the transposed data opens.

Transpose Data: Result Dataset

Parametrization in Script Variables

The 'Transpose Data' function converts all data into numeric values. If the conversion is not successful, missing values are inserted. This conversion can be deactivated with the 'Script Variables' dialog via the 'R Script' -> 'Script Variables' menu. The following dialog appears.

Transpose Data: Script Variables

If you uncheck 'Convert data to numeric', for example, text is retained and is not replaced by a missing value.