Data Format

Quentin Grimonprez

2022-11-11

Since Rankcluster 0.92, ranks have to be given to the functions in the ranking notation.

Notations

Ranking and ordering notation

The ranking representation \(r=(r_1,...,r_m)\) contains the ranks assigned to the objects, and means that the \(i\)th object is in \(r_i\)th position.

The ordering representation \(o=(o_1,...,o_m)\) means that object \(o_i\) is in the \(i\)th position.

Example

Let us consider the following example to illustrate both notations: a judge, which has to rank three holidays destinations according to its preferences, O1 = Countryside, O2 = Mountain and O3 = Sea, ranks first Sea (O3), second Countryside (O1), and last Mountain (O2).

The ordering result of the judge is o = (3, 1, 2) (the first object is O3, then O1 and O2) whereas the ranking result is r = (2, 3, 1) (O1 is in second position, O2 in third position and O3 in first position).

Rankcluster format

Since Rankcluster 0.92, ranks have to be given to the functions in the ranking notation.

The parameter must be a matrix with every row corresponding to a rank.

data(words)
head(words$data)
#>      A B C D E
#> [1,] 1 3 4 5 2
#> [2,] 1 4 2 3 5
#> [3,] 3 2 5 4 1
#> [4,] 3 2 5 4 1
#> [5,] 4 1 2 5 3
#> [6,] 4 1 5 3 2

One row corresponds to one rank. The first column corresponds to the position on the object A, the second to the position of object B and so on.

For converting your data from ordering notation to ranking, you can use the convertRank function. This function works only for univariate and non-partially missing ranks.

Multivariate Ranks

For multivariate ranks, the differents variable are combined by column and an extra parameter (m) indicates the size of each dimension.

data(big4)
head(big4$data)
#>           A.uefa B.uefa C.uefa D.uefa A.pl B.pl C.pl D.pl
#> 1992-1993      1      2      3      4    1    2    3    4
#> 1993-1994      1      3      2      4    1    3    2    4
#> 1994-1995      1      3      2      4    1    2    4    3
#> 1995-1996      1      3      2      4    1    2    3    4
#> 1996-1997      1      2      3      4    1    3    2    4
#> 1997-1998      1      3      2      4    2    3    1    4
big4$m
#> [1] 4 4

The big4 dataset is composed of the rankings (in ranking notation) of the “Big Four” English football teams (A: Manchester, B: Liverpool, C: Arsenal, D: Chelsea) to the English Championship (Premier League) and according to the UEFA coefficients (statistics used in Europe for ranking and seeding teams in international competitions), from 1993 to 2013.

Each variable corresponds to the ranking of four elements, so m = c(4, 4). In the data matrix, the first four columns correspond to the rankings in Premier League and the four next to the ranking accoding to the uefa coefficient.

Partial Missing Ranks

Rankcluster manages partial missing ranks. Missing positions are denoted by 0.

For example 5 0 1 2 0 indicates that the position of the second and fifth objects are unknown.

Ranks with tied positions

Rankcluster manages tied positions in ranks. Tied position are replaced by the lowest position they share.

For example, assume there are five objects to rank. If the output rank in ranking notation is 4 3 4 1 1, the 1 for both the objects number 4 and 5 indicates that either object 4 is in first position and object 5 in second or object 5 in second position and object 4 in first. Then the object number 2 is in third position, then objects 1 and 3 are in fourth and fifth or fifth and fourth.