rPref allows an efficient computation of Pareto frontiers (also known
as *Skylines* in the context of databases) and slight
generalizations (*database preferences*). This vignette will
explain how to compose Skyline queries and preferences, which are
finally evaluated on a data set, i.e., the optimal objects from the data
set are selected.

A classical Skyline query optimizes two dimensions of a data set
simultaneously. Usually these dimensions tend to anticorrelate. Consider
the `mtcars`

data set and the dimensions `mpg`

(miles per gallon, i.e., inverse fuel consumption) and `hp`

(horsepower). To get those cars with a low fuel consumption (i.e., high
`mpg`

value) and high power we create the preference and
evaluate it on `mtcars`

. Using `select`

from the
dplyr package we restrict our attention to the relevant columns:

```
<- high(mpg) * high(hp)
p <- psel(mtcars, p)
res ::kable(select(res, mpg, hp)) knitr
```

mpg | hp | |
---|---|---|

Merc 450SL | 17.3 | 180 |

Fiat 128 | 32.4 | 66 |

Toyota Corolla | 33.9 | 65 |

Lotus Europa | 30.4 | 113 |

Ford Pantera L | 15.8 | 264 |

Ferrari Dino | 19.7 | 175 |

Maserati Bora | 15.0 | 335 |

The `*`

operator is the Pareto composition. The result
contains all cars from `mtcars`

which are not
Pareto-dominated according to this preference. This means, we are not
interested in those cars, which are strictly worse in at least one
dimension and worse/equal in the other dimension (i.e., they are
*dominated*).

We can add a third optimization goal like minimizing the 1/4 mile
time of a car. Additionally to the preference selection via
`psel`

, preference objects can be associated with data sets
and then processed via `peval`

(preference evaluation). For
example

```
<- high(mpg, df = mtcars) * high(hp) * low(qsec)
p
p## [Preference] high(mpg) * high(hp) * low(qsec)
## * associated data source: data.frame "mtcars" [32 x 11]
```

creates a 3-dimensional Pareto preference which is associated with
`mtcars`

. We can evaluate this preference using
`peval(p)`

which returns the Pareto optima:

```
<- peval(p)
res ::kable(select(res, mpg, hp, qsec)) knitr
```

mpg | hp | qsec | |
---|---|---|---|

Mazda RX4 | 21.0 | 110 | 16.46 |

Merc 450SE | 16.4 | 180 | 17.40 |

Merc 450SL | 17.3 | 180 | 17.60 |

Fiat 128 | 32.4 | 66 | 19.47 |

Toyota Corolla | 33.9 | 65 | 19.90 |

Porsche 914-2 | 26.0 | 91 | 16.70 |

Lotus Europa | 30.4 | 113 | 16.90 |

Ford Pantera L | 15.8 | 264 | 14.50 |

Ferrari Dino | 19.7 | 175 | 15.50 |

Maserati Bora | 15.0 | 335 | 14.60 |

Using `psel`

instead of `peval`

we can evaluate
the preference on another data set (which does not change the
association of `p`

). Using the `filter`

function
from dplyr we can first pick all cars with automatic transmission
(`am == 0`

) and then get the Pareto optima:

```
<- mtcars %>% filter(am == 0) %>% psel(p)
res ::kable(select(res, am, mpg, hp, qsec)) knitr
```

am | mpg | hp | qsec | |
---|---|---|---|---|

Hornet 4 Drive | 0 | 21.4 | 110 | 19.44 |

Hornet Sportabout | 0 | 18.7 | 175 | 17.02 |

Duster 360 | 0 | 14.3 | 245 | 15.84 |

Merc 240D | 0 | 24.4 | 62 | 20.00 |

Merc 230 | 0 | 22.8 | 95 | 22.90 |

Merc 450SE | 0 | 16.4 | 180 | 17.40 |

Merc 450SL | 0 | 17.3 | 180 | 17.60 |

Chrysler Imperial | 0 | 14.7 | 230 | 17.42 |

Toyota Corona | 0 | 21.5 | 97 | 20.01 |

Dodge Challenger | 0 | 15.5 | 150 | 16.87 |

Camaro Z28 | 0 | 13.3 | 245 | 15.41 |

Pontiac Firebird | 0 | 19.2 | 175 | 17.05 |

Database preferences allow some generalizations of Skyline queries
like combining the Pareto order with the lexicographical order. Assume
we prefer cars with manual transmission (`am == 0`

). If two
cars are equivalent according to this criterion, then the higher number
of gears should be the decisive criterion. This is known as the
lexicographical order and can be realized with

`<- true(am == 1) & high(gear) p `

where `true`

is a Boolean preference, where those tuples
are preferred fulfilling the logical condition. The `&`

is a non-commutative operator creating a lexicographical order, also
called *Prioritization* in the context of database
preferences.

We evaluate this preference on the `mtcars`

data set:

```
<- psel(mtcars, p)
res ::kable(select(res, am, gear, hp, cyl)) knitr
```

am | gear | hp | cyl | |
---|---|---|---|---|

Porsche 914-2 | 1 | 5 | 91 | 4 |

Lotus Europa | 1 | 5 | 113 | 4 |

Ford Pantera L | 1 | 5 | 264 | 8 |

Ferrari Dino | 1 | 5 | 175 | 6 |

Maserati Bora | 1 | 5 | 335 | 8 |

The constructs `high`

, `low`

and
`true`

are the three base preferences. They also accept
arbitrary arithmetic expressions (and accordingly logical, for
`true`

). For example, we can Pareto-combine the
lexicographical order from above with a wish for an high power per
cylinder ratio:

```
<- (true(am == 1) & high(gear)) * high(hp/cyl)
p <- psel(mtcars, p)
res ::kable(select(res, am, gear, hp, cyl)) knitr
```

am | gear | hp | cyl | |
---|---|---|---|---|

Maserati Bora | 1 | 5 | 335 | 8 |

According to this preference there is only one Pareto-optimal car.

In the above preference selection we just have one Pareto-optimal
tuple for the data set `mtcars`

. Probably we are also
interested in the tuples slightly worse than the optimum. rPref offers a
top-k preference selection, iterating the preference selection on the
remainder on the data set until k tuples are returned. To get the 3 best
tuples we use:

```
<- psel(mtcars, p, top = 3)
res ::kable(select(res, am, gear, hp, cyl, .level)) knitr
```

am | gear | hp | cyl | .level | |
---|---|---|---|---|---|

Maserati Bora | 1 | 5 | 335 | 8 | 1 |

Ford Pantera L | 1 | 5 | 264 | 8 | 2 |

Duster 360 | 0 | 3 | 245 | 8 | 3 |

The column `.level`

is additionally added to the result
when `psel`

is called with the `top`

parameter. It
counts the number of iterations needed to get this tuple. The k-th level
of a Skyline is also called *the k-th stratum*. We see that the
first three tuples have levels {1, 2, 3}. The top-k parameter produces a
nondeterministic cut, i.e., there could be more tuples in the third
level. To avoid the cut, we use the `at_least`

parameter,
returning all tuples from the last level:

```
<- psel(mtcars, p, at_least = 3)
res ::kable(select(res, am, gear, hp, cyl, .level)) knitr
```

am | gear | hp | cyl | .level | |
---|---|---|---|---|---|

Maserati Bora | 1 | 5 | 335 | 8 | 1 |

Ford Pantera L | 1 | 5 | 264 | 8 | 2 |

Duster 360 | 0 | 3 | 245 | 8 | 3 |

Camaro Z28 | 0 | 3 | 245 | 8 | 3 |

Ferrari Dino | 1 | 5 | 175 | 6 | 3 |

Additionally there is a `top_level`

parameter which allows
to explicitly state the number of iterations. The preference selection
with `top_level = 3`

is identical to the statement above in
this case, because just one tuple resides in each of the levels 1 and
2.

Using the grouping functionality from the dplyr package, we can
perform a preference selection on each group separately. For example, we
search for the cars maximizing `mpg`

and `hp`

in
each group of cars with the same number of cylinders. This is done
by:

```
<- group_by(mtcars, cyl)
grouped_df <- psel(grouped_df, high(hp) * high(mpg))
res ::kable(select(res, cyl, hp, mpg)) knitr
```

cyl | hp | mpg |
---|---|---|

4 | 66 | 32.4 |

4 | 65 | 33.9 |

4 | 113 | 30.4 |

6 | 110 | 21.4 |

6 | 175 | 19.7 |

8 | 180 | 17.3 |

8 | 175 | 19.2 |

8 | 264 | 15.8 |

8 | 335 | 15.0 |

The first line is the grouping operation from dplyr and the second line is the preference selection from rPref, which respects the grouping. The result is again a grouped data frame, containing the Pareto optima for each group of cylinders.