Advanced use

library(joyn)
#> 
#> Attaching package: 'joyn'
#> The following object is masked from 'package:base':
#> 
#>     merge
library(data.table)

x <- data.table(
  id      = c(1, 4, 2, 3, NA),
  t       = c(1L, 2L, 1L, 2L, NA),
  country = c(16, 12, 3, NA, 15)
)
  
y <- data.table(
  id      = c(1, 2, 5, 6, 3),
  gdp     = c(11L, 15L, 20L, 13L, 10L),
  country = 16:20
)

Updating variables in master (i.e., left) table

Imagine that variable country is available in the two tables that you’re going to join. You could do one of five things:

1. Use variable country as one of the key variables

This is the default if you don’t use the argument by

# The variables with the same name, `id` and `x`, are used as key
# variables.
merge(x, y)[]
#> > removing key variables `id` and `country` from yvars
#> 
#> -- JOYn Report --
#> 
#>    report n percent
#> 1:      x 4   44.4%
#> 2:  x & y 1   11.1%
#> 3:      y 4   44.4%
#> 4:  total 9    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id country  t gdp report
#> 1:  1      16  1  11  x & y
#> 2:  2       3  1  NA      x
#> 3:  2      17 NA  15      y
#> 4:  3      20 NA  10      y
#> 5:  3      NA  2  NA      x
#> 6:  4      12  2  NA      x
#> 7:  5      18 NA  20      y
#> 8:  6      19 NA  13      y
#> 9: NA      15 NA  NA      x

2. Ignore variable country and don’t bring it into the resulting table

This the default if you did not include country as part of the key variables in argument by.

merge(x, y, by = "id")[]
#> > removing key variables `id` from yvars
#> i variable `country` in table y is ignored because arguments `update_NAs` and
#> `update_values` are FALSE.
#> 
#> -- JOYn Report --
#> 
#>    report n percent
#> 1:      x 2   28.6%
#> 2:  x & y 3   42.9%
#> 3:      y 2   28.6%
#> 4:  total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp report
#> 1:  1  1      16  11  x & y
#> 2:  2  1       3  15  x & y
#> 3:  3  2      NA  10  x & y
#> 4:  4  2      12  NA      x
#> 5:  5 NA      NA  20      y
#> 6:  6 NA      NA  13      y
#> 7: NA NA      15  NA      x

3. Update only NAs in table x

Update only NAs values in variable country in table x (i.e., left or master) with the actual values of the matching observations in country from table y (i.e, right or using). In this case, actual values in country from table x will remain unchanged.

merge(x, y, by = "id", update_NAs = TRUE)[]
#> > removing key variables `id` from yvars
#> 
#> -- JOYn Report --
#> 
#>        report n percent
#> 1: NA updated 1   14.3%
#> 2:          x 2   28.6%
#> 3:      x & y 2   28.6%
#> 4:          y 2   28.6%
#> 5:      total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp     report
#> 1:  1  1      16  11      x & y
#> 2:  2  1       3  15      x & y
#> 3:  3  2      20  10 NA updated
#> 4:  4  2      12  NA          x
#> 5:  5 NA      18  20          y
#> 6:  6 NA      19  13          y
#> 7: NA NA      15  NA          x

4. Update actual values in table x

Update all the values–NAs and actual–in variable country of table x (i.e., left or master) with the actual values of the matching observations in country from table y (i.e, right or using). Also, notice the output of the table below: only the values that are different between country from x and country from y are updated. If, for the same matching observations, the values between the two country variables are the same, the reporting variable says “x & y” rather than “value updated”, so you know that there is no update to make. Finally, If there are NAs in variable country from table y, the actual values in x will be unchanged, and you get a “not updated” status in the reporting variable. Yet, there is a way to solve this, which explained below using argument keep_y_in_x.

# notice that only the value that are 
merge(x, y, by = "id", update_values = TRUE)[]
#> > removing key variables `id` from yvars
#> 
#> -- JOYn Report --
#> 
#>           report n percent
#> 1:    NA updated 1   14.3%
#> 2:   not updated 2   28.6%
#> 3: value updated 1   14.3%
#> 4:         x & y 1   14.3%
#> 5:             y 2   28.6%
#> 6:         total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp        report
#> 1:  1  1      16  11         x & y
#> 2:  2  1      17  15 value updated
#> 3:  3  2      20  10    NA updated
#> 4:  4  2      12  NA   not updated
#> 5:  5 NA      18  20             y
#> 6:  6 NA      19  13             y
#> 7: NA NA      15  NA   not updated

5. Keep matching-names variables from y into x (No update)

Bring variable country from y into the resulting table with a different name (country.y).

merge(x, y, by = "id", keep_y_in_x = TRUE)[]
#> > removing key variables `id` from yvars
#> i variable `country` in table y is ignored because arguments `update_NAs` and
#> `update_values` are FALSE.
#> 
#> -- JOYn Report --
#> 
#>    report n percent
#> 1:      x 2   28.6%
#> 2:  x & y 3   42.9%
#> 3:      y 2   28.6%
#> 4:  total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp country.y report
#> 1:  1  1      16  11        16  x & y
#> 2:  2  1       3  15        17  x & y
#> 3:  3  2      NA  10        20  x & y
#> 4:  4  2      12  NA        NA      x
#> 5:  5 NA      NA  20        18      y
#> 6:  6 NA      NA  13        19      y
#> 7: NA NA      15  NA        NA      x

If you need to update country in x, including the actual values with NAs from country in y, you could do the following, keeping in mind that the reporting variable is not that useful anymore.

dd <- merge(x, y, by = "id", keep_y_in_x = TRUE, verbose = FALSE)
dd[, country := country.y
   ][, country.y := NULL]

dd[]
#>    id  t country gdp report
#> 1:  1  1      16  11  x & y
#> 2:  2  1      17  15  x & y
#> 3:  3  2      20  10  x & y
#> 4:  4  2      NA  NA      x
#> 5:  5 NA      18  20      y
#> 6:  6 NA      19  13      y
#> 7: NA NA      NA  NA      x

Bring specific variables from y into x

Most of the times, you don’t need to join all the variable from table y into table x, but just a few of them. By default, joyn brings all the variables in y into x because the argument yvars is set to TRUE. Yet, if you set it to FALSE or NULL it will bring NO variable from y into x, only only the reporting variable, which many times is very nice trick. Finally, if you set the yvars argument to the specific name of variable in y, only that variable will be carried over in the join.

# Bring all variables updating values
merge(x, y, by = "id", update_values = TRUE)[]
#> > removing key variables `id` from yvars
#> 
#> -- JOYn Report --
#> 
#>           report n percent
#> 1:    NA updated 1   14.3%
#> 2:   not updated 2   28.6%
#> 3: value updated 1   14.3%
#> 4:         x & y 1   14.3%
#> 5:             y 2   28.6%
#> 6:         total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp        report
#> 1:  1  1      16  11         x & y
#> 2:  2  1      17  15 value updated
#> 3:  3  2      20  10    NA updated
#> 4:  4  2      12  NA   not updated
#> 5:  5 NA      18  20             y
#> 6:  6 NA      19  13             y
#> 7: NA NA      15  NA   not updated

# Bring no variables from y
merge(x, y, by = "id", yvars = FALSE)[]
#> 
#> -- JOYn Report --
#> 
#>    report n percent
#> 1:      x 2   28.6%
#> 2:  x & y 3   42.9%
#> 3:      y 2   28.6%
#> 4:  total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country report
#> 1:  1  1      16  x & y
#> 2:  2  1       3  x & y
#> 3:  3  2      NA  x & y
#> 4:  4  2      12      x
#> 5:  5 NA      NA      y
#> 6:  6 NA      NA      y
#> 7: NA NA      15      x

# Bring only variable gdp from table y. So variable country won't be updated either because table y is filtered before the coparison of variables is done. 
merge(x, y, by = "id", yvars = "gdp")[]
#> 
#> -- JOYn Report --
#> 
#>    report n percent
#> 1:      x 2   28.6%
#> 2:  x & y 3   42.9%
#> 3:      y 2   28.6%
#> 4:  total 7    100%
#> ---------------------------------------------------------- End of JOYn report --
#>    id  t country gdp report
#> 1:  1  1      16  11  x & y
#> 2:  2  1       3  15  x & y
#> 3:  3  2      NA  10  x & y
#> 4:  4  2      12  NA      x
#> 5:  5 NA      NA  20      y
#> 6:  6 NA      NA  13      y
#> 7: NA NA      15  NA      x

Rolling joins

Coming soon

Misc.

Coming soon