TensorFlow Probability is a library for statistical computation and probabilistic modeling built on top of TensorFlow.

Its building blocks include a vast range of distributions and
invertible transformations (*bijectors*), probabilistic layers
that may be used in `keras`

models, and tools for
probabilistic reasoning including variational inference and Markov Chain
Monte Carlo.

Install the released version of `tfprobability`

from
CRAN:

`install.packages("tfprobability")`

To install `tfprobability`

from github, do

`devtools::install_github("rstudio/tfprobability")`

Then, use the `install_tfprobability()`

function to
install TensorFlow and TensorFlow Probability python modules.

```
library(tfprobability)
install_tfprobability()
```

you will automatically get the current stable version of TensorFlow Probability together with TensorFlow. Correspondingly, if you need nightly builds,

`install_tfprobability(version = "nightly")`

will get you the nightly build of TensorFlow as well as TensorFlow Probability.

High-level application of `tfprobability`

to tasks
like

- probabilistic (multi-level) modeling with MCMC and/or variational inference,
- uncertainty estimation for neural networks,
- time series modeling with state space models, or
- density estimation with autoregressive flows

are described in the vignettes/articles and/or featured on the TensorFlow for R blog.

This introductory text illustrates the lower-level building blocks:
distributions, bijectors, and probabilistic `keras`

layers.

```
library(tfprobability)
library(tensorflow)
```

Distributions are objects with methods to compute summary statistics, (log) probability, and (optionally) quantities like entropy and KL divergence.

```
# create a binomial distribution with n = 7 and p = 0.3
<- tfd_binomial(total_count = 7, probs = 0.3)
d
# compute mean
%>% tfd_mean()
d #> tf.Tensor(2.1000001, shape=(), dtype=float32)
# compute variance
%>% tfd_variance()
d #> tf.Tensor(1.47, shape=(), dtype=float32)
# compute probability
%>% tfd_prob(2.3)
d #> tf.Tensor(0.303791, shape=(), dtype=float32)
```

```
# Represent a cold day with 0 and a hot day with 1.
# Suppose the first day of a sequence has a 0.8 chance of being cold.
# We can model this using the categorical distribution:
<- tfd_categorical(probs = c(0.8, 0.2))
initial_distribution #> Loaded Tensorflow version 2.9.1
# Suppose a cold day has a 30% chance of being followed by a hot day
# and a hot day has a 20% chance of being followed by a cold day.
# We can model this as:
<- tfd_categorical(
transition_distribution probs = matrix(c(0.7, 0.3, 0.2, 0.8), nrow = 2, byrow = TRUE) %>%
$cast(tf$float32)
tf
)# Suppose additionally that on each day the temperature is
# normally distributed with mean and standard deviation 0 and 5 on
# a cold day and mean and standard deviation 15 and 10 on a hot day.
# We can model this with:
<- tfd_normal(loc = c(0, 15), scale = c(5, 10))
observation_distribution # We can combine these distributions into a single week long
# hidden Markov model with:
<- tfd_hidden_markov_model(
d initial_distribution = initial_distribution,
transition_distribution = transition_distribution,
observation_distribution = observation_distribution,
num_steps = 7
)# The expected temperatures for each day are given by:
%>% tfd_mean() # shape [7], elements approach 9.0
d #> tf.Tensor([3. 6. 7.4999995 8.249999 8.625001 8.812501 8.90625 ], shape=(7), dtype=float32)
# The log pdf of a week of temperature 0 is:
%>% tfd_log_prob(rep(0, 7))
d #> tf.Tensor(-19.855635, shape=(), dtype=float32)
```

Bijectors are invertible transformations that allow to derive data likelihood under the transformed distribution from that under the base distribution. For an in-detail explanation, see Getting into the flow: Bijectors in TensorFlow Probability on the TensorFlow for R blog.

```
# create an affine transformation that shifts by 3.33 and scales by 0.5
<- tfb_shift(3.33)(tfb_scale(0.5))
b
# apply the transformation
<- c(100, 1000, 10000)
x %>% tfb_forward(x)
b #> tf.Tensor([ 53.33 503.33 5003.33], shape=(3), dtype=float32)
```

```
# create a bijector to that performs the discrete cosine transform (DCT)
<- tfb_discrete_cosine_transform()
b
# run on sample data
<- matrix(runif(3))
x %>% tfb_forward(x)
b #> tf.Tensor(
#> [[0.5221709 ]
#> [0.5336635 ]
#> [0.06735111]], shape=(3, 1), dtype=float32)
```

`tfprobality`

wraps distributions in Keras layers so we
can use them seemlessly in a neural network, and work with tensors as
targets as usual. For example, we can use
`layer_kl_divergence_add_loss`

to have the network take care
of the KL loss automatically, and train a variational autoencoder with
just negative log likelihood only, like this:

```
library(keras)
<- 2
encoded_size <- c(2L, 2L, 1L)
input_shape <- 100
train_size <- array(runif(train_size * Reduce(`*`, input_shape)), dim = c(train_size, input_shape))
x_train
# encoder is a keras sequential model
<- keras_model_sequential() %>%
encoder_model layer_flatten(input_shape = input_shape) %>%
layer_dense(units = 10, activation = "relu") %>%
layer_dense(units = params_size_multivariate_normal_tri_l(encoded_size)) %>%
layer_multivariate_normal_tri_l(event_size = encoded_size) %>%
# last layer adds KL divergence loss
layer_kl_divergence_add_loss(
distribution = tfd_independent(
tfd_normal(loc = c(0, 0), scale = 1),
reinterpreted_batch_ndims = 1
),weight = train_size)
# decoder is a keras sequential model
<- keras_model_sequential() %>%
decoder_model layer_dense(units = 10,
activation = 'relu',
input_shape = encoded_size) %>%
layer_dense(params_size_independent_bernoulli(input_shape)) %>%
layer_independent_bernoulli(event_shape = input_shape,
convert_to_tensor_fn = tfp$distributions$Bernoulli$logits)
# keras functional model uniting them both
<- keras_model(inputs = encoder_model$inputs,
vae_model outputs = decoder_model(encoder_model$outputs[1]))
# VAE loss now is just log probability of the data
<- function (x, rv_x)
vae_loss - (rv_x %>% tfd_log_prob(x))
%>% compile(
vae_model optimizer = "adam",
loss = vae_loss
)
vae_model#> Model: "model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> flatten_input (InputLayer) [(None, 2, 2, 1)] 0
#> flatten (Flatten) (None, 4) 0
#> dense_1 (Dense) (None, 10) 50
#> dense (Dense) (None, 5) 55
#> multivariate_normal_tri_l (Multiva ((None, 2), 0
#> riateNormalTriL) (None, 2))
#> kl_divergence_add_loss (KLDivergen (None, 2) 0
#> ceAddLoss)
#> sequential_1 (Sequential) (None, 2, 2, 1) 74
#> ================================================================================
#> Total params: 179
#> Trainable params: 179
#> Non-trainable params: 0
#> ________________________________________________________________________________
%>% fit(x_train, x_train, batch_size = 25, epochs = 1) vae_model
```