Visualization tools

Elio Campitelli

2019-11-12

One of the “conceptual branches” of metR is the visualization tools. These are a set of functions that interface with ggplot2 for easier and better plotting of meteorological (an other) fields.

Scales

Many meteorological fields are defined in a longitude×latitude×level grid, so metR includes scales for each dimension. These are glorified wrappers around scale_*_continuous() with sensible defaults and, in the case of scale_*_level(), the implementation of reverselog_trans().

There are also scale_color_divergent() and scale_fill_divergent() which are wrapers around scale_*_gradient2() but with sane default colors for positive and negative values –particularly useful for plotting anomaly values.

To see how this scales work, let’s visualize the vertical distribution of temperature anomalies from the zonal mean.

library(metR)
library(ggplot2)
library(data.table)
temperature <- copy(temperature)
temperature[, air.z := Anomaly(air), by = .(lat, lev)]

# Plot made with base ggplot
(g <- ggplot(temperature[lon %~% 180], aes(lat, lev, z = air.z)) +
        geom_contour(aes(color = ..level..)))
#> Warning: Computation failed in `stat_contour()`:
#> Number of y coordinates must match number of rows in density matrix.

While this is fine, since pressure levels are roughly proportional to \(\mathrm{e}^{-z}\) in meteorology we usually plot the vertical coordinate as \(-\log(p)\). However, while ggplot2 has scale_y_log10() and scale_y_reverse(), they don’t work together. metR defines the new transformation reverselog_trans() that can be used with any scale but that is the default for scale_*_level().

On the other hand, scale_*_latitude() (and scale_*_longitude()) not only defaults expand to c(0, 0), but also has a ticks argument that specifies the spacing of breaks between -90 and 90 in the case of scale_*_latitude(), and between 0 and 360 in the case of scale_*_longitude().

These scales default to printing no label, since usually the dimensions are understood by the shape of the plot.

g + 
    scale_y_level() +
    scale_x_latitude(ticks = 15, limits = c(-90, 90)) +
    scale_color_divergent()
#> Warning: Computation failed in `stat_contour()`:
#> Number of y coordinates must match number of rows in density matrix.

Note: scale_*_longitude() (currently) assumes the data goes from 0° to 360° but puts labels between -180° and 180°. This very idiosyncratic choice stems from the fact that model output is usually in the [0; 360) range but it’s easier to read maps in the (-180; 180] range. This may change in the future.

Geoms and stats

geom_contour_fill()

In ggplot2, the ‘canonical’ way to get filled contours is by using stat_contour() with a polygon geom and mapping fill to level (see this issue), but this has three important limitations.

geom_contour_fill() makes some adjustments to the data and computes an additional variable int.level (which is the default mapping for the fill aesthetic) that solve these problems.

In principle, geom_contour_fill() needs a full regular grid with no missing values and by default with fail if otherwise. However it can automatically impute missing values. With the na.fill parameter you can have some control over the imputation of NAs. Setting it to TRUE will impute missing values with spline bivariate interpolation. Other options are passing a constant numeric value or a function that takes the vector of values and return one value (like mean).

It’s probably good practice to mask missing values with geom_raster(). See stat_subset() for a quick way of selecting missing values.

We can apply this to a more realistic dataset

As an important note, this stat currently only works with rectangular grids.

geom_text_contour and geom_label_contour

Labeling contours is also a problematic aspect of ggplot2. geom_text_contour() and geom_label_contour() can be use to automatically add text or labels to the flattest part of a contour.

By default it labels every 2nd contour (this can be changed by the skip parameter) and it rotates to follow the angle of the contour (this is not available on geom_label_contour()). Since in some datasets there can be very small contours that should not be labeled for clarity, the min.size argument specifies the minimum points a contour has to have in order to be labeled.

Notice how labels are drawn on top of contours? The problem is that geom_contour() doesn’t know it’s being labeled; geom_text_contour() addresses this issue by allowing you to draw a stroke around the text.

geom_contour_tanaka

Illuminated contours (aka Tanaka contours) use varying brightness and width to create an illusion of relief. This can help distinguishing between concave and convex areas (local minimums and maximums), specially in black and white plots. It also allows for photocopy safe plots with divergent colour palettes, and it just looks cool.

stat_subset

metR also has stat_subset() which makes a subset of the data according to the subset aesthetic. This makes it possible to show only part of the data in one geom without needing to specify a data argument (specially useful if the data being plotted is the result of a long pipe and not actually assigned to a data.frame). It has a somewhat limited use in that it cannot perform further statistical transformations of the data.

For example, it can be used if you have a correlation field and want to mark only the points with significant correlations:

Another possible use it to quickly mask missing values.

geom_vector and geom_arrow

Plotting arrows can be a pain. Again, the ‘canonical’ way of plotting vectors is to use geom_segment() and specify x, y, xend and yend aesthetics which can be a lot of typing when one has the data on location and displacement (or velocity). Instead, metR‘s geom_vector() and geom_arrow() draw vectors defined by their lateral displacements (dx, dy) or their magnitude and angle. It also has some useful parameters like min.mag, which controls the minimum magnitude for an arrow to be drawn (useful for highlighting only areas of strong ’flow’) and skip, which draws only the nth arrow in the x and y directions.

Both geoms are essentially the same, except that geom_arrow() defaults to preserving direction in coordinate transformations and regardless of the plot aspect ratio. This is recommended if the arrows are meant to convey the direction of the flow in each point instead of being a description of the shape of the flow related to the plot coordinates.

So, as an example, we can plot the temperature gradient like this:

In the above example, t.dx and t.dy represent displacements in the same units as the x and y dimension, so using geom_vector() each arrow points perpendicular to the contours even if we change the coordinate system.

But if we want to plot how the (horizontal) direction changes with height it makes more sense to use geom_arrow() (or geom_vector(preserve.dir = TRUE))

In this case, an arrow at a 45° angle represents temperature gradient from the southwest regardless of the x and y scales.

The start and direction arguments adjust the behaviour of the arrows. This is useful for working with winds in the meteorological standard (in which 0° means wind from the North and 90° means wind from the East).

geom_streamline

Streamlines are paths tangential to a vector field and provide an intuitive way of visualizing vector fields. geom_streamline() computes streamlines via Euler integration.