Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions.
For example the plot below showing both an observed discrete empirical distribution (as stems) and a matching theoretical distribution (as bars) is a built in “one liner.”
set.seed(52523)
<- data.frame(
d wt = 100*rnorm(100),
stringsAsFactors = FALSE)
::PlotDistCountNormal(d,'wt','example') WVPlots
The graph above is actually the product of a number of presentation decisions:
All of these decisions are triggered by choosing which plot to use
from the WVPlots library. In this case we chose
WVPlots::PlotDistCountNormal
. For an audience of analysts
we might choose an area/density based representation (by instead
specifying WVPlots::PlotDistDensityNormal
) which is shown
below:
::PlotDistDensityNormal(d,'wt','example') WVPlots
Switching the chosen plot simultaneously changes many of the details of the presentation. WVPlots is designed to make this change simple by insisting an a very simple unified calling convention. The plot calls all insist on roughly the following arguments:
This intentionally rigid calling interface is easy to remember and
makes switching between plot types very easy. We have also make
title
a required argument, as we feel all plots should be
labeled.
What we are trying to do is separate the specification of exactly what plot we want from the details of how to produce it. We find this separation of concerns and encapsulation of implementation allows us to routinely use rich annotated graphics. Below are a few more examples:
set.seed(34903490)
= rnorm(50)
x = 0.5*x^2 + 2*x + rnorm(length(x))
y = data.frame(
frm x=x,
y=y,
yC=y>=as.numeric(quantile(y,probs=0.8)),
stringsAsFactors = FALSE)
$absY <- abs(frm$y)
frm$posY = frm$y > 0
frm::ScatterHist(frm, "x", "y", smoothmethod="lm",
WVPlotstitle="Example Linear Fit")
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## ℹ The deprecated feature was likely used in the WVPlots package.
## Please report the issue at <]8;;https://github.com/WinVector/WVPlots/issueshttps://github.com/WinVector/WVPlots/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
set.seed(34903490)
= abs(rnorm(20)) + 0.1
y = abs(y + 0.5*rnorm(20))
x
= data.frame(
frm model=x,
value=y,
stringsAsFactors = FALSE)
$costs=1
frm$costs[1]=5
frm$rate = with(frm, value/costs)
frm
$isValuable = (frm$value >= as.numeric(quantile(frm$value, probs=0.8)))
frm= 0.10 # get the top 10% most valuable points as sorted by the model
gainx
# make a function to calculate the label for the annotated point
= function(gx, gy) {
labelfun = gx*100
pctx = gy*100
pcty
paste("The top ", pctx, "% most valuable points by the model\n",
"are ", pcty, "% of total actual value", sep='')
}
::GainCurvePlotWithNotation(frm, "model", "value",
WVPlotstitle="Example Gain Curve with annotation",
gainx=gainx,labelfun=labelfun)
## Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
## ggplot2 3.3.4.
## ℹ Please use "none" instead.
## ℹ The deprecated feature was likely used in the WVPlots package.
## Please report the issue at <]8;;https://github.com/WinVector/WVPlots/issueshttps://github.com/WinVector/WVPlots/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
set.seed(52523)
= data.frame(
d meas=rnorm(100),
stringsAsFactors = FALSE)
= 1.5
threshold ::ShadedDensity(d, "meas", threshold, tail="right",
WVPlotstitle="Example shaded density plot, right tail")
set.seed(34903490)
= data.frame(
frm x=rnorm(50),
y=rnorm(50),
stringsAsFactors = FALSE)
$z <- frm$x+frm$y
frm::ScatterHistN(frm, "x", "y", "z", title="Example Joint Distribution") WVPlots
set.seed(34903490)
= rnorm(50)
x = 0.5*x^2 + 2*x + rnorm(length(x))
y = data.frame(
frm x = x,
yC = y>=as.numeric(quantile(y,probs=0.8)),
stringsAsFactors = FALSE)
::ROCPlot(frm, "x", "yC", TRUE, title="Example ROC plot") WVPlots
We know this collection doesn’t rise to the standard of a complete
“grammar of graphics” (as in Leland Wilkinson’s ideas). But it can
become (through accumulation) a re-usable repository of a number of
specific graphing tasks done well. It is also a chance to eventually
document presentation design decisions (though we haven’t gotten far on
that yet). The complete set of graphs is shown in the
WVPlots_example
vignette.