# "runexp" R package
Authors: Annie Sauer (anniees@vt.edu) and Sierra Merkes (smerkes@vt.edu)
Implements two methods of estimating runs scored in a softball
scenario: (1) theoretical expectation using discrete Markov chains and (2) empirical
distribution using multinomial random simulation. Scores are based on player-specific input
probabilities (out, single, double, triple, walk, and homerun). Optional inputs include probability
of attempting a steal, probability of succeeding in an attempted steal, and an indicator of whether
a player is "fast" (e.g. the player could stretch home). These probabilities may be
calculated from common player statistics that are publicly available on team's webpages.
Scores are evaluated based on a nine-player lineup and may be used to compare lineups,
evaluate base scenarios, and compare the offensive potential of individual players. Manuscript
forthcoming. See Bukiet & Harold (1997) for implementation of discrete Markov chains.
Functions:
* `chain`: calculates run expectancy using discrete Markov chains
* `sim`: estimates run expectancy using multinomial simulation
* `plot.chain`: S3 method for plotting `chain` output objects
* `prob_calc`: calculates player probabilities from commonly available stats
* `scrape`: scrapes player statistics from a given URL
Data Files:
* `wku_stats`: player statistics for the 2013 Western Kentucky University softball team
* `wku_probs`: calculated player probabilities for the 2013 Western Kentucky University softball team
Examples:
```
# scrape ----------------------------------------------------------------------
url <-"https://wmubroncos.com/sports/softball/stats/2019"
test <- scrape(url)
test_probs <- prob_calc(test)
# prob_calc -------------------------------------------------------------------
probs <- prob_calc(wku_stats) # probs corresponds to wku_probs
# chain -----------------------------------------------------------------------
# Expected score for single batter (termed "offensive potential")
chain1 <- chain("B", wku_probs)
plot(chain1)
# Expected score without cycling
lineup <- wku_probs$name[1:9]
chain2 <- chain(lineup, wku_probs)
plot(chain2)
# Expected score with cycling
chain3 <- chain(lineup, wku_probs, cycle = TRUE)
plot(chain3, type = 1:3)
# sim -------------------------------------------------------------------------
# Short simulation (designed to run in less than 5 seconds)
sim1 <- sim("B", wku_probs, inn = 1, reps = 100, cores = 1)
# Simulation with interactive graphic
lineup <- wku_probs$name[1:9]
sim2 <- sim(lineup, wku_probs, inn = 7, reps = 1, graphic = TRUE)
# Simulation for entire game (recommended to increase cores)
sim3 <- sim(lineup, wku_probs, cores = 1)
boxplot(sim3$score)
points(1, sim3$score_avg_game)
# game situation comparison of chain and sim ----------------------------------
# Select lineup made up of the nine "starters"
lineup <- sample(wku_probs$name[1:9], 9)
# Average chain across lead-off batters
chain_avg <- mean(chain(lineup, wku_probs, cycle = TRUE)$score)
# Simulate full 7 inning game (recommended to increase cores)
sim_score <- sim(lineup, wku_probs, inn = 7, reps = 50000, cores = 1)
# Split into bins in order to plot averages
sim_grouped <- split(sim_score$score, rep(1:100, times = 50000 / 100))
boxplot(sapply(sim_grouped, mean), ylab = 'Expected Score for Game')
points(1, sim_score$score_avg_game, pch = 16, cex = 2, col = 2)
points(1, chain_avg * 7, pch = 18, cex = 2, col = 3)
```