Biostatistics vignette

Rob Knell

January 2021

The Biostatistics Package

This package consists of a series of learnr tutorials for use in teaching statistics to biologists. They were written for use in undergraduate and postgraduate teaching in the UK but they could also be used for individual, self-directed learning. The subjects covered range from basic data visualistion and description through to reasonably advanced linear modelling. There are obviously many subjects which are not currently covered such as generalised linear models, mixed effects models and multivariate statistics and it is hoped that these will be incorporated in the future.

There is a strong emphasis throughout the tutorial on analysing real data sets. This is much better for learning statistics than using synthetic example data because with real data comes all of the issues and uncertainty associated with real science. The data used here have mostly been made publicly available by the authors of papers published in the biological literature, mostly via the Dryad data repository, and I would like to thank all of them for this.

The tutorials are written for the learnr package which uses an rmarkdown framework to render tutorials into shiny webapps. The rmarkdown files for all of the tutorials are available on the author’s github page.

Running tutorials

There are two ways of running these tutorials. The easy way assumes you are using a recent version of RStudio. If this is the case then once you have installed the package the tutorials will show up in the ‘Tutorial’ tab in the RStudio pane that also includes the Environment and History tabs. Click the “Start Tutorial” button and the tutorial will render, which can take a few seconds, and then appear in the Tutorial tab. You’ll probably want to maximise the pane within your RStudio window. If you want to finish the tutorial click on the ‘Stop’ sign button at the top left of the tab.

If you would rather run your tutorial in a separate browser window then you can use the run_tutorial() function from the learnr package. You need to specify the name of the tutorial and the package, so

learnr::run_tutorial("02_Descriptive_statistics", package = "Biostatistics")

will run the Descriptive Statistics tutorial and

learnr::run_tutorial("17_Multiple_Regression", package = "Biostatistics")

will run the Multiple Regression tutorial. In my experience the first method, with the Tutorial pane, seems more stable and sometimes tutorials won’t render using run_tutorial() for reasons that are not clear.

List of tutorials

The tutorials currently in the package are:

00_Introduction
01_Frequency_histograms
02_Descriptive_statistics
03_Boxplots
04_Scatterplots
05_Sampling_distributions
06_Standard_errors
07_Confidence_intervals
08_CIs_comparing_two_means 09_Paired_sample_t_tests
10_Two_sample_t_tests
11_Chi_square_tests
12_Correlation
13_Single_factor_ANOVA
14_Linear_Regression
15_Model_assumptions
16_Multi_factor_ANOVA
17_Multiple_regression
18_Factors_and_continuous_variables
19_Model_selection