Introduction to MullerPlot package

F. Farahpour

2022-04-27

Muller plots are plots which show the succession of different OTUs and their relative abundance/population dynamics over time. They are powerful and fascinating tools to visualize diversity and its dynamics, i.e. how diversity emerges and how changes over time. They are called Muller plots in honor of Hermann Joseph Muller which used them to explain his idea of Muller’s ratchet (Muller 1932).

A big difference between Muller plots and normal box plots of abundances is that a Muller plot depicts not only the relative abundances but also succession of OTUs based on their genealogy/parental relation.

To see examples of such plots look at (Herron and Doebeli 2013; Kvitek and Sherlock 2013; Maddamsetti, Lenski, and Barrick 2015).

For generating a Muller plot one needs at least the abundance/population/frequency of OTUs over time/generations and their parental/genealogy/phylogeny relation.

How to use MullerPlot package

This package contains one function called “Muller.plot” and 3 data files in rda format. These data files provide an example of information about an artificial system with dynamics of 8 OTUs over 101 time steps. This artificial data is inspired by figure 2a of (Herron and Doebeli 2013).

To load the package run

library(MullerPlot)

Inputs (arguments) of Muller.plot function

The main inputs of this function are:

  1. attributes: Attributes of OTUS which must contain names and parents of OTUs and can contain also optional color for each OTU.

As an example you can load Attributes from our artificial example:

data("Attributes")
Attributes
##      names          parents        colors   
## [1,] "RBL636"       NA             "#A6CEE3"
## [2,] "rkd D5891 bc" "RBL636"       "#1F78B4"
## [3,] "ylnC-48"      "RBL636"       "#B2DF8A"
## [4,] "iglK f12"     "ylnC-48"      "#33A02C"
## [5,] "BAeR G11"     "rkd D5891 bc" "#FDBF6F"
## [6,] "nuhj-25"      "iglK f12"     "#FB9A99"
## [7,] "HwrK-41"      "rkd D5891 bc" "#FF7F00"
## [8,] "QecF*22"      "nuhj-25"      "#E31A1C"

This could be a matrix or data frame and must have 2 or three columns.

The first column contains OTU names. This could be any numeric or character vector. There is no rule for the order of OTU names but as a facility user can control the order of OTUs in the final plot by the order of their names in this column. Order of appearance of OTUs in Muller plot is determined by their Genealogy relation but sister OTUs (OTUs with the same parent) has no preference in the order of appearance. So One can control their appearance by the order of their appearance in this column.

The second column determines the parent of each OTU with NA instead of the parent of those OTU which their parents are not known or not present in the dynamics. This column must contain at least one NA. Each OTU in this column must appear also in the first column.

The third column (optional) determines the color that we want to be attributed to our OTUs (first column). If this column is empty then a random vector of colors will be attributed to OTUs. This vector with the corresponding name of OTUs will be returned by Muller.plot function.

  1. population.data: This input must provide information about population/abundance over time/generation.

It could be a matrix/data frame with 3 columns which contains OTU names (numeric or character), time/generation (numeric) and abundance/population (numeric, greater than or equal to 0), respectively. To use this type of population.data you should set data.method=“list.” Names provided in the first column must be a member of the first column of attributes. There is no rule for the order of the rows. As an example of this type of data you can load PopulationDataList. Here you can see the first 10 rows of this data frame.

data("PopulationDataList")
PopulationDataList[1:10,]
##      names times abundances
## 1   RBL636     1      29.00
## 2   RBL636     2      29.00
## 3   RBL636     3      29.00
## 4   RBL636     4      29.00
## 5   RBL636     5      29.00
## 6   RBL636     6      29.00
## 7   RBL636     7      29.00
## 8   RBL636     8      29.00
## 9   RBL636     9      29.00
## 10 ylnC-48     9       0.25

population.data could also be an OTU table. Then it must be a matrix or data frame with N rows and T columns. N is the number of OTUs and T is the number of time steps. rownames of this matrix must be OTU names (the same as names in attributes, numeric or character, order is not important) and colnames must be time steps or generation (numeric). To use this type of population.data you should set data.method=“table.” There is no rule for the order of rows and columns as long as the rownames and colnames correspond the correct rows and columns, respectively. As an example of this type of data you can load PopulationDataTable. Here you can see the first 14 columns of this data frame.

data("PopulationDataTable")
PopulationDataTable[,1:14]
##               1  2  3  4  5  6  7  8     9    10   11    12    13    14
## RBL636       29 29 29 29 29 29 29 29 29.00 29.00 29.0 29.00 28.50 28.15
## rkd D5891 bc  0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.25  0.50
## ylnC-48       0  0  0  0  0  0  0  0  0.25  0.35  0.5  0.75  0.85  1.00
## iglK f12      0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.00  0.00
## BAeR G11      0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.00  0.00
## nuhj-25       0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.00  0.00
## HwrK-41       0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.00  0.00
## QecF*22       0  0  0  0  0  0  0  0  0.00  0.00  0.0  0.00  0.00  0.00
  1. data.method: Type of data provided as population.data. It must be one of “list” or “table.” For more information see population.data above.

  2. time.interval.method: This must be one of “linear” or “equal.” By this input you determine how should be the distance between two successive time steps or generations. If “equal” is determined then the distances would be equal regardless of the value of time steps or generations otherwise the distance is determined by the real difference of two successive time steps or generations.

How to use Muller.plot function

Muller.plot function is easily used by passing its arguments. You can provide any graphical parameter. The function by default do not provide any legend for the plot but you can use output of the function to add legend to your plots.

Here, we use data.method=“list”:

data(PopulationDataList)
data(Attributes)
layout(matrix(c(1,2), nrow = 1), widths = c(0.8, 0.2))
par(mar=c(3, 3, 1, 0))
m.plot <- Muller.plot(attributes = Attributes, population.data = PopulationDataList,
                      data.method = "list", time.interval.method = "equal",mgp=c(2,0.5,0))
## Processing time steps:
## 1  |2  |3  |4  |5  |6  |7  |8  |9  |10  |11  |12  |13  |14  |15  |16  |17  |18  |19  |20  |21  |22  |23  |24  |25  |26  |27  |28  |29  |30  |31  |32  |33  |34  |35  |36  |37  |38  |39  |40  |41  |42  |43  |44  |45  |46  |47  |48  |49  |50  |51  |52  |53  |54  |55  |56  |57  |58  |59  |60  |61  |62  |63  |64  |65  |66  |67  |68  |69  |70  |71  |72  |73  |74  |75  |76  |77  |78  |79  |80  |81  |82  |83  |84  |85  |86  |87  |88  |89  |90  |91  |92  |93  |94  |95  |96  |97  |98  |99  |100  |101  |
plot(0,0,axes = F)
par(mar=c(0, 0, 0, 0))
legend("right", legend = m.plot$name,col = as.character(m.plot$color),pch = 19)

If for example, you prefer to have the OTU “rkd D5891 bc” above its sister “ylnC-48” and “BAeR G11” above its sister “HwrK-41” just change their appearance order in the attribute.

Attributes <- Attributes[c(1,3,2,4,7,5,6,8),]
layout(matrix(c(1,2), nrow = 1), widths = c(0.8, 0.2))
par(mar=c(3, 3, 1, 0))
m.plot <- Muller.plot(attributes = Attributes, population.data = PopulationDataList,
                      data.method = "list", time.interval.method = "equal",mgp=c(2,0.5,0))
## Processing time steps:
## 1  |2  |3  |4  |5  |6  |7  |8  |9  |10  |11  |12  |13  |14  |15  |16  |17  |18  |19  |20  |21  |22  |23  |24  |25  |26  |27  |28  |29  |30  |31  |32  |33  |34  |35  |36  |37  |38  |39  |40  |41  |42  |43  |44  |45  |46  |47  |48  |49  |50  |51  |52  |53  |54  |55  |56  |57  |58  |59  |60  |61  |62  |63  |64  |65  |66  |67  |68  |69  |70  |71  |72  |73  |74  |75  |76  |77  |78  |79  |80  |81  |82  |83  |84  |85  |86  |87  |88  |89  |90  |91  |92  |93  |94  |95  |96  |97  |98  |99  |100  |101  |
plot(0,0,axes = F)
par(mar=c(0, 0, 0, 0))
legend("right", legend = m.plot$name,col = as.character(m.plot$color),pch = 19)

In the next plot we use data.method=“table.” So we should load “PopulationDataTable.”

data(PopulationDataTable)
par(mar=c(3, 3, 1, 1))
Attributes[,3] <- rainbow(8)
m.plot <- Muller.plot(attributes = Attributes, population.data = PopulationDataTable,
                      data.method = "table", time.interval.method = "equal",mgp=c(2,0.5,0))
## Processing time steps:
## 1  |2  |3  |4  |5  |6  |7  |8  |9  |10  |11  |12  |13  |14  |15  |16  |17  |18  |19  |20  |21  |22  |23  |24  |25  |26  |27  |28  |29  |30  |31  |32  |33  |34  |35  |36  |37  |38  |39  |40  |41  |42  |43  |44  |45  |46  |47  |48  |49  |50  |51  |52  |53  |54  |55  |56  |57  |58  |59  |60  |61  |62  |63  |64  |65  |66  |67  |68  |69  |70  |71  |72  |73  |74  |75  |76  |77  |78  |79  |80  |81  |82  |83  |84  |85  |86  |87  |88  |89  |90  |91  |92  |93  |94  |95  |96  |97  |98  |99  |100  |101  |

If you have samples from time/generations with different time intervals you can use time.interval.method=“linear” to show the real interval between samples:

colnames(PopulationDataTable) <- c(seq(0,1000,50),seq(1050,3000,100),seq(3500,33000,500))
par(mar=c(3, 3, 1, 1))
Attributes[,3] <- rainbow(8)
m.plot <- Muller.plot(attributes = Attributes, population.data = PopulationDataTable,
                      data.method = "table", time.interval.method = "linear",mgp=c(2,0.5,0))
## Processing time steps:
## 0  |50  |100  |150  |200  |250  |300  |350  |400  |450  |500  |550  |600  |650  |700  |750  |800  |850  |900  |950  |1000  |1050  |1150  |1250  |1350  |1450  |1550  |1650  |1750  |1850  |1950  |2050  |2150  |2250  |2350  |2450  |2550  |2650  |2750  |2850  |2950  |3500  |4000  |4500  |5000  |5500  |6000  |6500  |7000  |7500  |8000  |8500  |9000  |9500  |10000  |10500  |11000  |11500  |12000  |12500  |13000  |13500  |14000  |14500  |15000  |15500  |16000  |16500  |17000  |17500  |18000  |18500  |19000  |19500  |20000  |20500  |21000  |21500  |22000  |22500  |23000  |23500  |24000  |24500  |25000  |25500  |26000  |26500  |27000  |27500  |28000  |28500  |29000  |29500  |30000  |30500  |31000  |31500  |32000  |32500  |33000  |

How to cite

citation(package = "MullerPlot")
## 
## To cite package 'MullerPlot' in publications use:
## 
##   Farnoush Farahpour, Mohammadkarim Saeedghalati and Daniel Hoffmann
##   (2022). MullerPlot: Generates Muller Plot from
##   Population/Abundance/Frequency Dynamics Data. R package version
##   0.1.3.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {MullerPlot: Generates Muller Plot from Population/Abundance/Frequency Dynamics Data},
##     author = {Farnoush Farahpour and Mohammadkarim Saeedghalati and Daniel Hoffmann},
##     year = {2022},
##     note = {R package version 0.1.3},
##   }

References

Herron, Matthew D., and Michael Doebeli. 2013. Parallel Evolutionary Dynamics of Adaptive Diversification in Escherichia coli.” PLoS Biol. 11 (2): 1–11. https://doi.org/10.1371/journal.pbio.1001490.
Kvitek, Daniel J., and Gavin Sherlock. 2013. Whole Genome, Whole Population Sequencing Reveals That Loss of Signaling Networks Is the Major Adaptive Strategy in a Constant Environment.” PLoS Genet. 9 (11). https://doi.org/10.1371/journal.pgen.1003972.
Maddamsetti, Rohan, Richard E Lenski, and Jeffrey E Barrick. 2015. Adaptation, Clonal Interference, and Frequency Dependent Interactions in a Long Term Evolution Experiment with Escherichia coli.” Genetics 200 (2): 619–31. https://doi.org/10.1534/genetics.115.176677.
Muller, Hermann Joseph. 1932. “Some Genetic Aspects of Sex.” The American Naturalist 66 (703): 118–38.