`TreeSearch`

is an R package that allows, among other things, parsimony search on morphological datasets that contain inapplicable data, following the algorithm proposed by Brazeau, Guillerme and Smith (2019).

In brief, this algorithm modifies the Fitch algorithm to count the total number of homoplasious events on a tree.

A companion vignette gives details on installing the package and getting up and running.

Once installed, load the inapplicable package into R using

Here’s an example of using the package to conduct tree search. You can load your own dataset, but for now, we’ll use the Vinther *et al.* (2008) dataset that comes bundled with the package.

This dataset is small enough that it runs reasonably quickly, but its phylogenetic signal is obscure enough that it can require Ratchet searches to escape from local optima.

```
my.data <- TreeSearch::inapplicable.datasets[['Vinther2008']]
my.phyDat <- phangorn::phyDat(my.data, type='USER', levels=c(0:9, '-'))
```

```
# Set a random seed so that random functions in this document are reproducible
RNGversion("3.5.0") # Until we can require R3.6.0
set.seed(0)
```

We can generate a random tree and calculate its parsimony score thus:

```
random.tree <- RandomTree(my.phyDat)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(random.tree)
```

`## [1] 189`

We might save some wasted time in tree search if we can start with a tree that’s a little closer to optimal: perhaps a neighbour-joining tree.

```
nj.tree <- NJTree(my.phyDat)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(nj.tree)
```

`## [1] 85`

With the Vinther *et al.* (2008) dataset, nemerteans and brachiopods form an natural outgroup to the other taxa. If we wish, we can constrain this outgroup and avoid making tree rearrangements that would mix ingroup and outgroup taxa. This will accelerate tree search, but it’s worth thinking carefully whether you can be perfectly confident that the ingroup and outgroup are mutually monophyletic.

To constrain an outgroup, we first need to make the outgroup and ingroup monophyletic on our tree:

```
outgroup <- c('Nemertean', 'Lingula', 'Terebratulina')
rooted.tree <- EnforceOutgroup(nj.tree, outgroup)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(rooted.tree)
```

Now let’s see whether a few nearest-neighbour interchanges can find us a better tree score. This tends to be the quickest search to run, if not the most exhaustive. Using `RootedNNISwap`

instead of `NNISwap`

stops taxa from moving from one side of the root to another, which we need to do if we’re going to enforce our outgroup constraint. Even if we are using a single-taxon outgroup, using `RootedNNISwap`

retains the position of the root, which makes tree comparison and interpretation easier later on.

```
better.tree <- TreeSearch(tree=rooted.tree, dataset=my.phyDat,
EdgeSwapper=RootedNNISwap, verbosity=3)
```

```
##
## - Performing tree search. Initial score: 85
## - Iteration 6 - Best score 85 hit 1 times.
## - Iteration 7 - Best score 85 hit 2 times.
## - Iteration 8 - Best score 85 hit 3 times.
## - Iteration 10 - Best score 85 hit 4 times.
## - Iteration 11 - Best score 85 hit 5 times.
## - Iteration 12 - Best score 85 hit 6 times.
## - Iteration 16 - Best score 85 hit 7 times.
## - Iteration 17 - Best score 85 hit 8 times.
## - Iteration 19 - Best score 85 hit 9 times.
## - Iteration 24 - Best score 85 hit 10 times.
## - Iteration 29 - Best score 85 hit 11 times.
## - Iteration 36 - Best score 85 hit 12 times.
## * Iteration 37 - New best score 84 found on 1 trees.
## - Iteration 39 - Best score 84 hit 2 times.
## * Iteration 41 - New best score 81 found on 1 trees.
## - Iteration 42 - Best score 81 hit 2 times.
## - Iteration 43 - Best score 81 hit 3 times.
## - Iteration 49 - Best score 81 hit 4 times.
## - Iteration 50 - Best score 81 hit 5 times.
## - Iteration 54 - Best score 81 hit 6 times.
## - Iteration 55 - Best score 81 hit 7 times.
## - Iteration 57 - Best score 81 hit 8 times.
## - Iteration 62 - Best score 81 hit 9 times.
## - Iteration 63 - Best score 81 hit 10 times.
## - Iteration 69 - Best score 81 hit 11 times.
## - Iteration 81 - Best score 81 hit 12 times.
## - Iteration 82 - Best score 81 hit 13 times.
## * Iteration 83 - New best score 80 found on 1 trees.
## - Iteration 84 - Best score 80 hit 2 times.
## - Iteration 85 - Best score 80 hit 3 times.
## - Iteration 89 - Best score 80 hit 4 times.
## - Final score 80 found 4 times after 100 rearrangements
```

This score is better than the 85 of the original neighbour-joining tree, but is this the best we can do?

Using NNI helps to explore the region of treespace close to the local optimum, but SPR and TBR rearrangements are better at escaping local optima, and find better trees further away in tree space. Using more hits (`maxHits`

) and more iterations (`maxIter`

) also means we’ll move closer to an optimal tree.

```
better.tree <- TreeSearch(better.tree, my.phyDat, maxHits=8, maxIter=10000,
EdgeSwapper=RootedSPRSwap, verbosity=2)
```

```
##
## - Performing tree search. Initial score: 80
## - Terminating search; hit best score 8 times.
## - Final score 80 found 8 times after 440 rearrangements
```

```
better.tree <- TreeSearch(better.tree, my.phyDat, maxHits=20, maxIter=40000,
EdgeSwapper=RootedTBRSwap, verbosity=2)
```

```
##
## - Performing tree search. Initial score: 80
## * Iteration 848 - New best score 79 found on 1 trees.
## - Terminating search; hit best score 20 times.
## - Final score 79 found 20 times after 2046 rearrangements
```

That score’s looking better, but these are only quick searches, and we could be caught in a local optimum. A more comprehensive search of tree space can be accomplished using the parsimony ratchet (Nixon, 1999). It might take a couple of minutes to run.

```
best.tree <- Ratchet(better.tree, my.phyDat, verbosity=0, ratchIter=5,
swappers=list(RootedTBRSwap, RootedSPRSwap, RootedNNISwap))
attr(best.tree, 'score') # Each tree is labelled with its score during tree search
```

`## [1] 79`

The best tree for this dataset has a score of 79. This relatively quick Ratchet search always seems to find a tree with this score.

Let’s take a look at the one it’s found on this occasion:

This is, of course, just one most parsimonious tree; with this dataset, there are many.

One way to make a strict consensus of multiple optimal trees is to collect a number of trees from independent Ratchet iterations.

For each Ratchet iteration, we’ll conduct a TBR search to scan tree space, then an NNI search to hone in on the local optimum.

Note that it’s especially important to retain the position of the root here: a single topology, if rooted in different places, can lead to an unresolved consensus.

```
some.optimal.trees <- MultiRatchet(best.tree, nSearch=20, my.phyDat, swappers=
list(RootedTBRSwap, RootedNNISwap))
```

`## Found 15 unique trees from 20 searches.`

If ten independent runs ended up at a number of different trees, there may be many more optimal trees out there to be found; perhaps we could repeat `MultiRatchet`

with `nSearch = 250`

for a more exhaustive sampling of tree space. That would take a while though; for now, let’s check out our consensus tree:

```
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
strict.consensus <- ape::consensus(some.optimal.trees)
plot(strict.consensus)
```

We might prefer a majority-rule consensus tree:

```
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
majority.consensus <- ape::consensus(some.optimal.trees, p=0.5)
plot(majority.consensus)
```

Clade support can be generated using Jackknife resampling:

```
jack.trees <- Jackknife(best.tree, my.phyDat, EdgeSwapper=RootedTBRSwap,
jackIter=25, verbosity=0)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(strict.consensus)
jack.support <- SplitFrequency(strict.consensus, jack.trees) / length(jack.trees)
ape::nodelabels(paste('\n\n', signif(jack.support, 2)),
col=SupportColour(jack.support),
adj=0, pos=2, frame='none')
```

Equal weights produces trees that are less accurate and less precise than implied weights (Smith, 2019); equally weighted analysis should never be conducted without also considering the results of implied weights (Goloboff, 1997), ideally under a range of concavity constants (cf. Smith & Ortega-Hernández, 2014).

The simplest way to conduct an implied weights search is to use the functions

which operate in the same fashion as their equally-weighted counterparts, with the option of specifying a concavity constant (*k*) using the parameter `concavity=k`

(default = 4).

Brazeau, M. D., Guillerme, T., & Smith, M. R. (2019). An algorithm for morphological phylogenetic analysis with inapplicable data. *Systematic Biology*, in press. doi:10.1093/sysbio/syy083

Goloboff, P. A. (1997). Self-weighted optimization: tree searches and character state reconstructions under implied transformation costs. *Cladistics*, *13*(3), 225–245. Retrieved from http://dx.doi.org/10.1111/j.1096-0031.1997.tb00317.x

Nixon, K. C. (1999). The Parsimony Ratchet, a new method for rapid parsimony analysis. *Cladistics*, *15*(4), 407–414. doi:10.1111/j.1096-0031.1999.tb00277.x

Smith, M. R. (2019). Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets. *Biology Letters*, *15*(2), 20180632. doi:10.1098/rsbl.2018.0632

Smith, M. R., & Ortega-Hernández, J. (2014). *Hallucigenia*’s onychophoran-like claws and the case for Tactopoda. *Nature*, *514*(7522), 363–366. doi:10.1038/nature13576

Vinther, J., Van Roy, P., & Briggs, D. E. G. (2008). Machaeridians are Palaeozoic armoured annelids. *Nature*, *451*(7175), 185–188. doi:10.1038/nature06474