It is usually a good idea to control for species evolutionary history if we want to get robust results. This is because species are not independent from each other, thus violate the independence assumption of data for most statistical models. Fortunately, with growing available genetic data and softwares, building phylogenies are getting easier and easier.
Phylomatic is an easy way to fetch phylogenies for species, especially plants, on line. Thanks to packages developed by rOpenSci, we can now use Phylomatic within R. One big advantage of this is reproducibility, which means that we can regenerate the phylogeny whenever we want without click on buttons on the website. In addition, because most ecologists are using R for downstream analyses, fetching phylogenies within R will make the workflow much natural and easy to follow.
The basic procedure for fetching phylogenies with Phylomatic using R will be:
- Compile the species names we want to include in the phylogeny; and clean if necessary (
- Clean and prepare species names in the format to be used with Phylomatic (
- Query Phylomatic and return the phylogeny (
brranching::phylomatic(); if you have hundreds species, it is better to use Phylomatic locally with
It is possible to merge step 2 and 3, but I prefer to separate them.
I assume that you already have a list of species, named as
sp_list. Then we can use the
phylomatic() function from the
brranching package. If you do not have it installed, install it first with
sp_list = c() tree = brranching::phylomatic(sp_list)
If you have few species, this will likely give you a phylogeny with all species. However, in practice, it is quite possible that you will get a warning like this:
NOTE: 3 taxa not matched: NA/genus/species, ...
In this case, we may try to prepare species names first with
brranching::phylomatic_names(). The default database will be
ncbi, but if you have hundreds of species, this can be slow. Instead, I would suggest to use
ape first because it is much faster (this is the default within
brranching::phylomatic()). Then filter out those species have
NA as family and try
itis (these are the three database supported). Sometimes, your species names are not clean, e.g. with synonyms, then the R package
taxize will be really handy. In addition, I find
rotl::tnrs_match_names() is also good to check and solve names. This function will compare with Open Tree of Life to check species names.
sp_list_phylocom = brranching::phylomatic_names(sp_list, format = "isubmit", db = "ncbi")
Now, let’s try to fetch the phylogeny again, with the updated species list.
tree = brranching::phylomatic(sp_list_phylocom)
As mentioned eariler, it is possible to merge these two steps into one with
tree = brranching::phylomatic(sp_list_phylocom, db = "ncbi") but I prefer to solve species names first.
The default backbone phylogeny is the APG III
R20120829. We can use the Zanne et al. 2014 phylogeny.
tree = brranching::phylomatic(sp_list_phylocom, storedtree = "zanne2014") plot(tree)
Finally, I have one reproducible example that shows how to use the
brranching package to get phylogeny for plants at Github. Feel free to check it out (and the associated paper if you are interested in)!