From a species list to cleaning names to a map of their occurrences


rOpenSci package: taxize



Load libraries

library("taxize")

Most of us will start out with a species list, something like the one below. Note that each of the names is spelled incorrectly.

splist <- c("Helanthus annuus", "Pinos contorta", "Collomia grandiflorra", "Abies magnificaa",
    "Rosa california", "Datura wrighti", "Mimulus bicolour", "Nicotiana glauca",
    "Maddia sativa", "Bartlettia scapposa")

There are many ways to resolve taxonomic names in taxize. Of course, the ideal name resolver will do the work behind the scenes for you so that you don’t have to do things like fuzzy matching. There are a few services in taxize like this we can choose from: the Global Names Resolver service from EOL (see function gnr_resolve) and the Taxonomic Name Resolution Service from iPlant (see function tnrs). In this case let’s use the function tnrs.

# The tnrs function accepts a vector of 1 or more
splist_tnrs <- tnrs(query = splist, getpost = "POST", source_ = "iPlant_TNRS")

## Calling http://taxosaurus.org/retrieve/bf0b4ae7f3c3c9a0d7f50854f6c0997f

# Remove some fields
(splist_tnrs <- splist_tnrs[, !names(splist_tnrs) %in% c("matchedName", "annotations",
    "uri")])

##            submittedName         acceptedName    sourceId score
## 3       Helanthus annuus    Helianthus annuus iPlant_TNRS  0.98
## 1         Pinos contorta       Pinus contorta iPlant_TNRS  0.96
## 4  Collomia grandiflorra Collomia grandiflora iPlant_TNRS  0.99
## 5       Abies magnificaa      Abies magnifica iPlant_TNRS  0.98
## 10       Rosa california     Rosa californica iPlant_TNRS  0.99
## 9         Datura wrighti      Datura wrightii iPlant_TNRS  0.98
## 7       Mimulus bicolour      Mimulus bicolor iPlant_TNRS  0.98
## 8       Nicotiana glauca     Nicotiana glauca iPlant_TNRS     1
## 6          Maddia sativa         Madia sativa iPlant_TNRS  0.97
## 2    Bartlettia scapposa   Bartlettia scaposa iPlant_TNRS  0.98

# Note the scores. They suggest that there were no perfect matches, but
# they were all very close, ranging from 0.77 to 0.99 (1 is the highest).
# Let's assume the names in the 'acceptedName' column are correct (and
# they should be).

# So here's our updated species list
(splist <- as.character(splist_tnrs$acceptedName))

##  [1] "Helianthus annuus"    "Pinus contorta"       "Collomia grandiflora" "Abies magnifica"
##  [5] "Rosa californica"     "Datura wrightii"      "Mimulus bicolor"      "Nicotiana glauca"
##  [9] "Madia sativa"         "Bartlettia scaposa"

Another thing we may want to do is collect common names for our taxa.

tsns <- get_tsn(searchterm = splist, searchtype = "sciname", verbose = FALSE)
comnames <- lapply(tsns, getcommonnamesfromtsn)

## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=36616
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=183327
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=31037
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=181834
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=24818
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=30521
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=33245
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=30574
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=38040
## http://www.itis.gov/ITISWebService/services/ITISService/getCommonNamesFromTSN?tsn=36822

# Unfortunately, common names are not standardized like species names, so there are multiple common
# names for each taxon
sapply(comnames, length)

##  [1] 3 3 3 3 3 3 3 3 3 3

# So let's just take the first common name for each species
comnames_vec <- do.call(c, lapply(comnames, function(x) as.character(x[1, "comname"])))

# And we can make a data.frame of our scientific and common names
(allnames <- data.frame(spname = splist, comname = comnames_vec))

##                  spname                       comname
## 1     Helianthus annuus              common sunflower
## 2        Pinus contorta                lodgepole pine
## 3  Collomia grandiflora        largeflowered collomia
## 4       Abies magnifica                    golden fir
## 5      Rosa californica           California wildrose
## 6       Datura wrightii            sacred thorn-apple
## 7       Mimulus bicolor yellow and white monkeyflower
## 8      Nicotiana glauca                  tree tobacco
## 9          Madia sativa                 coast tarweed
## 10   Bartlettia scaposa                Bartlett daisy

Another common task is getting the taxonomic tree upstream from your study taxa. We often know what family or order our taxa are in, but it we often don’t know the tribes, subclasses, and superfamilies. taxize provides many avenues to getting classifications. Two of them are accessible via a single function (classification): the Integrated Taxonomic Information System (ITIS) and National Center for Biotechnology Information (NCBI); and via the Catalogue of Life (see function col_classification):

# Let's get classifications from ITIS using Taxonomic Serial Numbers. Note that we could use uBio
# instead.
class_list <- classification(tsns)

## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=36616
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=183327
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=31037
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=181834

## Error: Empty reply from server

# And we can attach these names to our allnames data.frame
library("plyr")
gethiernames <- function(x) {
    temp <- x[, c("rankName", "taxonName")]
    values <- data.frame(t(temp[, 2]))
    names(values) <- temp[, 1]
    return(values)
}
class_df <- ldply(class_list, gethiernames)

## Error: object 'class_list' not found

allnames_df <- merge(allnames, class_df, by.x = "spname", by.y = "Species")

## Error: error in evaluating the argument 'y' in selecting a method for function 'merge': Error: object
## 'class_df' not found

# Now that we have allnames_df, we can start to see some relationships among species simply by their
# shared taxonomic names
allnames_df[1:2, ]

## Error: object 'allnames_df' not found

Using the species list, with the corrected names, we can now search for occurrence data. The Global Biodiversity Information Facility (GBIF) has the largest collection of records data, and has a API that we can interact with programmatically from R.

library("rgbif")
library("ggplot2")

Get occurences

occurr_list <- occurrencelist_many(as.character(allnames$spname), coordinatestatus = TRUE, maxresults = 100,
    fixnames = "change")

Make a map

gbifmap_list(occurr_list) +
	guides(col = guide_legend(title = ", nrow = 3, byrow = TRUE)) +
	theme(legend.position = "bottom", legend.key = element_blank()) +
	coord_equal()

map