taxize v0.3.0 update - a new data source, taxonomy in writing, and uBio examples

  Scott Chamberlain


May 20, 2014

We just released v0.3 of taxize. For details on the update, see the release notes.

Some new features

  • New function iplant_resolve() to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the tnrs() function.
  • New function ipni_search() to search for names in the International Plant Names Index (IPNI). See below for more.
  • New function resolve() that unifies name resolution services from iPlant’s name resolution service (via iplant_resolve()), Taxosaurus’ TNRS (via tnrs()), and GNR’s name resolution service (via gnr_resolve()).
  • All get_ functions now returning a new uri attribute that is a link to the taxon on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example: browseURL(attr(result, "uri")).

Updating to v0.3

Since taxize is not updated to v0.3 on CRAN yet at the time of writing this, install taxize from GitHub:

devtools::install_github("ropensci/taxize")

Then load taxize

library("taxize")

International Plant Names Index (IPNI)

We added the IPNI as a new data source in taxize in v0.3. Currently, there is only one function to interact with IPNI: ipni_search(). What follows are a few examples of how you can use ipni_search().

Search for the genus Brintonia

ipni_search(genus='Brintonia')[,c(1:3)]
##         id version     family
## 1   7996-1     1.3 Asteraceae
## 2 296073-2     1.3 Asteraceae
## 3  36551-2     1.3 Asteraceae
## 4 186337-1     1.3 Asteraceae

Search for the species Pinus contorta

head(ipni_search(genus='Pinus', species='contorta')[,c(1:3)])
##           id     version   family
## 1   262873-1 1.1.2.1.1.2 Pinaceae
## 2   262872-1 1.2.2.1.1.1 Pinaceae
## 3 30000492-2     1.1.2.1 Pinaceae
## 4   196950-2         1.4 Pinaceae
## 5   921291-1         1.4 Pinaceae
## 6   196949-2         1.5 Pinaceae

Different output formats (the default is minimal)

head(ipni_search(genus='Ceanothus')[,c(1:3)])
##           id     version     family
## 1    55268-3         1.1 Rhamnaceae
## 2 30006383-2 1.1.2.1.1.3 Rhamnaceae
## 3    55269-3         1.1 Rhamnaceae
## 4    33421-1         1.5 Rhamnaceae
## 5 60461578-2         1.1 Rhamnaceae
## 6   331948-2         1.4 Rhamnaceae
head(ipni_search(genus='Ceanothus', output='extended'))[,c(1:3)]
##           id     version     family
## 1    55269-3         1.1 Rhamnaceae
## 2    33421-1         1.5 Rhamnaceae
## 3    55268-3         1.1 Rhamnaceae
## 4 30006383-2 1.1.2.1.1.3 Rhamnaceae
## 5 60461578-2         1.1 Rhamnaceae
## 6   331948-2         1.4 Rhamnaceae

If you do something wrong, you get a message, and the actual output is NA

ipni_search(genus='Brintoniaasasf')
## Warning: No data found
## [1] NA

uBio examples

Until now, we have had functions to interact with uBio’s API, but it probably hasn’t been too clear how to use them, and they were a little buggy for sure. We have squashed many bugs in ubio functions. Here is an example workflow of how to use ubio functions.

Search uBio by taxonomic name. This is sort of the entry point for uBio where you can search by taxonomic name, from which you can get namebankID’s that can be passed to the ubio_classification_search and ubio_namebankID functions

lapply(ubio_search(searchName = 'elephant'), head)
## $scientific
##   namebankid       namestring   fullnamestring packageid packagename
## 1    6938660 Cerylon elephant Cerylon elephant        80 Cerylonidae
##   basionymunit rankid rankname
## 1      6938660     24  species
##
## $vernacular
##   namebankid    namestring languagecode    languagename packageid
## 1    8118711 Elephant fish          115 Creole, English         3
## 2    8118714 Elephant fish          115 Creole, English         3
## 3    8118726 Elephant fish          115 Creole, English         3
## 4    8115700 Elephant fish          115 Creole, English         3
## 5    8115663 Elephant fish          115 Creole, English         3
## 6    8114377 Elephant fish          115 Creole, English      2463
##   packagename namebankidlink   namestringlink
## 1      Pisces         132263 Mormyrus tapirus
## 2      Pisces         132258 Mormyrus tapirus
## 3      Pisces         181174 Mormyrus tapirus
## 4      Pisces         128971 Mormyrus tapirus
## 5      Pisces         128972 Mormyrus tapirus
## 6  Mormyridae        2299821 Mormyrus tapirus
##                  fullnamestringlink
## 1 Mormyrus tapirus Pappenheim, 1905
## 2 Mormyrus tapirus Pappenheim, 1905
## 3 Mormyrus tapirus Pappenheim, 1905
## 4 Mormyrus tapirus Pappenheim, 1905
## 5 Mormyrus tapirus Pappenheim, 1905
## 6 Mormyrus tapirus Pappenheim, 1905
id <- ubio_search(searchName = 'elephant')$scientific$namebankid[1]
## Error: CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

ubio_id

Get data on a specific uBio namebankID. Use the id from the previous code block

ubio_id(namebankID = id)
## $data
##   namebankid       namestring   fullnamestring packageid packagename
## 1    6938660 Cerylon elephant Cerylon elephant        80 Cerylonidae
##   basionymunit rankid rankname
## 1      6938660     24  Species
##
## $synonyms
## NULL
##
## $vernaculars
## NULL
##
## $cites
## NULL
##
## $mappings
## NULL

Return hierarchiesID that refer to the given namebankID

ubio_classification_search(namebankID = 3070378)
##   hierarchiesid classificationtitleid              classificationtitle
## 1       2477072                    84                    NCBI Taxonomy
## 2      11166818                   100                    NCBI Taxonomy
## 3      17950600                   104 uBiota 2008-03-20T10:36:50-04:00

ubio_classification

Return all ClassificationBank data pertaining to a particular hierarchiesID

ubio_classification(hierarchiesID = 2483153)
## Error: XML content does not seem to be XML: ''

ubio_synonyms

Search for taxonomic synonyms by hierarchiesID

ubio_synonyms(hierarchiesID = 4091702)
## Error: invalid subscript type 'list'

Examples of using taxize in writing

Let’s say one is writing a paragraph in which you are using taxonomic or common names, and perhaps you want to have the number of taxa in a particular group. You can write a paragaph like:

I studied the common weed species _Tragopogon dubius_ (`r sci2comm('Tragopogon dubius', db='itis')[[1]][1]`; `r tax_name(query = "Tragopogon dubius", get = "family", db = "ncbi")[[1]]`) and _Cirsium arvense_ (`r sci2comm('Cirsium arvense', db='itis')[[1]][1]`; `r tax_name(query = "Cirsium arvense", get = "family", db = "ncbi")[[1]]`).

Which renders to:

I studied the common weed species Tragopogon dubius (yellow salsify; Asteraceae) and Cirsium arvense (Canada thistle; Asteraceae).

Notice how inside backticks you can execute code by starting with an r, then doing something like searching for common names for a taxon.

Another example:

We found that `r sci2comm('Tragopogon dubius', db='itis')[[1]][1]` was very invasive.

Renders to:

We found that yellow salsify was very invasive.

Another example:

There are `r nrow(downstream('Tragopogon', db = "col", downto = "Species")$Tragopogon)` species (source: Catalogue of Life) in the _Tragopogon_ genus, meaning there is much more to study :)

Renders to:

There are 142 species (source: Catalogue of Life) in the Tragopogon genus, meaning there is much more to study :)

el fin

Please do update to v0.3, try it out, report bugs, and get back to us with any questions!