Retrieve taxonomic information with taxize

Taxize is a very powerful R package

that allows the user to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information – among other things.

Here an example for retrieving taxonomic information from itis:

 

library(taxize)
library(plyr)

# download an example dataset
macro <- read.csv(url("http://www.biomonitor.it/wp-content/uploads/2017/10/macro_example.csv"))

# extract the Taxa column
taxa <-  macro$Taxa

# use the function classification from taxize 
# to retrieve taxonomic information
# you will be asked to chose the right
# taxon name from a list of possible candidates
# classification return a list of data.frame
# data about some taxon could be missing

taxa.cla <- classification(taxa, db = "itis")

# transpose the elements of the list in order to have
# a wide rather than a long dataframe format

taxa.cla.t <- lapply(taxa.cla, function(y){data.frame(t(y[1]), stringsAsFactors = F)})

# change column name of the data.frame

for(i in 1:length(taxa.cla.t)){
  names(taxa.cla.t[[i]]) <- t(taxa.cla[[i]][2])
  if(is.na(colnames(taxa.cla.t[[i]][1]))==T & length(names(taxa.cla.t[[i]]) == 1)){
  # if taxonomic information are missing for
  # a taxa add a data.frame with one column
  # called kingdom filled with Animalia
  taxa.cla.t[[i]] <- data.frame(kingdom = "Animalia", stringsAsFactors = F ) 
  }
}

# use rbind.fill from plyr package to
# reduce the list to a data.frame

taxa.df <- rbind.fill(taxa.cla.t)

 

With a little effort it is possible to modify the dataframe to be used as custom database in the asBiomonitor function.

Leave a Reply

Your email address will not be published. Required fields are marked *