taxonomy {CHNOSZ} | R Documentation |
Read data from NCBI taxonomy files, traverse taxonomic ranks, get scientific names of taxonomic nodes.
getnodes(taxdir)
getrank(id, taxdir, nodes=NULL)
parent(id, taxdir, rank=NULL, nodes=NULL)
allparents(id, taxdir, nodes=NULL)
getnames(taxdir)
sciname(id, taxdir, names=NULL)
taxdir |
character, directory where the taxonomy files are kept. |
id |
numeric, taxonomic ID(s) of the nodes of interest. |
nodes |
dataframe, output from |
rank |
character, name of the taxonomic rank of interest. |
names |
dataframe, output from |
These functions provide a convenient way to read data from NCBI taxonomy files (i.e., the contents of taxdump.tar.gz
, which is available from https://ftp.ncbi.nih.gov/pub/taxonomy/).
The taxdir
argument is used to specify the directory where the nodes.dmp
and names.dmp
files are located. getnodes
and getnames
read these files into data frames. getrank
returns the rank (species, genus, etc) of the node with the given taxonomic id
. parent
returns the taxonomic ID of the next-lowest node below that specified by the id
in the argument, unless rank
is supplied, in which case the function descends the tree until a node with that rank is found. allparents
returns all the taxonomic IDs of all nodes between that specified by id
and the root of the tree, inclusive. sciname
returns the scientific name of the node with the given id
.
The id
argument can be of length greater than 1 except for allparents
. If getrank
, parent
, allparents
or sciname
need to be called repeatedly, the operation can be hastened by supplying the output of getnodes
in the nodes
argument and/or the output of getnames
in the names
argument.
## Get information about Homo sapiens from the
## packaged taxonomy files
taxdir <- system.file("extdata/taxonomy", package = "CHNOSZ")
# H. sapiens' taxonomic id
id1 <- 9606
# That is a species
getrank(id1, taxdir)
# The next step up the taxonomy
id2 <- parent(id1, taxdir)
print(id2)
# That is a genus
getrank(id2, taxdir)
# That genus is "Homo"
sciname(id2, taxdir)
# We can ask what phylum is it part of?
id3 <- parent(id1, taxdir, "phylum")
# Answer: "Chordata"
sciname(id3, taxdir)
# H. sapiens' complete taxonomy
id4 <- allparents(id1, taxdir)
sciname(id4, taxdir)
## The names of the organisms in the supplied taxonomy files
taxdir <- system.file("extdata/taxonomy", package = "CHNOSZ")
id5 <- c(83333, 4932, 9606, 186497, 243232)
sciname(id5, taxdir)
# These are not all species, though
# (those with "no rank" are something like strains,
# e.g. Escherichia coli K-12)
getrank(id5, taxdir)
# Find the species for each of these
id6 <- sapply(id5, function(x) parent(x, taxdir = taxdir, rank = "species"))
unique(getrank(id6, taxdir)) # "species"
# Note that the K-12 is dropped
sciname(id6, taxdir)
## The complete nodes.dmp and names.dmp files are quite large,
## so it helps to store them in memory when performing multiple queries
## (this doesn't have a noticeable speed-up for the small files in this example)
taxdir <- system.file("extdata/taxonomy", package = "CHNOSZ")
nodes <- getnodes(taxdir = taxdir)
# All of the node ids in this file
id7 <- nodes$id
# All of the non-leaf nodes
id8 <- unique(parent(id7, nodes = nodes))
names <- getnames(taxdir = taxdir)
sciname(id8, names = names)