get_metrics {chem16S} | R Documentation |
Calculate chemical metrics of community reference proteomes
Description
Combines taxonomic classifications with reference proteomes for taxa to get amino acid compositions of community reference proteomes. Amino acid compositions are used to calculate chemical metrics.
Usage
get_metrics(RDP, map, refdb = "GTDB_220", taxon_AA = NULL, groups = NULL,
return_AA = FALSE, zero_AA = NULL, metrics = c("Zc", "nO2", "nH2O"))
Arguments
RDP |
data frame, taxonomic abundances produced by |
map |
data frame, taxonomic mapping produced by |
refdb |
character, name of reference database (‘GTDB_220’ or ‘RefSeq_206’) |
taxon_AA |
data frame, amino acid compositions of taxa, used to bypass |
groups |
list of indexing vectors, samples to be aggregated into groups |
return_AA |
logical, return the amino acid composition for each sample instead of the chemical metrics? |
zero_AA |
character, three-letter abbreviation(s) of amino acid(s) to assign zero counts for calculating chemical metrics |
metrics |
character, the chemical metrics to calculate |
Details
get_metrics
calculates selected chemical metrics for the community reference proteome in each sample.
The community reference proteome is computed from the amino acid compositions of reference proteomes for taxa (obtained from the reference database in refdb
), multiplied by taxonomic abundances given in RDP
.
RDP
may include results from the RDP Classifier (read using read_RDP
) or derived from the OTU table of a phyloseq-class
object (see ps_taxacounts
).
map
defines the taxonomic mapping between RDP
and refdb
.
Then, chemical metrics are calculated from the amino acid composition of the community reference proteome.
The default chemical metrics are carbon oxidation state (ZC), stoichiometric oxidation state (nO2), and stoichiometric hydration state (nH2O).
See calc_metrics
for other available metrics.
groups
, if given, is a list of one or more indexing vectors (with logical or numeric values) corresponding to samples whose taxonomic classifications are aggregated into groups before calculating amino acid compositions and chemical metrics.
Value
A data frame with one row for each sample, corresponding to columns 5 and above of RDP
.
The sample names are in the first column, which is named Run
by default, or group
if the groups
argument is provided.
The remaining columns have numeric values and are named for each of the calculated metrics
.
See Also
Examples
## First two examples are for RDP Classifier with default training set
## and mapping to NCBI taxonomy with RefSeq reference proteomes
# Get chemical metrics for all samples in a dataset
RDPfile <- system.file("extdata/RDP/BGPF13.tab.xz", package = "chem16S")
RDP <- read_RDP(RDPfile)
map <- map_taxa(RDP, refdb = "RefSeq_206")
# This is a data frame with 14 rows and Run, Zc, nO2, and nH2O columns
(metrics <- get_metrics(RDP, map, refdb = "RefSeq_206"))
# Read the metadata file
mdatfile <- system.file("extdata/metadata/BGPF13.csv", package = "chem16S")
# Create list with metadata and metrics in same sample order
mdat <- get_metadata(mdatfile, metrics)
# Calculate metrics for aggregated samples of Archaea and Bacteria
groups <- list(A = mdat$metadata$domain == "Archaea",
B = mdat$metadata$domain == "Bacteria")
# This is a data frame with 2 rows and group, Zc, nO2, and nH2O columns
get_metrics(RDP, map, refdb = "RefSeq_206", groups = groups)
# Classifications were made using the RDP Classifer retrained with GTDB r220
RDPfile.GTDB <- system.file("extdata/RDP-GTDB_220/BGPF13.tab.xz", package = "chem16S")
RDP.GTDB <- read_RDP(RDPfile.GTDB)
# These use the default option of refdb = "GTDB_220"
map.GTDB <- map_taxa(RDP.GTDB)
metrics.GTDB <- get_metrics(RDP.GTDB, map.GTDB)
# Plot Zc from GTDB vs RefSeq
xylim <- range(metrics$Zc, metrics.GTDB$Zc)
plot(metrics$Zc, metrics.GTDB$Zc, xlim = xylim, ylim = xylim, type = "n")
lines(xylim, xylim, lty = 2, col = 8)
points(metrics$Zc, metrics.GTDB$Zc, pch = mdat$metadata$pch, bg = mdat$metadata$col)
md.leg <- mdat$metadata[1:2, ]
legend("bottomright", md.leg$domain, pch = md.leg$pch, pt.bg = md.leg$col)
title(quote(italic(Z)[C]~"from GTDB vs RefSeq"))
# To exclude tryptophan, tyrosine, and phenylalanine
# from the calculation of chemical metrics
get_metrics(RDP.GTDB, map.GTDB, zero_AA = c("Trp", "Tyr", "Phe"))