orp16S {JMDplots} | R Documentation |
Plots from the paper by Dick and Meng (2023).
orp16S_1(pdf = FALSE)
orp16S_2(pdf = FALSE)
orp16S_3(pdf = FALSE)
orp16S_4(pdf = FALSE)
orp16S_5(pdf = FALSE)
orp16S_6(pdf = FALSE, EMP_primers = FALSE)
orp16S_S1(pdf = FALSE)
orp16S_S2(pdf = FALSE)
orp16S_info(study)
orp16S_T1(samesign = FALSE)
orp16S_D3(mincount = 100)
getmdat_orp16S(study, metrics = NULL, dropNA = TRUE, size = NULL, quiet = TRUE)
getmetrics_orp16S(study, mincount = 100, quiet = TRUE, ...)
pdf |
logical, make a PDF file? |
EMP_primers |
logical, include only datasets using Earth Microbime Project primers (515F/806R)? |
study |
character, study name |
samesign |
logical, only count slopes with same sign of minimum and maximum values in the 95% CI? |
mincount |
integer, samples with less than this number of RDP classifications are excluded |
metrics |
data frame, output of |
dropNA |
logical, exclude samples with NA name in metadata? |
size |
numeric, number of samples to randomly select |
quiet |
logical, change to FALSE to print details about data processing |
... |
additional arguments passed to |
This table gives a brief description of each plotting function.
orp16S_1 | Thermodynamic model for the relationship between carbon oxidation state of reference proteomes and redox potential |
orp16S_2 | ZC of reference proteomes compared with oxygen tolerance and with metaproteomes |
orp16S_3 | Methods overview and chemical depth profiles in Winogradsky columns |
orp16S_4 | Sample locations on world map and Eh-pH diagram (outline is from Baas Becking et al., 1960) |
orp16S_5 | Associations between Eh7 and ZC at local scales |
orp16S_6 | Associations between Eh7 and ZC at a global scale. Set EMP_primers to TRUE to make Figure S3. |
orp16S_S1 | ZC-Eh scatterplots for each dataset |
orp16S_S2 | Comparison of Eh7, Eh, and O2 as predictors of carbon oxidation state |
orp16S_S1
creates the files ‘EZdat.csv’ and ‘EZlm.csv’ that are used by the other plotting functions and correspond to Datasets S1 and D2 in the paper.
‘EZdat.csv’ has sample data and metadata: study key, environment type, lineage (Bacteria or Archaea), sample name, SRA or MG-RAST accession number, grouping description, sample group, values of T (°C), pH, O2 concentration (μmol/L), Eh (mV), Eh7 (mV), and ZC.
‘EZlm.csv’ has regression results for each dataset and domain: study key, environment type, lineage, number of samples, Eh7 range, slope (V-1) and intercept (ZC) of of linear regression, and Pearson correlation coefficient.
orp16S_T1
returns a table summarizing regression slopes for all datasets (based on ‘EZlm.csv’), corresponding to Table 1a in the paper.
orp16S_T1(samesign = TRUE)
generates the values listed in Table 1b in the paper.
orp16S_D3
creates a file with percent of RDP classifications to genus level mapped to the NCBI taxonomy, and genus names and Zc
for the groups of samples corresponding to the first and fourth quartile of Eh7 values (i.e., reducing and oxidizing conditions), corresponding to Dataset S3 in the paper.
getmdat_orp16S
gets metadata for the indicated study
, calculates Eh7 from Eh and pH and T (if available), classifies samples into 3 clusters according to their Eh values, and adds columns for plot parameters (‘pch’, ‘col’).
The function also adds a column named ‘O2_umol_L’, with O2 concentration in (μmol/L), which is converted from source units of O2 concentration as needed.
The default for dropNA
means to exclude samples with NA name in the metadata file.
If metrics
is supplied, metadata are kept only for samples with available metrics, the sample metrics are placed in the same order as the remaining metadata, and the function returns a list with both ‘metadata’ and the sorted ‘metrics’.
size
can be used to specify a number of samples (from among those with both metadata and metrics
, if given) to be randomly selected.
getmetrics_orp16S
calculates chemical metrics (ZC and nH2O) for the indicated study.
...
is used to supply argument to read_RDP
; note that the default of mincount = 100
was used for data processing in the paper (see plotEZ
).
orp16S_info
prints general information about a study (more specifically, a dataset): name, number of samples with Bacteria and Archaea (with mincount = 100
), bibligraphic key, range of T, pH, Eh, and Eh7, and whether the slope of the ZC-Eh7 correlations for Bacteria and Archaea (if available) are negative or positive.
The function also returns information in a data frame, which is used to fill in Table S1 of the paper.
Pipeline for sequence data processing (uses external programs fastq-dump, vsearch, seqtk, RDP Classifier, and GNU Parallel (Tange, 2023)).
Sample metadata for each study.
RDP Classifier results combined into a single CSV file for each study, created with the classify
and mkRDP
functions in ‘pipeline.R’.
Shapefiles for the North American Great Lakes, downloaded from USGS (2010).
Data for all samples, created by orp16S_S1
.
Linear fits between Eh7 and ZC for each dataset created by orp16S_S1
.
Outline of Eh-pH range of natural environments, digitized from Fig. 32 of Baas Becking et al. (1960).
List of strictly anaerobic and aerotolerant genera from Table S1 of Million and Raoult (2018).
File created by orp16S_D3
.
Amino acid composition of proteins in metaproteomic experiments.
Scripts to process metaproteomic data for amino acid compositions.
doc/orp16S.bib
Reference keys for all datasets (all of them are listed here; only some of them are cited in the vignette orp16S.Rmd).
Baas Becking LGM, Kaplan IR and Moore D (1960) Limits of the natural environment in terms of pH and oxidation-reduction potentials. Journal of Geology 68, 243–284. https://www.jstor.org/stable/30059218
Dick JM and Meng D (2023) Community- and genome-based evidence for a shaping influence of redox potential on bacterial protein evolution. mSystems 8, e00014-23. doi:10.1128/msystems.00014-23
Million M and Raoult D (2018) Linking gut redox to human microbiome. Human Microbiome Journal 10, 27–32. doi:10.1016/j.humic.2018.07.002
Tange O (2023) GNU Parallel 20230222 ('Gaziantep'). doi:10.5281/zenodo.7668338
USGS (2010) Great Lakes and Watersheds Shapefiles. ScienceBase Catalog, U.S. Geological Survey. doi:https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd
plotEZ
is used to make the ZC-Eh7 scatterplots and linear fits in orp16S_3
and orp16S_S1
.
# Methods outline and Winogradsky columns
orp16S_3()
# Sample locations and Eh-pH diagram
orp16S_4()
# Summarize results for individual datasets
orp16S_T1()
# "Raw" metrics with lots of messages printed
# NOTE: the messages include values for the last two columns of Table S2 of the paper
# (% Classification to genus level and % Mapping to NCBI taxonomy)
getmetrics_orp16S("MLL+19", quiet = FALSE)
# Metrics with associated metadata
metrics <- getmetrics_orp16S("SPA+21")
mdat <- getmdat_orp16S("SPA+21", metrics = metrics)
# This has 'metadata' and 'metrics' data frames with same number of rows
str(mdat)
stopifnot(length(unique(sapply(mdat, nrow))) == 1)
# Printing info about one dataset
orp16S_info("SPA+21")
# Some keys have suffixes to retrieve subsets of data (see code of getmdat_orp16S)
orp16S_info("PCL+18_Acidic")
orp16S_info("PCL+18_Alkaline")
# There's no plotmet_orp16S, but this is what it would look like in one line
plot_metrics(getmdat_orp16S("MLL+19", getmetrics_orp16S("MLL+19")))
## Not run:
# Get unexported object with datasets in each environment type
envirotype <- JMDplots:::envirotype
# Function to get info for all datasets in one environment type
envinfo <- function(ienv) do.call(rbind, lapply(envirotype[[ienv]], orp16S_info))
# Get info for all datasets in all environment types
allenv <- lapply(1:7, envinfo)
# Put in names of environments
cbindenv <- function(a, b) cbind(envirotype = a, b)
allenv <- mapply(cbindenv, names(envirotype)[1:7], allenv, SIMPLIFY = FALSE)
allenv <- do.call(rbind, c(allenv, make.row.names = FALSE))
# Now write to CSV file for copying to Table S1
write.csv(allenv, "Table_S1_values.csv", row.names = FALSE)
## End(Not run)
# Check that references for all studies are available
mdatdir <- system.file("extdata/orp16S/metadata", package = "JMDplots")
studies <- sapply(strsplit(dir(mdatdir), ".", fixed = TRUE), "[", 1)
bibfile <- system.file("doc/orp16S.bib", package = "JMDplots")
bibentry <- bibtex::read.bib(bibfile)
stopifnot(all(studies %in% names(bibentry)))