orp16S {JMDplots}R Documentation

Influence of redox potential on bacterial protein evolution

Description

Plots from the paper by Dick and Meng (2023).

Usage

  orp16S_1(pdf = FALSE)
  orp16S_2(pdf = FALSE)
  orp16S_3(pdf = FALSE)
  orp16S_4(pdf = FALSE)
  orp16S_5(pdf = FALSE)
  orp16S_6(pdf = FALSE, EMP_primers = FALSE)

  orp16S_S1(pdf = FALSE)
  orp16S_S2(pdf = FALSE)

  orp16S_info(study)
  orp16S_T1(samesign = FALSE)
  orp16S_D3(mincount = 100)

  getmdat_orp16S(study, metrics = NULL, dropNA = TRUE, size = NULL, quiet = TRUE)
  getmetrics_orp16S(study, mincount = 100, quiet = TRUE, ...)

Arguments

pdf

logical, make a PDF file?

EMP_primers

logical, include only datasets using Earth Microbime Project primers (515F/806R)?

study

character, study name

samesign

logical, only count slopes with same sign of minimum and maximum values in the 95% CI?

mincount

integer, samples with less than this number of RDP classifications are excluded

metrics

data frame, output of get_metrics

dropNA

logical, exclude samples with NA name in metadata?

size

numeric, number of samples to randomly select

quiet

logical, change to FALSE to print details about data processing

...

additional arguments passed to read_RDP

Details

This table gives a brief description of each plotting function.

orp16S_1 Thermodynamic model for the relationship between carbon oxidation state of reference proteomes and redox potential
orp16S_2 ZC of reference proteomes compared with oxygen tolerance and with metaproteomes
orp16S_3 Methods overview and chemical depth profiles in Winogradsky columns
orp16S_4 Sample locations on world map and Eh-pH diagram (outline is from Baas Becking et al., 1960)
orp16S_5 Associations between Eh7 and ZC at local scales
orp16S_6 Associations between Eh7 and ZC at a global scale. Set EMP_primers to TRUE to make Figure S3.
orp16S_S1 ZC-Eh scatterplots for each dataset
orp16S_S2 Comparison of Eh7, Eh, and O2 as predictors of carbon oxidation state

orp16S_S1 creates the files ‘EZdat.csv’ and ‘EZlm.csv’ that are used by the other plotting functions and correspond to Datasets S1 and D2 in the paper. ‘EZdat.csv’ has sample data and metadata: study key, environment type, lineage (Bacteria or Archaea), sample name, SRA or MG-RAST accession number, grouping description, sample group, values of T (°C), pH, O2 concentration (μmol/L), Eh (mV), Eh7 (mV), and ZC. ‘EZlm.csv’ has regression results for each dataset and domain: study key, environment type, lineage, number of samples, Eh7 range, slope (V-1) and intercept (ZC) of of linear regression, and Pearson correlation coefficient.

orp16S_T1 returns a table summarizing regression slopes for all datasets (based on ‘EZlm.csv’), corresponding to Table 1a in the paper. orp16S_T1(samesign = TRUE) generates the values listed in Table 1b in the paper. orp16S_D3 creates a file with percent of RDP classifications to genus level mapped to the NCBI taxonomy, and genus names and Zc for the groups of samples corresponding to the first and fourth quartile of Eh7 values (i.e., reducing and oxidizing conditions), corresponding to Dataset S3 in the paper.

getmdat_orp16S gets metadata for the indicated study, calculates Eh7 from Eh and pH and T (if available), classifies samples into 3 clusters according to their Eh values, and adds columns for plot parameters (‘⁠pch⁠’, ‘⁠col⁠’). The function also adds a column named ‘⁠O2_umol_L⁠’, with O2 concentration in (μmol/L), which is converted from source units of O2 concentration as needed. The default for dropNA means to exclude samples with NA name in the metadata file. If metrics is supplied, metadata are kept only for samples with available metrics, the sample metrics are placed in the same order as the remaining metadata, and the function returns a list with both ‘⁠metadata⁠’ and the sorted ‘⁠metrics⁠’. size can be used to specify a number of samples (from among those with both metadata and metrics, if given) to be randomly selected.

getmetrics_orp16S calculates chemical metrics (ZC and nH2O) for the indicated study. ... is used to supply argument to read_RDP; note that the default of mincount = 100 was used for data processing in the paper (see plotEZ).

orp16S_info prints general information about a study (more specifically, a dataset): name, number of samples with Bacteria and Archaea (with mincount = 100), bibligraphic key, range of T, pH, Eh, and Eh7, and whether the slope of the ZC-Eh7 correlations for Bacteria and Archaea (if available) are negative or positive. The function also returns information in a data frame, which is used to fill in Table S1 of the paper.

Files in extdata/orp16S

pipeline.R

Pipeline for sequence data processing (uses external programs fastq-dump, vsearch, seqtk, RDP Classifier, and GNU Parallel (Tange, 2023)).

metadata/*.csv

Sample metadata for each study.

RDP/*.csv.xz

RDP Classifier results combined into a single CSV file for each study, created with the classify and mkRDP functions in ‘pipeline.R’.

hydro_p

Shapefiles for the North American Great Lakes, downloaded from USGS (2010).

EZdat.csv

Data for all samples, created by orp16S_S1.

EZlm.csv

Linear fits between Eh7 and ZC for each dataset created by orp16S_S1.

BKM60.csv

Outline of Eh-pH range of natural environments, digitized from Fig. 32 of Baas Becking et al. (1960).

MR18_Table_S1.csv

List of strictly anaerobic and aerotolerant genera from Table S1 of Million and Raoult (2018).

Dataset_S3.csv

File created by orp16S_D3.

metaproteome/*/*_aa.csv

Amino acid composition of proteins in metaproteomic experiments.

metaproteome/*/mkaa.R

Scripts to process metaproteomic data for amino acid compositions.

Other files

doc/orp16S.bib

Reference keys for all datasets (all of them are listed here; only some of them are cited in the vignette orp16S.Rmd).

References

Baas Becking LGM, Kaplan IR and Moore D (1960) Limits of the natural environment in terms of pH and oxidation-reduction potentials. Journal of Geology 68, 243–284. https://www.jstor.org/stable/30059218

Dick JM and Meng D (2023) Community- and genome-based evidence for a shaping influence of redox potential on bacterial protein evolution. mSystems 8, e00014-23. doi:10.1128/msystems.00014-23

Million M and Raoult D (2018) Linking gut redox to human microbiome. Human Microbiome Journal 10, 27–32. doi:10.1016/j.humic.2018.07.002

Tange O (2023) GNU Parallel 20230222 ('Gaziantep'). doi:10.5281/zenodo.7668338

USGS (2010) Great Lakes and Watersheds Shapefiles. ScienceBase Catalog, U.S. Geological Survey. doi:https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd

See Also

plotEZ is used to make the ZC-Eh7 scatterplots and linear fits in orp16S_3 and orp16S_S1.

Examples

# Methods outline and Winogradsky columns
orp16S_3()

# Sample locations and Eh-pH diagram
orp16S_4()

# Summarize results for individual datasets
orp16S_T1()

# "Raw" metrics with lots of messages printed
# NOTE: the messages include values for the last two columns of Table S2 of the paper
# (% Classification to genus level and % Mapping to NCBI taxonomy)
getmetrics_orp16S("MLL+19", quiet = FALSE)
# Metrics with associated metadata
metrics <- getmetrics_orp16S("SPA+21")
mdat <- getmdat_orp16S("SPA+21", metrics = metrics)
# This has 'metadata' and 'metrics' data frames with same number of rows
str(mdat)
stopifnot(length(unique(sapply(mdat, nrow))) == 1)

# Printing info about one dataset
orp16S_info("SPA+21")
# Some keys have suffixes to retrieve subsets of data (see code of getmdat_orp16S)
orp16S_info("PCL+18_Acidic")
orp16S_info("PCL+18_Alkaline")

# There's no plotmet_orp16S, but this is what it would look like in one line
plot_metrics(getmdat_orp16S("MLL+19", getmetrics_orp16S("MLL+19")))

## Not run: 
# Get unexported object with datasets in each environment type
envirotype <- JMDplots:::envirotype
# Function to get info for all datasets in one environment type
envinfo <- function(ienv) do.call(rbind, lapply(envirotype[[ienv]], orp16S_info))
# Get info for all datasets in all environment types
allenv <- lapply(1:7, envinfo)
# Put in names of environments
cbindenv <- function(a, b) cbind(envirotype = a, b)
allenv <- mapply(cbindenv, names(envirotype)[1:7], allenv, SIMPLIFY = FALSE)
allenv <- do.call(rbind, c(allenv, make.row.names = FALSE))
# Now write to CSV file for copying to Table S1
write.csv(allenv, "Table_S1_values.csv", row.names = FALSE)

## End(Not run)

# Check that references for all studies are available
mdatdir <- system.file("extdata/orp16S/metadata", package = "JMDplots")
studies <- sapply(strsplit(dir(mdatdir), ".", fixed = TRUE), "[", 1)
bibfile <- system.file("doc/orp16S.bib", package = "JMDplots")
bibentry <- bibtex::read.bib(bibfile)
stopifnot(all(studies %in% names(bibentry)))

[Package JMDplots version 1.2.19-14 Index]