R: Influence of redox potential on bacterial protein evolution

orp16S {JMDplots}

R Documentation

Influence of redox potential on bacterial protein evolution

Description

Plots from the paper by Dick and Meng (2023).

Usage

  orp16S_1(pdf = FALSE)
  orp16S_2(pdf = FALSE)
  orp16S_3(pdf = FALSE)
  orp16S_4(pdf = FALSE)
  orp16S_5(pdf = FALSE)
  orp16S_6(pdf = FALSE, EMP_primers = FALSE)

  orp16S_S1(pdf = FALSE)
  orp16S_S2(pdf = FALSE)

  orp16S_info(study)
  orp16S_T1(samesign = FALSE)
  orp16S_D3(mincount = 100)

  getmdat_orp16S(study, metrics = NULL, dropNA = TRUE, size = NULL, quiet = TRUE)
  getmetrics_orp16S(study, mincount = 100, quiet = TRUE, ...)

Arguments

`pdf`	logical, make a PDF file?
`EMP_primers`	logical, include only datasets using Earth Microbime Project primers (515F/806R)?
`study`	character, study name
`samesign`	logical, only count slopes with same sign of minimum and maximum values in the 95% CI?
`mincount`	integer, samples with less than this number of RDP classifications are excluded
`metrics`	data frame, output of `get_metrics`
`dropNA`	logical, exclude samples with NA name in metadata?
`size`	numeric, number of samples to randomly select
`quiet`	logical, change to FALSE to print details about data processing
`...`	additional arguments passed to `read_RDP`

Details

This table gives a brief description of each plotting function.

`orp16S_1`	Thermodynamic model for the relationship between carbon oxidation state of reference proteomes and redox potential
`orp16S_2`	Z_C of reference proteomes compared with oxygen tolerance and with metaproteomes
`orp16S_3`	Methods overview and chemical depth profiles in Winogradsky columns
`orp16S_4`	Sample locations on world map and Eh-pH diagram (outline is from Baas Becking et al., 1960)
`orp16S_5`	Associations between Eh7 and Z_C at local scales
`orp16S_6`	Associations between Eh7 and Z_C at a global scale. Set `EMP_primers` to TRUE to make Figure S3.
`orp16S_S1`	Z_C-Eh scatterplots for each dataset
`orp16S_S2`	Comparison of Eh7, Eh, and O₂ as predictors of carbon oxidation state

orp16S_S1 creates the files ‘EZdat.csv’ and ‘EZlm.csv’ that are used by the other plotting functions and correspond to Datasets S1 and D2 in the paper. ‘EZdat.csv’ has sample data and metadata: study key, environment type, lineage (Bacteria or Archaea), sample name, SRA or MG-RAST accession number, grouping description, sample group, values of T (°C), pH, O₂ concentration (μmol/L), Eh (mV), Eh7 (mV), and Z_C. ‘EZlm.csv’ has regression results for each dataset and domain: study key, environment type, lineage, number of samples, Eh7 range, slope (V^-1) and intercept (Z_C) of of linear regression, and Pearson correlation coefficient.

orp16S_T1 returns a table summarizing regression slopes for all datasets (based on ‘EZlm.csv’), corresponding to Table 1a in the paper. orp16S_T1(samesign = TRUE) generates the values listed in Table 1b in the paper. orp16S_D3 creates a file with percent of RDP classifications to genus level mapped to the NCBI taxonomy, and genus names and Zc for the groups of samples corresponding to the first and fourth quartile of Eh7 values (i.e., reducing and oxidizing conditions), corresponding to Dataset S3 in the paper.

getmdat_orp16S gets metadata for the indicated study, calculates Eh7 from Eh and pH and T (if available), classifies samples into 3 clusters according to their Eh values, and adds columns for plot parameters (‘⁠pch⁠’, ‘⁠col⁠’). The function also adds a column named ‘⁠O2_umol_L⁠’, with O₂ concentration in (μmol/L), which is converted from source units of O₂ concentration as needed. The default for dropNA means to exclude samples with NA name in the metadata file. If metrics is supplied, metadata are kept only for samples with available metrics, the sample metrics are placed in the same order as the remaining metadata, and the function returns a list with both ‘⁠metadata⁠’ and the sorted ‘⁠metrics⁠’. size can be used to specify a number of samples (from among those with both metadata and metrics, if given) to be randomly selected.

getmetrics_orp16S calculates chemical metrics (Z_C and n_H₂O) for the indicated study. ... is used to supply argument to read_RDP; note that the default of mincount = 100 was used for data processing in the paper (see plotEZ).

orp16S_info prints general information about a study (more specifically, a dataset): name, number of samples with Bacteria and Archaea (with mincount = 100), bibligraphic key, range of T, pH, Eh, and Eh7, and whether the slope of the Z_C-Eh7 correlations for Bacteria and Archaea (if available) are negative or positive. The function also returns information in a data frame, which is used to fill in Table S1 of the paper.

Files in extdata/orp16S

‘pipeline.R’: Pipeline for sequence data processing (uses external programs fastq-dump, vsearch, seqtk, RDP Classifier, and GNU Parallel (Tange, 2023)).
‘metadata/*.csv’: Sample metadata for each study.
‘RDP/*.csv.xz’: RDP Classifier results combined into a single CSV file for each study, created with the classify and mkRDP functions in ‘pipeline.R’.
‘hydro_p’: Shapefiles for the North American Great Lakes, downloaded from USGS (2010).
‘EZdat.csv’: Data for all samples, created by orp16S_S1.
‘EZlm.csv’: Linear fits between Eh7 and Z_C for each dataset created by orp16S_S1.
‘BKM60.csv’: Outline of Eh-pH range of natural environments, digitized from Fig. 32 of Baas Becking et al. (1960).
‘MR18_Table_S1.csv’: List of strictly anaerobic and aerotolerant genera from Table S1 of Million and Raoult (2018).
‘Dataset_S3.csv’: File created by orp16S_D3.
‘metaproteome/*/*_aa.csv’: Amino acid composition of proteins in metaproteomic experiments.
‘metaproteome/*/mkaa.R’: Scripts to process metaproteomic data for amino acid compositions.

Other files

doc/orp16S.bib: Reference keys for all datasets (all of them are listed here; only some of them are cited in the vignette orp16S.Rmd).

References

Baas Becking LGM, Kaplan IR and Moore D (1960) Limits of the natural environment in terms of pH and oxidation-reduction potentials. Journal of Geology 68, 243–284. https://www.jstor.org/stable/30059218

Dick JM and Meng D (2023) Community- and genome-based evidence for a shaping influence of redox potential on bacterial protein evolution. mSystems 8, e00014-23. doi:10.1128/msystems.00014-23

Million M and Raoult D (2018) Linking gut redox to human microbiome. Human Microbiome Journal 10, 27–32. doi:10.1016/j.humic.2018.07.002

Tange O (2023) GNU Parallel 20230222 ('Gaziantep'). doi:10.5281/zenodo.7668338

USGS (2010) Great Lakes and Watersheds Shapefiles. ScienceBase Catalog, U.S. Geological Survey. doi:https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd

Examples

# Methods outline and Winogradsky columns
orp16S_3()

# Sample locations and Eh-pH diagram
orp16S_4()

# Summarize results for individual datasets
orp16S_T1()

# "Raw" metrics with lots of messages printed
# NOTE: the messages include values for the last two columns of Table S2 of the paper
# (% Classification to genus level and % Mapping to NCBI taxonomy)
getmetrics_orp16S("MLL+19", quiet = FALSE)
# Metrics with associated metadata
metrics <- getmetrics_orp16S("SPA+21")
mdat <- getmdat_orp16S("SPA+21", metrics = metrics)
# This has 'metadata' and 'metrics' data frames with same number of rows
str(mdat)
stopifnot(length(unique(sapply(mdat, nrow))) == 1)

# Printing info about one dataset
orp16S_info("SPA+21")
# Some keys have suffixes to retrieve subsets of data (see code of getmdat_orp16S)
orp16S_info("PCL+18_Acidic")
orp16S_info("PCL+18_Alkaline")

# There's no plotmet_orp16S, but this is what it would look like in one line
plot_metrics(getmdat_orp16S("MLL+19", getmetrics_orp16S("MLL+19")))

## Not run: 
# Get unexported object with datasets in each environment type
envirotype <- JMDplots:::envirotype
# Function to get info for all datasets in one environment type
envinfo <- function(ienv) do.call(rbind, lapply(envirotype[[ienv]], orp16S_info))
# Get info for all datasets in all environment types
allenv <- lapply(1:7, envinfo)
# Put in names of environments
cbindenv <- function(a, b) cbind(envirotype = a, b)
allenv <- mapply(cbindenv, names(envirotype)[1:7], allenv, SIMPLIFY = FALSE)
allenv <- do.call(rbind, c(allenv, make.row.names = FALSE))
# Now write to CSV file for copying to Table S1
write.csv(allenv, "Table_S1_values.csv", row.names = FALSE)

## End(Not run)

# Check that references for all studies are available
mdatdir <- system.file("extdata/orp16S/metadata", package = "JMDplots")
studies <- sapply(strsplit(dir(mdatdir), ".", fixed = TRUE), "[", 1)
bibfile <- system.file("doc/orp16S.bib", package = "JMDplots")
bibentry <- bibtex::read.bib(bibfile)
stopifnot(all(studies %in% names(bibentry)))

[Package JMDplots version 1.2.19-14 Index]