JMDplots vignettes

Water activity and redox potential in evolution and development (2022)

This vignette runs the code to make the plots from the following paper first published by Springer Nature:

Dick JM. 2022. A thermodynamic model for water activity and redox potential in evolution and development. Journal of Molecular Evolution 90(2): 182–199. doi: 10.1007/s00239-022-10051-7

Use this link for full-text access to a view-only version of the paper: https://rdcu.be/cITho. A preprint of the paper is available on bioRxiv at doi: 10.1101/2021.01.29.428804.

On 2023-12-18, Figure 3a was modified from the original publication to use chemical metrics computed from the sum of amino acid compositions of proteins in each gene age category. The original publication used mean values of pre-computed chemical metrics for all proteins in each gene age category. The tables of chemical metrics for all proteins were removed to save space in the current version of the package; they remain available in the Zenodo archive up to JMDplots version 1.2.18 (https://doi.org/10.5281/zenodo.8207128). Compared to the original publication, the summation of amino acid compositions gives greater weight to longer proteins. The lines shift somewhat because of this revision, but the overall trends are unchanged.

This vignette was compiled on 2024-03-11 with JMDplots 1.2.19-11, CHNOSZ 2.1.0-5, and canprot 1.1.2-39.

To reduce running time, the plots in this vignette are made with 99 bootstrap replicates. To reproduce the plots in the paper, the value of boot.R in the function calls below should be changed to 999.

library(JMDplots)

Comparison of different sets of basis species (Figure 1)

evdevH2O1()

Data source: UniProt reference protoemes (https://uniprot.org).

Protein length and chemical metrics for phylostratigraphic age groups (Figure 2)

evdevH2O2(boot.R = 99)

Data sources: Phylostrata are from Trigos et al. (2017). Consensus gene ages are from Liebeskind et al. (2016).

Evolution of protein ZC in eukaryotic lineages (Figure 3)

evdevH2O3()

Data sources: a Consensus gene ages are from Liebeskind et al. (2016). Divergence times of human lineage are from Kumar et al. (2017). b Amino acid compositions of homology groups for Pfam domains are from James et al. (2020) and James et al. (2021).

MaximAct: Thermodynamic analysis of optimal logaH2O and logfO2 for target proteins (Figure 4)

evdevH2O4()

Optimal logaH2O and logfO2 and virtual Eh for target proteins (Figure 5)

evdevH2O5()

Data sources: Blood plasma and subcellular redox potentials (EGSH) are from Jones and Sies (2015) and Schwarzländer et al. (2016).

Chemical metrics for and thermodynamic parameters with different background proteomes (Figure 6)

evdevH2O6()

Chemical and thermodynamic analysis of B. subtilis biofilm transcriptome and proteome (Figure 7)

evdevH2O7(boot.R = 99)

Data source: Transcriptomic and proteomic data are from Futo et al. (2021).

Organismal water content, proteomic nH2O, and optimal logaH2O for fruit fly development (Figure 8)

evdevH2O8(boot.R = 99)

Data sources: a Whole-organism water content is from Church and Robertson (1966). b Stoichiometric hydration state of proteins is calculated in this study using proteomic data from Casas-Vila et al. (2017). d ZC and nH2O of differentially expressed proteins in embryos and adult flies is calculated in this study using proteomic data from Fabre et al. (2019).

Specific values mentioned in the text

Total and unmapped numbers of genes in Trigos et al. (2017) phylostrata dataset

## Total number of genes: 17318
## Total gene count in each phylostratum:
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1761 4675  283 1986 1191 1116  319 2773  554  526  591  764  126  241  377   35
## Genes not mapped to UniProt: 169
## Unmapped genes in each phylostratum:
## 
##  1  2  3  4  5  6  7  8  9 10 12 13 14 15 16 
##  7 10  1  2  2  3  2  6  2  5  3  3 21 92 10

Calculating ZC of chicken egg-white lysozyme from the chemical formula

(pf <- protein.formula("LYSC_CHICK"))
##              C   H   N   O  S
## LYSC_CHICK 613 959 193 185 10
CHNOSZ::ZC(pf)
## [1] 0.01631321

Calculating ZC of lysozyme from the amino acid composition

(aa <- pinfo(pinfo("LYSC_CHICK")))
##   protein organism     ref  abbrv chains Ala Cys Asp Glu Phe Gly His Ile Lys
## 6    LYSC    CHICK UniProt P00698      1  12   8   7   2   3  12   1   6   6
##   Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr
## 6   8   2  14   2   3  11  10   7   6   6   3
canprot::Zc(aa)
##          6 
## 0.01631321

Per-residue chemical formula, formation reaction, equilibrium constant, and activity product

LYSC_example()
## Per-residue chemical formula of LYSC_CHICK
## [1] "C4.752H7.434N1.496O1.434S0.078"
## Stoichiometry of formation reaction
##        coeff          name                        formula
## 3508  1.0000  LYSC_residue C4.752H7.434N1.496O1.434S0.078
## 1723 -0.5144     glutamine                      C5H10N2O3
## 1724 -0.3892 glutamic acid                        C5H9NO4
## 1721 -0.0780      cysteine                       C3H7NO2S
## 1     0.8794         water                            H2O
## 2679  0.4713        oxygen                             O2
## logK of formation reaction (25 °C, 1 bar)
## [1] -39.84
## logQ of formation reaction (logfO2 = -70, logaH2O = 0)
## [1] -29.33
## log(K/Q)
## [1] -10.52

Number of background proteins from the human proteome

TPPG17_file <- "extdata/evdevH2O/phylostrata/TPPG17.csv.xz"
LMM16_file <- "extdata/evdevH2O/phylostrata/LMM16.csv.xz"
TPPG17 <- read.csv(system.file(TPPG17_file, package = "JMDplots"), as.is = TRUE)
LMM16 <- read.csv(system.file(LMM16_file, package = "JMDplots"), as.is = TRUE)
UniProt_IDs <- na.omit(intersect(TPPG17$Entry, LMM16$UniProt))
length(UniProt_IDs)
## [1] 16723

Data sources: Trigos et al. (2017) (TPPG17) and Liebeskind et al. (2016) (LMM16).

References

Casas-Vila N, Bluhm A, Sayols S, Dinges N, Dejung M, Altenhein T, Kappei D, Altenhein B, Roignant J-Y, Butter F. 2017. The developmental proteome of Drosophila melanogaster. Genome Research 27(7): 1273–1285. doi: 10.1101/gr.213694.116

Church RB, Robertson FW. 1966. A biochemical study of the growth of Drosophila melanogaster. Journal of Experimental Zoology 162(3): 337–351. doi: 10.1002/jez.1401620309

Fabre B, Korona D, Lees JG, Lazar I, Livneh I, Brunet M, Orengo CA, Russell S, Lilley KS. 2019. Comparison of Drosophila melanogaster embryo and adult proteome by SWATH-MS reveals differential regulation of protein synthesis, degradation machinery, and metabolism modules. Journal of Proteome Research 18(6): 2525–2534. doi: 10.1021/acs.jproteome.9b00076

Futo M, Opašić L, Koska S, Čorak N, Široki T, Ravikumar V, Thorsell A, Lenuzzi M, Kifer D, Domazet-Lošo M, et al. 2021. Embryo-like features in developing Bacillus subtilis biofilms. Molecular Biology and Evolution 38(1): 31–47. doi: 10.1093/molbev/msaa217

James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. 2021. Universal and taxon-specific trends in protein sequences as a function of age. eLife 10: e57347. doi: 10.7554/eLife.57347

James J, Willis S, Nelson P, Weibel C, Kosinski L, Masel J. 2020. Data from: Universal and taxon-specific trends in protein sequences as a function of age. Figshare. doi: 10.6084/m9.figshare.12037281.v1

Jones DP, Sies H. 2015. The redox code. Antioxidants & Redox Signaling 23(9): 734–746. doi: 10.1089/ars.2015.6247

Kumar S, Stecher G, Suleski M, Hedges SB. 2017. TimeTree: A resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution 34(7): 1812–1819. doi: 10.1093/molbev/msx116

Liebeskind BJ, McWhite CD, Marcotte EM. 2016. Towards consensus gene ages. Genome Biology and Evolution 8(6): 1812–1823. doi: 10.1093/gbe/evw113

Schwarzländer M, Dick TP, Meyer AJ, Morgan B. 2016. Dissecting redox biology using fluorescent protein sensors. Antioxidants & Redox Signaling 24(13): 680–712. doi: 10.1089/ars.2015.6266

Trigos AS, Pearson RB, Papenfuss AT, Goode DL. 2017. Altered interactions between unicellular and multicellular genes drive hallmarks of transformation in a diverse range of solid tumors. Proceedings of the National Academy of Sciences 114(24): 6406–6411. doi: 10.1073/pnas.1617743114