evdevH2O {JMDplots} | R Documentation |
Plots from the paper by Dick (2022).
evdevH2O1(pdf = FALSE)
evdevH2O2(pdf = FALSE, boot.R = 99)
evdevH2O3(pdf = FALSE, H2O = FALSE)
evdevH2O4(pdf = FALSE)
evdevH2O5(pdf = FALSE)
evdevH2O6(pdf = FALSE)
evdevH2O7(pdf = FALSE, boot.R = 99)
evdevH2O8(pdf = FALSE, boot.R = 99)
runMaximAct(dataset = "TPPG17", seed = 1:100, nbackground = 2000,
res = 256, bg_organism = "Hsa", writeFiles = TRUE)
LYSC_example()
pdf |
logical, make a PDF file? |
boot.R |
integer, number of bootstrap replicates |
H2O |
logical, make plots for nH2O instead of ZC? |
dataset |
character, dataset to use for constructing target proteins |
seed |
numeric, seeds to use for subsampling background proteins from human proteome |
nbackground |
numeric, number of background proteins to sample |
res |
numeric, grid resolution for logfO2 and logaH2O |
writeFiles |
logical, creates CSV and PNG output files? |
bg_organism |
character, use background proteins from this organism. |
This table gives a brief description of each plotting function.
evdevH2O1 | Comparison of different sets of basis species |
evdevH2O2 | Protein length and chemical metrics for phylostratigraphic age groups |
evdevH2O3 | Evolution of protein ZC in eukaryotic lineages |
evdevH2O4 | Thermodynamic analysis of optimal logaH2O and logfO2 for target proteins |
evdevH2O5 | Optimal logaH2O and logfO2 and virtual Eh for target proteins |
evdevH2O6 | Chemical metrics for and thermodynamic parameters with different background proteomes |
evdevH2O7 | Chemical and thermodynamic analysis of B. subtilis biofilm transcriptome and proteome |
evdevH2O8 | Organismal water content, proteomic nH2O, and optimal logaH2O for fruit fly development |
runMaximAct
sets up calculations with MaximAct
to compute the optimal logaH2O and logfO2 that maximize the equilibrium activity of target proteins against a proteomic background.
The available dataset
s are described below.
Phylostrata from Trigos et al., 2017; target proteins are mean amino acid compositions for each phylostratum.
Gene ages from Liebeskind et al., 2016; target proteins are mean amino acid compositions for each gene age.
Futo et al., 2021; target proteins are mean amino acid composiitons for each stage of biofilm development, weighted by gene or protein expression values.
Differentially expressed proteins from Fabre et al., 2019; target proteins are all proteins (i.e. not mean compositions) with higher expression in embryo or adult flies.
Fly developmental proteome from Casas-Vila et al., 2017; target proteins are mean amino acid composition for each time point.
bg_organism
can be one of ‘Hsa’ (Homo sapiens), ‘Sce’ (Saccharomyces cerevisiae), ‘Eco’ (Escherichia coli), ‘Mja’ (Methanocaldococcus jannaschii), ‘Dme’ (Drosophila melanogaster), or ‘Bsu’ (Bacillus subtilis).
These are used to retrieve the background proteins used for the AA_background
argument in MaximAct
.
If writeFiles
is TRUE, a PNG plot file is created and optimal logfO2 or logaH2O values for each seed and protein are saved in CSV files named by pasting together the dataset
, bg_organism
, and ‘O2.csv’ or ‘H2O.csv’, separated by ‘_’.
LYSC_example
prints the chemical formula, formation reaction, equilibrium constant, and activity product for the example protein (LYSC_CHICK) used in the text.
runall.sh
Script to run runMaximAct
for different datasets and background proteomes.
[dataset]_H2O.csv
Output file with optimal values of logaH2O for each random seed.
[dataset]_O2.csv
Output file with optimal values of logfO2 for each random seed.
The output files are used in evdevH2O5
(thermodynamic analysis of phylostrata and gene ages) and evdevH2O7
(thermodynamic analysis of biofilm developmental stages).
To reduce package size, the PNG files output by runMaximAct
are not included here.
mkaa.R
Script to sum amino acid composition of all proteins in each gene age category for each organism in the consensus tables of Liebeskind et al. (2016). This script requires files available from UniProt and Zenodo (doi:10.5281/zenodo.51708); see comments in the script for details.
reference_proteomes.csv
IDs of UniProt reference proteomes for 31 organisms in Liebeskind et al. (2016) (input file for mkaa.R
).
modeAges_names.csv
Names of gene age categories for each organism (output by mkaa.R
).
modeAges_aa.csv
Summed amino acid composition for proteins in each gene age category (output by mkaa.R
).
mkAA.R
Script to assemble amino acid compositions from data files of James et al. (2021). This script requires files available from Figshare (doi:10.6084/m9.figshare.12037281.v1); see comments for details.
pfam_animal_nontrans_AA.csv.xz
, pfam_animal_trans_AA.csv.xz
, pfam_plant_nontrans_AA.csv.xz
, pfam_plant_trans_AA.csv.xz
Output files from the script.
Casas-Vilas N et al. (2017) The developmental proteome of Drosophila melanogaster. Genome Res. 27, 1273–1285. doi:10.1101/gr.213694.116
Dick JM (2022) A thermodynamic model for water activity and redox potential in evolution and developent. J. Mol. Evol 90, 182–199. doi:10.1007/s00239-022-10051-7
Fabre B, Korona D, Lees JG, Lazar I, Livneh I, Brunet M, Orengo CA, Russell S and Lilley KS (2019) Comparison of Drosophila melanogaster embryo and adult proteome by SWATH-MS reveals differential regulation of protein synthesis, degradation machinery, and metabolism modules. J. Proteome Res. 18, 2525–2534. doi:10.1021/acs.jproteome.9b00076
Futo M et al. (2021) Embryo-like features in developing Bacillus subtilis biofilms. Mol. Biol. Evol. 38, 31–47. doi:10.1093/molbev/msaa217
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ and Masel J (2021) Universal and taxon-specific trends in protein sequences as a function of age. eLife 10, e57347. doi:10.7554/eLife.57347
Liebeskind BJ, McWhite CD and Marcotte EM (2016) Towards consensus gene ages. Genome Biol. Evol. 8, 1812–1823. doi:10.1093/gbe/evw113
Trigos AS, Pearson RB, Papenfuss AT and Goode DL (2017) Altered interactions between unicellular and multicellular genes drive hallmarks of transformation in a diverse range of solid tumors. Proc. Natl. Acad. Sci. 114, 6406–6411. doi:10.1073/pnas.1617743114
devodata
for data sources for developmental proteomes.
PS
for functions used to retrieve phylostrata from UniProt IDs.
evdevH2O2()
# Start with a fresh plot device to run the next example
# (Something from above messes up split.screen())
dev.off()
MA <- runMaximAct("transcriptome", seed = 1:3, nbackground = 200, res = 15, writeFiles = FALSE)
# The logaH2O decreases at the end of the time course
H2O <- colMeans(MA$H2O)
stopifnot(H2O["14D"] > H2O["1M"])
stopifnot(H2O["1M"] > H2O["2M"])
# The final value of logaH2O is close to 0
stopifnot(round(H2O["2M"]) == 0)