This vignette introduces CHNOSZ, an R package for thermodynamic calculations relevant to geochemistry and geobiochemistry. CHNOSZ provides functions and a thermodynamic database for calculating properties of reactions involving minerals, aqueous species, and gases across a range of temperatures and pressures.
This vignette was compiled on 2025-05-09 with CHNOSZ 2.1.0-64.
After installing CHNOSZ from CRAN, load the package:
library(CHNOSZ)
This makes the thermodynamic database and functions available in your R session. To restore default settings at any point, use reset()
.
CHNOSZ offers several primary functions for thermodynamic analysis:
info()
: Search the thermodynamic databasesubcrt()
: Calculate thermodynamic properties of species and reactionsaffinity()
: Calculate affinities of formation reactionsequilibrate()
: Calculate equilibrium chemical activitiesdiagram()
: Plot the resultsThe info()
function provides access to the OBIGT thermodynamic database.
# Get database index for aqueous methane
info("CH4")
## info.character: found CH4(aq); also available in gas, liq
## [1] 925
When searching by formula, aqueous species are returned if they are available.
Use a species name or add the state to get a particular physical state - aq
, cr
, gas
, or liq
:
# Two ways to lookup methane gas
info("methane")
## info.character: found methane(gas); also available in liq
## [1] 2782
info("CH4", "gas")
## [1] 2782
Use info()
recursively to return thermodynamic parameters:
# Get thermodynamic properties for aqueous methane
info(info("CH4"))
## info.character: found CH4(aq); also available in gas, liq
## check.EOS: calculated Cp° of CH4(aq) differs by -2.61 cal K-1 mol-1 from database value
## name abbrv formula state ref1 ref2 date model E_units G H S
## 925 CH4 <NA> CH4 aq PS01 <NA> 2000-10-04 HKF cal -8140 -20930 21
## Cp V a1 a2 a3 a4 c1 c2 omega Z
## 925 60.23 36 1.769 -1530 -67.88 114700 40.87 64500 -40000 0
You can access fuzzy search functionality by using partial names:
# Search for ribose-related entries
info("ribose+")
## info.approx: 'ribose+' is ambiguous; has approximate matches to 8 species:
## [1] "ribose" "deoxyribose" "ribose-5-phosphate" ## [4] "ribose-5-phosphate-1" "ribose-5-phosphate-2" "ribose" ## [7] "deoxyribose" "ribose-5-phosphate"
## [1] NA
The subcrt()
function (loosely named after SUPCRT; Johnson et al., 1992Johnson JW, Oelkers EH, Helgeson HC. 1992. SUPCRT92: A software package for calculating the standard molal thermodynamic properties of minerals, gases, aqueous species, and reactions from 1 to 5000 bar and 0 to 1000°C. Computers & Geosciences 18(7): 899–947. doi: 10.1016/0098-3004(92)90029-Q) calculates standard thermodynamic properties:
# Properties of aqueous methane at default T and P
subcrt("CH4")
## info.character: found CH4(aq); also available in gas, liq
## subcrt: 1 species at 15 values of T (ºC) and P (bar) (wet) [energy units: J]
## $species
## name formula state ispecies model
## 925 CH4 CH4 aq 925 HKF
##
## $out
## $out$CH4
## T P rho logK G H S V
## 1 0.01 1.000000 0.9998289 6.146120 -32141.64 -94362.48 64.02537 28.43371
## 2 25.00 1.000000 0.9970614 5.966661 -34057.76 -87571.12 87.86400 36.32929
## 3 50.00 1.000000 0.9880295 5.898249 -36490.28 -81887.10 106.18347 40.21802
## 4 75.00 1.000000 0.9748643 5.903292 -39346.90 -76565.80 122.04828 42.65169
## 5 100.00 1.013220 0.9583926 5.960475 -42580.84 -71362.35 136.48265 44.45100
## 6 125.00 2.320144 0.9390726 6.055409 -46157.27 -66152.32 149.98108 45.99448
## 7 150.00 4.757169 0.9170577 6.179107 -50057.59 -60843.92 162.88237 47.51660
## 8 175.00 8.918049 0.8923427 6.325065 -54267.31 -55334.98 175.48219 49.23207
## 9 200.00 15.536499 0.8647434 6.488877 -58778.47 -49481.61 188.11556 51.41530
## 10 225.00 25.478603 0.8338733 6.667868 -63591.20 -43048.65 201.24975 54.51539
## 11 250.00 39.736493 0.7990719 6.861076 -68717.66 -35603.67 215.66282 59.40754
## 12 275.00 59.431251 0.7592362 7.069725 -74191.11 -26241.56 232.89165 68.08973
## 13 300.00 85.837843 0.7124075 7.298795 -80088.35 -12702.80 256.64410 86.10141
## 14 325.00 120.457572 0.6545772 7.562218 -86598.26 12491.49 298.92128 134.18673
## 15 350.00 165.211289 0.5746875 7.907228 -94333.67 93338.02 429.08080 355.39612
## Cp
## 1 320.2540
## 2 241.0868
## 3 217.6954
## 4 209.4693
## 5 207.5653
## 6 209.5788
## 7 215.2640
## 8 225.6772
## 9 243.4995
## 10 274.5815
## 11 332.5458
## 12 454.1118
## 13 764.2170
## 14 1893.9587
## 15 11384.8458
# Custom T,P grid for water in supercritical region
subcrt("H2O", T = c(400, 500), P = c(250, 300))
## info.character: found H2O(liq) [water]; also available in cr, gas
## subcrt: 1 species at 2 values of T (ºC) and P (bar) (wet) [energy units: J]
## $species
## name formula state ispecies model
## 1 water H2O liq 1 water.SUPCRT92
##
## $out
## $out$water
## T P rho logK G H S V Cp
## 1 400 250 0.1666279 21.47188 -276714.5 -241279.6 155.8919 108.1164 239.06019
## 2 500 300 0.1152589 19.74903 -292320.6 -232176.0 167.6886 156.3021 77.68281
Unit settings for subcrt()
are handled by T.units()
, P.units()
, and E.units()
:
# Change energy units from Joules to calories
E.units("cal")
## changed energy units to cal
subcrt("CH4", T = 25)
## info.character: found CH4(aq); also available in gas, liq
## subcrt: 1 species at 25 ºC and 1 bar (wet) [energy units: cal]
## $species
## name formula state ispecies model
## 925 CH4 CH4 aq 925 HKF
##
## $out
## $out$CH4
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 5.966661 -8140 -20930 21 36.32929 57.62112
reset() # Restore defaults
## reset: resetting "thermo" object
## OBIGT: loading default database with 2026 aqueous, 3580 total species
Define reactions with species names, states (optional), and coefficients:
# CO2 dissolution reaction
subcrt(c("CO2", "CO2"), c("gas", "aq"), c(-1, 1), T = 25)
## subcrt: 2 species at 25 ºC and 1 bar (wet) [energy units: J]
## $reaction
## coeff name formula state ispecies model
## 2764 -1 carbon dioxide CO2 gas 2764 CGL
## 690 1 CO2 CO2 aq 690 HKF
##
## $out
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 -1.468942 8384.736 -20288.22 -96.16924 32.63308 159.1761
Reactions can be automatically balanced using basis species:
subcrt()
for calculating standard thermodynamic properties.basis(c("CO2", "H2O", "H+", "e-"))
## C H O Z ispecies logact state
## CO2 1 0 2 0 690 0 aq
## H2O 0 2 1 0 1 0 liq
## H+ 0 1 0 1 3 0 aq
## e- 0 0 0 -1 2 0 aq
subcrt(c("CH4", "acetate"), c("aq", "aq"), c(1, -1), T = 25)
## subcrt: 2 species at 25 ºC and 1 bar (wet) [energy units: J]
## subcrt: reaction is not balanced; it is missing this composition:
## C H O Z ## 1 -1 2 -1
## subcrt: adding missing composition from basis definition and restarting...
## subcrt: 4 species at 25 ºC and 1 bar (wet) [energy units: J]
## $reaction
## coeff name formula state ispecies model
## 925 1 CH4 CH4 aq 925 HKF
## 1115 -1 acetate C2H3O2- aq 1115 HKF
## 690 1 CO2 CO2 aq 690 HKF
## 3 -1 H+ H+ aq 3 HKF
##
## $out
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 8.884021 -50710.08 -15355.28 119.244 28.86502 410.4372
species()
for calculating affinities and making diagrams.basis(c("CO2", "H2O", "H+", "e-"))
## C H O Z ispecies logact state
## CO2 1 0 2 0 690 0 aq
## H2O 0 2 1 0 1 0 liq
## H+ 0 1 0 1 3 0 aq
## e- 0 0 0 -1 2 0 aq
species(c("CH4", "acetate"))
## CO2 H2O H+ e- ispecies logact state name
## 1 1 -2 8 8 925 -3 aq CH4
## 2 2 -2 7 8 1115 -3 aq acetate
There are some keywords (e.g. CHNOS+
, CHNOSe
and QEC
) for loading predefined sets of basis species.
See the help page of basis()
for more information.
The affinity()
function calculates chemical affinities over ranges of T, P, and activities:
Eh-pH (Pourbaix) diagram for S
# Set up the C-H-N-O-S basis system with electron
basis("CHNOSe")
# Define aqueous sulfur species
species(c("SO4-2", "HSO4-", "HS-", "H2S"))
# Calculate affinities on an Eh-pH grid
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Create an Eh-pH (Pourbaix) diagram
diagram(a, col = 4, col.names = 4, italic = TRUE)
NOTE: diagram()
automatically adds shading to regions of water instability with respect to O2 or H2.
For more sophisticated diagrams involving speciation of basis species, use the mosaic()
function:
Mosaic diagram showing effect of aqueous S speciation on the relative stabilities of Cu minerals and aqueous species
# Create a mosaic diagram for Cu-S-Cl-H2O system
basis(c("Cu", "H2S", "Cl-", "H2O", "H+", "e-"))
basis("H2S", -6)
basis("Cl-", -1)
species(c("CuCl", "CuCl2-", "CuCl3-2", "CuCl+", "CuCl2", "CuCl3-", "CuCl4-2"))
species(c("chalcocite", "tenorite", "cuprite", "copper"), add = TRUE)
bases <- c("H2S", "HS-", "HSO4-", "SO4-2")
m <- mosaic(bases, pH = c(0, 12), Eh = c(-1, 1), T = 200)
diagram(m$A.species, lwd = 2, bold = species()$state == "cr")
diagram(m$A.bases, add = TRUE, col = 4, col.names = 4, italic = TRUE)
water.lines(m$A.species, lty = 3, col = 8)
Here we’ve added dotted lines to help visualize the water stability limits.
Calculate equilibrium distributions of species:
Carbonate speciation as a function of pH and temperature
# Carbonate speciation vs pH
basis(c("CO2", "H2O", "H+", "e-"))
species(c("CO2", "HCO3-", "CO3-2"))
# 25 degrees C
a <- affinity(pH = c(0, 14))
e <- equilibrate(a)
diagram(e, alpha = TRUE)
# 100 degrees C
a <- affinity(pH = c(4, 12), T = 100)
e <- equilibrate(a)
diagram(e, alpha = TRUE, add = TRUE, col = 2, names = NA)
Calculate solubility of minerals or gases:
Solubility of corundum (green line) and equilibrium concentrations of aqueous species (black lines)
# Corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = 3, lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")
Incorporate non-ideal behavior using the extended Debye-Hückel equation by setting the ionic strength parameter IS
:
Solubility of corundum dependent on ionic strength
# Corundum solubility again
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
# Calculate with ionic strength of 0 and 1 molal
s0 <- solubility(iaq, pH = c(2, 10))
s1 <- solubility(iaq, pH = c(2, 10), IS = 1)
diagram(s1, col = 4, lwd = 3, ylim = c(-8, -2))
diagram(s0, col = 3, lwd = 2, add = TRUE)
legend("topleft", legend = c(1, 0), lwd = c(3, 2), col = c(4, 3), title = "I (mol/kg)", bty = "n")
legend("topright", c("25 °C", "1 bar"), bty = "n")
Functions that have the IS
parameter are subcrt()
, affinity()
, mosaic()
, and solubility()
.
When a value for IS
is specified, inputs and outputs labeled as activities are actually interpreted or reported as molalities.
The CHNOSZ package incorporates data and methods from various sources. Use thermo.refs()
to view citation information for data sources:
# Return data frame with references for one or more species
thermo.refs(info("CH4", c("aq", "gas")))
# View all references in a browser
thermo.refs()
For citing CHNOSZ itself, see “How should CHNOSZ be cited?” in the FAQ.
The idea for creating stability diagrams in CHNOSZ came from Prof. Harold Helgeson’s geochemistry course at UC Berkeley. There, we constructed diagrams that were “balanced on” a metal. For instance, in a system balanced on aluminum, Al is only present in the minerals on both sides of the reaction and is not free as an ion.
The reaction-based method, used for making diagrams by hand, looks at reactions between pairs of species (let’s call them transformation reactions), then draws a line between stability fields where the non-standard Gibbs energy of reaction is zero. The grid-based method, used in CHNOSZ, looks at reactions to compose individual species from the basis species (let’s call them formation reactions), then selects the most stable species according to their affinity values.
Affinity is just the opposite of non-standard Gibbs energy of reaction. “Standard Gibbs energy of reaction” and “Gibbs energy of reaction” - which are two different things - have unfortunately similar names except for an optional “overall” or “non-standard” in front of the latter (the word choice varies among authors, e.g. Amend and LaRowe, 2019Amend JP, LaRowe DE. 2019. Mini-review: Demystifying microbial reaction energetics. Environmental Microbiology 21(10): 3539–3547. doi: 10.1111/1462-2920.14778; Solel et al., 2019Solel E, Tarannam N, Kozuch S. 2019. Catalysis: Energy is the measure of all things. Chemical Communications 55(37): 5306–5322. doi: 10.1039/C9CC00754G). “Non-standard Gibbs energy of reaction” doesn’t lend itself to a short, unambiguous function name, which is why its opposite, “affinity”, is used in CHNOSZ.
In the reaction-based method, transformation reactions are said to be “balanced on” a metal. The grid-based method implements this balancing constraint by dividing the affinities of formation reactions by the coefficients of a basis species. CHNOSZ uses these normalized affinities for making relative stability diagrams, referred to as the maximum affinity method. Both the reaction- and grid-based methods have the same limitation: every candidate species must have non-zero stoichiometry of a given metal (or of a basis species with that metal).
Having seen basic examples of the main features of CHNOSZ, here are some ideas for building more complex calculations and diagrams.
Labeling diagrams is an important but often difficult step for creating publication-ready figures.
For general information on formatting mathematical expressions in R, see the documentation for plotmath()
as well as bquote()
, which allows substituting variables into an equation.
CHNOSZ has several helper functions for creating labels.
axis.label()
and expr.species()
are used to create formatted axis labels and chemical formulas.
Let’s revisit the CO2 dissolution example seen earlier and add two other gases (carbon monoxide and methane).
This plot is similar to Figure 18 of Manning et al. (2013Manning CE, Shock EL, Sverjensky DA. 2013. The chemistry of carbon in aqueous fluids at crustal and upper-mantle conditions: Experimental and theoretical constraints. Reviews in Mineralogy and Geochemistry 75(1): 109–148. doi: 10.2138/rmg.2013.75.5).
Equilibrium constants of dissolution reactions
T <- seq(0, 350, 10)
CO2 <- subcrt(c("CO2", "CO2"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CO <- subcrt(c("CO", "CO"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CH4 <- subcrt(c("CH4", "CH4"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
logK <- data.frame(T, CO2, CO, CH4)
par(mar = c(5, 5, 1, 1))
matplot(logK[, 1], logK[, -1], type = "l", col = 1, lty = 1,
xlab = axis.label("T"), ylab = axis.label("logK"))
text(80, -1.7, expr.species("CO2"))
text(240, -2.37, expr.species("CO"))
text(300, -2.57, expr.species("CH4"))
See the help pages in CHNOSZ for additional functions for labeling diagrams, including describe.reaction()
to format a chemical reaction from the output of subcrt()
, and lT()
and related functions for compact representations of temperature and other variables for plot legends.
Want to find all the Al complexes in the database?
List them by calling retrieve()
with the main element optionally followed by the ligand elements and state.
# List aqueous Al species in the default database
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Print the first few rows and columns
info(iaq)[1:3, 1:5]
## name abbrv formula state ref1
## 739 Al+3 <NA> Al+3 aq TS01
## 740 Al(OH)4- <NA> Al(OH)4- aq TS01
## 741 AlOH+2 <NA> AlOH+2 aq TS01
# Use the species index or name in subcrt()
subcrt(iaq[1:3], T = 25)
## $species
## name formula state ispecies model
## 739 Al+3 Al+3 aq 739 HKF
## 740 Al(OH)4- Al(OH)4- aq 740 HKF
## 741 AlOH+2 AlOH+2 aq 741 HKF
##
## $out
## $out$`Al+3`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 85.40242 -487477.8 -538769.6 -339.7534 -45.33933 -119.3444
##
## $out$`Al(OH)4-`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 228.7613 -1305772 -1503075 103.5456 46.28513 96.5402
##
## $out$`AlOH+2`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 121.9904 -696322.2 -769864.3 -181.1254 -20.62274 -37.44297
#subcrt(names(iaq)[1:3], T = 25) # same as above
The result of retrieve()
can also be used to add species to a diagram; see the example in #3 below.
basis(c("Al+3", "H2O", "F-", "H+", "e-"))
species(iaq)
species(names(iaq)) # same as above
Perhaps you’d like to use data from older databases that have been superseded by later updates.
The OBIGT vignette briefly summarizes the superseded data for SUPCRT92 and SLOP98 (Shock and others, 1998Shock EL, others. 1998. slop98.dat: Sequential-access thermodynamic datafile used by PROGRAM supcrt92. Last updated on 1998-08-20. Available at https://doi.org/10.5281/zenodo.2630820. ).
Use add.OBIGT()
to load these old data entries.
add.OBIGT("SLOP98")
iaq_all <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Use difference between sets to get the added species
info(setdiff(iaq_all, iaq))
## name abbrv formula state ref1 ref2 date model E_units
## 3613 AlO2- AlO2- AlO2-1 aq SSWS97.4 <NA> 1997-11-07 HKF cal
## 3614 AlO+ AlO+ AlO+1 aq SSWS97.4 <NA> 1997-11-13 HKF cal
## 3615 HAlO2 HAlO2(AQ) HAlO2 aq SSWS97.4 <NA> 1997-11-13 HKF cal
## G H S Cp V a1 a2 a3 a4 c1
## 3613 -198693 -222125 -7.22 -11.9 10.0 0.37221 399.54 -1.5879 -29441 15.2391
## 3614 -158188 -170900 -27.00 -30.0 0.6 0.21705 -248.11 6.7241 -26763 -2.5983
## 3615 -207700 -227500 5.00 -50.0 13.0 0.35338 84.85 5.4132 -28140 -23.4129
## c2 omega Z
## 3613 -54585 174180 -1
## 3614 -91455 95700 1
## 3615 -132195 -3000 0
The convention for SUPCRT-family databases is to use anhydrous species. For example, AlO2- in SLOP98 corresponds to Al(OH)4- in the default database (see output above). They are effectively the same species, which is why only the latter (taken from a more extensive compilation for Al species properties; Tagirov and Schott, 2001Tagirov B, Schott J. 2001. Aluminum speciation in crustal fluids revisited. Geochimica et Cosmochimica Acta 65(21): 3965–3992. doi: 10.1016/S0016-7037(01)00705-0) is used in the default database. Unless you have a specific reason to compare them, redundant species should not be used in the same equilibrium calculation.
OBIGT()
restores the default database without affecting other settings.
reset()
restores the default database and all other settings in CHNOSZ.
These functions are useful for both interactive use and scripts that compare different versions of data or plots for different systems or conditions.
Let’s put items #1-4 together to remake the corundum solubility plot using only species available in SLOP98.
To do this, we use add.OBIGT()
followed by retrieve()
to gather the species indices for all Al species, then take only those species sourced from Shock et al. (1997Shock EL, Sassani DC, Willis M, Sverjensky DA. 1997. Inorganic species in geologic fluids: Correlations among standard molal thermodynamic properties of aqueous ions and hydroxide complexes. Geochimica et Cosmochimica Acta 61(5): 907–950. doi: 10.1016/S0016-7037(96)00339-0).
Corundum solubility with species from SLOP98
# Add superseded species from SLOP98
add.OBIGT("SLOP98")
# List all aqueous Al species
iaq <- retrieve("Al", state = "aq")
# Keep only species from Shock et al. (1997)
iaq <- iaq[grepl("SSWS97", info(iaq)$ref1)]
# Plot corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
## Alternatively, we could use the species names
#s <- solubility(names(iaq), pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = 3, lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")
# Reset the database for subsequent calculations
reset()
A common question is: what are the basis species used for? The basis species define the compositional variables that can be added to a diagram. In more precise terms, they define the thermodynamic components of a chemical system. The composition of any possible species in that system can be represented by a linear combination of the basis species.
CHNOSZ requires that the number of basis species is equal to the number of different elements in the basis species (plus charge, if present). If you were studying the relative stability of F- and OH-complexes with Al, you might be tempted to try this basis definition:
basis(c("Al+3", "F-", "OH-"))
## Error in put.basis(ispecies, logact): the number of basis species is less than the number of elements and charge
According to the message, we don’t have enough basis species for the number of elements. Since hydroxide (OH-) is just water minus a proton, we could try this instead:
basis(c("Al+3", "F-", "H+", "H2O"))
## Error in put.basis(ispecies, logact): the number of basis species is less than the number of elements and charge
That’s still not enough species. As is often the case, we need to include a basis species representing oxidation-reduction (redox) reactions, even if there are no redox reactions between the formed species. Here are two possible basis definitions that do not give an error.
# Use "oxygen" to get oxygen gas (for logfO2 diagrams)
basis(c("Al+3", "F-", "H+", "H2O", "oxygen"))
# Use "e-" to get aqueous electron (for Eh diagrams)
basis(c("Al+3", "F-", "H+", "H2O", "e-"))
In order to make a diagram with stability fields for different species, CHNOSZ needs to know about the activities of all the species in the reaction.
The activities of the basis species start with constant values as shown in the output above (logact
column).
Selected basis species can be assigned to plot axes (with a range of values) in affinity()
.
NOTE: logact
is the logarithm of activity for aqueous species, solids, and liquids, or logarithm of fugacity for gases.
How about the formed species in the system - that is, the species whose stability fields we want to visualize?
We both list the species and set their activities using species()
.
The function defaults to activities of 1e-3 (logact
of -3) for aqueous species and unit activity (logact
= 0) for minerals, gases, and liquids.
Let’s change this to activities of 1e-6 for the formed species.
basis(c("Al+3", "F-", "H+", "H2O", "e-"))
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Check that the data are from the same source
stopifnot(all(info(iaq)$ref1 == "TS01"))
species(iaq, -6)
This value for logact
defines a solubility contour, as we’ll see below.
There are two places where you might see add = TRUE
.
First, in species()
to add species not already in the list.
Without add = TRUE
, any existing species are discarded.
Second, in diagram()
to add to an existing plot.
Let’s put items #5-7 together to make a Pourbaix (Eh-pH) diagram for Al with two solubility contours.
Pourbaix diagram for Mn with two solubility contours
basis(c("Mn+2", "H+", "H2O", "e-"))
icr <- retrieve("Mn", ligands = c("H", "O"), state = "cr")
iaq <- retrieve("Mn", ligands = c("H", "O"), state = "aq")
# First layer, logact(aq) = -3
species(icr)
species(iaq, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
# Use names = NA to avoid plotting labels twice
diagram(a, lty = 2, names = NA)
# Second layer, logact(aq) = -4
species(icr)
species(iaq, -4, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
d <- diagram(a, bold = species()$state == "cr", add = TRUE)
# Add water stability limits
water.lines(d, lty = 3, col = 8)
# Add legends
legend("topright", legend = c(lT(100), lP("Psat")), bty = "n")
title = as.expression(quote(log~italic(a)*"Mn(aq)"))
legend("bottomleft", legend = c(-3, -4), lty = c(2, 1), title = title, bty = "n")
The shaded areas in the diagram represent water instability regions and are automatically added by diagram()
.
We use water.lines()
here to plot the water stability limits with dotted lines.
After defining the basis species and formed species (and their constant activities), you have some choices about what variables to put on the plot, the grid resolution, and values for a few other variables.
affinity()
accepts one or more named arguments that specifying ranges of variables using the default grid resolution of 256 (c(min, max)
) or ranges and a custom grid resolution (c(min, max, res)
).
The number of such arguments is the dimensionality of the final plot.
The grid resolution (res
) defaults to 256 and can be different for each variable.
The names of the variables can be the formulas of any of the basis species, or T
, P
, or IS
for temperature, pressure, or ionic strength.
These last three default to 25 °C, Psat
(1 bar below 100 °C and saturation pressure at higher temperatures), and 0 mol/kg.
I often start with a low grid resolution to quickly iterate a calculation, then switch to a higher resolution when I’m satisfied with the result.
Sodium chloride (NaCl) solutions are commonly used reference points for geochemical models.
The NaCl()
function provides a quick-and-dirty way to estimate ionic strength and activity of chloride (Cl-) for a given total amount of NaCl added to 1 kg of H2O.
These values can then be used in setting up a calculation that involves these variables.
This function does not use either the basis()
or species()
definitions.
The following example runs a calculation for 0.8 mol/kg NaCl and given T, P, and pH.
See demo('sum_S')
for the fully worked-out example that uses this code (based on a diagram in Skirrow and Walshe, 2002Skirrow RG, Walshe JL. 2002. Reduced and oxidized Au-Cu-Bi iron oxide deposits of the Tennant Creek inlier, Australia: An integrated geologic and chemical model. Economic Geology 97(6): 1167–1202. doi: 10.2113/gsecongeo.97.6.1167).
T <- 300
P <- 1000
pH <- 5
m_NaCl = 0.8
NaCl <- NaCl(m_NaCl = m_NaCl, T = T, P = P, pH = pH)
print(paste("mol NaCl added to 1 kg H2O:", m_NaCl))
## [1] "mol NaCl added to 1 kg H2O: 0.8"
print(paste("Ionic strength (mol/kg):", NaCl$IS))
## [1] "Ionic strength (mol/kg): 0.607065915465928"
print(paste("Chloride concentration (mol/kg):", NaCl$m_Clminus))
## [1] "Chloride concentration (mol/kg): 0.607061333179561"
There are many uses for “composite diagrams” (Garrels and Christ, 1965Garrels RM, Christ CL. 1965. Solutions, Minerals, and Equilibria. New York: Harper & Row. Available at https://search.worldcat.org/title/517586. ), where stability fields for minerals and predominance fields for aqueous species are both present. As mentioned above (#6), setting the activities of formed aqueous species defines a single solubility contour. This represents a concentration-dependent boundary between minerals and aqueous species on a composite diagram, a concept referred to either as “equisolubility” (Pourbaix, 1974Pourbaix M. 1974. Atlas of Electrochemical Equilibria in Aqueous Solutions. 2nd ed. Houston, Texas; Brussels: National Association of Corrosion Engineers; CEBELCOR. Available at https://search.worldcat.org/title/563921897. ) or “isosolubility” (Helgeson, 1964Helgeson HC. 1964. Complexing and Hydrothermal Ore Deposition. New York: Pergamon Press. Available at https://search.worldcat.org/title/8080200. and Garrels and Christ, 1965).
Composite diagrams are often drawn with multiple solubility contours in order to show the dependence of solubility on pH, redox, or other variables.
See examples of Eh-pH composite diagrams in demo("Pourbaix")
.
You could loop over constant activities to make a series of solubility contours (see the above example for Mn).
An easier solution is to use solubility()
to visualize multiple solubility contours in one go.
The basic idea is to first load the mineral(s) containing a single metal as the formed species()
.
Then, list the aqueous species with that metal as the first argument to solubility()
.
The remaining arguments to the function define the plot variables, just as in affinity()
and mosaic()
.
Let’s put together #8-10 to make a set of diagrams for a single metal. The example here uses Fe; try changing it to Cu, Zn, Pb, Au, or something else!
CHNOSZ generates warning messages about being above the Cp limits for various iron oxyhydroxides. If you see warning messages like this, it’s a good idea not to ignore them; instead, consider whether you might be pushing extrapolations of the Cp equation too far. For the present calculation, the warnings are probably harmless because the predicted set of stable minerals (pyrite, pyrrhotite, magnetite, and hematite) is consistent with many published diagrams.
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 390 K for the Cp equation for lepidocrocite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 390 K for the Cp equation for lepidocrocite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(2158L, 65L, 29L, 2772L, 1L, 3L, 2058L, 2066L, : ## above T limit of 390 K for the Cp equation for lepidocrocite(cr)
Mineral stability diagram; aqueous species predominance diagram; composite diagram with one solubility contour; diagram with multiple solubility contours in units of log m
The convert()
function offers several unit conversions.
It implements reciprocal conversion between pairs of units, so only the destination unit needs to be specified.
Some simple uses are to convert temperature, pressure, or energy values:
# Convert 100 degrees C to value in Kelvin
convert(100, "K")
# Convert 273.15 K to value in degrees C
convert(273.15, "C")
# Convert 1 bar to value in MPa
convert(1, "MPa")
# Convert 100 MPa to value in bar
convert(100, "bar")
# Convert 1 cal/mol to value in J/mol
convert(1, "J")
# Convert 10 J/mol to value in cal/mol
convert(10, "cal")
Another use of convert()
is to convert the output of solubility()
from log activity to ppm, ppb, log ppm, or log ppb.
The following code continues from the example above:
Solubility in units of ppm
sppm <- convert(s, "ppm")
levels <- c(1e-6, 1e-3, 1e0, 1e3)
diagram(sppm, levels = levels)
Specify 4 or more values for one or more variables (each variable should have the same number of values, or be set to a constant) to activate the transect mode of affinity()
.
The transect mode allows defining an arbitrary path in multidimensional space.
Here’s a simple example:
basis("CHNOSe")
species(c("NO3-", "NO2-", "NH4+", "NH3"))
a <- affinity(pH = c(6, 8, 6, 8), Eh = c(0.5, 0.5, -0.5, -0.5))
# Print pH and Eh values used for calculation
a$vals
## $pH
## [1] 6 8 6 8
##
## $Eh
## [1] 0.5 0.5 -0.5 -0.5
# Print affinity values calculated for each species
a$values
## $`16`
## [1] 10.71165 28.71165 -124.52393 -106.52393
##
## $`17`
## [1] 9.573951 23.573951 -91.852732 -77.852732
##
## $`18`
## [1] 2.2409947 0.2409947 2.2409947 0.2409947
##
## $`64`
## [1] -1 -1 -1 -1
NOTE: affinity()
returns dimensionless values of affinity (i.e., A/2.303RT).
Below we’ll see how to convert these to energetic units.
diagram()
looks for the first basis species that has non-zero coefficients in each of the formed species.
This is called the conserved basis species in the documentation.
The affinity values are then divided by the coefficients of the conserved basis species to give normalized affinities.
This is how “balancing on a metal” is implemented.
Let’s put together #11-13 to calculate affinities of organic synthesis reactions in mixed seawater and hydrothermal fluid from the Rainbow vent field using speciation results from Shock and Canovas (2010Shock EL, Canovas P. 2010. The potential for abiotic organic synthesis and biosynthesis at seafloor hydrothermal systems. Geofluids 10(1-2): 161–192. doi: 10.1111/j.1468-8123.2010.00277.x):
Affinities of organic synthesis reactions per mole of C, H2, or formed species
Although affinity()
uses all of the variables in the transect, diagram()
treats a transect like a system of one variable, so we get a plot of affinity vs. the first variable (temperature).
We obtain three plots:
balance = 1
).
This just shows the affinity of each reaction as given (that is, per mole of formed species), which is how the results were presented by Shock and Canovas (2010).In normal use, subcrt()
returns standard molal properties (G, H, S, Cp, and V) for species in their standard states, defined as unit activity or fugacity.
Two deviations from this default setting are possible: non-standard properties for specified activity, and adjusted properties for activity coefficients.
First let’s look at how adjusted properties depend on activity coefficients. This example uses a particular nonideality model based on Alberty (2003Alberty RA. 2003. Thermodynamics of Biochemical Reactions. Hoboken, New Jersey: Wiley-Interscience. Available at https://search.worldcat.org/title/51242181. ):
# Set the nonideality model
old_ni <- nonideal("Alberty")
# Calculate standard and adjusted Gibbs energy at 25 °C
species <- c("MgH2ATP", "MgHATP-", "MgATP-2")
subcrt(species, T = 25, IS = c(0, 0.25), property = "G")$out
## $MgH2ATP
## T P G loggam IS
## 1 25 1 -3289461 0.000000000 0.00
## 2 25 1 -3289472 -0.001951595 0.25
##
## $`MgHATP-`
## T P G loggam IS
## 1 25 1 -3267679 0.0000000 0.00
## 2 25 1 -3268489 -0.1418484 0.25
##
## $`MgATP-2`
## T P G loggam IS
## 1 25 1 -3236780 0.0000000 0.00
## 2 25 1 -3240019 -0.5673936 0.25
# Restore the original nonideality model
nonideal(old_ni)
Notice how logarithms of the activity coefficients (loggam) are more negative for the higher-charged species. The activity coefficients have a stabilizing effect: adjusted Gibbs energies (at I > 0) are less than the standard Gibbs energies (at I = 0).
Now let’s change the activities to get non-standard properties.
species <- c("Mg+2", "ATP-4", "MgATP-2")
coeffs <- c(-1, -1, 1)
T <- c(25, 50, 100)
# Drop some columns for more compact output
idrop <- c(2:4, 6:9)
# Unit activity: affinity is opposite of standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(0, 0, 0))$out[, -idrop]
## T G logQ A
## 1 25 -33748.14 0 33748.14
## 2 50 -38662.41 0 38662.41
## 3 100 -51468.20 0 51468.20
# Non-unit activity: affinity is opposite of non-standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(-5, -4, -3))$out[, -idrop]
## T G logQ A
## 1 25 -33748.14 6 -499.9146
## 2 50 -38662.41 6 1542.6389
## 3 100 -51468.20 6 8604.9959
NOTE:
subcrt()
are always standard or adjusted thermodynamic properties, not non-standard ones.logact
argument is provided.logact
argument specifies the activities of species in the same order as the first argument.subcrt()
has the same units as G (J/mol by default); this differs from the output of affinity()
, which uses dimensionless values (A/2.303RT).The first call above specifies unit activities of all the species in the reaction.
The second call specifies non-unit activities of the species.
Above we used subcrt()
to calculate non-standard Gibbs energy, but doing it with affinity()
can be more convenient for making diagrams.
This example plots non-standard Gibbs energies for hydrogenotrophic methanogenesis, acetate oxidation, and acetate oxidation, and is based on Fig. 4b of Mayumi et al. (2013Mayumi D, Dolfing J, Sakata S, Maeda H, Miyagawa Y, Ikarashi M, Tamaki H, Takeuchi M, Nakatsu CH, Kamagata Y. 2013. Carbon dioxide concentration dictates alternative methanogenic pathways in oil reservoirs. Nature Communications 4: 1998. doi: 10.1038/ncomms2998).
We combine some of the features described above with new ones for swapping basis species and removing formed species:
swap.basis()
: Changes one species for another in an existing basis definitionspecies()
with delete = TRUE
: Removes one or more species from the set of formed speciesdescribe.reaction()
: Formats reactions for plots using the output of subcrt()
names
and srt
arguments of diagram()
: Use supplied reaction labels with rotation
Non-standard Gibbs energies of organic reactions as a function of CO2 fugacity
After running the code to make the diagram, we can print the formation reactions for each of the species. This shows how the two acetate reactions (acetate oxidation and acetoclastic methanogenesis) are implemented by swapping H2 and CH4 in the basis species.
a1$species
## CO2 H2 H2O H+ ispecies logact state name
## 1 1 4 -2 0 2782 -0.18 gas methane
## 2 2 4 -2 -1 1115 -3.40 aq acetate
a2$species
## CO2 CH4 H2O H+ ispecies logact state name
## 1 1 1 0 -1 1115 -3.4 aq acetate
NOTE: This example uses balance = 1
in the call to diagram()
to prevent normalizating the reactions by a balancing constraint (i.e., normalization by number of C) in order to reproduce the calculations of Mayumi et al. (2013). In most other cases (especially for making relative stability diagrams), this argument should not be used.
Sometimes it’s useful to make further computations on the results of a diagram()
call.
For example, a system might dominated by a few stable species, but you’d rather visualize the relative stabilities of less stable (i.e., metastable) species.
Here we do this for all the aqueous S species in the OBIGT database, accessed using retrieve()
.
We use plot.it = FALSE
to suppress the first plot (which would look like the Eh-pH diagram for S above),
but save the output with d <- diagram()
to access the identified stable species in d$predominant
.
After removing these stable species from the system, we recalculate affinities for the remaining metastable species and make a diagram for them.
Eh-pH diagram for metastable S species
basis("CHNOSe")
iaq <- retrieve("S", c("H", "O"), state = "aq")
species(iaq)
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Save results but don't plot the first diagram
d <- diagram(a, plot.it = FALSE)
# Remove the stable species
istable <- unique(as.numeric(d$predominant))
species(istable, delete = TRUE)
# Make a diagram for the metastable species
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
d <- diagram(a, col = 4, col.names = 4, italic = TRUE)
Other possibly useful parts of the diagram()
output are:
plotvals
: Affinity for each species, after normalizing by the balancing constraintpredominant.values
: Normalized affinity for the predominant species at each point on the diagramnames
: Names used for labeling lines or fields (includes formatting for chemical formulas)namesx
, namesy
: Locations for labelslinesout
: x, y coordinates of boundary lines between stability fieldsNOTE: If diagram()
was passed the output of equilibrate()
or solubility()
, then its output contains logarithms of activities instead of dimensionless affinities.
makeup()
is used internally by some functions in CHNOSZ but is also available for the user.
Buffers are assemblages of one or more species whose presence constrains the chemical activities (or fugacities) of basis species in a thermodynamic system. Buffers play a critical role in geochemical modeling by providing realistic constraints on system variables like pH and oxidation state. This section explores the implementation and application of buffers in CHNOSZ.
The mod.buffer()
function defines or modifies buffers by specifying the species that constitute the buffer and their activities:
# View available buffers
thermo()$buffer
# Define a new buffer or modify an existing one
mod.buffer("PPM", c("pyrite", "pyrrhotite", "magnetite"), "cr", 0)
# Buffer made of one species (acetic acid with log activity = -10)
mod.buffer("AC", "acetic acid", "aq", -10)
Use basis()
to associate one or more basis species with a buffer (all the elements in the buffer must be present in the basis species).
Then use affinity(return.buffer = TRUE)
to calculate and retrieve the buffered activities or fugacities of basis species:
# Specify basis species
basis(c("FeCHNOS"))
# Associate O2 with the PPM buffer
basis("O2", "PPM")
# Calculate and retrieve the buffered fugacity of O2
a <- affinity(T = 200, P = 2000, return.buffer = TRUE)
# Access the buffered O2 fugacity (logfO2)
log_fO2 <- a$O2 # -44.28
# Calculate buffered fugacities across temperature range
a <- affinity(T = c(200, 400, 11), P = 2000, return.buffer = TRUE)
Some buffers constrain multiple basis species simultaneously:
# Set up basis species
basis(c("FeS2", "H2S", "O2", "H2O"), c(0, -10, -50, 0))
# Associate both H2S and O2 with the PPM buffer
basis(c("H2S", "O2"), c("PPM", "PPM"))
# Retrieve values for both buffered species
buffer_values <- affinity(T = 300, P = 2000, return.buffer = TRUE)
log_aH2S <- buffer_values$H2S # -2.57
log_fO2 <- buffer_values$O2 # -37.16
The diagram()
function with the type
argument can solve for and display activities of basis species.
This example reproduces part of Fig. 6 in Schulte and Shock (1995Schulte MD, Shock EL. 1995. Thermodynamics of Strecker synthesis in hydrothermal systems. Origins of Life and Evolution of the Biosphere 25(1-3): 161–173. doi: 10.1007/BF01581580):
Equilibrium log H2 fugacity for 10-6 activity of HCN or formaldehyde with water, 1 bar of N2 and 10 bar of CO2
# Set up a system in terms of gases and liquid water
basis(c("hydrogen", "carbon dioxide", "nitrogen", "water"))
# Use 10 bar of CO2 and 1 bar of other gases (default)
basis("CO2", 1)
# Load aqueous species with given log activity
species(c("HCN", "formaldehyde"), c(-6, -6))
# Calculate affinities to form aqueous species from basis species
a <- affinity(T = c(0, 350), P = 300)
# Create diagram showing H2 activity where affinity = 0
d <- diagram(a, type = "H2", lty = c(2, 3))
legend("bottomright", c("HCN", "formaldehyde"), lty = c(2, 3))
See demo("buffer")
for a fully worked-out example based on the figure in Schulte and Shock (1995).
NOTE: This feature works independently from buffers defined in thermo()$buffer
, but produces equivalent results for certain systems; see test-diagram.R
in the package test directory.
Redox buffers like QFM, HM, and PPM can be used as inputs for subsequent calculations:
Gold solubility at 300 °C with PPM buffer for fO2 and aH2S
# Set up system for gold solubility calculation
basis(c("Au", "Fe", "H2S", "H2O", "oxygen", "H+"))
# Apply PPM buffer for fO2 and H2S
basis("O2", "PPM")
basis("H2S", "PPM")
# Calculate gold solubility using the buffered values
species("Au")
iaq <- info(c("Au(HS)2-", "AuHS", "AuOH"))
s <- solubility(iaq, pH = c(2, 8), T = 300, P = 1000)
# Create solubility diagram
diagram(s, ylim = c(-10, -5))
col <- c("#ED4037", "#F58645", "#0F9DE2") # Au(HS)2-, AuHS, AuOH
diagram(s, type = "loga.equil", add = TRUE, col = col, lwd = 2)
Buffers can be used in transect calculations to model changes across gradients.
An interesting application is to add a delta to values obtained with return.buffer = TRUE
:
# Set basis species
basis(c("Fe", "SiO2", "CO3-2", "H2O", "oxygen", "H+"))
# Calculate logfO2 in QFM buffer across temperature range
basis("O2", "QFM")
T <- seq(600, 1000, 100)
buf <- affinity(T = T, P = 5000, return.buffer = TRUE)
# Use buffered fO2 values in downstream calculations
species(c("CH4", "CO2", "HCO3-", "CO3-2"))
# Set values of pH for transect
pH <- seq(3.8, 4.3, length.out = length(T))
# Adding 2 log units below QFM buffer
a <- affinity(T = T, O2 = buf$O2 - 2, pH = pH, P = 5000)
Neutral pH at various temperatures and pressures can be determined from the dissociation constant of water:
# Calculate neutral pH at 300°C and 1000 bar
T <- 300
P <- 1000
neutral_pH <- -subcrt(c("H2O", "H+", "OH-"), c(-1, 1, 1), T = T, P = P)$out$logK/2
Mineral assemblages like K-feldspar–muscovite–quartz (KMQ) can buffer pH:
# Define the KMQ buffer
mod.buffer("KMQ", c("quartz", "muscovite", "K-feldspar"), "cr", 0)
# Set up the system
basis(c("Al2O3", "quartz", "K+", "H2O", "oxygen", "H+"))
# Set K+ molality for KCl solution
basis("K+", log10(0.5))
# Associate H+ with the KMQ buffer
basis("H+", "KMQ")
# Calculate buffered pH
pH_KMQ <- -affinity(T = 300, P = 1000, return.buffer = TRUE)$`H+`
Mineral buffers constrain both pH and redox state in geological systems, which in turn control metal solubility and transport in ore-forming fluids. This example illustrates how different buffers affect gold solubility in hydrothermal systems.
Effects of different buffers on gold solubility
The diagrams show:
Note the following limitation:
mosaic()
calculations currently aren’t support for basis species that are associated with a buffer.basis()
to use in the mosaic calculation.See also:
demo("gold")
for chloride and ionic strength effects (plots show molality instead of activity)Proteins in CHNOSZ are treated differently from other chemical species. Instead of direct thermodynamic data, CHNOSZ uses amino acid group additivity to calculate the thermodynamic properties of proteins. This approach requires knowledge of the amino acid composition of each protein.
In CHNOSZ, protein identifiers have a specific format that combines the protein name and organism with an underscore separator,
modeled after UniProt names (e.g., LYSC_CHICK
for chicken lysozyme C).
This naming convention uniquely identifies each protein in the database.
# Search by name in thermo()$protein
ip1 <- pinfo("LYSC_CHICK") # Using protein_organism format
ip2 <- pinfo("LYSC", "CHICK") # Using separate arguments
# Search for the same protein in different organisms
ip3 <- pinfo("MYG", c("HORSE", "PHYCA"))
CHNOSZ has a built-in database of amino acid compositions for selected proteins, but you can expand this by adding proteins from FASTA or CSV files.
# Reading amino acid compositions from a CSV file
file <- system.file("extdata/protein/POLG.csv", package = "CHNOSZ")
aa <- read.csv(file)
# Add the proteins to CHNOSZ and get their indices
iprotein <- add.protein(aa)
# For FASTA files, use the canprot package
fasta_file <- system.file("extdata/protein/rubisco.fasta", package = "CHNOSZ")
aa <- canprot::read_fasta(fasta_file)
# This is how you could read your own file
# aa <- canprot::read_fasta("proteins.fasta")
iprotein <- add.protein(aa)
The add.protein()
function adds the amino acid compositions to the database and returns the row indices of the added proteins.
Once proteins are added to the database, you can calculate various properties such as length, formula, and thermodynamic properties.
# Get protein length (number of amino acids)
pl <- protein.length("LYSC_CHICK")
# Get chemical formula
pf <- protein.formula("LYSC_CHICK")
# Display results
list(length = pl, formula = pf)
## $length
## [1] 129
##
## $formula
## C H N O S
## LYSC_CHICK 613 959 193 185 10
The average oxidation state of carbon, calculated with ZC()
, provides insight into the redox state of protein sequences:
# Calculate ZC for a protein
ZC_value <- ZC(protein.formula("LYSC_CHICK"))
# For multiple proteins
proteins <- c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN")
ZC_values <- ZC(protein.formula(proteins))
CHNOSZ provides several functions for calculating thermodynamic properties of proteins.
Calculate standard thermodynamic properties of non-ionized proteins using subcrt()
:
# Properties of non-ionized protein
subcrt("LYSC_CHICK")$out[[1]][1:6, ]
## T P rho logK G H S V
## 1 0.01 1.000000 0.9998289 3465.966 -18125553 -44827095 16093.36 10049.21
## 2 25.00 1.000000 0.9970614 3250.225 -18552315 -44238615 18149.61 10420.95
## 3 50.00 1.000000 0.9880295 3076.728 -19034574 -43528036 20436.77 10600.23
## 4 75.00 1.000000 0.9748643 2936.705 -19573864 -42770559 22693.98 10708.15
## 5 100.00 1.013220 0.9583926 2823.190 -20168495 -41989264 24860.88 10782.93
## 6 125.00 2.320144 0.9390726 2730.687 -20814620 -41191982 26925.13 10840.94
## Cp
## 1 18448.59
## 2 26842.53
## 3 29597.54
## 4 30863.62
## 5 31582.69
## 6 32071.19
For more accurate calculations, especially in biological systems, protein ionization must be considered (Dick et al., 2006Dick JM, LaRowe DE, Helgeson HC. 2006. Temperature, pressure, and electrochemical constraints on protein speciation: Group additivity calculation of the standard molal thermodynamic properties of ionized unfolded proteins. Biogeosciences 3(3): 311–336. doi: 10.5194/bg-3-311-2006).
CHNOSZ handles this through the ionize.aa()
function, which allows specifying the temperature, pressure and pH conditions:
# Calculate ionization properties
aa <- pinfo(pinfo("LYSC_CHICK"))
charge <- ionize.aa(aa, pH = c(4, 7, 9))
# Calculate heat capacity of ionization
Cp_ionization <- ionize.aa(aa, property = "Cp", pH = 7, T = c(25, 100))
To include proteins in a chemical system, first define the basis species, then add proteins to the system:
# Define the basis species with H+
basis("CHNOS+")
# Add proteins to the system
species(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
The affinity()
function accounts for ionization effects when calculating affinities of formation reactions:
# Calculate affinity as a function of pH
basis("CHNOS+")
species(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
a1 <- affinity(pH = c(0, 14))
For performance optimization, use protein indices directly with the iprotein
argument to affinity()
.
This doesn’t require proteins to be added with species()
:
species(delete = TRUE)
ip <- pinfo(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
# Set logarithm of activity with loga.protein
a2 <- affinity(pH = c(0, 14), iprotein = ip, loga.protein = -3)
# Check that both methods produce equivalent results
for(i in 1:3) stopifnot(all.equal(a1$values[[i]], a2$values[[i]]))
Calculate the relative abundance of proteins in metastable equilibrium using equilibrate()
.
This example uses averaged amino acid compositions of protein sequences in metagenomes from a temperature and chemical gradient in a hot spring (Dick and Shock, 2011Dick JM, Shock EL. 2011. Calculation of the relative chemical stabilities of proteins as a function of temperature and redox chemistry in a hot spring. PLOS One 6(8): e22782. doi: 10.1371/journal.pone.0022782):
Metastable equilibrium of proteins from hot-spring metagenomes
# Calculate equilibrium distribution as a function of Eh
basis("CHNOSe")
proteins <- paste("overall", paste0("bison", c("N", "S", "R", "Q", "P")), sep = "_")
ip <- pinfo(proteins)
a <- affinity(Eh = c(-0.35, -0.15), iprotein = ip)
# Normalize by protein length to get residue-equivalent distribution
# Set loga.balance to distribute proteins with total activity of residues equal to 1
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
col <- c("darkred", "red", "darkgray", "blue", "darkblue")
diagram(e, ylim = c(-4, -2.5), col = col, lwd = 2)
legend("bottomleft", c("High-T reducing", NA, NA, NA, "Low-T oxidizing"),
lty = 1:5, col = col, title = "Environmental Conditions", inset = c(0.1, 0))
The normalize = TRUE
option is important for proteins because their large size leads to extreme separation of activities in metastable equilibrium.
Normalizing by protein length (calculating per-residue equivalents) compresses the range of relative abundances to be more experimentally realistic.
Use unitize()
to scale abundances of proteins so that number of residues sums to unity:
# Sample protein data from YeastGFP study
protein <- c("YDL195W", "YHR098C", "YLR208W", "YNL049C", "YPL085W")
abundance <- c(1840, 12200, 21400, 1720, 358)
# Find protein indices in CHNOSZ database
ip <- match(protein, thermo()$protein$protein)
# Get protein lengths
pl <- protein.length(ip)
# Scale protein abundance so total abundance of residues is unity
scaled_abundance <- 10^unitize(log10(abundance), pl)
# Check that sum for residues is unity
stopifnot(sum(scaled_abundance * pl) == 1)
Unit total activity of residues is set by equilibrate(loga.balance = 0)
, allowing comparison between experimental and predicted abundances:
Unoptimized predicted abundances of proteins compared to experimental abundances, both scaled to unit total activity of residues; dashed line is 1:1 line
basis("CHNOSe")
# Make a guess for Eh
basis("Eh", -0.5)
a <- affinity(iprotein = ip)
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
# Check for unit total abundance of residues
predicted_abundance <- 10^unlist(e$loga.equil)
stopifnot(all.equal(sum(predicted_abundance * pl), 1))
plot(log10(scaled_abundance), log10(predicted_abundance), pch = 19)
# Show 1:1 line
lims <- range(par("usr"))
lines(c(lims[1], lims[2]), c(lims[1], lims[2]), lty = 2)
MAE <- mean(abs(log10(scaled_abundance) - log10(predicted_abundance)))
title(paste("MAE =", round(MAE, 2)))
The canprot package provides a different interface for calculating Zc and other chemical analyses of proteins from their amino acid composition:
# Load canprot package
library(canprot)
# Get amino acid compositions
ip <- pinfo(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
aa <- pinfo(ip)
# canprot has Zc(); CHNOSZ has ZC()
Zc(aa)
# Stoichiometric oxygen and water content
nO2(aa)
nH2O(aa)
# Isoelectric point and GRAVY
pI(aa)
GRAVY(aa)
Let’s analyze the relative abundances of proteins from the ER-to-Golgi location in S. cerevisiae (yeast) and compare theoretical predictions with experimental measurements from the YeastGFP study (Ghaemmaghami et al., 2003Ghaemmaghami S, Huh W-K, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, Weissman JS. 2003. Global analysis of protein expression in yeast. Nature 425(6959): 737–741. doi: 10.1038/nature02046):
Optimizing redox potential to fit experimental protein abundances
The diagrams show:
normalize = FALSE
are highly divergent.The correspondence between theoretical predictions and experimental measurements depends on normalization of protein formulas and optimizing physicochemical parameters. The metastable equilibrium model provides a framework for predicting how chemical conditions influence relative protein abundances.
Explore demos with demo(package = "CHNOSZ")
.
You can also use demos()
to run all the demos without pausing or just one (e.g. demos("mosaic")
).
mosaic
: Speciating more than one set of basis speciessum_S
, uranyl
: Using summed activities of speciated basis speciescomproportionation
: Non-standard Gibbs energy of reaction with speciated basis speciesarsenic
, copper
: More examples of Eh-pH diagramssphalerite
, contour
, minsol
: Solubility calculations with speciated basis speciesPourbaix
: Isosolubility lines for various metals (try Fe, Cu, Mn)contour
: Solubility contours for goldminsol
Solubility contours for multiple mineralssolubility
: CO2 and calcitesaturation
: Saturation lines (where affinity = 0) and labels for activity ratiosionize
: Protein ionization propertiesTCA
: Citric acid cycle energeticscomproportionation
: Using a color scale (image map)buffer
: Place labels next to linesMgATP
: Calculate number of protons bound per ATP moleculebuffer
: Plotting buffers as a function of temperatureDEW
: Applying calculated values of logfO2 in affinity()
gold
: Settting pH and fO2 buffers in basis()
for solubility of goldprotbuff
: Using proteins as buffer speciesDEW
: Deep Earth Water model (extension of HKF to high pressures)AD
: Akinfiev-Diamond model for aqueous nonelectrolytesShh
: Relative stabilities of transcription factors along redox and water activity gradientcarboxylase
: Predicted rank abundance with varying temperature and redoxrank.affinity
: Affinity ranking for groupwise stability comparisonsAdditional vignettes cover:
The FAQ is a non-comprehensive collection of questions and answers about CHNOSZ.
The OBIGT vignette is generated from reference information in the database and lists all literature citations for species arranged by default and optional data files.
The custom_data vignette describes add.OBIGT()
for adding data from files, mod.OBIGT()
for updating or adding parameters of particular species, and logK.to.OBIGT()
for generating parameters from logK values.
The eos-regress vignette shows how to fit experimental data (volume and heat capacity) using constructed equation-of-state models.
The multi-metal vignette has some techniques for overcoming the limitation of balancing reactions on a single metal.
R provides convenient access to documentation in a local browser:
install.packages("CHNOSZ")
help.start()
to open a browser window for the R help systemYou can also:
help(package = "CHNOSZ")
maintainer("CHNOSZ")
to get contact info
###### ## ## ## ## ###### ##### #####
## ##===## ## \\## ## ## \\ //
###### ## ## ## ## ###### ##### #####