JMDplots vignettes

Adaptations of microbial genomes to human body chemistry (2024)

This vignette runs the functions to make the plots from the following manuscript:

Dick JM. 2024. Adaptations of microbial genomes to human body chemistry. bioRxiv preprint doi: 10.1101/2023.02.12.528246

Update: In July 2024, the taxonomic classifications and reference database were updated to use GTDB release 220. There are small changes in the figures compared to the preprint, which used GTDB release 207.

This vignette was compiled on 2024-12-02 with JMDplots 1.2.20-13, chem16S 1.1.0-11, and canprot 2.0.0-1.

library(JMDplots)

Consistency between shotgun metagenomes and community reference proteomes (Figure 1)

microhum_1()

Data source: The Human Microbiome Project Consortium (2012).

Amount of putative human DNA removed from HMP metagenomes in screening step (Figure 1–figure supplement 1)

microhum_1_1()

Chemical metrics are broadly different among genera and are similar between GTDB and low-contamination genomes from UHGG (Figure 2)

microhum_2()

Data sources: Genome Taxonomy Database (GTDB release 220) (Parks et al., 2022); Unified Human Gastrointestinal Genome (UHGG v2.0.1) (Almeida et al., 2021).

Chemical variation of microbial proteins across body sites (multi-omics comparison) (Figure 3)

microhum_3()

Data sources: (A) Boix-Amorós et al. (2021); (B) See Table 1 in manuscript; (C) Liu et al. (2021), de Castilhos et al. (2022), Zuo et al. (2020); (D) Thuy-Boun et al. (2022), Maier et al. (2017), Jiang et al. (2022), Granato et al. (2021).

Differences of nO2 and nH2O between untreated and viral-inactivated samples (Figure 3–figure supplement 1)

microhum_3_1()

Data source: Boix-Amorós et al. (2021).

Differences of chemical metrics between controls and COVID-19 or IBD patients (Figure 4)

microhum_4()

Data sources: (A) See Table 1 in manuscript; (B) MAGs (Ke et al., 2022), based on data from Yeoh et al. (2021) and Zuo et al. (2020); (C) He et al. (2021) and Grenga et al. (2022); (D) See Table 1 in manuscript; (E) Lloyd-Price et al. (2019).

Differences of relative abundances of genera between controls and patients (Figure 5)

Figure_5_genera <- microhum_5()

Chemical metrics of reference proteomes for genera with known oxygen tolerance (Figure 5–figure supplement 1)

microhum_5_1()
## [1] "Reference proteomes are available for 200 of 234 obligately anaerobic genera in the list"
## [1] "Reference proteomes are available for 363 of 399 aerotolerant genera in the list"
## [1] "Oxygen tolerance is listed for 1033 of 24959 genera in GTDB"

Data source: List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study.

Oxygen tolerance of genera in body sites, COVID-19, and IBD (Figure 6)

microhum_6()

Data sources: (A)–(B) List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study; (C) See Table 1 in manuscript.

Differences of nO2 and nH2O between subcommunities of obligate anaerobes and aerotolerant genera in controls and patients (Figure 6–figure supplement 1)

microhum_6_1()

Summary of metagenome sequence processing (part of Supplementary File 1)

datadir <- system.file("extdata/microhum/ARAST", package = "JMDplots")
dataset <- c("HMP12", "HMP12_no_screening", "LLZ+21", "CZH+22", "ZZL+20", "LAA+19")
summary_list <- lapply(dataset, function(this_dataset) {
  file <- file.path(datadir, paste0(this_dataset, "_stats.csv"))
  dat <- read.csv(file)
  runs <- nrow(dat)
  average_input_sequences <- sum(dat$input_sequences) / runs
  protein_prediction_rate <- dat$coding_sequences / dat$input_sequences * 100
  runs_with_low_protein_prediction <- sum(protein_prediction_rate < 40)
  average_protein_prediction_rate <- round(sum(protein_prediction_rate) / runs, 2)
  average_protein_length <- round(sum(dat$protein_length) / runs, 2)
  data.frame(runs, average_input_sequences, average_protein_prediction_rate,
    runs_with_low_protein_prediction, average_protein_length)
})
description <- c("HMP with human DNA screening", "HMP without screening",
  "COVID-19 nasopharyngeal", "COVID-19 oropharyngeal", "COVID-19 gut", "IBD gut")
summary_table <- cbind(description, dataset, do.call(rbind, summary_list))
col.names <- c("Description", "Dataset", "Number of sequencing runs", "Average number of input sequences",
  "Average protein prediction rate (%)", "Runs with protein prediction rate < 40%",
  "Average protein length")
kable(summary_table, col.names = col.names)
Description Dataset Number of sequencing runs Average number of input sequences Average protein prediction rate (%) Runs with protein prediction rate < 40% Average protein length
HMP with human DNA screening HMP12 101 13928960 44.70 40 30.74
HMP without screening HMP12_no_screening 101 13928960 50.04 31 30.69
COVID-19 nasopharyngeal LLZ+21 9 40593115 26.70 7 47.32
COVID-19 oropharyngeal CZH+22 122 13884977 59.15 15 31.81
COVID-19 gut ZZL+20 73 17245724 59.12 12 46.30
IBD gut LAA+19 139 9722610 56.03 4 31.48

Number of genomes and genera in GTDB and UHGG (Unified Human Gastrointestinal Genome from MGnify)

GTDB_taxonomy <- read.csv(system.file("RefDB/GTDB_220/taxonomy.csv.xz", package = "JMDplots"))
print(paste("GTDB:", nrow(GTDB_taxonomy), "genomes and", length(unique(GTDB_taxonomy$genus)), "genera"))
## [1] "GTDB: 113104 genomes and 24959 genera"
UHGG_taxonomy_full <- read.csv(system.file("RefDB/UHGG_2.0.1/fullset/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 5% and completeness > 50%):", nrow(UHGG_taxonomy_full), "species-level clusters and", length(unique(UHGG_taxonomy_full$genus)), "genera"))
## [1] "UHGG (contamination < 5% and completeness > 50%): 4744 species-level clusters and 1031 genera"
UHGG_taxonomy <- read.csv(system.file("RefDB/UHGG_2.0.1/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 2% and completeness > 95%):", nrow(UHGG_taxonomy), "species-level clusters and", length(unique(UHGG_taxonomy$genus)), "genera"))
## [1] "UHGG (contamination < 2% and completeness > 95%): 2350 species-level clusters and 643 genera"

List genera identified in Figure 5 that are not present in UHGG

not_in_UHGG <- paste(Figure_5_genera[!Figure_5_genera %in% UHGG_taxonomy_full$genus], collapse = " ")
paste("Figure 5 shows", length(Figure_5_genera), "genera; of these,", not_in_UHGG, "is not in UHGG")
## [1] "Figure 5 shows 27 genera; of these, Vescimonas is not in UHGG"

References

Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, et al. 2021. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nature Biotechnology 39(1): 105–114. doi: 10.1038/s41587-020-0603-3

Boix-Amorós A, Piras E, Bu K, Wallach D, Stapylton M, Fernández-Sesma A, Malaspina D, Clemente JC. 2021. Viral inactivation impacts microbiome estimates in a tissue-specific manner. mSystems 6(5): e00674–21. doi: 10.1128/mSystems.00674-21

de Castilhos J, Zamir E, Hippchen T, Rohrbach R, Schmidt S, Hengler S, Schumacher H, Neubauer M, Kunz S, Müller-Esch T, et al. 2022. Severe dysbiosis and specific Haemophilus and Neisseria signatures as hallmarks of the oropharyngeal microbiome in critically ill coronavirus disease 2019 (COVID-19) patients. Clinical Infectious Diseases 75(1): e1063–e1071. doi: 10.1093/cid/ciab902

Granato DC, Neves LX, Trino LD, Carnielli CM, Lopes AFB, Yokoo S, Pauletti BA, Domingues RR, Sá JO, Persinoti G, et al. 2021. Meta-omics analysis indicates the saliva microbiome and its proteins associated with the prognosis of oral cancer patients. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1869(8): 140659. doi: 10.1016/j.bbapap.2021.140659

Grenga L, Pible O, Miotello G, Culotta K, Ruat S, Roncato M-A, Gas F, Bellanger L, Claret P-G, Dunyach-Remy C, et al. 2022. Taxonomical and functional changes in COVID-19 faecal microbiome could be related to SARS-CoV-2 faecal load. Environmental Microbiology 24(9): 4299–4316. doi: 10.1111/1462-2920.16028

He F, Zhang T, Xue K, Fang Z, Jiang G, Huang S, Li K, Gu Z, Shi H, Zhang Z, et al. 2021. Fecal multi-omics analysis reveals diverse molecular alterations of gut ecosystem in COVID-19 patients. Analytica Chimica Acta 1180: 338881. doi: 10.1016/j.aca.2021.338881

Jiang X, Zhang Y, Wang H, Wang Z, Hu S, Cao C, Xiao H. 2022. In-depth metaproteomics analysis of oral microbiome for lung cancer. Research 2022: 9781578. doi: 10.34133/2022/9781578

Ke S, Weiss ST, Liu Y-Y. 2022. Dissecting the role of the human microbiome in COVID-19 via metagenome-assembled genomes. Nature Communications 13(1): 5235. doi: 10.1038/s41467-022-32991-w

Liu J, Liu S, Zhang Z, Lee X, Wu W, Huang Z, Lei Z, Xu W, Chen D, Wu X, et al. 2021. Association between the nasopharyngeal microbiome and metabolome in patients with COVID-19. Synthetic and Systems Biotechnology 6(3): 135–143. doi: 10.1016/j.synbio.2021.06.002

Lloyd-Price J, Arze C, Ananthakrishnan AN, Schirmer M, Avila-Pacheco J, Poon TW, Andrews E, Ajami NJ, Bonham KS, Brislawn CJ, et al. 2019. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569(7758): 655–662. doi: 10.1038/s41586-019-1237-9

Maier TV, Lucio M, Lee LH, VerBerkmoes NC, Brislawn CJ, Bernhardt J, Lamendella R, McDermott JE, Bergeron N, Heinzmann SS, et al. 2017. Impact of dietary resistant starch on the human gut microbiome, metaproteome, and metabolome. mBio 8(5): e01343–17. doi: 10.1128/mBio.01343-17

Million M, Raoult D. 2018. Linking gut redox to human microbiome. Human Microbiome Journal 10: 27–32. doi: 10.1016/j.humic.2018.07.002

Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: An ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50(D1): D785–D794. doi: 10.1093/nar/gkab776

The Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214. doi: 10.1038/nature11234

Thuy-Boun PS, Wang AY, Crissien-Martinez A, Xu JH, Chatterjee S, Stupp GS, Su AI, Coyle WJ, Wolan DW. 2022. Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiome identifies host and microbial serine-type endopeptidase activity associated with ulcerative colitis. Molecular & Cellular Proteomics 21(3): 100197. doi: 10.1016/j.mcpro.2022.100197

Yeoh YK, Zuo T, Lui GC-Y, Zhang F, Liu Q, Li AYL, Chung ACK, Cheung CP, Tso EYK, Fung KSC, et al. 2021. Gut microbiota composition reflects disease severity and dysfunctional immune responses in patients with COVID-19. Gut 70(4): 698–706. doi: 10.1136/gutjnl-2020-323020

Zuo T, Zhang F, Lui GCY, Yeoh YK, Li AYL, Zhan H, Wan Y, Chung ACK, Cheung CP, Chen N, et al. 2020. Alterations in gut microbiota of patients with COVID-19 during time of hospitalization. Gastroenterology 159(3): 944–955. doi: 10.1053/j.gastro.2020.05.048