This vignette runs the functions to make the plots from the following manuscript:
Dick JM. 2024. Adaptations of microbial genomes to human body chemistry. bioRxiv preprint doi: 10.1101/2023.02.12.528246
Update: In July 2024, the taxonomic classifications and reference database were updated to use GTDB release 220. There are small changes in the figures compared to the preprint, which used GTDB release 207.
This vignette was compiled on 2024-12-02 with JMDplots 1.2.20-13, chem16S 1.1.0-11, and canprot 2.0.0-1.
Data source: The Human Microbiome Project Consortium (2012).
Data sources: Genome Taxonomy Database (GTDB release 220) (Parks et al., 2022); Unified Human Gastrointestinal Genome (UHGG v2.0.1) (Almeida et al., 2021).
Data sources: (A) Boix-Amorós et al. (2021); (B) See Table 1 in manuscript; (C) Liu et al. (2021), de Castilhos et al. (2022), Zuo et al. (2020); (D) Thuy-Boun et al. (2022), Maier et al. (2017), Jiang et al. (2022), Granato et al. (2021).
Data sources: (A) See Table 1 in manuscript; (B) MAGs (Ke et al., 2022), based on data from Yeoh et al. (2021) and Zuo et al. (2020); (C) He et al. (2021) and Grenga et al. (2022); (D) See Table 1 in manuscript; (E) Lloyd-Price et al. (2019).
## [1] "Reference proteomes are available for 200 of 234 obligately anaerobic genera in the list"
## [1] "Reference proteomes are available for 363 of 399 aerotolerant genera in the list"
## [1] "Oxygen tolerance is listed for 1033 of 24959 genera in GTDB"
Data source: List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study.
Data sources: (A)–(B) List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study; (C) See Table 1 in manuscript.
datadir <- system.file("extdata/microhum/ARAST", package = "JMDplots")
dataset <- c("HMP12", "HMP12_no_screening", "LLZ+21", "CZH+22", "ZZL+20", "LAA+19")
summary_list <- lapply(dataset, function(this_dataset) {
file <- file.path(datadir, paste0(this_dataset, "_stats.csv"))
dat <- read.csv(file)
runs <- nrow(dat)
average_input_sequences <- sum(dat$input_sequences) / runs
protein_prediction_rate <- dat$coding_sequences / dat$input_sequences * 100
runs_with_low_protein_prediction <- sum(protein_prediction_rate < 40)
average_protein_prediction_rate <- round(sum(protein_prediction_rate) / runs, 2)
average_protein_length <- round(sum(dat$protein_length) / runs, 2)
data.frame(runs, average_input_sequences, average_protein_prediction_rate,
runs_with_low_protein_prediction, average_protein_length)
})
description <- c("HMP with human DNA screening", "HMP without screening",
"COVID-19 nasopharyngeal", "COVID-19 oropharyngeal", "COVID-19 gut", "IBD gut")
summary_table <- cbind(description, dataset, do.call(rbind, summary_list))
col.names <- c("Description", "Dataset", "Number of sequencing runs", "Average number of input sequences",
"Average protein prediction rate (%)", "Runs with protein prediction rate < 40%",
"Average protein length")
kable(summary_table, col.names = col.names)
Description | Dataset | Number of sequencing runs | Average number of input sequences | Average protein prediction rate (%) | Runs with protein prediction rate < 40% | Average protein length |
---|---|---|---|---|---|---|
HMP with human DNA screening | HMP12 | 101 | 13928960 | 44.70 | 40 | 30.74 |
HMP without screening | HMP12_no_screening | 101 | 13928960 | 50.04 | 31 | 30.69 |
COVID-19 nasopharyngeal | LLZ+21 | 9 | 40593115 | 26.70 | 7 | 47.32 |
COVID-19 oropharyngeal | CZH+22 | 122 | 13884977 | 59.15 | 15 | 31.81 |
COVID-19 gut | ZZL+20 | 73 | 17245724 | 59.12 | 12 | 46.30 |
IBD gut | LAA+19 | 139 | 9722610 | 56.03 | 4 | 31.48 |
GTDB_taxonomy <- read.csv(system.file("RefDB/GTDB_220/taxonomy.csv.xz", package = "JMDplots"))
print(paste("GTDB:", nrow(GTDB_taxonomy), "genomes and", length(unique(GTDB_taxonomy$genus)), "genera"))
## [1] "GTDB: 113104 genomes and 24959 genera"
UHGG_taxonomy_full <- read.csv(system.file("RefDB/UHGG_2.0.1/fullset/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 5% and completeness > 50%):", nrow(UHGG_taxonomy_full), "species-level clusters and", length(unique(UHGG_taxonomy_full$genus)), "genera"))
## [1] "UHGG (contamination < 5% and completeness > 50%): 4744 species-level clusters and 1031 genera"
UHGG_taxonomy <- read.csv(system.file("RefDB/UHGG_2.0.1/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 2% and completeness > 95%):", nrow(UHGG_taxonomy), "species-level clusters and", length(unique(UHGG_taxonomy$genus)), "genera"))
## [1] "UHGG (contamination < 2% and completeness > 95%): 2350 species-level clusters and 643 genera"
not_in_UHGG <- paste(Figure_5_genera[!Figure_5_genera %in% UHGG_taxonomy_full$genus], collapse = " ")
paste("Figure 5 shows", length(Figure_5_genera), "genera; of these,", not_in_UHGG, "is not in UHGG")
## [1] "Figure 5 shows 27 genera; of these, Vescimonas is not in UHGG"
Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, et al. 2021. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nature Biotechnology 39(1): 105–114. doi: 10.1038/s41587-020-0603-3
Boix-Amorós A, Piras E, Bu K, Wallach D, Stapylton M, Fernández-Sesma A, Malaspina D, Clemente JC. 2021. Viral inactivation impacts microbiome estimates in a tissue-specific manner. mSystems 6(5): e00674–21. doi: 10.1128/mSystems.00674-21
de Castilhos J, Zamir E, Hippchen T, Rohrbach R, Schmidt S, Hengler S, Schumacher H, Neubauer M, Kunz S, Müller-Esch T, et al. 2022. Severe dysbiosis and specific Haemophilus and Neisseria signatures as hallmarks of the oropharyngeal microbiome in critically ill coronavirus disease 2019 (COVID-19) patients. Clinical Infectious Diseases 75(1): e1063–e1071. doi: 10.1093/cid/ciab902
Granato DC, Neves LX, Trino LD, Carnielli CM, Lopes AFB, Yokoo S, Pauletti BA, Domingues RR, Sá JO, Persinoti G, et al. 2021. Meta-omics analysis indicates the saliva microbiome and its proteins associated with the prognosis of oral cancer patients. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1869(8): 140659. doi: 10.1016/j.bbapap.2021.140659
Grenga L, Pible O, Miotello G, Culotta K, Ruat S, Roncato M-A, Gas F, Bellanger L, Claret P-G, Dunyach-Remy C, et al. 2022. Taxonomical and functional changes in COVID-19 faecal microbiome could be related to SARS-CoV-2 faecal load. Environmental Microbiology 24(9): 4299–4316. doi: 10.1111/1462-2920.16028
He F, Zhang T, Xue K, Fang Z, Jiang G, Huang S, Li K, Gu Z, Shi H, Zhang Z, et al. 2021. Fecal multi-omics analysis reveals diverse molecular alterations of gut ecosystem in COVID-19 patients. Analytica Chimica Acta 1180: 338881. doi: 10.1016/j.aca.2021.338881
Jiang X, Zhang Y, Wang H, Wang Z, Hu S, Cao C, Xiao H. 2022. In-depth metaproteomics analysis of oral microbiome for lung cancer. Research 2022: 9781578. doi: 10.34133/2022/9781578
Ke S, Weiss ST, Liu Y-Y. 2022. Dissecting the role of the human microbiome in COVID-19 via metagenome-assembled genomes. Nature Communications 13(1): 5235. doi: 10.1038/s41467-022-32991-w
Liu J, Liu S, Zhang Z, Lee X, Wu W, Huang Z, Lei Z, Xu W, Chen D, Wu X, et al. 2021. Association between the nasopharyngeal microbiome and metabolome in patients with COVID-19. Synthetic and Systems Biotechnology 6(3): 135–143. doi: 10.1016/j.synbio.2021.06.002
Lloyd-Price J, Arze C, Ananthakrishnan AN, Schirmer M, Avila-Pacheco J, Poon TW, Andrews E, Ajami NJ, Bonham KS, Brislawn CJ, et al. 2019. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569(7758): 655–662. doi: 10.1038/s41586-019-1237-9
Maier TV, Lucio M, Lee LH, VerBerkmoes NC, Brislawn CJ, Bernhardt J, Lamendella R, McDermott JE, Bergeron N, Heinzmann SS, et al. 2017. Impact of dietary resistant starch on the human gut microbiome, metaproteome, and metabolome. mBio 8(5): e01343–17. doi: 10.1128/mBio.01343-17
Million M, Raoult D. 2018. Linking gut redox to human microbiome. Human Microbiome Journal 10: 27–32. doi: 10.1016/j.humic.2018.07.002
Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: An ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50(D1): D785–D794. doi: 10.1093/nar/gkab776
The Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214. doi: 10.1038/nature11234
Thuy-Boun PS, Wang AY, Crissien-Martinez A, Xu JH, Chatterjee S, Stupp GS, Su AI, Coyle WJ, Wolan DW. 2022. Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiome identifies host and microbial serine-type endopeptidase activity associated with ulcerative colitis. Molecular & Cellular Proteomics 21(3): 100197. doi: 10.1016/j.mcpro.2022.100197
Yeoh YK, Zuo T, Lui GC-Y, Zhang F, Liu Q, Li AYL, Chung ACK, Cheung CP, Tso EYK, Fung KSC, et al. 2021. Gut microbiota composition reflects disease severity and dysfunctional immune responses in patients with COVID-19. Gut 70(4): 698–706. doi: 10.1136/gutjnl-2020-323020
Zuo T, Zhang F, Lui GCY, Yeoh YK, Li AYL, Zhan H, Wan Y, Chung ACK, Cheung CP, Chen N, et al. 2020. Alterations in gut microbiota of patients with COVID-19 during time of hospitalization. Gastroenterology 159(3): 944–955. doi: 10.1053/j.gastro.2020.05.048