JMDplots vignettes

Chemical features of proteins in microbial genomes (2025)

This vignette runs the functions to make the plots from the following paper:

Dick JM. 2025. Chemical features of proteins in microbial genomes associated with body sites and gut inflammation. Biomed. Inform. 1(1): 0005 doi: 10.55092/bi20250003

This vignette was compiled on 2025-05-29 with JMDplots 1.2.22-1, chem16S 1.2.0, and canprot 2.0.0.

library(JMDplots)

Consistency between shotgun metagenomes and community reference proteomes (Figure 1)

microhum_1()

Data source: The Human Microbiome Project Consortium (2012).

Chemical metrics are broadly different among genera and are similar between GTDB and low-contamination genomes from UHGG (Figure 2)

microhum_2()

Data sources: Genome Taxonomy Database (GTDB release 220) (Parks et al., 2022); Unified Human Gastrointestinal Genome (UHGG v2.0.1) (Almeida et al., 2021).

Chemical variation of microbial proteins across body sites (multi-omics comparison) (Figure 3)

microhum_3()

Data sources: (A) Boix-Amorós et al. (2021); (B) See Table 1 in manuscript; (C) Liu et al. (2021), de Castilhos et al. (2022), Zuo et al. (2020); (D) Thuy-Boun et al. (2022), Maier et al. (2017), Jiang et al. (2022), Granato et al. (2021).

Differences of chemical metrics between controls and COVID-19 or IBD patients (Figure 4)

microhum_4()

Data sources: (A) See Table 1 in manuscript; (B) MAGs (Ke et al., 2022), based on data from Yeoh et al. (2021) and Zuo et al. (2020); (C) He et al. (2021) and Grenga et al. (2022); (D) See Table 1 in manuscript; (E) Lloyd-Price et al. (2019).

Differences of relative abundances of genera between controls and patients (Figure 5)

Figure_5_genera <- microhum_5()

Oxygen tolerance of genera in body sites, COVID-19, and IBD (Figure 6)

microhum_6()

Data sources: (a)–(b) List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study; (c) See Table 1 in manuscript.

Amount of putative human DNA removed from HMP metagenomes in screening step (Figure S1)

microhum_S1()

Differences of oxygen and water content of proteins between untreated and viral-inactivated samples (Figure S2)

microhum_S2()

Data source: Boix-Amorós et al. (2021).

Chemical features of reference proteomes for genera with known oxygen tolerance (Figure S3)

microhum_S3()
## [1] "Reference proteomes are available for 201 of 235 obligately anaerobic genera in the list"
## [1] "Reference proteomes are available for 363 of 399 aerotolerant genera in the list"
## [1] "Oxygen tolerance is listed for 1034 of 24959 genera in GTDB"

Data source: List of Prokaryotes according to their Aerotolerant or Obligate Anaerobic Metabolism (Million and Raoult, 2018) with modifications made in this study.

Differences of chemical features of proteins between subcommunities of obligate anaerobes and aerotolerant genera from controls and patients (Figure S4)

microhum_S4()

Summary of metagenome sequence processing (part of Supplementary File 1)

datadir <- system.file("extdata/microhum/ARAST", package = "JMDplots")
dataset <- c("HMP12", "HMP12_no_screening", "LLZ+21", "CZH+22", "ZZL+20", "LAA+19")
summary_list <- lapply(dataset, function(this_dataset) {
  file <- file.path(datadir, paste0(this_dataset, "_stats.csv"))
  dat <- read.csv(file)
  runs <- nrow(dat)
  average_input_sequences <- sum(dat$input_sequences) / runs
  protein_prediction_rate <- dat$coding_sequences / dat$input_sequences * 100
  runs_with_low_protein_prediction <- sum(protein_prediction_rate < 40)
  average_protein_prediction_rate <- round(sum(protein_prediction_rate) / runs, 2)
  average_protein_length <- round(sum(dat$protein_length) / runs, 2)
  data.frame(runs, average_input_sequences, average_protein_prediction_rate,
    runs_with_low_protein_prediction, average_protein_length)
})
description <- c("HMP with human DNA screening", "HMP without screening",
  "COVID-19 nasopharyngeal", "COVID-19 oropharyngeal", "COVID-19 gut", "IBD gut")
summary_table <- cbind(description, dataset, do.call(rbind, summary_list))
col.names <- c("Description", "Dataset", "Number of sequencing runs", "Average number of input sequences",
  "Average protein prediction rate (%)", "Runs with protein prediction rate < 40%",
  "Average protein length")
kable(summary_table, col.names = col.names)
Description Dataset Number of sequencing runs Average number of input sequences Average protein prediction rate (%) Runs with protein prediction rate < 40% Average protein length
HMP with human DNA screening HMP12 101 13928960 44.70 40 30.74
HMP without screening HMP12_no_screening 101 13928960 50.04 31 30.69
COVID-19 nasopharyngeal LLZ+21 9 40593115 26.70 7 47.32
COVID-19 oropharyngeal CZH+22 122 13884977 59.15 15 31.81
COVID-19 gut ZZL+20 73 17245724 59.12 12 46.30
IBD gut LAA+19 139 9722610 56.03 4 31.48

Number of genomes and genera in GTDB and UHGG (Unified Human Gastrointestinal Genome from MGnify)

GTDB_taxonomy <- read.csv(system.file("RefDB/GTDB_220/taxonomy.csv.xz", package = "JMDplots"))
print(paste("GTDB:", nrow(GTDB_taxonomy), "genomes and", length(unique(GTDB_taxonomy$genus)), "genera"))
## [1] "GTDB: 113104 genomes and 24959 genera"
UHGG_taxonomy_full <- read.csv(system.file("RefDB/UHGG_2.0.1/fullset/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 5% and completeness > 50%):", nrow(UHGG_taxonomy_full), "species-level clusters and", length(unique(UHGG_taxonomy_full$genus)), "genera"))
## [1] "UHGG (contamination < 5% and completeness > 50%): 4744 species-level clusters and 1031 genera"
UHGG_taxonomy <- read.csv(system.file("RefDB/UHGG_2.0.1/taxonomy.csv.xz", package = "JMDplots"))
print(paste("UHGG (contamination < 2% and completeness > 95%):", nrow(UHGG_taxonomy), "species-level clusters and", length(unique(UHGG_taxonomy$genus)), "genera"))
## [1] "UHGG (contamination < 2% and completeness > 95%): 2350 species-level clusters and 643 genera"

List genera identified in Figure 5 that are not present in UHGG

not_in_UHGG <- paste(Figure_5_genera[!Figure_5_genera %in% UHGG_taxonomy_full$genus], collapse = " ")
paste("Figure 5 shows", length(Figure_5_genera), "genera; of these,", not_in_UHGG, "is not in UHGG")
## [1] "Figure 5 shows 27 genera; of these, Vescimonas is not in UHGG"

References

Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, et al. 2021. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nature Biotechnology 39(1): 105–114. doi: 10.1038/s41587-020-0603-3

Boix-Amorós A, Piras E, Bu K, Wallach D, Stapylton M, Fernández-Sesma A, Malaspina D, Clemente JC. 2021. Viral inactivation impacts microbiome estimates in a tissue-specific manner. mSystems 6(5): e00674–21. doi: 10.1128/mSystems.00674-21

de Castilhos J, Zamir E, Hippchen T, Rohrbach R, Schmidt S, Hengler S, Schumacher H, Neubauer M, Kunz S, Müller-Esch T, et al. 2022. Severe dysbiosis and specific Haemophilus and Neisseria signatures as hallmarks of the oropharyngeal microbiome in critically ill coronavirus disease 2019 (COVID-19) patients. Clinical Infectious Diseases 75(1): e1063–e1071. doi: 10.1093/cid/ciab902

Granato DC, Neves LX, Trino LD, Carnielli CM, Lopes AFB, Yokoo S, Pauletti BA, Domingues RR, Sá JO, Persinoti G, et al. 2021. Meta-omics analysis indicates the saliva microbiome and its proteins associated with the prognosis of oral cancer patients. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1869(8): 140659. doi: 10.1016/j.bbapap.2021.140659

Grenga L, Pible O, Miotello G, Culotta K, Ruat S, Roncato M-A, Gas F, Bellanger L, Claret P-G, Dunyach-Remy C, et al. 2022. Taxonomical and functional changes in COVID-19 faecal microbiome could be related to SARS-CoV-2 faecal load. Environmental Microbiology 24(9): 4299–4316. doi: 10.1111/1462-2920.16028

He F, Zhang T, Xue K, Fang Z, Jiang G, Huang S, Li K, Gu Z, Shi H, Zhang Z, et al. 2021. Fecal multi-omics analysis reveals diverse molecular alterations of gut ecosystem in COVID-19 patients. Analytica Chimica Acta 1180: 338881. doi: 10.1016/j.aca.2021.338881

Jiang X, Zhang Y, Wang H, Wang Z, Hu S, Cao C, Xiao H. 2022. In-depth metaproteomics analysis of oral microbiome for lung cancer. Research 2022: 9781578. doi: 10.34133/2022/9781578

Ke S, Weiss ST, Liu Y-Y. 2022. Dissecting the role of the human microbiome in COVID-19 via metagenome-assembled genomes. Nature Communications 13(1): 5235. doi: 10.1038/s41467-022-32991-w

Liu J, Liu S, Zhang Z, Lee X, Wu W, Huang Z, Lei Z, Xu W, Chen D, Wu X, et al. 2021. Association between the nasopharyngeal microbiome and metabolome in patients with COVID-19. Synthetic and Systems Biotechnology 6(3): 135–143. doi: 10.1016/j.synbio.2021.06.002

Lloyd-Price J, Arze C, Ananthakrishnan AN, Schirmer M, Avila-Pacheco J, Poon TW, Andrews E, Ajami NJ, Bonham KS, Brislawn CJ, et al. 2019. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569(7758): 655–662. doi: 10.1038/s41586-019-1237-9

Maier TV, Lucio M, Lee LH, VerBerkmoes NC, Brislawn CJ, Bernhardt J, Lamendella R, McDermott JE, Bergeron N, Heinzmann SS, et al. 2017. Impact of dietary resistant starch on the human gut microbiome, metaproteome, and metabolome. mBio 8(5): e01343–17. doi: 10.1128/mBio.01343-17

Million M, Raoult D. 2018. Linking gut redox to human microbiome. Human Microbiome Journal 10: 27–32. doi: 10.1016/j.humic.2018.07.002

Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: An ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50(D1): D785–D794. doi: 10.1093/nar/gkab776

The Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214. doi: 10.1038/nature11234

Thuy-Boun PS, Wang AY, Crissien-Martinez A, Xu JH, Chatterjee S, Stupp GS, Su AI, Coyle WJ, Wolan DW. 2022. Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiome identifies host and microbial serine-type endopeptidase activity associated with ulcerative colitis. Molecular & Cellular Proteomics 21(3): 100197. doi: 10.1016/j.mcpro.2022.100197

Yeoh YK, Zuo T, Lui GC-Y, Zhang F, Liu Q, Li AYL, Chung ACK, Cheung CP, Tso EYK, Fung KSC, et al. 2021. Gut microbiota composition reflects disease severity and dysfunctional immune responses in patients with COVID-19. Gut 70(4): 698–706. doi: 10.1136/gutjnl-2020-323020

Zuo T, Zhang F, Lui GCY, Yeoh YK, Li AYL, Zhan H, Wan Y, Chung ACK, Cheung CP, Chen N, et al. 2020. Alterations in gut microbiota of patients with COVID-19 during time of hospitalization. Gastroenterology 159(3): 944–955. doi: 10.1053/j.gastro.2020.05.048