pdat_ {JMDplots} | R Documentation |
These functions are used to retrieve the amino acid compositions of differentially expressed proteins or proteins corresponding to differentially expressed genes.
pdat_breast(dataset = 2020)
pdat_colorectal(dataset = 2020)
pdat_liver(dataset = 2020)
pdat_lung(dataset = 2020)
pdat_pancreatic(dataset = 2020)
pdat_prostate(dataset = 2020)
pdat_hypoxia(dataset = 2020)
pdat_secreted(dataset = 2020)
pdat_3D(dataset = 2020)
pdat_glucose(dataset = 2020)
pdat_osmotic_bact(dataset = 2020)
pdat_osmotic_euk(dataset = 2020)
pdat_osmotic_halo(dataset = 2020)
.pdat_multi(dataset = 2020)
.pdat_osmotic(dataset = 2017)
pdat_osmotic_gene(dataset = 2020)
pdat_TCGA(dataset = 2020)
pdat_HPA(dataset = 2020)
pdat_aneuploidy(dataset = 2020)
pdat_yeast_stress(dataset = 2020)
pdat_fly(dataset = NULL)
dataset |
character, dataset name |
The pdat_
functions assemble lists of up- and down-regulated proteins and retrieve their amino acid compositions using human_aa
.
The result can be used with get_comptab
to make a table of chemical metrics that can then be plotted with diffplot
.
If dataset
is ‘2020’ (the default) or ‘2017’, the function returns the names of all datasets in the compilation for the respective year.
Each dataset name starts with a reference key indicating the study (i.e. paper or other publication) where the data were reported. The reference keys are made by combining the first characters of the authors' family names with the 2-digit year of publication.
If a study has more than one dataset, the reference key is followed by an underscore and an identifier for the particular dataset.
This identifier is saved in the variable named stage
in the functions, but can be any descriptive text.
To retrieve the data, provide a single dataset name in the dataset
argument.
Protein expression data is read from the CSV files stored in extdata/expression/
or extdata/canH2O
, under the subdirectory corresponding to the name of the pdat_
function.
Some of the functions also read amino acid compositions (e.g. for non-human proteins) from the files in extdata/aa/
.
The following functions are used in the canH2O
paper:
pdat_colorectal
, pdat_pancreatic
, pdat_breast
, pdat_lung
, pdat_prostate
, and pdat_liver
retrieve data for protein expression in different cancer types.
pdat_hypoxia
gets data for cellular extracts in hypoxia and pdat_secreted
gets data for secreted proteins (e.g. exosomes) in hypoxia.
pdat_3D
retrieves data for 3D (e.g. tumor spheroids and aggregates) compared to 2D (monolayer) cell culture.
.pdat_osmotic
retrieves data for hyperosmotic stress, for the 2017 compilation only.
In 2020, this compilation was expanded and split into pdat_osmotic_bact
(bacteria), pdat_osmotic_euk
(eukaryotic cells) and pdat_osmotic_halo
(halophilic bacteria and archaea).
pdat_glucose
gets data for high-glucose experiments in eukaryotic cells.
.pdat_multi
retrieves data for studies that have multiple types of datasets (e.g. both cellular and secreted proteins in hypoxia), and is used internally by the specific functions (e.g. pdat_hypoxia
and pdat_secreted
).
pdat_osmotic_gene
, which is used in the gradH2O
paper, gets data for differentially expressed genes in transcriptomic experiments for bacteria.
pdat_fly
, which is used in the evdevH2O
paper, gets data for differentially expressed genes or proteins in adult flies compared to embryos (Fabre et al., 2019).
The following functions are used in the canH2O
paper:
pdat_TCGA
calculates chemical composition for proteins corresponding to differentially expressed genes listed by GEPIA2 (Tang et al., 2019).
Expression levels for normal tissue and cancer are from the Genotype-Tissue Expression project (GTEx Consortium, 2017) and The Cancer Genome Atlas (TCGA Research Network et al., 2013; Hutter and Zenklusen, 2018).
pdat_HPA
calculates chemical composition for protein expression data from the Human Protein Atlas (Uhlen et al., 2015).
pdat_aneuploidy
gets data for differentially expressed genes in aneuploid vs haploid yeast cells (Tsai et al., 2019).
pdat_yeast_stress
gets data for differentially expressed genes in hyper- and hypo-osmotic stress in yeast (Gasch et al., 2000).
Except for pdat_aneuploidy
and pdat_fly
, each of the functions has a corresponding vignette.
These vignettes are located in inst/vignettes
, so they are not built with the package, but can be run using mkvig
.
The output files (CSV) from the vignettes are included in the package, and they are used in some figures: gradH2O7
(osmotic_gene), canH2O3
(TCGA, HPA), canH2O5
(yeast_stress).
A list consisting of:
dataset
Name of the dataset
description
Descriptive text for the dataset, used for making the tables in the vignettes
pcomp
UniProt IDs together with amino acid compositions obtained using human_aa
up2
Logical vector with length equal to the number of proteins; TRUE for up-regulated proteins and FALSE for down-regulated proteins
GEPIA2.csv.xz
This file has gene names, Ensembl Gene IDs, UniProt IDs, and log2 fold change values for differentially expressed genes in each cancer type available on the GEPIA2 server (http://gepia2.cancer-pku.cn/#degenes) run using default settings (ANOVA, log2FC cutoff = 1, q-value cutoff = 0.01). Ensembl Gene IDs were converted to UniProt IDs using the UniProt mapping tool (https://www.uniprot.org/mapping/).
HPA.csv.xz
This file has gene names and Ensembl Gene IDs, UniProt IDs, and expression level values for cancer and normal tissue derived from the pathology.tsv.zip
and normal_tissue.tsv.zip
data files in the Human Protein Atlas, version 19 (https://proteinatlas.org).
To calculate the expression level values, antibody staining intensities were converted to a numeric scale (not detected: 0, low: 1, medium: 3, high: 5), and values for all available samples were averaged (including all cell types for normal tissues).
Ensembl Gene IDs were converted to UniProt IDs using the UniProt mapping tool (https://www.uniprot.org/mapping/).
GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature 550, 204–213. doi:10.1038/nature24277
Fabre B, Korona D, Lees JG, Lazar I, Livneh I, Brunet M, Orengo CA, Russell S and Lilley KS (2019) Comparison of Drosophila melanogaster embryo and adult proteome by SWATH-MS reveals differential regulation of protein synthesis, degradation machinery, and metabolism modules. J. Proteome Res. 18, 2525–2534. doi:10.1021/acs.jproteome.9b00076
Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D and Brown PO (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257. doi:10.1091/mbc.11.12.4241
Hutter C and Zenklusen JC (2018) The Cancer Genome Atlas: Creating lasting value beyond its data. Cell 173, 283–285. doi:10.1016/j.cell.2018.03.042
Tang Z, Kang B, Li C, Chen T and Zhang Z (2019) GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res. 47, W556–W560. doi:10.1093/nar/gkz430
The Cancer Genome Atlas Research Network et al. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genetics 45, 1113–1120. doi:10.1038/ng.2764
Tsai H-J et al. (2019) Hypo-osmotic-like stress underlies general cellular defects of aneuploidy. Nature 570, 117–121. doi:10.1038/s41586-019-1187-2
Uhlen M et al. (2015) A pathology atlas of the human cancer transcriptome. Science 357, eaan2507. doi:10.1126/science.aan2507
# list datasets for cancer types in TCGA
pdat_TCGA(2020)
# process one dataset (adrenocortical carcinoma)
pdat_TCGA("GEPIA2_ACC")
# show the median values and differences of
# ZC and nH2O between groups of proteins
# corresponding to up- and down-regulated genes
(get_comptab(pdat_TCGA("GEPIA2_ACC")))
# list datasets for cancer types in HPA
pdat_HPA(2020)
# process one dataset (breast cancer)
pdat_HPA("HPA19_1")
# List datasets in the 2017 complilation for colorectal cancer
pdat_colorectal(2017)
# Get proteins and amino acid compositions for one dataset
pdat_colorectal("JKMF10")