pdat_ {canprot}R Documentation

Get Protein Data


Get data on protein expression and chemical composition.


  pdat_CRC(dataset = NULL, basis = "QEC")



character, specifies which dataset to retrieve


character, keyword for basis species to use


The pdat_ functions read CSV data files for data on relatively up- and down-expressed proteins reported in proteomic experiments, then uses protcomp to calculate chemical compositional metrics.

The data files available in the package are stored in the extdata directory, with subdirectories corresponding to the names of the functions. Use pdat_CRC to retrieve data for protein expression in colorectal cancer, pdat_pancreactic for data on pancreatic cancer, pdat_hypoxia for data on hypoxia or 3D culture, and pdat_osmotic for data on hyperosmotic stress.

If dataset is NULL, the return value gives the names of all datasets that can be retrieved using the function. Provide one of these names as the dataset argument to retrieve the data. Each dataset name indicates the study (publication) where the data were reported, constructed by combining the first characters of the (first three or four) authors' family names with the 2-digit year of publication. This coincides with the key-generation scheme used in some bibliography manager software. This abbreviation also is used to name the CSV file containing the data. If more than one dataset is available from a single study (for example, for relative protein expression in different stages of cancer), dataset is suffixed by an underscore followed by a short abbreviation indicating the particular dataset.

Tables listing mean compositional differences between up- and down-expressed proteins for each dataset are saved in extdata/summary/. These files were created using the second example below.


A list consisting of dataset (the name of the dataset), basis (basis species used for the calculations), description (descriptive text), pcomp (compositional data generated by protcomp), up2 (logical vector with length equal to the number of proteins; TRUE if the protein is up-expressed in group 2 compared to group 1 (i.e. cancer compared to normal), FALSE otherwise), names (gene names for the proteins, if available).

See Also



pdat_CRC("JKMF10")  # same result as get_pdat("JKMF10")

## Not run: 
# how the extdata/summary/summary_*.csv files were made
for(what in c("CRC", "pancreatic", "hypoxia", "osmotic")) {
  pdat_fun <- paste0("pdat_", what)
  datasets <- get(pdat_fun)()
  comptab <- lapply_canprot(datasets, function(dataset) {
    pdat <- get_pdat(dataset, pdat_fun)
    ZC_nH2O(pdat, plot.it = FALSE)
  }, varlist = "pdat_fun")
  # write summary table
  comptab <- do.call(rbind, comptab)
  comptab <- cbind(set = c(letters, LETTERS)[1:nrow(comptab)], comptab)
  comptab[, 6:15] <- signif(comptab[, 6:15], 4)
  filename <- paste0("summary_", what, ".csv")
  write.csv(comptab, filename, row.names = FALSE, quote = 3)
## End(Not run)

[Package canprot version 0.1.0 Index]