canprot {canprot}R Documentation

Amino Acid Compositions of Human Proteins


Data for amino acid compositions, and updates to UniProt IDs.




The columns of human_aa are compatible with the layout used for amino acid compositions in CHNOSZ (see thermo):

protein character Identification of protein
organism character Identification of organism
ref character Reference key for source of compositional data
abbrv character Abbreviation or other ID for protein
chains numeric Number of polypeptide chains in the protein
Ala...Tyr numeric Number of each amino acid in the protein

Here, the protein column contains the UniProt ID (accession), possibly with a suffix indicating the isoform of the protein (esp. from human_additional.csv).


These amino acid compositions were compiled from amino acid sequences downloaded from UniProt. Amino acid sequences of human proteins were obtained from files in the UniProt reference proteome, dated 2016-04-03, downloaded from

The amino acid compositions of human proteins are stored in three files. human_base.Rdata contains amino acid compositions of proteins in the UniProt reference proteome (UP000005640_9606.fasta.gz containing canonical, manually reviewed sequences). human_additional.Rdata contains amino acid compositions of additional proteins in the UniProt reference proteome (UP000005640_9606_additional.fasta.gz containing isoforms and unreviewed sequences). human_extra.csv contains amino acid compositions of other (“extra”) proteins identified in proteomic experiments but not listed in one of the files above.

On loading the data (with data(canprot)), the individual data files are read and combined using rbind, and the result is assigned to the human_aa object in the canprot environment.

As an aid for processing some datasets that use old (obsoleted) UniProt IDs, the corresponding new (current) IDs are are stored in uniprot_updates. uniprot_updates also lists the source (i.e. reference key) that uses each old ID.



[Package canprot version 0.1.0 Index]