add.protein {CHNOSZ} | R Documentation |
Functions to get amino acid compositions and add them to protein list for use by other functions.
add.protein(aa, as.residue = FALSE)
seq2aa(sequence, protein = NA)
aasum(aa, abundance = 1, average = FALSE, protein = NULL, organism = NULL)
aa |
data frame, amino acid composition in the format of |
as.residue |
logical, normalize by protein length? |
sequence |
character, protein sequence |
protein |
character, name of protein; numeric, indices of proteins (rownumbers of |
abundance |
numeric, abundances of proteins |
average |
logical, return the weighted average of amino acid counts? |
organism |
character, name of organism |
A ‘protein’ in CHNOSZ is defined by its identifying information and the amino acid composition, stored in thermo$protein
.
The names of proteins in CHNOSZ are distinguished from those of other chemical species by having an underscore character ("_") that separates two identifiers, referred to as the protein
and organism
.
An example is ‘LYSC_CHICK’.
The purpose of the functions described here is to identify proteins and work with their amino acid compositions.
From the amino acid compositions, the thermodynamic properties of the proteins can be estimated by group additivity.
seq2aa
returns a data frame of amino acid composition for the provided sequence
, in the format of thermo()$protein
.
In this function, the value of the protein
argument is put into the protein
column of the result.
If there is an underscore (e.g. ‘LYSC_CHICK’), it is used to split the text, and the two parts are put into the protein
and organism
columns.
Given amino acid compositions returned by seq2aa
, add.protein
adds them to thermo()$protein
for use by other functions in CHNOSZ.
The amino acid compositions of proteins in aa
with the same name as one in thermo()$protein
are replaced.
Set as.residue
to TRUE to normalize by protein length; each input amino acid composition is divided by the corresponding number of residues, with the result that the sum of amino acid frequencies for each protein is 1.
aasum
returns a data frame representing the sum of amino acid compositions in the rows of the input aa
data frame.
The amino acid compositions are multiplied by the indicated abundance
; that argument is recycled to match the number of rows of aa
.
If average
is TRUE the final sum is divided by the number of input compositions.
The name used in the output is taken from the first row of aa
or from protein
and organism
if they are specified.
For seq2aa
, a data frame of amino acid composition and identifying information for proteins.
For add.protein
, the rownumbers of thermo()$protein
that are added and/or replaced.
For aasum
, a one-row data frame of amino acid composition and identifying information.
read.fasta
for another way of getting amino acid compositions that can be used with add.protein
.
pinfo
for protein-level functions (length, chemical formulas, reaction coefficients of basis species).
# Get the amino acid composition of a protein sequence
# (Human Gastric juice peptide 1)
aa <- seq2aa("LAAGKVEDSD", "GAJU_HUMAN")
# Add the protein to CHNOSZ
ip <- add.protein(aa)
# Calculate the protein length and chemical formula
protein.length(ip) # 10
as.chemical.formula(protein.formula(ip)) # "C41H69N11O18"
# Calculate a formula without using add.protein
aa <- seq2aa("ANLSG", "pentapeptide_test")
as.chemical.formula(protein.formula(aa))
# Sum the amino acid compositions of several poliovirus protein subunits
file <- system.file("extdata/protein/POLG.csv", package = "CHNOSZ")
aa <- read.csv(file, as.is = TRUE)
aasum(aa, protein = "POLG_sum")