CRAN DOI R-CMD-check

canprot

Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state as described in Dick et al. (2020). Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002; Wagner, 2005; Zhang et al., 2018). A database of amino acid compositions of human proteins derived from UniProt is provided.

See the vignettes at https://chnosz.net/canprot/vignettes/.

Installation

First install the remotes package from CRAN, then install canprot from GitHub. This also installs several other R packages as dependencies:

install.packages("remotes")
remotes::install_github("jedick/canprot")

Demo

Three demos are available. One of them is shown below.

demo("thermophiles")
#demo("locations")
#demo("redoxins")

Specific entropy and Zc and pI for Nitrososphaeria MAGs

This is a scatter plot of standard specific entropy (S° per gram) and carbon oxidation state (ZC) for proteins in Nitrososphaeria (syn. Thaumarchaeota) metagenome-assembled genomes (MAGs) reported by Luo et al. (2024). S° is calculated using amino acid group contributions (Dick et al, 2006) via canprot::S0g(). This plot reveals that proteins tend to have higher specific entropy in MAGs from thermal habitats compared to those from nonthermal habitats with similar carbon oxidation state. This implies that, after correcting for ZC, proteins in thermophiles have a more negative derivative of the standard Gibbs energy per gram of protein with respect to temperature. See the Demos for canprot vignette for a similar plot for genomes of methanogenic archaea.