
Amino acids
Isoelectric point & amino acids
Since all 20 amino acids that make up the protein structure have ionizable groups, both in their carbon skeleton and in their side chain, it is possible to predict the isoionic point of an amino acid based on the pKa values presented in these ionizable groups. The pI is also commonly called the isoelectric point, although this definition entails a more complex theoretical scope.
For example, glutamic acid (Glu, E) has an ionizable carboxylate in its side chain, in addition to the amine (-H\(_2\)N) and carboxylate groups of the carbon skeleton Figure 1:

\[ qnet = qb + qa \tag{1}\]
\[ qnet = qb+\frac{1}{1+10^{pH-pKa}} \tag{2}\]
\[ qnet = \sum_{i=1}^{n} {(qb+\frac{1}{1+10^{pH-pKi}})} \tag{3}\]
, with pKi as the nth value of pKa. In this way, it is possible to programmatically determine the titration curve of glutamic acid as a function of its charge, and not of the acid fraction. In this line, qb represents the form of the compound in base, which for Glu will present the values of -1 for the two carboxylates, and 0 for the amine group, making it necessary to compose an additional vector for qb.
# Glu Titration
<- function(pH, qB, pKa) {
qNet <- 0
x for (i in 1:length(qB)) {
<- x + qB[i] + 1 / (1 + 10^(pH - pKa[i]))
x
}return(x)
}<- c(-1, 0, -1)
qB <- c(2.2, 9.7, 4.3)
pKa
curve(qNet(x, qB, pKa), 1, 12, xlab = "pH", ylab = "qNet")
abline(0, 0, lty = "dotted")
locator()
seen previously. But it is also possible to access this value automatically, by applying a command that finds the root of this function, that is, the pH value that corresponds to a null value for qnet. For this, the use of uniroot
is exemplified, in which the desired mathematical function is defined, as well as the lower and upper limits for the search by the algorithm, as follows:# Calculation of pI
<- function(pH) {
f qNet(pH, qB, pKa)
}str(uniroot(f, c(2, 5)))
List of 5
$ root : num 3.25
$ f.root : num -4.8e-06
$ iter : int 4
$ init.it : int NA
$ estim.prec: num 6.1e-05
root
), in 4 iterations, with an estimated precision of 6.1x10\(^{-5}\), and an associated error of -4.8x10\(^{-6}\).This way of obtaining a value using numerical calculation is sometimes called a numerical solution. On the other hand, the pI value for Glu can be obtained by a simpler procedure, usually found in textbooks on the subject, and which takes the form below:
\[ pI = \frac{pKa1+pKa2}{2} \tag{4}\]
Isoionic point & biopolymers
# Lysozyme Titration and pI Determination
# Define function for qNet
<- function(pH, qB, pKa, n) {
qNet <- 0
x for (i in 1:length(qB)) {
<- x + n[i] * qB[i] + n[i] / (1 + 10^(pH - pKa[i]))
x
}return(x)
}
# Define pKas of aCOOH, aNH3 and the 7 side chains of AA
<- c(2.2, 9.6, 3.9, 4.1, 6.0, 8.5, 10.1, 10.8, 12.5)
pKa
# Define qB, the charges of each amino acid in the base form
<- c(-1, 0, -1, -1, 0, -1, -1, 0, 0)
qB
<- c(
ionizable "aCOOH", "aNH3", "Asp", "Glu", "His", "Cys", "Tyr",
"Lys", "Arg"
)<- c(1, 1, 7, 3, 1, 8, 6, 5, 14) # List of amounts of residues
n # ionizable in lysozyme (each element represents the amount
# of aCOOH, aNH3, and certain AA in the enzyme)
# Calculation of pI
<- function(pH) {
f qNet(pH, qB, pKa, n)
}str(uniroot(f, c(1, 13))) # estimation of pI between 10 and 12
List of 5
$ root : num 9.46
$ f.root : num 3.3e-07
$ iter : int 7
$ init.it : int NA
$ estim.prec: num 6.1e-05
# Titration graph
curve(qNet(x, qB, pKa, n), 1, 12, xlab = "pH", ylab = "qNet")
abline(0, 0, lty = 3)
Isoionic point & R libraries
packages
), which is no different for determining biopolymer properties, such as pI.Among the existing libraries for physicochemical properties of proteins and nucleic acids, the
seqinr
package, Biological Sequences Retrieval and Analysis 1, for exploratory analysis and visualization of biopolymers, is an example. To use this package, however, it is necessary to obtain the primary sequence of the protein, represented in a one-letter code. The primary sequence of lysozyme can be obtained from the website of the National Center for Biotechnology Information, NCBI 2. A quick trick involves:type the name of the protein;
select from the resulting options;
click on FASTA to obtain the 1-letter primary sequence.
copy the presented protein sequence to
seqinr
.
seqinr
library is installed, and that the sequence has been obtained for lysozyme (search for CAA32175 or lysozyme [Homo sapiens]), the pI value for it can be found using the following code:library(seqinr)
<- s2c("KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDR
lysozyme STDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVV
RDPQGIRAWVAWRNRCQNRDVRQYVQGCGV")
# convert string sequence to character vector
computePI(lysozyme)
[1] 9.2778
seqinr
. As an example of this variation, seqinr
itself presents different pKa values, depending on the database searched. To verify this, type the command below and view the resulting pK variable.library(seqinr)
data(pK)
seqinr
package.library(knitr)
::kable(pK, "pipe", caption = "Table of pKa values for amino acids
knitrfrom various sources, extracted from the seqinr package.")
Bjellqvist | EMBOSS | Murray | Sillero | Solomon | Stryer | |
---|---|---|---|---|---|---|
C | 9.00 | 8.5 | 8.33 | 9.0 | 8.3 | 8.5 |
D | 4.05 | 3.9 | 3.68 | 4.0 | 3.9 | 4.4 |
E | 4.45 | 4.1 | 4.25 | 4.5 | 4.3 | 4.4 |
H | 5.98 | 6.5 | 6.00 | 6.4 | 6.0 | 6.5 |
K | 10.00 | 10.8 | 11.50 | 10.4 | 10.5 | 10.0 |
R | 12.00 | 12.5 | 11.50 | 12.0 | 12.5 | 12.0 |
Y | 10.00 | 10.1 | 10.07 | 10.0 | 10.1 | 10.0 |