# Definition of matrices
<- matrix(c(1, -308, 1, -318), ncol = 2, byrow = TRUE) # matrix A;
A # the negative sign is due to the function having a negative slope
<- matrix(c(4.4, -5.2), ncol = 1) # matrix b b
Biothermodynamics
Bioenergetics
- Stability of biopolymers;
- Ligand-receptor interaction;
- Transport of biomolecules and ions;
- Conformational changes in biomacromolecules;
- Association of biopolymers;
- Electron transfer in proteins;
- Combustion and synthesis of biomolecules;
- Generation of metabolic energy.
\[ \Delta G = \Delta H - T * \Delta S \tag{1}\]
\[ N \rightleftarrows D \tag{2}\]
\[ K_{eq} = \frac{[D]}{[N]} \tag{3}\]
\[ \Delta G = - R*T*ln(K_{eq}) \tag{4}\]
Where \(K_{eq}\), [D], and [N] represent, respectively, the equilibrium constant for protein denaturation, as well as its concentrations in the denatured and native forms.
At \(35^{o}\)C: \(\Delta\)G\(_{desn}\) = \(\Delta\)H\(_{desn}\) - 308 * \(\Delta\)S\(_{desn}\) = +4.4 kJ \(mol^{-1}\)
At \(45^{o}\)C: \(\Delta\)G\(_{desn}\) = \(\Delta\)H\(_{desn}\) - 318 * \(\Delta\)S\(_{desn}\) = -5.2 kJ \(mol^{-1}\)
Another solution, more practical and computational, involves solving the system of linear equations, as follows:
\[ a_{11}*x_1 + a_{12} * x_2 = b_1 \\ a_{21}*x_1 + a_{22} * x_2 = b_2 \tag{5}\]
Where \(x_1\) and \(x_2\) represent, respectively, \(\Delta\)H\(_{desn}\) and \(\Delta\)S\(_{desn}\).
\[ a_{11}*x_1 + a_{12} * x_2 = b_1 \\ a_{21}*x_1 + a_{22} * x_2 = b_2 \tag{6}\]
\[ A = \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} , \]
\[ x = \begin{bmatrix} b_1\\ b_2 \end{bmatrix} , \]
\[ b = \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \]
\[ A * x = b \tag{7}\]
\[ x = A^{-1} * b \tag{8}\]
Solution of a system of linear equations in R
R
, we first define the matrix for A and the matrix for b such that:\[ A = \begin{bmatrix} 1 & -308\\ 1 & -318 \end{bmatrix} , \]
\[ b = \begin{bmatrix} +4.4\\ -5.2 \end{bmatrix} \]
solve
command:# Matrix solution for linear system
solve(A) %*% b # # the %*% operation indicates the scalar product of two
[,1]
[1,] 300.08
[2,] 0.96
# vectors ("dot product")
\[ \begin{pmatrix} 1 \\ 2 \\ 3 \\ \end{pmatrix} * \begin{pmatrix} 1 & 2 & 3 \\ \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 6 & 9 \\ \end{pmatrix} \tag{9}\]
Thus, as seen in the Enzymes chapter, we could solve for the parameters \(\Delta\)H\(_{desn}\) and \(\Delta\)SH\(_{desn}\) by linear fit. In fact, one of the Van’t Hoff expressions that define this linear relationship is:
\[ ln \, K_{eq} = - \frac{\Delta H}{R} * \frac{1}{T} + \frac{\Delta S}{R} \tag{10}\]
In fact, the linear adjustment matrix solution can be obtained from the relation below:
\[ \beta = (X^T \; X)^{-1} \; X^T*y \tag{11}\]
lm
function are extracted by other functions of the matrix/statistical calculation algorithm. In Equation 11 the term in parentheses involves the inversion operation of the matrix. In linear algebra there is no division operation for matrices, but only the multiplication of a matrix by a scalar or by the inverse of another. And even then, only if it is a square matrix. Thus, the term (X\(^{T}\) X)\(^{-1}\) can only be calculated with matrix inversion. In R
this action is performed by the solve
command.As before, it is also vitally important that the matrix X containing the independent variable is created with unit values on the left, as follows:
\[ X = \begin{bmatrix} 1 & x_{1}\\ 1 & x_{2}\\ 1 & x_{3}\\ ... & ... \end{bmatrix} \]
# Matrix solution for the Lineweaver-Burk kinetic parameters
# Repeating the data for the Lineweaver-Burk variables
<- seq(0.1, 1, length.out = 20) # generates a sequence with 20 points
S # between 0 and 1 for substrate values
<- 10
Vm <- 0.5 # kinetic parameters
Km set.seed(1500) # establishes the same random seed as the direct graph
# of Michaelis-Menten, for reproducibility of the points
<- runif(20, 0, 1) # command for uniform error (no. of points, min, max)
error <- Vm * S / (Km + S) + error # Michaelis-Menten equation
v <- 1 / S # create variables for the double reciprocal
inv.S <- 1 / v
inv.v
# Creation of matrices A and b
<- matrix(c(rep(1, 20), inv.S), nrow = 20, byrow = FALSE) # create matrix
A2 # with unit value required before the independent variable
<- as.matrix(inv.v, nrow = 1, byrow = FALSE) # vector b
b2
# Matrix solution of the linear fit
<- solve(t(A2) %*% A2) %*% t(A2) %*% b2
beta beta
[,1]
[1,] 0.11363419
[2,] 0.03277244
lm
function of R
.Matrices and R
Therefore, it is interesting to have a quick overview of the matrix potential that
R
has.# Some manipulations with matrices
## Identifying rows and columns
<- matrix(c(-308, -318),
res nrow = 2, byrow = TRUE, # matrix definition
dimnames = list(c("Delta H", "Delta S"), "kJ/mol")
) res
kJ/mol
Delta H -308
Delta S -318
## Arithmetic operations
<- matrix(c(1, 2, 3, 4), nrow = 2, byrow = T)
m1 <- matrix(c(4, 5, 6, 7), nrow = 2, byrow = T)
m2 + 5 m1
[,1] [,2]
[1,] 6 7
[2,] 8 9
- 7 # addition or subtraction in scalar m2
[,1] [,2]
[1,] -3 -2
[2,] -1 0
^2 m1
[,1] [,2]
[1,] 1 4
[2,] 9 16
sin(m2) # power or trigonometry
[,1] [,2]
[1,] -0.7568025 -0.9589243
[2,] -0.2794155 0.6569866
+ m2 # addition of elements in matrices of equal dimension m1
[,1] [,2]
[1,] 5 7
[2,] 9 11
* m2 # multiplication of elements in matrices of equal dimension m1
[,1] [,2]
[1,] 4 10
[2,] 18 28
%*% m2 # dot product of vectors m1
[,1] [,2]
[1,] 16 19
[2,] 36 43
det(m1) # determinant of a matrix
[1] -2
t(m2) # transposition of a matrix
[,1] [,2]
[1,] 4 6
[2,] 5 7
diag(m1) # diagonal matrix
[1] 1 4
solve(m2) # inverse of a matrix
[,1] [,2]
[1,] -3.5 2.5
[2,] 3.0 -2.0
eigen(m1) # eigenvalue and eigenvector of a matrix
eigen() decomposition
$values
[1] 5.3722813 -0.3722813
$vectors
[,1] [,2]
[1,] -0.4159736 -0.8245648
[2,] -0.9093767 0.5657675
R
also supports several other operations used in numerical and symbolic calculations that use matrices, such as the functions kronecker
(matrix multiplication), svd
(Single Value Decomposition), qr
(QR Decomposition), and chol
(Cholesky Decomposition).Solving thermodynamic parameters of enzyme stability
\[ \begin{cases}\Delta G^\ddagger = \Delta H^\ddagger-T* \Delta S^\ddagger \\ ln(\frac{kcat*h}{K_B*T})=-\frac{1}{RT}*\Delta H^\ddagger + \frac{1}{R}*\Delta S^\ddagger \end{cases} \tag{12}\]
Where the terms with \(\ddagger\) symbolize the variations of quantities related to the activation (or deactivation) of the enzyme (transition state of the activated complex), \(K_{B}\) represents the Boltzmann constant (1.381x10\(^{-23}\) JK\(^{-1}\)), h the Planck constant (6.686x10\(^{-34}\) J*s), and R the general gas constant (8.314 JK\(^{-1}\) mol\(^{-1}\)). It is not always possible to converge a matrix solution by simply using cross multiplication (dot product).
\[ \Delta G^{\ddagger}=65920\, J\,mol^{-1} \\ T = 328 K \\ kcat = 217 s^{-1} \tag{13}\]
\[ A = \begin{bmatrix} 1 & -328\\ -3.67e-4 & 0.120\\ \end{bmatrix} , \]
\[ b = \begin{bmatrix} 65920\\ -24.17\\ \end{bmatrix} \]
# Attempt at a simple matrix solution in published data
# (Appl. Microbiol. Biotechnol, 73, 1290, 2007):
<- matrix(c(1, -3.67e-4, -328, 0.120), nrow = 2, byrow = TRUE)
A <- matrix(c(65921, -24.17), nrow = 2)
b solve(A) %*% b
[,1]
[1,] -21038593
[2,] -57505488910
solve
command also suffers from solving the problem, incurring an error if executed, as in the following excerpt. Also note the distinct possibility for constructing the matrices.<- data.frame(c(1, -3.67e-4), c(-328, 0.120))
Aframe <- as.matrix(Aframe)
A <- as.matrix(c(65921, -24.17))
b solve(t(A) %*% A) %*% t(A) %*% b
R
packages, such as rootSolve
already used or nleqslv
used previously. In this sense, the root search minimization for the system of equations can be demonstrated as follows:# Minimization for roots of a system of thermodynamic equations
library(rootSolve)
<- 328
T <- 8.314
R <- 6.626e-34
h <- 1.381e-23
kb <- 217
kcat <- -R * T * log((kcat * h) / (kb * T)) # 65920 J/mol
DG <- function(x) c(x[1] - T * x[2] - DG, x[2] / R - x[1] / (R * T) -
model log((kcat * h) / (kb * T)))
<- multiroot(model, start = c(50000, 50000))) (ss
$root
[1] 40843.50837 -76.45441
$f.root
[1] 2.110028e-09 -7.744916e-13
$iter
[1] 3
$estim.precis
[1] 1.055401e-09
Comparing the values, the authors found \(\Delta\) H\(^{\ddagger}\) = 33.3 kJmol\(^{-1}\) and \(\Delta\) S\(^{\ddagger}\) = -99.8 Jmol\(^{-1}\)K\(^{-1}\). Note the similarity of the results obtained by minimizing roots with the parameters found by the authors. The value of \(\Delta\) H\(^{\ddagger}\) for these, however, was obtained only from obtaining the experimental value of Arrhenius activation energy (Ea), by the slope of a linearized graph of the reaction rate, as follows:
\[ k = A *e^{-Ea/RT} \\ ln(k) = \frac{\Delta S^{\ddagger}}{R} - \frac{\Delta H^{\ddagger}}{R} * \frac{1}{T} \tag{14}\]
\[ k = f(kcat) = e^{-\Delta G^{\ddagger}/RT} \tag{15}\]
require(rootSolve)
<- 328
T <- 8.314
R <- 6.626e-34
h <- 1.381e-23
kb <- 217
kcat <- 66000
DG <- function(x) c(x[1] - T * x[2] - DG, x[2] / R - x[1] / (R * T) -
model log((kcat * h) / (kb * T)))
<- multiroot(model, start = c(50000, 50000))) (ss
$root
[1] 51579492242 157254348
$f.root
[1] 177.50881195 -0.09422566
$iter
[1] 3
$estim.precis
[1] 88.80152
Enthalpy of Reaction by Matrices
\[ 0 = \sum_{i=1}^{N} \nu_i B_i \tag{16}\]
\[ 2 C_2H_2(g)+ 5 O_2(g) \rightleftarrows 4 CO_2(g) + 2 H_2O(l), \\ \Delta_fH^o = -2600 \, kJ/mol\\ 2 C_2H6(g) + 7 O_2(g) \rightleftarrows 4 CO_2(g) + 6 H_2O(l), \\ \; \Delta_fH^o = -3210 \, kJ/mol \\ H_2(g) + \frac{1}{2} O_2(g) \rightleftarrows H_2O(l), \\ ; \Delta_fH^o = -286 \, kJ/mol\\ C_2H_2(g) + 2H_2(g) \rightleftarrows C_2H_6(g), \\ \; \Delta_fH^o = ? \tag{17}\]
Mathematically, Hess’s Law can be expressed as:
\[ \Delta_fH^o = \sum_{n=1}^{\infty} \nu \Delta_fH^o_P - \sum_{n=1}^{\infty} \nu \Delta_fH^o_R \tag{18}\]
Where \(\nu\) represents the stoichiometry of the reaction, that is, the number of moles of each compound/element, while P and R refer to the Product and Reactant.
To do this, it is necessary to 1) compose the matrices A and b, 2) calculate the vector of coefficients beta, and 3) perform the scalar product (%*%) of beta with a matrix formed by the formation enthalpy values. The rationale for composing the matrices involves listing each compound with its reaction stoichiometry, and requires that a negative value be given to reactants, while a positive value is given to products.
The table below illustrates this construction for the problem in question.
library(knitr) # to generate the table
<- c("C2H2", "O2", "CO2", "H2O", "C2H6", "H2") # list of compounds
comp # involved
<- c(-2, -5, +4, +2, 0, 0) # stoichiometry (reaction1)
rea1 <- c(0, -7, +4, +6, -2, 0) # stoichiometry (reaction2)
rea2 <- c(0, -0.5, 0, +1, 0, -1) # stoichiometry (reaction3)
rea3 <- c(-1, 0, 0, 0, +1, -2) # stoichiometry of the reaction with enthalpy
incog # unknown
<- data.frame(comp, rea1, rea2, rea3, incog) # dataframe with the
tab_esteq # results
colnames(tab_esteq) <- c("compound", "reaction 1", "reaction 2", "reaction 3",
"ethane") # name the columns
::kable(tab_esteq, caption = "Reaction stoichiometry for a matrix solution of ethane formation (C2H6).", "pipe") # table knitr
compound | reaction 1 | reaction 2 | reaction 3 | ethane |
---|---|---|---|---|
C2H2 | -2 | 0 | 0.0 | -1 |
O2 | -5 | -7 | -0.5 | 0 |
CO2 | 4 | 4 | 0.0 | 0 |
H2O | 2 | 6 | 1.0 | 0 |
C2H6 | 0 | -2 | 0.0 | 1 |
H2 | 0 | 0 | -1.0 | -2 |
# Matrix solution for enthalpy of formation
<- matrix(c(-2, 0, 0, -5, -7, -0.5, 4, 4, 0, 2, 6, 1, 0, -2, 0, 0, 0, -1),
A nrow = 6, byrow = T) # create matrix of reactions with known
# enthalpy variation
<- matrix(c(-1, 0, 0, 0, 1, -2), nrow = 6, byrow = T) # create matrix of
b # stoichiometric coefficients of reaction with unknown
# enthalpy variation
<- solve(t(A) %*% A) %*% t(A) %*% b # beta calculation
beta <- matrix(c(-2600, -3210, -286), nrow = 1, byrow = T) # create matrix
energ # with enthalpy values
<- energ %*% beta
ethane cat("Value for deltaHr ethane: ", ethane, " kJ/mol")
Value for deltaHr ethane: -267 kJ/mol
Thermodynamic Quantities by Polynomial Fit
However, we could not use linear matrix relationships or linear fits to solve quantitative parameters in situations that do not rely on linear behavior between variables, as reported by Equation 10, for example.
# Dependence of T on deltaG for insulin and receptor
<- c(5.29, 10.07, 15.23, 20.21, 25.11, 30.29, 37.39) + 273
T # temperature data, in degrees Kelvin
<- c(11.74, 12.17, 12.46, 12.73, 12.88, 12.98, 13.13) * -1e3
dG # -deltaG data, in kcal/mol
plot(T, dG,
xlab = "T, K", ylab = expression(paste(Delta, "G, kcal/mol"))
)

# Polynomial fit of thermodynamic parameters
<- lm(dG ~ poly(T, 3, raw = TRUE)) # fit to 3rd degree polynomial; # "raw=TRUE" is essential
pol3 # Alternatively, one can also fit polynomials as
# pol3<-lm(dG ~ T + I(T^2)+I(T^3))
summary(pol3)
Call:
lm(formula = dG ~ poly(T, 3, raw = TRUE))
Residuals:
1 2 3 4 5 6 7
2.350 -13.432 26.731 -16.407 -8.577 12.164 -2.829
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.970e+05 2.449e+05 3.663 0.0352 *
poly(T, 3, raw = TRUE)1 -8.836e+03 2.498e+03 -3.537 0.0385 *
poly(T, 3, raw = TRUE)2 2.866e+01 8.492e+00 3.375 0.0433 *
poly(T, 3, raw = TRUE)3 -3.105e-02 9.613e-03 -3.230 0.0482 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.6 on 3 degrees of freedom
Multiple R-squared: 0.999, Adjusted R-squared: 0.9981
F-statistic: 1045 on 3 and 3 DF, p-value: 5.02e-05
plot(T, dG,
xlab = "T, K", ylab = expression(paste(Delta, "G, kcal/mol"))
# graph of T x deltaG
)lines(T, fitted(pol3), col = "red") # curve fitted to the data

\[ A = \begin{bmatrix} 1 & 278.29\\ 1 & 283.07\\ 1 & 288.23\\ 1 & 293.21\\ 1 & 298.11\\ 1 & 303.29\\ 1 & 310.39\\ \end{bmatrix} , \] \[ b = \begin{bmatrix} -11740\\ -12170\\ -12460\\ -12730\\ -12880\\ -12980\\ -13130\\ \end{bmatrix} \]
\[ matrix \,V = \begin{bmatrix} 1 & x_1 & x_1^2 & x_1^3 & ... \\ 1 & x_2 & x_2^2 & x_2^3 & ... \\ 1 & x_3 & x_3^2 & x_3^3 & ...\\ 1 & ...& ... & ... & ...\\ \end{bmatrix} , \]
R
has a package to automate this transformation, matrixcalc
, exemplified in the code snippet below:Now just apply the same matrix relation of Equation 8, in this case, for four interleaved points of the experimental data above, and therefore producing a 4th degree polynomial:
# Polynomial adjustment 4th degree for thermodynamic parameters
<- c(5.29, 15.23, 25.11, 37.39) + 273 # temperature data,
T # in degrees Kelvin
<- c(11.74, 12.46, 12.88, 13.13) * -1e3 #
dG
library(matrixcalc)
# Creating matrices A (Vandermonde) and b
<- as.matrix(dG, nrow = 4, byrow = TRUE) # vector b
b <- vandermonde.matrix(alpha = T, n = 4)
A # function to create alternation matrix (Vandermonde) A
[,1] [,2] [,3] [,4]
[1,] 1 278.29 77445.32 21552259
[2,] 1 288.23 83076.53 23945149
[3,] 1 298.11 88869.57 26492908
[4,] 1 310.39 96341.95 29903579
<- solve(A) %*% b
sol.vnd # polynomial coefficients (4th degree) sol.vnd
[,1]
[1,] 5.095658e+05
[2,] -4.886799e+03
[3,] 1.525183e+01
[4,] -1.589352e-02
polynom
package:library(polynom) # converts vector of coefficients into symbolic polynomial
<- as.polynomial(sol.vnd)
p <- as.function(p) # allows converting the polynomial to the curve function
p2 plot(T, dG)
curve(p2, from = 273, to = 315, add = TRUE) # smooth polynomial curve
R
eigenfunction (lm
) or by the matrix solution above, reveals good adherence of the model to the experimental data, as represented by Figure 2 and the results table above, there is no correlation of thermodynamic parameters, since it is an empirical mathematical model, and not an analytical one for the system.However, it is possible to obtain a good approximation of the quantities \(\Delta\)H (enthalpy), \(\Delta\)S (entropy) and \(\Delta\)Cp (heat capacity) that phenomenologically model the thermodynamic behavior at a given temperature, by specific relations between these quantities (Edelhoch and Osborne Jr 1976).
\[ \Delta S = -(\frac{\partial \Delta G}{\partial T})_p \tag{19}\]
\[ \Delta G = a+bT+cT^2+dT^3 \tag{20}\]
Thus, the value of \(\Delta\)S can be obtained by the first derivative of the empirical relation above (Equation 20):
\[ \Delta S = -(\frac{\partial \Delta G}{\partial T})_p = -b-2cT-3dT^2 \tag{21}\]
The value of \(\Delta\)H, in turn, can now be extracted from the Equation 22 repeated here, together with the Equation 20:
\[ \Delta G = \Delta H - T * \Delta S \tag{22}\]
\[ \Delta H = \Delta G +T * \Delta S \tag{23}\]
\[ \Delta H = (a+bT+cT^2+dT^3) +T(-b-2cT-3dT^2) \tag{24}\]
\[ \Delta H = a-cT^2-2dT^3 \tag{25}\]
\[ \Delta Cp = -(\frac{\partial \Delta H}{\partial T})_p \tag{26}\]
That is, the value of \(\Delta\)Cp can be approximated by the first derivative of \(\Delta\)H (Equation 24) over T. That is:
\[ \Delta Cp = -2cT-6dT^2 \tag{27}\]
# Polynomial solution of thermodynamic parameters for interaction of
# insulin with receptor
<- c(5.29, 10.07, 15.23, 20.21, 25.11, 30.29, 37.39) +
T 273 # temperature data, in degrees Kelvin
<- c(11.74, 12.17, 12.46, 12.73, 12.88, 12.98, 13.13) *
dG -1e3 # -deltaG data, in kcal/mol
# Fit to 2nd degree polynomial
<- lm(dG ~ poly(T, 3, raw = TRUE)) # fit to 3rd degree polynomial degree
pol3
<- 298 # reference temperature, in degrees Kelvin
Tref
# Calculations
<- coef(pol3)[1] + coef(pol3)[2] * Tref + coef(pol3)[3] *
dG ^2 + coef(pol3)[4] * Tref^3 # deltaG
Tref<- -coef(pol3)[2] - 2 * coef(pol3)[3] * Tref - 3 * coef(pol3)[4] *
dS ^2 # deltaS
Tref<- coef(pol3)[1] - coef(pol3)[3] * Tref^2 - 2 * coef(pol3)[4] *
dH ^3 # deltaH
Tref<- -2 * coef(pol3)[3] * Tref - 6 * coef(pol3)[4] * Tref^2 # deltaCp
dCp
# Parameters in 298 K
cat("deltaG value: ", dG, "cal/mol", "\n")
deltaG value: -12868.43 cal/mol
cat("deltaS value: ", dS, "cal/mol/K", "\n")
deltaS value: 27.29257 cal/mol/K
cat("deltaH value: ", dH, "cal/mol", "\n")
deltaH value: -4735.248 cal/mol
cat("deltaCp value: ", dCp, "cal/mol/K", "\n")
deltaCp value: -537.5956 cal/mol/K
Thermodynamic Stability of Biopolymers
\[ f_D+f_N = 1 \tag{28}\]
Which results in:
\[ f_N = 1 - f_D \tag{29}\]
\[ S = f_N * S_N +f_D * S_D \tag{30}\]
\[ f_D = \frac{S_i-S_N}{S_D-S_N} \tag{31}\]
Where Si represents the signal at point i.
\[ K_D = \frac{[D]}{[N]}=\frac{f_D}{f_N} \tag{32}\]
\[ K_D = \frac{f_D}{1-f_D} \tag{33}\]
And, therefore,
\[ \Delta G_D = -RT*ln\;K_D \tag{34}\]
\[ \Delta G(T) = \Delta H_m(\frac{Tm-T}{Tm})-\Delta Cp[Tm-T(1-ln \; \frac{Tm}{T})]) \tag{35}\]
# Denaturation curve for protein
<- 85
Tm <- 180
dHm <- 3
dCp <- 0:80
x
curve(dHm * (1 - x / Tm) + dCp * ((x - Tm - x * log(x / Tm)))
xlim = c(0, 80)) # Nicholson1996; Sholz2009 ,
Quantitative Structure-Function Relationship (QSAR) and Multilinear Fitting
\[ pIC_{50}=x_0+x_1*S+x_2*W \] {#eq:eqTIBO}
Where pIC\(_{50}\) represents the measured biological activity (-log IC\(_{50}\)), S indexes solubility values, and W refers to the width parameter of the first atom of the substituent group. These data are tabulated below:
# Tabulation data for QSAR
<- c("H", "Cl", "SCH3", "OCH3", "CN", "CHO", "Br", "CH3", "CCH")
group # substituent groups in TIBO
<- c(3.53, 4.24, 4.09, 3.45, 2.96, 2.89, 4.39, 4.03, 3.8) # solubility parameter
S <- c(1, 1.8, 1.7, 1.35, 1.6, 1.6, 1.95, 1.6, 1.6)
W # group width parameter
<- c(7.36, 8.37, 8.3, 7.47, 7.25, 6.73, 8.52, 7.87, 7.53)
pIC50 # biological activity of TIBO
<- data.frame(group, S, W, pIC50)
tab.tibo ::kable(tab.tibo, caption = "multivariate data of biological activity of TIBO and predictive parameters.", "pipe") # table knitr
group | S | W | pIC50 |
---|---|---|---|
H | 3.53 | 1.00 | 7.36 |
Cl | 4.24 | 1.80 | 8.37 |
SCH3 | 4.09 | 1.70 | 8.30 |
OCH3 | 3.45 | 1.35 | 7.47 |
CN | 2.96 | 1.60 | 7.25 |
CHO | 2.89 | 1.60 | 6.73 |
Br | 4.39 | 1.95 | 8.52 |
CH3 | 4.03 | 1.60 | 7.87 |
CCH | 3.80 | 1.60 | 7.53 |
R
allows this to be done in at least two ways: internal linear fit function (lm
) or matrix algebra.Multiple linear fit by lm
function:
# Multilinear fitting in QSAR
<- lm(tab.tibo$pIC50 ~ tab.tibo$S + tab.tibo$W)
lm.tibo # command for multilinear fitting;
# Alternatively,
# lm.tibo <- lm(cbind(S,W)~pIC50)
summary(lm.tibo)
Call:
lm(formula = tab.tibo$pIC50 ~ tab.tibo$S + tab.tibo$W)
Residuals:
Min 1Q Median 3Q Max
-0.27636 -0.15649 0.02922 0.08911 0.24761
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.5903 0.5435 6.606 0.000579 ***
tab.tibo$S 0.9571 0.1519 6.300 0.000746 ***
tab.tibo$W 0.3619 0.3020 1.199 0.275888
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2045 on 6 degrees of freedom
Multiple R-squared: 0.912, Adjusted R-squared: 0.8826
F-statistic: 31.07 on 2 and 6 DF, p-value: 0.0006826
R
function (*dataframe $ vector). This is the simplest way, since it does not depend on extra packages (like dplyr
), although it is less readable.\[ y=5.75+0.14*S+0.95*W \tag{36}\]
Multiple linear adjustment by matrices:
\[ X = \begin{bmatrix} 1 & S_{1} & W_{1} \\ 1 & S_{2} & W_{2}\\ 1 & S_{3} & W_{3}\\ ... & ... \end{bmatrix} \]
# Creating matrices A and b
<- matrix(c(rep(1, 9), S, W), nrow = 9, byrow = FALSE)
A # creates matrix with unit value required before the independent variable
<- as.matrix(pIC50, nrow = 1, byrow = FALSE) # vector b
b
# Matrix solution of linear adjustment
<- solve(t(A) %*% A) %*% t(A) %*% b
beta beta
[,1]
[1,] 3.5902556
[2,] 0.9571092
[3,] 0.3619292
This multilinear matrix procedure can also be applied to other types of multivariate analysis, such as factorial experiment and response surface methodology. This is due to the very nature of these systems, when linear. See the applications below. Even for quadratic response surface methodology (where the parameters vary with the square of the predictor variables), the matrix solution (Equation 11) is also possible.
\[ y = b_0+b_1*x, \, linear fit\ \\ y = b_0+b_1*x_1+b_2*x_2+...+b_n*x_n, \, fit \, multilinear \\ y = b_0+b_1*x_1+b_2*x_2+...+b_n*x_n, \, methodology \, of \, surface \, of \, response \, linear \\ y = b_0+b_1*x_1+b_2*x_2+b_{12}*x_1*x_2, \, planning \, factorial \, 2^2 \\ y = b_0+b_1*x_1+b_2*x_2+b_3*x_3+b_{12}*x_1*x_2+,b_{13}*x_1*x_3+b_{23}*x_2*x_3+b_{123}*x_1*x_2*x3 \, experiment \, factorial \, 2^3 \\ y = b_0+b_1*x_1+b_2*x_2+b_{11}*x_1^2+b_{22}*x_2^2+b_{12}*x_1*x_2, \, quadratic \, response \, surface \, methodology \\ \tag{37}\]
A word about matrices and applications
\[ y = \sum_{n=1}^{\infty} (x_1*x_2) \tag{38}\]