Interactive Simulations for Biophysical Chemistry with JSPlotly & GSPlotly

To illustrate the potential use of JSPlotly for biochemistry and related fields, here are some examples of simulations and data analysis whose graphs are often found in textbooks and related content. To get the most out of each topic, try following the suggestions for parametric manipulation in each topic.


Instructions

1. Choose a topic;
2. Click on the corresponding graph;
3. Click on "Add Plot";
4. Use the mouse for interactivity and/or edit the code.

Reminder: the editor uses infinite undo/redo in the code (Ctrl+Z / Shift+Ctrl+Z)!

1 Acid-base equilibrium and buffer systems

Context:

The example illustrates the addition of a base to a system containing a weak acid and its conjugate base. The titration equation refers to a triprotic acid, the phosphoric acid of the buffer of the same name, but it would also work for many others, such as those present in the Krebs cycle (citrate, isocitrate).

Equation:

\[ fa= \frac{1}{1 + 10^{pKa1 - pH}} + \frac{1}{1 + 10^{pKa2 - pH}} + \frac{1}{1 + 10^{pKa3 - pH}} \] *Where fa = acid fraction (protonatable groups)


Suggestion:

"A. Converting the phosphate buffer (triprotic) curve to bicarbonate buffer (diprotic)"

1. Change the pKa values for the bicarbonate buffer: pKa1 = 6.1, and pKa2 = 10.3;
2. Set a very large value for pKa3 (e.g., 1e20). 
3. Click "add plot."

Explanation: pKa is a term that represents the logarithm of a dissociation constant (-log[Ka]). With an extreme value, the denominator becomes equally large, canceling out the term that carries pKa3. In JavaScript and other languages, ‘e’ represents the notation for power of 10.

"B. Converting the bicarbonate buffer curve to acetate"

1. Simply repeat the above procedure with pKa1 = 4.75 and eliminate pKa2.


2 Net charge network in peptides

Context:

The code refers to the net charge network present in any sequence of amino acid residues. Here we illustrate angiotensin II, an important peptide for blood pressure regulation and electrolyte balance, whose converting enzyme is associated with the mechanism of cell invasibility by SARS-CoV-2.

Equation:

There is no direct equation for this simulation, as the algorithm needs to decide the charge based on the basic or acidic nature of the residue at a given pH value (see the script). Thus:

\[ q = \begin{cases} -\dfrac{1}{1 + 10^{pK_a - pH}} & \text{(acid group)} \\\\ \dfrac{1}{1 + 10^{pH - pK_a}} & \text{(basic group)} \end{cases} \]

*Where

  • pKa = value of the antilogarithm of base 10 for the acid dissociation equilibrium constant, Ka (or log[Ka]).


Suggestion:

1. Select the peptide sequence below and observe the charge distribution:

"Ala,Lys,Arg,Leu,Phe,Glu,Cys,Asp,His"

2. Simulate the pH condition of the stomach ("const pH = 1.5") and check the change in charges in the peptide. 

3. Select a physiological peptide (e.g., oxytocin), observe its charge in blood (pH 7.5), and reflect on its potential for electrostatic interaction with cellular components.

"Cys,Tyr,Ile,Gln,Asn,Cys,Pro,Leu,Gly" - oxytocin


3 Interaction of oxygen with myoglobin and hemoglobin

Context:

The oxygen molecule can combine with the heme group of myoglobin and hemoglobin in different ways, depending on the cooperativity exhibited in the latter due to its quaternary structure. The following model illustrates this interaction using the Hill equation.

Equation:

\[ Y= \frac{pO_2^{nH}}{p_{50}^{nH} + pO_2^{nH}} \]

Where

Y = degree of oxygen saturation in the protein;

  • pO\(_{2}\) = oxygen pressure;
  • p\(_{50}\) = oxygen pressure at 50% saturation;
  • nH = Hill coefficient for the interaction;


Suggestion:

1. Run the application ("add plot"). Note that the value of "nH" for the Hill constant is "1," meaning there is no cooperativity effect.

2. Now replace the value of "nH" with the Hill coefficient for hemoglobin, 2.8, and run again!


4 Bohr effect on hemoglobin (pH)

Context:

Some physiological or pathological conditions can alter the binding affinity of oxygen to hemoglobin, such as temperature, certain metabolites (2,3-BPG), and the hydrogen ion content of the solution.

Equation:

\[ Y(pO_2) = \frac{{pO_2^n}}{{P_{50}^n + pO_2^n}}, \quad \text{with } P_{50} = P_{50,\text{ref}} + \alpha (pH_{\text{ref}} - pH) \]
Where,

  • Y = hemoglobin saturation,
  • pO\(_{2}\) = partial pressure of oxygen (in mmHg),
  • P\(_{50}\) = pressure of O\(_{2}\) at which hemoglobin is 50% saturated,
  • P\(_{50,ref}\) = 26 mmHg (standard value),
  • \(\alpha\) = 50 (Bohr effect intensity),
  • pH\(_{ref}\) = 7.4,
  • n = 2.8 = Hill coefficient for hemoglobin.

5 Isoelectric Point in Proteins

Context:

The script for this simulation is based on the distribution of the charge network for a polyelectrolyte, and the identification of the pH value at which this network is zero, i.e., the isoelectric point (or isoionic point), pI. The example uses the primary sequence of lysozyme, a hydrolase that acts to break down the microbial wall.

Equation:

\[ q_{\text{net}}(pH) = \sum_{i=1}^{N} \left[ n_i \cdot q_{B_i} + \frac{n_i}{1 + 10^{pH - pK_{a_i}}} \right] \]
Where,

  • qnet = total net charge;
  • qB$_{i} = charge of the basic form for residue i (e.g., +1 for Lys, 0 for Asp);
  • n\(_{i}\) = number of groups of residue i.

Suggestion:

"Finding the pI for other proteins"

1. You can check the titration of any other protein or peptide sequence by simply replacing the primary sequence contained in the code. An assertive way to perform this replacement involves:
a. Search for the "FASTA" sequence of the protein in NCBI ("https://www.ncbi.nlm.nih.gov/protein/") - e.g., "papain";
b. Click on "FASTA" and copy the sequence 1a. obtained;
c. Paste the sequence into a website for residue quantification (e.g., "https://www.protpi.ch/Calculator/ProteinTool");
4. Replace the sequence in the code. 


6 Catalysis and enzyme inhibition

Context:

The simulation aims to provide a general equation for enzyme inhibition studies, covering competitive, non-competitive, and competitive (pure or mixed) models, also allowing the study of enzyme catalysis in the absence of an inhibitor.

Equation:

\[ v=\frac{Vm*S}{Km(1+\frac{I}{Kic})+S(1+\frac{I}{Kiu})} \]
Where

  • S = substrate content for reaction;
  • Vm = reaction limit speed (in books, maximum speed);
  • Km = Michaelis-Mentem constant;
  • Kic = inhibitor dissociation equilibrium constant for competitive model;
  • Kiu = inhibitor dissociation equilibrium constant for non-competitive model


Suggestion:

"A. Enzymatic catalysis in the absence of inhibitor."
1. Just run the application with the general equation. Note that the values for Kic and Kiu are high (1e20). Thus, with high "dissociation constants," the interaction of the inhibitor with the enzyme is irrelevant, returning the model to the classic Michaelis-Mentem equation.
2. Try changing the values of Vm and Km, comparing graphs.
3. Use the geographic coordinates feature on the icon bar ("Toggle Spike Lines") to consolidate the mathematical meaning of Km, as well as to observe the effect of different values on the graph.

"B. Competitive inhibition model."
1. To observe or compare the Michaelis model with the competitive inhibition model, simply replace the value of Kic with a consistent number (e.g., Kic= 3).

"C. Incompetitive inhibition model."
1. The same suggestion above applies to the incompetitive model, this time replacing the value for Kiu.

"D. Pure non-competitive inhibition model."
1. In this model, the simulation is performed using equal values for Kic and Kiu.

"E. Mixed non-competitive inhibition model."
1. For this model, simply assign different values to Kic and Kiu.


7 Thermodynamic stability of nucleic acids

Context:

Several factors contribute to the thermodynamic stability of biopolymers, either stabilizing or destabilizing them. This simulation deals with a thermal denaturation curve for DNA in the presence or absence of some of these compounds: trehalose (stabilizing osmolyte) and guanidine chloride (destabilizing).

Equation:

\[ y(T) = \frac{1}{1 + e^{-\frac{T - T_m}{\beta}}} \]

Where,

  • y(T): fraction of denatured DNA at a given temperature T;
  • Tm: transition temperature (melting, temperature at which 50% of molecules are double-stranded and 50% are single-stranded);
  • \(\beta\): parameter that adjusts the slope of the curve (affected by trehalose and guanidine).

Note:

  1. Trehalose as a stabilizer (reduces \(\beta\));
  2. Guanidine as a denaturant (increases \(\beta\));

Suggestion:

1. Try testing various conditions involved in the simulation, such as:
a) variation of Tm;
b) variation of the $\beta$ parameter;
c) variation of trehalose content;
d) variation of guanidine chloride content.

8 Van der Waals equation for ideal gases

Context:

An adaptation that relates the thermodynamic quantities of pressure, volume, and temperature for ideal gases is the van der Waals equation. In this equation, coefficients are calculated that estimate the existence of a volume and interparticle interactions. Thus, the van der Waals equation corrects the ideal gas equation by considering a term to compensate for intermolecular forces (a/V\(^{2}\)) and the available volume, which discounts the volume occupied by the gas molecules themselves.
The simulation offers additional interactivity through the presence of sliders for temperature and for the finite volume coefficient (b) and particle interaction coefficient (a).

Equation:

\[ P = \frac{RT}{V - b} - \frac{a}{V^2} \]

  • P = gas pressure (atm);
  • V = molar volume (L);
  • T = temperature (K);
  • R = 0.0821 = ideal gas constant (L·atm/mol·K);
  • a = intermolecular attraction constant (L\(^{2}\)·atm/mol$^{2})
  • b = excluded volume constant (L/mol)

Suggestion:

1. Try varying the parameters of the equation using the slider for temperature, as well as for coefficients "a and b."
2. Discuss which of the coefficients has the greatest effect on the curve profile and the reason for this.


9 Equilibrium of ATP production from reactants, temperature, and pH

Context:

Intracellular ATP production involves the classic chemical equilibrium relationship between reactants and products as a function of reaction temperature, adjusted to a specific pH value. By varying one or both reactant concentrations or physical-chemical parameters, it is possible to quantify the product using the following thermodynamic reaction:

Equation:

\[ \Delta G = \Delta G^{\circ'} + RT \ln\left(\frac{[\text{ADP}] \cdot [\text{P}_i]}{[\text{ATP}]}\right) + 2{,}303 \cdot RT \cdot n_H \cdot \text{pH} \]

Where,

  • \(\Delta\)G = Gibbs energy of the reaction (positive for spontaneously unfavorable synthesis, kJ/mol);
  • \(\Delta\)G\(^{o'}\) = 30.5 kJ/mol standard biological Gibbs energy for ATP synthesis;
  • R = 8.314 J/mol/K (universal gas constant);
  • T=310 K (physiological temperature);
  • nH\(^{+}\) = 1 (number of protons involved in the reaction);
  • [ADP], [Pi], [ATP] = molar concentrations of reactants and product

Suggestion:

1. Change the quantities involved in the expression and compare with previous visualizations. For example, temperature, pH, and ADP and Pi levels.

10 Variation of Gibbs energy with temperature

Context:

The Gibbs-Helmholtz relation predicts that Gibbs energy varies nonlinearly with temperature in reactions involving a change in the heat capacity of the system (\(\Delta\)Cp). This expanded form of the Gibbs equation for variable heat capacity is found in various biochemical phenomena, such as in the phase transition of biomembrane structures subjected to a compound challenge, or in the conformational change that accompanies the protein structure under heating. \

Equation:

\[ \Delta G(T) = \Delta H^\circ - T\,\Delta S^\circ + \Delta C_p \left(T - T_0 - T \ln\left(\frac{T}{T_0}\right)\right) \]

Where,

  • \(\Delta\)G(T) = Gibbs energy of the reaction at each temperature value, kJ/mol);
  • \(\Delta\)H\(^{o}\) = standard enthalpy of the reaction at T\(_{0}\), normally 298 K (J/mol);
  • \(\Delta\)S\(^{o}\) = standard entropy of the reaction at T\(_{0}\);
  • \(\Delta\)Cp = variation in the heat capacity of the reaction (J/mol·K), assumed to be constant with temperature;
  • T = temperature of interest (K);
  • T\(_{0}\) = reference temperature, usually 298 K.
  • R = 8.314 J/mol/K (universal gas constant);

Suggestion:

1. Try varying one or more parameters of the expression;
2. Test the behavior of the Gibbs curve at a high reference temperature (simulation for extremophile organism);
3. Simulate the situation where the variation in heat capacity is zero.

11 Electrochemical impedance spectroscopy (EIS)

Context:

Electrochemical impedance spectroscopy (EIS) is an analytical technique that measures the response of an electrochemical system to a small amplitude sinusoidal disturbance (e.g., 5-10 mV) applied over a wide frequency range (e.g., 100 kHz-1 mHz). EIS provides information about interfacial processes, such as charge transfer, ion diffusion, and passive film formation, by analyzing the complex impedance of the system. It is a non-destructive technique widely used in areas such as corrosion, batteries, fuel cells, biosensors, and protective coatings.

Equation:

An electrochemical impedance (EIS) model with typical equivalent circuit components constitutes the modified Randles model: series resistor Rs, a resistor Rp in parallel with a CPE (Constant Phase Element), and a Warburg element. A common representation for EIS is the Nyquist plot, where the real axis represents resistance (Zre) and the imaginary axis represents negative reactance (−Zim). The formalism involved is in the equation below:

\[ Z_{\text{total}}(\omega) = R_s + \left[ \left( \frac{1}{R_p} + Q (j\omega)^n \right)^{-1} \right] + \frac{\sigma}{\sqrt{\omega}} (1 - j) \]

Where:

  • Z\(_{total}\) = total impedance at a given frequency;
  • \(\omega\) = angular frequency (rad/s), \(\omega\)=2πf or \(\omega\)=2πf;
  • R\(_{p}\) = ohmic resistance (resistance of the electrolytic solution, wires, contacts, etc.);
  • R\(_{p}\) = Polarization resistance (associated with charge transfer processes, such as electrochemical reactions);
  • Q = Constant associated with the constant phase element (CPE), replaces an ideal capacitor to represent non-ideal behaviors;
  • n = CPE exponent, between 0 and 1; defines the degree of ideality of the capacitive behavior (n = 1: ideal capacitor; n < 1: dispersion);
  • \(\sigma\) = Warburg coefficient, associated with ion diffusion in the electrochemical system;
  • j = Imaginary unit, j\(^{2}\) = −1;
An electrochemical impedance model (EIS) with typical equivalent circuit components constitutes the modified Randles model: series resistor Rs, a resistor Rp in parallel with a CPE (Constant Phase Element), and a Warburg element. A common representation for EIS is the Nyquist plot, where the real axis represents resistance (Zre) and the imaginary axis represents negative reactance (−Zim).

Suggestion

1. Check the effect of Rs on the graph by setting its value to zero (solution resistance);
2. Observe the deformation of the semicircle by varying the values of the constant phase element (e.g., Q = 1e-3; n = 0.6 - dispersion of capacitive behavior);
2. Try combining other values of the code header parameters to highlight common situations in electroanalysis: Rs, Rp, Q, n, and sigma;
3. Reduce the Warburg model with constant phase element to a simple Randles model, consisting only of two resistors in series, the second in parallel with an ideal capacitor, and without Warburg diffusion (sigma = 0; n = 1).

12 Cyclic voltammetry

Context:

Cyclic voltammetry (CV) is an electroanalytical technique for characterizing interfacial (electrode-solution) redox processes. When a standard analyte in quantitative equilibrium between the oxidized and reduced forms is used, the resulting graph acquires an unusual profile, called a duck shape. This results from the peculiar shape obtained when a bidirectional potentiometric scan is performed on a set of electrodes by a potentiostat, resulting in cycles of oxidation and subsequent reduction (application of positive potentials followed by negative potentials). The shape, value, and distance between the resulting peaks allow the analyte to be identified and characterized, as well as the redox processes involved in the experiment.

Equation:

Although the formalism for cyclic voltammetry is complex and involves the solution of ordinary differential equations (as well as electroanalytical techniques in general), which is beyond the scope of this material, the resulting current flow can be summarized by the Buttler-Volmer equation below:

\[ j = j_0 \left[ \exp\left(\frac{\alpha n F (E - E^0)}{RT}\right) - \exp\left(\frac{-(1 - \alpha) n F (E - E^0)}{RT}\right) \right] \] Where:

  • j = current density;
  • j\(_{0}\) = exchange current;
  • \(\alpha\) = charge transfer coefficient;
  • E = potential applied to the electrode;
  • E\(^{0}\) = standard electrode potential;
  • n = number of electrons;
  • F = Faraday constant (96485 C·mol⁻¹)
  • R = universal gas constant (8.314 J·mol⁻¹·K⁻¹);
  • T = temperature
For a model of electroanalyte diffusion and accumulation at the electrode surface, the general equation for the boundary condition where the redox conversion rate is proportional to the diffusion current can be represented by Fick’s 2nd Law equation for mass transport:

\[ \frac{\partial C(x,t)} {\partial t} = D \frac{\partial^2 C(x,t)} {\partial x^2} \]

Where:

  • C(x,t) = concentration of the electroanalytical species (mol/cm³), as a function of position x and time t;
  • D = diffusion coefficient of the species (cm²/s);
  • t = time (s);
  • x = distance from the electrode surface (cm);
  • \(\delta\) = notation for partial derivative.
The chemical equilibrium of the redox reaction is given by the ratio between the concentrations of the oxidized and reduced forms at the electrode/solution interface when the system is in equilibrium (or near equilibrium), according to the Nernst equation:

\[ E = E^0 + \frac{RT}{nF} \ln \left( \frac{[\text{Ox}]}{[\text{Red}]} \right) \]

The resulting current is obtained by Faraday’s law:

\[ i(t) = n F A D \left. \frac{\partial C(x,t)} {\partial x} \right|_{x=0} \] Where:

  • A = electrode surface area (cm²);
  • i(t) = current at time t (A, Amperes).

Cyclic Voltammetry Simulator

To illustrate the potential of the JSPlotly/GSPlotly platform for developing complex virtual experiments, here is a cyclic voltammetry simulator resulting from the above procedures (as well as a few hours of prompt engineering!!).
The example illustrates typical behavior in cyclic voltammetry (and the curious duck shape), and includes a model of diffusion and accumulation of electroactive species near the electrode surface. The code uses a numerical solution to obtain the current density over time, considering the effects of cyclic scanning of the applied potential, mass transport, and reaction kinetics.
In short, it numerically solves Fick’s equation with dynamic boundary conditions, applies the Nernst equation at the boundary (electrode surface) to determine the conversion between species, uses the finite difference method or explicit scheme to advance in time, and obtains the current by applying Faraday’s Law, through the derivative of the flux at the interface.
Find it complex? How about this: the code combines Fick (transport) and Nernst (local redox equilibrium) to accurately model the shape of the voltamogram, obtaining a physical simulation with significant realism for cyclic voltammetry.


Suggestion:

1. Observe the number of values at the beginning of the code, which are tangible to "parametric manipulation." Try to understand what they represent, and consciously vary their values, aiming to add value to the learning of the simulation. This is the essence of "parametric manipulation" that involves "reproducible teaching"!


13 Diagrams and flowcharts

Context:

Due to the nature of the Plotly.js library, it is also possible to generate code in JSPlotly without involving an equation, such as for representing diagrams, flowcharts, heredograms, among others. The example below illustrates a representation of the Krebs cycle.

Suggestion:

1. Try repositioning enzymes and metabolites by clicking and dragging the terms;
2. Try replacing the names in the code to produce another metabolic cycle, such as the urea cycle.

Context: Flowchart

Suggestion:

1. Reposition terms and connectors by dragging with the mouse;
2. For a different flowchart in the content, modify the terms in the code;
2. For a different flowchart format, change the font and connector characteristics in the "annotations" constant.


Data analysis

14 User data insertion

Sometimes it is interesting to work with discrete data itself, seeking a trend, mathematical or simply visual, for the relationship between variables. Here is an example of an interactive graph with user data.


Suggestion:

1. Try changing the data entered, overlaying the graph or not;
2. Try changing the data representation in "mode" and "type" to points, lines, points+lines, bars.


15 Uploading files for analysis

Context - CSV file

Sometimes, it is necessary to analyze results related to a set of data in a spreadsheet. For this situation, the following example illustrates the loading of two data vectors from a CSV (comma separated value) file, a format commonly used for exporting/importing in spreadsheets, and the resulting graph.

Instructions:

  • 1 Click on add plot and select a CSV file using the browse button at the top. Note: variable X in the first column of the file, and variable Y in the second column;
    1. Click on add plot again to view the resulting graph.
1. Try other CSV files;
2. Vary the aspects of the graph, such as type, color, marker size, etc.

16 Linear data adjustment

Context:

In various experimental research situations, it is necessary to correlate the data of a dependent variable with a predictor variable using the following equation.

Equation:

\[ y = \alpha x + \beta + \varepsilon \] Where:

  • y = dependent variable;
  • x = independent variable;
  • \(\alpha\) = slope of the adjusted line;
  • \(\beta\) = intercept of the adjusted line;
  • \(\epsilon\) = measurement error.
The parameters of slope and intercept can be obtained by several procedures, such as least squares, matrix algebra, or simple summaries. In the latter case, the calculation of parameters and error can be computed as follows:

\[ \alpha = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - \left( \sum x_i \right)^2} \]

\[ \beta = \frac{\sum y_i - \alpha \sum x_i}{n} \]

\[ \hat{y}_i = \alpha x_i + \beta \]

\[ \varepsilon_i = \left| y_i - \hat{y}_i \right| \]

A representation for linear adjustment with arbitrary data is provided below:


Suggestion:

1. Show the points that overlap or do not overlap the fit line. For overlap, choose "showPoints = true" and "showLine = true";
2. Change the data and perform a new fit to obtain other parameters of the line.

17 Polynomial regression

Context:

In some situations, linear fitting does not provide an adequate trend for modeling experimental data, which leads to the use of other models, such as exponential, hyperbolic, logarithmic, power, and polynomial. Polynomial trends are common in natural phenomena, with mathematical treatment of data occurring by different methods, such as simple sums, least squares, or linear algebra. Here is an example of polynomial regression with adjustable degrees in the code.

17.1 Equation

Error minimization for the polynomial curve is obtained by generalized least squares regression and application of the Vandermonde matrix, as follows.
Let the vectors x=[x\(_{1}\),x\(_{2}\),…,x\(_{n}\)] and y=[y\(_{1}\),y\(_{2}\),…,y\({_n}\)] be the desired polynomial:

\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_g x^g \] | Thus, the Vandermonde matrix is constructed as:

\[ X = \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^g \\\\ 1 & x_2 & x_2^2 & \cdots & x_2^g \\\\ \vdots & \vdots & \vdots & & \vdots \\\\ 1 & x_n & x_n^2 & \cdots & x_n^g \end{bmatrix} \]

To find the coefficients \(\beta\) that minimize the quadratic error, solve the linear system in sequence:

\[ \boldsymbol{\beta} = (X^T X)^{-1} X^T \mathbf{y} \]

Where:

  • T represents the transposed matrix

Suggestion:

1. Try degree 1 for the polynomial, i.e., a reduction of the treatment to linear adjustment;
2. Change the formatting of labels, colors, sizes, etc., in the code;
3. Overlap some adjustments, edit, and reposition the caption;
4. Test the code with another data vector.

18 Multilinear regression

Context:

Multiple linear adjustment is very useful when you want to predict behavior based on two or more predictor variables. The procedure involves a design matrix (or projection operator) similar to the Vandermonde matrix.

Equation:

The analytical solution for the design matrix involves solving a matrix of coefficients \(\beta\) for a set of predictor values of the response vector y, as follows:

\[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots + \beta_p x_{pi} + \varepsilon_i \]

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \]

Where: - \(\beta\) = coefficient vector; - y = response vector; - X = design matrix; - \(\epsilon\) = random noise

Thus, predicted values and associated error are obtained by:

\[ \hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol{\beta}} \] \[ \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} \]

Obviously, the graphical representation for a multilinear fit is limited to two independent variables, x1 and x2, in addition to the response variable (3D graph), although the matrix calculation allows for several predictor variables (hyperdimensional space). Tangential to the first situation, here is an example.

Instructions - 1. Note that there is a Boolean flag (showFit) at the beginning of the code: false for data only, and true for the fit. - 2. You can click on add plot to view the data with the flag set to false, followed by another add plot with the flag set to true.


Rotate the graph and observe the surface around the points. Unlike a linear fit, the points are distributed discontinuously in space.

Suggestion:

1. QSAR (“Quantitative Structure-Activity Relationship”) results use multilinear adjustment analysis to identify the strength of predictor variables. Try a dataset with two variables (e.g., concentration, pH, compound A, B, etc.) on the internet.
2. For more predictor variables, see the code below!

18.1 Context: Multiple linear adjustment with 3 or more predictor variables

Although JSPlotly was designed for producing interactive graphs and maps using the Plotly.js library on top of the JavaScript language, it extends the application possibilities beyond graphical representations. An example of this can be found in the code below, for a multilinear adjustment for a situation with 4 predictor variables.

Suggestion:

1. Try varying the number of predictors (xi). To do this:
a. Without reducing, simply remove the desired vector(s) and correct their quantity in the line: “const X = x1.map((_, i)”;
b. If you want to increase, add a new vector in the corresponding line and update the quantity in the mapping of “const X = x1.map((_, i)”.

19 Response Surface Methodology (RSM)

A variation of multilinear adjustment involves the response surface methodology, commonly linear (flat) or quadratic (curvilinear), applied in modeling and optimization of experimental processes with multiple independent variables. Its objective is to fit a mathematical equation that represents the relationship between the experimental factors and the response variable, allowing behaviors to be predicted and optimal conditions to be identified.
For experiments with two factors, a second-order quadratic model is commonly used, which allows surfaces with curvature to be represented. The central idea of the experiment is to select values and their codes (-1,0,1), representing, respectively, a quantity below, an average quantity, and a quantity above, for a given predictor variable.

Equation:

The general quadratic model for two factors is represented by:

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{11} x_1^2 + \beta_{22} x_2^2 + \beta_{12} x_1 x_2 + \varepsilon \]

With the matrix form of the model represented by:

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

And the estimation of the coefficients \(\beta\) by least squares, similar to multilinear adjustment, by:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \]

Suggestion:

1. As suggested earlier, try using data from the literature or another source whose responses are already known, replacing the respective vectors at the beginning of the code. This allows you to compare the effectiveness of using the tool presented.

20 Data smoothing - Spline and Savitzky-Golay filter

In experimental research, signal detection occasionally occurs in the presence of significant levels of noise, as occurs in continuous spectral scanning under wavelength variation, or in the detection of electrical signals in time. In this case, there is no analytical equation that describes the behavior of the signal, although some digital processing tools can rely on a trend curve in data processing, significantly reducing the noise level, such as by spline curve and Savitzky-Golay filter.

Context - Cubic spline

A very common interpolation technique involves the use of cubic splines by degree 3 polynomials interconnecting data windows. The treatment consists of applying polynomial functions to subintervals of the points, promoting smoothing between the data windows. Mathematically, the spline is applied in such a way that:

\[ S_i(x) = a_i + b_i(x - x_i) + c_i(x - x_i)^2 + d_i(x - x_i)^3 \]

Where:

  • S\(_{i}\)(x) = cubic spline function in the interval [xi,xi+1];
  • a\(_{i, b\){i}\(, c\){i}\(, d\)_{i}$ = specific coefficients for segment i;
  • x\(_{i}\) = starting point of the interval;
  • x = independent variable.

20.1 Context - Savitzky-Golay filter

The Savitzky-Golay filter smoothing technique applies a local polynomial regression to a moving window, using least squares and a predetermined degree, preserving peak, valley, and band signals. Unlike cubic splines, the filter produces a continuous global curve, rather than a set of interconnected curves. Mathematically:

\[ \tilde{y}_i = \sum_{j=-m}^{m} c_j \cdot y_{i+j} \]

Where:

  • y\(_{i}\) = smoothed value at position i;
  • y\(_{i+j}\) = actual values of the series within the window;
  • c\(_{i}\) = filter coefficients derived from a polynomial regression;
  • m = number of points on each side of the central window.
Here is an example of code for JSplotly that illustrates the use of the two interpolation methods.


Instructions:

    1. The code has two true/false flags, one for ‘useGolay’ and another for “useSpline”;
    1. View the raw data by setting both flags to false;
    1. To overlay a cubic spline, change its constant to true;
    1. To overlay the Savitzky-Golay filter, change its constant to true;
    1. To adjust the filter, modify its parameters in the constants window (moving window) and/or degree (degree of the polynomial).

Suggestion:

1. Compare the smoothing effects of the two polynomial interpolation treatments;
2. Apply only the filter, adjusting the moving window and polynomial degree parameters;
3. Try interpolation with other raw data in the code, replacing the constants “x_values” and “y_raw” with numeric vectors;

21 ANOVA and Tukey-HSD test

Context:

In research, it is common to analyze a set of data by comparing the variance between groups with that obtained within each group, or Analysis of Variance (ANOVA). Once statistical information indicating that there is a difference between groups greater than the internal difference has been obtained, a post hoc test is performed to compare means, such as the Tukey-HSD test (Honest Significant Difference).

Equation:

The equation that characterizes a Tukey-HSD test is provided below (correction for unbalanced groups):

\[ \text{HSD}_{ij} = q_{\alpha} \cdot \sqrt{\frac{\text{MS}_{\text{within}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} \]

Where:

  • HSD = minimum value of the difference between two means for it to be considered significant;
  • q\(_{\alpha}\) = critical value of the studentized distribution q (Tukey distribution), depending on the number of groups and the significance level \(\alpha\);
  • MS\(_{within}\) = mean square within groups (error) obtained from ANOVA;
  • n: number of observations per group (i and j if there are groups of different sizes).

Suggestion:

1. You can click and drag the minimum difference markers (a, b, etc.) above the graph for better visual adjustment, as well as the ANOVA table and the statistical test result;
2. Change the values of the vectors, redo the analysis and the graph simultaneously, and notice the difference in the values obtained. Note that, as stated at the bottom, you need to refresh the website page to clear previous data retained in the cache;
3. Enter other data in the respective vectors, either your own or from another source;
4. Enter new data vectors, or delete some, for a customized calculation and graph.

22 Nonlinear data adjustment

Modeling is a common area of employment in the natural sciences, when mathematical prediction or correlation of experimental data is desired. Models can be linear or nonlinear. When nonlinear, they can be applied empirically to the data (e.g., quadratic, logarithmic, exponential, or power functions), or from an explanatory mathematical function of the model.
In this case, techniques for nonlinear adjustment of the data to the model are used, such as Simplex, Gauss, Newton-Raphson, Levenberg-Marquardt, or least squares, for example. Regardless of the method, nonlinear adjustment differs from linear adjustment in its assumptions and treatment of data (see a quick explanation here.

Context: Ligand-protein interaction (grid search algorithm)

The following example illustrates the application of a nonlinear adjustment of molecular interaction data using the Langmuir model and treatment by manual grid search. In this treatment, the sum of the quadratic errors (RMSE) is calculated for combinations of a and b, selecting the pair with the lowest calculated error. In this case, for simplicity, although JSPlotly works with more specific libraries, the nonlinear regression is performed by least squares, but using grid search rather than classic iterative methods. Thus, a set of values is tested for the pair (a, b) in a previously defined grid (combination table), allowing the identification of the parameters that best describe the experimental data according to the minimum error criterion.


Equation:

The Langmuir equation is represented below:

\[ Y = \frac{aX}{K_d + X} \] Where:

  • Y = fraction of occupied sites;
  • X = concentration of free ligand;
  • a = saturation value (sites of the same affinity completely occupied);
  • Kd = dissociation equilibrium constant of the complex.
In this case, the nonlinear least squares function is:

\[ SSR(a, K_d) = \sum_{i=1}^n \left( y_i - \frac{a x_i}{K_d + x_i} \right)^2 \]

Where:

SSR = sum of the squares of the residuals (inversely proportional to the quality of the fit)

Suggestion:

1. As in the previous example, alternately display the points and the adjustment curve using the “Boolean” operation (“showPoints = true/false”; “showCurve = true/false”);
2. Change the data and perform a new fit to obtain other parameters of the equation (Kd, SSR, R², adjusted R²). Note: R² adj corresponds to the value of R² corrected for the number of parameters in the model (in this case, “2”), directly affecting the “degrees of freedom” for the fit.


Context - Competitive inhibition in enzyme kinetics (Gauss-Newton algorithm)

The Gauss-Newton method iteratively fits nonlinear models to experimental data sets. It is especially efficient when the model is quasi-linear with respect to the parameters and the residuals are moderately small. The goal is to find the parameters \(\theta\) that minimize the sum of the squares of the residuals between the observed data yi and the values predicted by the model f(xi,\(\theta\). Unlike the full Newton-Raphson method, Gauss-Newton avoids the explicit calculation of the Hessian by using an approximation based on the Jacobian of the residuals.

Equation:

This approximation results in a linear system that can be solved at each iteration to update the model parameters. The central formula of the method consists of multiplying the transposed Jacobian JT by the Jacobian J itself, forming a symmetric matrix that approximates the Hessian. The residual vector r is then used to determine the update direction \(\Delta\theta\), with the following equation:

\[ \theta_{k+1} = \theta_k + \left(J^T J\right)^{-1} J^T \cdot r \]

To illustrate the potential of the application in iterative nonlinear fitting, here is an example for enzyme kinetics using a competitive inhibition model.
The code allows you to adjust the seeds, tolerance, and number of iterations, and provides the graph (points/fitted line, manually), and statistical parameters of the regression.


Instructions

    1. Note that the code has a flag for true/false in the ajuste constant. I left it at false for data representation only (vectors x and y, or S and v);
    1. In the Plot area of the code, set the trace constant so that it also includes the data vectors;
    1. Click add plot to view the experimental points;
    1. To superimpose the fitted curve, update the constants ajuste and trace (vectors S_fit and v_fit), and choose lines in mode.

Back to top