Interactive Simulations for Biophysical Chemistry with JSPlotly & GSPlotly

To illustrate the potential use of JSPlotly for Biochemistry and related areas, here are some examples of simulations and data analysis, whose graphs are often found in textbooks and related content. To get the most out of each topic, try following the suggestions for parametric manipulation of the codes in each topic.


Instructions

1. Choose a topic;
2. Click on the corresponding graph;
3. Click on "Add Plot";
4. Use the mouse for interactivity and/or edit the code. 

Reminder: the editor uses infinite undo/redo in the code (Ctrl+Z / Shift+Ctrl+Z) !

General Biophysical Chemistry

1 Acid-base equilibrium and buffer system

Context:

The example illustrates the addition of a base in a system containing a weak acid and its conjugate base. The titration equation refers to a triprotic acid, the phosphoric acid of the buffer of the same name, but it would also work for many others, such as those present in the Krebs cycle (citrate, isocitrate).

Equation:

\[ fa= \frac{1}{1 + 10^{pKa1 - pH}} + \frac{1}{1 + 10^{pKa2 - pH}} + \frac{1}{1 + 10^{pKa3 - pH}} \] Where, fa = acid fraction (protonatable groups)


Suggestion:

"A. Converting the phosphate buffer (triprotic) curve to bicarbonate buffer (diprotic)"

1. Change the pKa values for the bicarbonate buffer: pKa1 = 6.1, and pKa2 = 10.3;
2. Set a very large value for pKa3 (e.g., 1e20).
3. Click "add plot."

Explanation: pKa is a term that represents the logarithm of a dissociation constant (-log[Ka]). With an extreme value, the denominator becomes equally large, canceling out the term that carries pKa3. In JavaScript and other languages, "e" represents the notation for power of 10.

"B. Converting the bicarbonate buffer curve to acetate"

1. Simply repeat the above procedure, with pKa1 = 4.75, and eliminating pKa2.


2 Ionization state of biomolecule chemical groups

Context:

JSPlotly also allows the observation of some concepts facilitated by animation-based learning together with graphical representations obtained by mathematical functions. In the following example, the interactive object illustrates the protonation state for the main chemical groups found in biomolecules.


Instruction:

  1. After clicking on the related image below, click on “add” and observe the acid-base titration graph generated. The abscissa axis (x-axis) has a wide range of pH values, and the ordinate axis shows the charge that the molecule with ionization potential manifests at each pH, as a function of its pKa value;

  2. Click on “Animate” to observe the displacement of a small ball along the titration curve, up to a pH value predefined in the simulator script, and as a function of its pKa;

  3. Modify the type of ionizable group (in “let grupo_edit =”) and/or the desired final pH (“let pH_final_edit =”) in the script to evaluate the ionization state of a new group;

  4. Click “Reset” and ‘Animate’ to view the new animation.

Suggestion:

1. Change the pH and the type of group you want to observe;
2. Simulate the ionization state of different molecules, such as drugs with carboxylic acid (e.g., acetylsalicylic acid...aspirin), in blood pH and in the gastric cavity;
3. Create a new ionizable group and predict its ionization state. To do this, simply change the information in 3 constants at the beginning of the script: groups, pKas, and types.
a.


3 Net charge network in peptides

Context:

The code refers to the net charge network present in any sequence of amino acid residues. Here we illustrate angiotensin II, an important peptide for regulating blood pressure and electrolyte balance, whose converting enzyme is associated with the mechanism of cell invasibility by SARS-CoV-2.

Equation:

There is no direct equation for this simulation, since the algorithm needs to decide the charge based on the basic or acidic nature of the residue at a given pH value (see the script). Thus:

\[ q = \begin{cases} -\dfrac{1}{1 + 10^{pK_a - pH}} & \text{(acid group)} \\\\ \dfrac{1}{1 + 10^{pH - pK_a}} & \text{ {(basic group)} \end{cases} \]

Where,

  • pKa = base 10 antilogarithm value for the acid dissociation equilibrium constant, Ka (or log[Ka]).


Suggestion:

1. Select the peptide sequence below and observe the charge distribution:

"Ala,Lys,Arg,Leu,Phe,Glu,Cys,Asp,His"

2. Simulate the pH condition of the stomach ("const pH = 1.5") and check the change in charges in the peptide. 

3. Select a physiological peptide (e.g., oxytocin), observe its charge in blood (pH 7.5), and reflect on its potential for electrostatic interaction with cellular components.

"Cys,Tyr,Ile,Gln,Asn,Cys,Pro,Leu,Gly" - oxytocin


4 Isoelectric Point in Proteins

Context:

The script for this simulation is based on the distribution of the charge network for a polyelectrolyte, and the identification of the pH value at which this network is zero, i.e., the isoelectric (or isoionic) point, pI. The example uses the primary sequence of lysozyme, a hydrolase that acts to break down the microbial wall.

Equation:

\[ q_{\text{net}}(pH) = \sum_{i=1}^{N} \left[ n_i \cdot q_{B_i} + \frac{n_i}{1 + 10^{pH - pK_{a_i}}} \right] \]
Where,

  • qnet = total net charge;
  • qB$_{i} = charge of the basic form for residue i (e.g., +1 for Lys, 0 for Asp);
  • n\(_{i}\) = number of groups of residue i.

Suggestion:

"Finding the pI for other proteins"

1. You can check the titration of any other protein or peptide sequence by simply replacing the primary sequence contained in the code. An assertive way to perform this replacement involves:
a. Searching for the ‘FASTA’ sequence of the protein in NCBI ("https://www.ncbi.nlm.nih.gov/protein/") - e.g., "papain";
b. Clicking on ‘FASTA’ and copying the sequence obtained in step 1;
c. Pasting the sequence into a website for residue quantification (e.g., "https://www.protpi.ch/Calculator/ProteinTool");
4. Replacing the sequence in the code.


5 Hydrophobicity in amino acid sequences

Context:

Observing amino acid hydrophobicity levels can be useful for studying and predicting residue sequences, such as those found in the central portion of transmembrane protein helices, as well as for identifying specific sites in enzymes, such as hydrophobic clefts and pockets.
These levels are classified relatively, giving rise to indices such as the Kyte–Doolittle scale, which assigns each amino acid a hydrophobicity value estimated from experimental and observational data. Positive values indicate a hydrophobic tendency (e.g., Ile, Val, Leu) and negative values indicate a hydrophilic tendency (e.g., Asp, Glu, Lys, Arg). The index allows us to highlight sections prone to nonpolar environments, as well as regions exposed to the aqueous solvent.

Equation:

\[ \tilde{y}_i \;=\; \frac{1}{w} \sum_{k=i-m}^{i+m} y_k, \qquad w = 2m+1 \]

Where,

\[\begin{aligned} y_i &:\; \text{hydrophobicity value at position $i$ of the sequence} \\ \tilde{y}_i &:\; \text{smoothed value (moving average) at position $i$} \\ m &:\; \text{half-window (number of neighbors on each side)} \\ w &:\; \text{total window width ($w=2m+1$)} \\ i &:\; \text{position of the amino acid in the sequence ($1 \leq i \leq N$)} \end{aligned}\]

Suggestion:

1. Try changing the sequence to a known one;
2. Compare a polar sequence with a nonpolar one by overlaying ("add");
3. Transcribe a transmembrane helix sequence of a protein by accessing its data on the PDB website


6 Interaction of oxygen with myoglobin and hemoglobin

Context:

The oxygen molecule can combine with the heme group of myoglobin and hemoglobin in different ways, depending on the cooperativity exhibited in the latter due to its quaternary structure. The following model exemplifies this interaction using the Hill equation.

Equation:

\[ Y= \frac{pO_2^{nH}}{p_{50}^{nH} + pO_2^{nH}} \]

Where

  • Y = degree of oxygen saturation in the protein;
  • pO₂ = oxygen pressure;
  • p₅₀ = oxygen pressure at 50% saturation;
  • nH = Hill coefficient for the interaction;


Suggestion:

1. Run the application ("add plot"). Note that the value of ‘nH’ for the Hill constant is "1," meaning there is no cooperativity effect.

2. Now replace the value of "nH" with the Hill coefficient for hemoglobin, 2.8, and run again!


7 Bohr effect on hemoglobin (pH)

Context:

Some physiological or pathological conditions can alter the binding affinity of oxygen to hemoglobin, such as temperature, certain metabolites (2,3-BPG), and the hydrogen ion content of the solution.

Equation:

\[ Y(pO_2) = \frac{{pO_2^n}}{{P_{50}^n + pO_2^n}}, \quad \text{with } P_{50} = P_{50,\text{ref}} + \alpha (pH_{\text{ref}} - pH) \]
Where,

  • Y = hemoglobin saturation,
  • pO₂ = partial pressure of oxygen (in mmHg),
  • P₅₀ = pressure of O₂ at which hemoglobin is 50% saturated,
  • P₅₀,ref = 26 mmHg (standard value),
  • \(\alpha\) = 50 (intensity of the Bohr effect),
  • pH\(_{ref}\) = 7.4,
  • n = 2.8 = Hill coefficient for hemoglobin.

8 Enzyme catalysis and inhibition

Context:

The simulation aims to provide a general equation for enzyme inhibition studies, covering competitive, non-competitive, and competitive (pure or mixed) models, also allowing the study of enzyme catalysis in the absence of inhibitors.

Equation:

\[ v=\frac{Vm*S}{Km(1+\frac{I}{Kic})+S(1+\frac{I}{Kiu})} \]
Where

  • S = substrate content for reaction;
  • Vm = reaction rate limit (in books, maximum rate);
  • Km = Michaelis-Mentem constant;
  • Kic = inhibitor dissociation equilibrium constant for competitive model;
  • Kiu = inhibitor dissociation equilibrium constant for incompetitive model


Suggestion:

"A. Enzymatic catalysis in the absence of inhibitor."
1. Just run the application with the general equation. Note that the values for Kic and Kiu are high (1e20). Thus, with high "dissociation constants," the interaction of the inhibitor with the enzyme is irrelevant, returning the model to the classic Michaelis-Mentem equation.
2. Try changing the values of Vm and Km, comparing graphs.
3. Use the geographic coordinates feature in the icon bar ("Toggle Spike Lines") feature to consolidate the mathematical meaning of Km, as well as to observe the effect of different values on the graph visualization.

"B. Competitive inhibition model."
1. To observe or compare the Michaelis model with the competitive inhibition model, simply replace the Kic value with a consistent number (e.g., Kic= 3).

"C. Incompetitive inhibition model."
1. The same suggestion above applies to the incompetitive model, this time replacing the value for Kiu.

"D. Pure non-competitive inhibition model."
1. In this model, the simulation is based on equal values for Kic and Kiu.

"E. Mixed non-competitive inhibition model."
1. For this model, simply allocate different values for Kic and Kiu.


9 Lineweaver-Burk treatment of Michaelis-Mentem data

Context

One of the (many) difficulties in learning biochemistry is the literacy required to interpret graphs. Among these, the hyperbolic nature of the Michaelis model is treated alternatively by various transformations of the variables S and v, with the double reciprocal linearization, or Lineweaver-Burk transformation, standing out in the literature.
The following example illustrates the relationship between the dependent and independent variables of the original and transformed models, using another feature of the Plotly.js library: the possibility of cross-selection of points.
To observe this functionality, select a set of points in one of the subplots of the graph below, and observe their automatic selection in the other subplot.

9.1 Equation

The Lineweaver-Burk linear transformation is given below:

\[ \frac{1}{v} = \frac{1}{S}*\frac{Km}{Vm} + \frac{1}{Vm} \]

Suggestion

1. Try selecting the extremes of the Michaelis-Mentem graph (hyperbolic curve), and observe the same points in the double reciprocal. Which values are more reliable in the latter, the first or the last?

2. Research other forms of linearization (e.g., Eadie-Hofstee; “v/S” X “v”), and see how the transformation and selection of points presents itself. To do this, change the constants below:
const invS = S.map(s => 1/s);
const invV = V.map(v => 1/v);


10 Thermodynamic stability of nucleic acids

Context:

Several factors contribute to the thermodynamic stability of biopolymers, either stabilizing or destabilizing them. This simulation deals with a thermal denaturation curve for DNA in the presence or absence of some of these compounds: trehalose (stabilizing osmolyte) and guanidine chloride (destabilizer).

Equation:

\[ y(T) = \frac{1}{1 + e^{-\frac{T - T_m}{\beta}}} \]

Where,

  • y(T): fraction of denatured DNA at a given temperature T;
  • Tm: transition temperature (melting, temperature at which 50% of molecules are double-stranded and 50% are single-stranded);
  • \(\beta\): parameter that adjusts the slope of the curve (affected by trehalose and guanidine).

Note:

  1. Trehalose as a stabilizer (reduces \(\beta\));
  2. Guanidine as a denaturant (increases \(\beta\));

Suggestion:

1. Try testing various conditions involved in the simulation, such as:
a) variation of Tm;
b) variation of the $\beta$ parameter;
c) variation of trehalose content;
d) variation of guanidine chloride content.

11 Van der Waals equation for ideal gases

Context:

An adaptation that relates the thermodynamic quantities of pressure, volume, and temperature for ideal gases is the van der Waals equation. In this equation, coefficients are computed that estimate the existence of a volume and interparticle interactions. Thus, the van der Waals equation corrects the ideal gas equation by considering a term to compensate for intermolecular forces (a/V\(^{2}\)) and the available volume, discounting the volume occupied by the gas molecules themselves.
The simulation offers additional interactivity through the presence of sliders for temperature and for the finite volume (b) and particle interaction (a) coefficients.

Equation:

\[ P = \frac{RT}{V - b} - \frac{a}{V^2} \]

  • P = gas pressure (atm);
  • V = molar volume (L);
  • T = temperature (K);
  • R = 0.0821 = ideal gas constant (L·atm/mol·K);
  • a = intermolecular attraction constant (L\(^{2}\)·atm/mol$^{2})
  • b = excluded volume constant (L/mol)

Suggestion:

1. Try varying the parameters of the equation using the slider for temperature, as well as for coefficients "a and b".
2. Discuss which of the coefficients has the greatest effect on the curve profile, and the reason for this.


12 Balance of ATP production from reagents, temperature, and pH

Context:

Intracellular ATP production involves the classic chemical equilibrium relationship between reagents and products as a function of reaction temperature, adjusted to a specific pH value. By varying one and/or other reagent content, or physical-chemical parameter, it is possible to quantify the product by the following thermodynamic reaction:

Equation:

\[ \Delta G = \Delta G^{\circ'} + RT \ln\left(\frac{[\text{ADP}] \cdot [\text{P}_i]}{[\text{ATP}]}\right) + 2{,}303 \cdot RT \cdot n_H \cdot \text{pH} \]

Where,

  • \(\Delta\)G = Gibbs energy of the reaction (positive for spontaneously unfavorable synthesis, kJ/mol);
  • \(\Delta\)G\(^{o'}\) = 30.5 kJ/mol standard biological Gibbs energy for ATP synthesis;
  • R = 8.314 J/mol/K (universal gas constant);
  • T=310 K (physiological temperature);
  • nH\(^{+}\) = 1 (number of protons involved in the reaction);
  • [ADP], [Pi], [ATP] = molar concentrations of reactants and product

Suggestion:

1. Change the quantities involved in the expression and compare with previous visualizations. For example, temperature, pH, and ADP and Pi levels.

13 Variation of Gibbs energy with temperature

Context:

The Gibbs-Helmholtz relation predicts that Gibbs energy varies nonlinearly with temperature in reactions involving a change in the heat capacity of the system (\(\Delta\)Cp). This expanded form of the Gibbs equation for variable heat capacity is present in various biochemical phenomena, such as in the phase transition of biomembrane structures subjected to a challenge from compounds, or in the conformational change that accompanies the protein structure under heating. \

Equation:

\[ \Delta G(T) = \Delta H^\circ - T\,\Delta S^\circ + \Delta C_p \left(T - T_0 - T \ln\left(\frac{T}{T_0}\right)\right) \]

Where,

  • \(\Delta\)G(T) = Gibbs energy of the reaction at each temperature value, kJ/mol);
  • \(\Delta\)H\(^{o}\) = standard enthalpy of the reaction at T\(_{0}\), usually 298 K (J/mol);
  • \(\Delta\)S\(^{o}\) = standard entropy of the reaction at T\(_{0}\);
  • \(\Delta\)Cp = change in heat capacity of the reaction (J/mol·K), assumed to be constant with temperature;
  • T = temperature of interest (K);
  • T\(_{0}\) = reference temperature, usually 298 K.
  • R = 8.314 J/mol/K (universal gas constant);

Suggestion:

1. Try varying one or more parameters of the expression;
2. Test the behavior of the Gibbs curve at a high reference temperature (simulation for extremophile organism);
3. Simulate the situation where the variation in heat capacity is zero

14 Electrochemical impedance spectroscopy (EIS)

Context:

Electrochemical impedance spectroscopy (EIS) is an analytical technique that measures the response of an electrochemical system to a small amplitude sinusoidal disturbance (e.g., 5-10 mV), applied over a wide range of frequencies (e.g., 100 kHz-1mHz).. EIS provides information about interfacial processes, such as charge transfer, ion diffusion, and passive film formation, through the analysis of the complex impedance of the system. It is a non-destructive technique widely used in areas such as corrosion, batteries, fuel cells, biosensors, and protective coatings.

Equation:

An electrochemical impedance (EIS) model with typical equivalent circuit components constitutes the modified Randles model: series resistor Rs, a resistor Rp in parallel with a CPE (Constant Phase Element), and a Warburg element. A common representation for EIS is the Nyquist plot, where the real axis represents resistance (Zre) and the imaginary axis represents negative reactance (−Zim). The formalism involved is in the equation below:

\[ Z_{\text{total}}(\omega) = R_s + \left[ \left( \frac{1}{R_p} + Q (j\omega)^n \right)^{-1} \right] + \frac{\sigma}{\sqrt{\omega}} (1 - j) \]

Where:

  • Z\(_{total}\) = total impedance at a given frequency;
  • \(\omega\) = Angular frequency (rad/s), \(\omega\)=2πf or \(\omega\)=2πf;
  • R\(_{p}\) = Ohmic resistance (resistance of the electrolytic solution, wires, contacts, etc.);
  • R\(_{p}\) = Polarization resistance (associated with charge transfer processes, such as electrochemical reactions);
  • Q = Constant associated with the constant phase element (CPE), replaces an ideal capacitor to represent non-ideal behaviors;
  • n = CPE exponent, between 0 and 1; defines the degree of ideality of the capacitive behavior (n = 1: ideal capacitor; n < 1: dispersion);
  • \(\sigma\) = Warburg coefficient, associated with ion diffusion in the electrochemical system;
  • j = Imaginary unit, j\(^{2}\) = −1;
An electrochemical impedance model (EIS) with typical equivalent circuit components constitutes the modified Randles model: series resistor Rs, a resistor Rp in parallel with a CPE (Constant Phase Element), and a Warburg element. A common representation for EIS is the Nyquist plot, where the real axis represents resistance (Zre) and the imaginary axis represents negative reactance (−Zim).

Suggestion

1. Check the effect of Rs on the graph by canceling its value (solution resistance);
2. Observe the deformation of the semicircle by varying the values of the constant phase element (e.g., Q = 1e-3; n = 0.6 - dispersion of capacitive behavior); 
2. Try combining other values of the code header parameters to highlight common situations in electroanalysis: Rs, Rp, Q, n, and sigma;
3. Reduce the Warburg model with constant phase element to a simple Randles model, consisting only of two resistors in series, the second in parallel with an ideal capacitor, and without Warburg diffusion (sigma = 0; n = 1).

15 Cyclic voltammetry

Context:

Cyclic voltammetry (CV) is an electroanalytical technique for characterizing interfacial (electrode-solution) redox processes. When a standard analyte in quantitative equilibrium between the oxidized and reduced forms is used, the resulting graph acquires an unusual profile, called a duck shape. This results from the peculiar shape obtained when a bidirectional potentiometric scan is performed on a set of electrodes by a potentiostat, resulting in cycles of oxidation and subsequent reduction (application of positive potentials followed by negative potentials). The shape, value, and distance between the resulting peaks allow the analyte to be identified and characterized, as well as the redox processes involved in the experiment.

Equation:

Although the formalism for cyclic voltammetry is complex and involves solving ordinary differential equations (as well as electroanalytical techniques in general), which is beyond the scope of this material, the resulting current flow can be summarized by the following Buttler-Volmer equation:

\[ j = j_0 \left[ \exp\left(\frac{\alpha n F (E - E^0)}{RT}\right) - \exp\left(\frac{-(1 - \alpha) n F (E - E^0)}{RT}\right) \right] \] Where:

  • j = current density;
  • j\(_{0}\) = exchange current;
  • \(\alpha\) = charge transfer coefficient;
  • E = potential applied to the electrode;
  • E\(^{0}\) = standard electrode potential;
  • n = number of electrons;
  • F = Faraday constant (96485 C·mol⁻¹)
  • R = universal gas constant (8.314 J·mol⁻¹·K⁻¹);
  • T = temperature
For a model of electroanalyte diffusion and accumulation at the electrode surface, the general equation for the boundary condition where the redox conversion rate is proportional to the diffusion current can be represented by Fick’s 2nd Law for mass transport:

\[ \frac{\partial C(x,t)} {\partial t} = D \frac{\partial^2 C(x,t)} {\partial x^2} \]

Where:

  • C(x,t) = concentration of the electroanalytical species (mol/cm³), as a function of position x and time t;
  • D = diffusion coefficient of the species (cm²/s);
  • t = time (s);
  • x = distance from the electrode surface (cm);
  • \(\delta\) = notation for partial derivative.
The chemical equilibrium of the redox reaction is given by the ratio between the concentrations of the oxidized and reduced forms at the electrode/solution interface when the system is in equilibrium (or near equilibrium), according to the Nernst equation:

\[ E = E^0 + \frac{RT}{nF} \ln \left( \frac{[\text{Ox}]}{[\text{Red}]} \right) \]

The resulting current is obtained by Faraday’s law:

\[ i(t) = n F A D \left. \frac{\partial C(x,t)} {\partial x} \right|_{x=0} \] Where:

  • A = electrode surface area (cm²);
  • i(t) = current at time t (A, Ampere).

Cyclic Voltametry Simulator

To illustrate the potential of the JSPlotly/GSPlotly platform for developing complex virtual experiments, here is a cyclic voltametry simulator resulting from the above procedures (as well as a few hours of prompt engineering!!).
The example illustrates typical behavior in cyclic voltammetry (and the curious duck shape), and includes a model of diffusion and accumulation of electroactive species near the electrode surface. The code uses a numerical solution to obtain the current density over time, considering the effects of cyclic scanning of the applied potential, mass transport, and reaction kinetics.
In short, it numerically solves Fick’s equation with dynamic boundary conditions, applies the Nernst equation at the boundary (electrode surface) to determine the conversion between species, uses the finite difference method or explicit scheme to advance in time, and obtains the current by applying Faraday’s Law, through the derivative of the flux at the interface.
Find it complex? Then how about this: the code combines Fick (transport) and Nernst (local redox equilibrium) to accurately model the shape of the voltammogram, obtaining a physical simulation with significant realism for cyclic voltammetry.


Suggestion:

1. Note the number of values at the beginning of the code, which are tangible to "parametric manipulation." Try to understand what they represent, and consciously vary their values, aiming to add value to the learning of the simulation. This is the essence of "parametric manipulation" that involves "reproducible teaching"!

16 Diagrams and flowcharts

Context: Diagram

Due to the nature of the Plotly.js library, it is also possible to generate code in JSPlotly without involving an equation, such as for representing diagrams and flowcharts, among others. The example below illustrates a representation of the Krebs cycle, and the following is an experimental flowchart.

Suggestion:

1. Try to better reposition enzymes and metabolites by simply clicking and dragging the terms;
2. Try replacing the names in the code to produce another metabolic cycle, such as the urea cycle.

Context: Flowchart

Suggestion:

1. Reposition terms and connectors by dragging with the mouse;
2. For a different flowchart in the content, modify the terms in the code;
2. For a different flowchart in the format, change the font and connector characteristics in the "annotations" constant.



Data Analysis

17 User data insertion

Context

Sometimes it is interesting to work with the discrete data itself, seeking a trend, mathematical or simply visual, for the relationship between variables. Here is an example of an interactive graph with user data.


Suggestion:

1. Try changing the data entered, overlaying the graph or not;
2. Try changing the data representation in "mode" and "type" to points, lines, points+lines, bars.


18 Loading file for analysis

Context - CSV file

Sometimes, it is necessary to analyze results related to a data set in a spreadsheet. For this situation, the following example illustrates the loading of two data vectors from a CSV (comma separated value) file, a format commonly used in exporting/importing spreadsheets, and its resulting graph.

Instructions:

  • 1 Click add plot and select a CSV file from the browse button at the top. Note: variable X in the 1st column of the file, and variable Y in the 2nd column;
    1. Click add plot again to view the resulting graph.
1. Try other CSV files;
2. Vary the aspects of the graph, such as type, color, marker size, etc.

19 Linear data adjustment

Context:

In various experimental research situations, it is necessary to correlate the data of a dependent variable with a predictor variable using the following equation.

Equation:

\[ y = \alpha x + \beta + \varepsilon \] Where:

  • y = dependent variable;
  • x = independent variable;
  • \(\alpha\) = slope of the adjusted line;
  • \(\beta\) = intercept of the adjusted line;
  • \(\epsilon\) = measurement error.
The slope and intercept parameters can be obtained through several procedures, such as least squares, matrix algebra, or simple summaries. In the latter case, the calculation of parameters and error can be computed as follows:

\[ \alpha = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - \left( \sum x_i \right)^2} \]

\[ \beta = \frac{\sum y_i - \alpha \sum x_i}{n} \]

\[ \hat{y}_i = \alpha x_i + \beta \]

\[ \varepsilon_i = \left| y_i - \hat{y}_i \right| \]

A representation for linear fitting with arbitrary data is provided below:


Suggestion:

1. Display the points overlapping or not overlapping the adjustment line. For overlap, choose "showPoints = true" and "showLine = true";
2. Change the data and perform a new adjustment to obtain other parameters of the line.

20 Polynomial regression

Context:

In some situations, linear adjustment does not provide an adequate trend for modeling experimental data, which leads to the use of other models, such as exponential, hyperbolic, logarithmic, power, and polynomial. Polynomial trends are common in natural phenomena, with mathematical treatment of data occurring by different methods, such as simple sums, least squares, or linear algebra. Here is an example of polynomial regression with adjustable degrees in the code.

20.1 Equation

Error minimization for the polynomial curve is obtained by generalized least squares regression and application of the Vandermonde matrix, as follows.
Let the vectors be x=[x\(_{1}\),x\(_{2}\),…,x\(_{n}\)] and y=[y\(_{1}\),y\(_{2}\),…,y\({_n}\)], we want to adjust the polynomial:

\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_g x^g \] | Thus, the Vandermonde matrix is constructed as:

\[ X = \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^g \\\\ 1 & x_2 & x_2^2 & \cdots & x_2^g \\\\ \vdots & \vdots & \vdots & & \vdots \\\\ 1 & x_n & x_n^2 & \cdots & x_n^g \end{bmatrix} \]

To find the coefficients \(\beta\) that minimize the quadratic error, solve the linear system in sequence:

\[ \boldsymbol{\beta} = (X^T X)^{-1} X^T \mathbf{y} \]

Where:

  • T represents the transposed matrix

Suggestion:

1. Try degree 1 for the polynomial, i.e., a reduction of the treatment to linear adjustment;
2. Change the formatting of labels, colors, sizes, etc., in the code;
3. Overlap some adjustments, edit, and reposition the legend;
4. Test the code with another data vector.

21 Multilinear regression

Context:

Multiple linear adjustment is very useful when you want to predict behavior based on two or more predictor variables. The procedure involves a design matrix (or projection operator) similar to the Vandermonde matrix.

Equation:

The analytical solution for the design matrix involves solving a matrix of coefficients \(\beta\) for a set of predictor values of the response vector y, as follows:

\[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots + \beta_p x_{pi} + \varepsilon_i \]

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \]

Where: - \(\beta\) = coefficient vector; - y = response vector; - X = design matrix; - \(\epsilon\) = random noise

Thus, predicted values and associated error are obtained by:

\[ \hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol{\beta}} \] \[ \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} \]

Obviously, the graphical representation for a multilinear fit is limited to two independent variables, x1 and x2, in addition to the response variable (3D graph), although the matrix calculation allows for several predictor variables (hyperdimensional space). Tangent to the first situation, here is an example.

Instructions - 1. Note that there is a Boolean flag (showFit) at the beginning of the code: false for data only, and true for the fit. - 2. You can click on add plot to view the data with the flag set to false, followed by another add plot with the flag set to true.


Rotate the graph and observe the surface around the points. Unlike a linear fit, the points are distributed non-continuously in space.

Suggestion:

1. QSAR ("Quantitative Structure-Activity Relationship") results use multilinear adjustment analysis to identify the strength of predictor variables. Try a dataset on the internet that has 2 variables (e.g., concentration, pH, compound A, B, etc.).
2. For more predictor variables, see the code below !!

21.1 Context: Multiple linear adjustment with 3 or more predictor variables

Although JSPlotly was designed to produce interactive graphs and maps using the Plotly.js library on top of the JavaScript language, it extends the application possibilities beyond graphical representations. An example of this permeates the following code, for a multilinear adjustment for a situation with 4 predictor variables.

Suggestion:

1. Try varying the number of predictors (xi). To do this:
a. Without reducing, simply remove the desired vector(s) and correct their quantity in the line: "const X = x1.map((_, i)";
b. If you want to increase, add a new vector in the corresponding line and update the quantity in the mapping of "const X = x1.map((_, i)".

22 Response Surface Methodology (RSM)

A multilinear adjustment variation involves the response surface methodology, commonly linear (flat) or quadratic (curvilinear), applied in modeling and optimization of experimental processes with multiple independent variables. Its objective is to adjust a mathematical equation that represents the relationship between the experimental factors and the response variable, allowing behaviors to be predicted and optimal conditions to be identified.
For experiments with two factors, a second-order quadratic model is commonly used, which allows for the representation of curved surfaces. The central idea of the experiment is to select values and their codes (-1,0,1), representing, respectively, a quantity below, an average quantity, and a quantity above, for a given predictor variable.

Equation:

The general quadratic model for two factors is represented by:

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{11} x_1^2 + \beta_{22} x_2^2 + \beta_{12} x_1 x_2 + \varepsilon \]

With the matrix form of the model represented by:

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

And the estimation of the coefficients \(\beta\) by least squares, similar to multilinear adjustment, by:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \]

Suggestion:

1. As suggested earlier, try using data from the literature or another source whose responses are already known, replacing the respective vectors at the beginning of the code. This allows you to compare the effectiveness of using the tool presented.

23 Data smoothing - Spline and Savitzky-Golay filter

In experimental research, signal detection occasionally occurs in the presence of significant levels of noise, such as in continuous spectral scanning under wavelength variation, or in the detection of electrical signals over time. In this case, there is no analytical equation that describes the behavior of the signal, although some digital processing tools can rely on a trend curve in data processing, significantly reducing the noise level, such as by spline curve and Savitzky-Golay filter.

Context - Cubic spline

A very common interpolation technique involves the use of cubic splines by degree 3 polynomials interconnecting data windows. The treatment consists of applying polynomial functions in subintervals of the points, promoting smoothing between data windows. Mathematically, the spline is applied in such a way that:

\[ S_i(x) = a_i + b_i(x - x_i) + c_i(x - x_i)^2 + d_i(x - x_i)^3 \]

Where:

  • S\(_{i}\)(x) = cubic spline function in the interval [xi,xi+1];
  • a\(_{i, b\){i}\(, c\){i}\(, d\)_{i}$ = specific coefficients for segment i;
  • x\(_{i}\) = starting point of the interval;
  • x = independent variable.

23.1 Context - Savitzky-Golay filter

The Savitzky-Golay filter smoothing technique applies a local polynomial regression to a moving window, using least squares and a predetermined degree, preserving peak, valley, and band signals. Unlike the cubic spline, the filter produces a continuous global curve, rather than a set of interconnected curves. Mathematically:

\[ \tilde{y}_i = \sum_{j=-m}^{m} c_j \cdot y_{i+j} \]

Where:

  • y\(_{i}\) = smoothed value at position i;
  • y\(_{i+j}\) = actual values of the series within the window;
  • c\(_{i}\) = filter coefficients derived from a polynomial regression;
  • m = number of points on each side of the central window.
Here is an example of code for JSplotly that illustrates the use of the two interpolation methods.


Instructions:

    1. The code has two true/false flags, one for “useGolay” and another for “useSpline”;
    1. View the raw data by setting both flags to false;
    1. To overlay a cubic spline, change its constant to true;
    1. To overlay the Savitzky-Golay filter, change its constant to true;
    1. To adjust the filter, modify its parameters in the window (sliding window) and/or degree (degree of the polynomial) constants.

Suggestion:

1. Compare the smoothing effects of the two polynomial interpolation treatments;
2. Apply only the filter, adjusting the moving window and polynomial degree parameters;
3. Try interpolation with other raw data in the code, replacing the constants "x_values" and "y_raw" with numerical vectors;

24 ANOVA and Tukey-HSD test

Context:

In research, it is common to analyze a data set by comparing the variance between groups with that obtained within each group, or Analysis of Variance (ANOVA). Once statistical information has been obtained showing that there is a difference between groups that is greater than the internal difference, a post-hoc test is performed to compare means, such as the Tukey-HSD test (Honest Significant Difference).

Equation:

The equation that characterizes a Tukey-HSD test is provided below (correction for unbalanced groups):

\[ \text{HSD}_{ij} = q_{\alpha} \cdot \sqrt{\frac{\text{MS}_{\text{within}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} \]

Where:

  • HSD = minimum value of the difference between two means for it to be considered significant;
  • q\(_{\alpha}\) = critical value of the studentized distribution q (Tukey distribution), depending on the number of groups and the significance level \(\alpha\);
  • MS\(_{within}\) = mean square within groups (error) obtained from ANOVA;
  • n: number of observations per group (i and j if there are groups of different sizes).

Suggestion:

1. You can click and drag the minimum difference markers (a, b, etc.) above the graph for better visual adjustment, as well as the ANOVA table and the statistical test result;
2. Change the vector values, redo the simultaneous analysis and graph, and notice the difference in the values obtained. Note that, as stated at the bottom, you need to refresh the website page to clear previous data retained in the cache;
3. Enter other data in the respective vectors, either your own or from another source;
4. Enter new data vectors, or delete some, for a customized calculation and graph.

25 Nonlinear data adjustment

Modeling is a common area of application in the natural sciences when mathematical prediction or correlation of experimental data is desired. Models can be linear or nonlinear. When nonlinear, they can be applied empirically to the data (e.g., quadratic, logarithmic, exponential, or power functions) or based on a mathematical function that explains the model.

In this case, techniques for nonlinear adjustment of the data to the model are used, such as Simplex, Gauss, Newton-Raphson, Levenberg-Marquardt, or least squares, for example. Regardless of the method, nonlinear fitting differs from linear fitting in its assumptions and treatment of data (see a quick explanation here.

Context: Ligand-protein interaction (grid search algorithm) {.unnumbered}

The following example illustrates the application of a nonlinear adjustment of molecular interaction data using the Langmuir model and treatment by quadratic error minimization by manual scanning. In this treatment, the sum of quadratic errors (RMSE) is calculated for combinations of a and b, selecting the pair with the lowest calculated error. In this case, for simplicity, although JSPlotly works with more specific libraries, the nonlinear regression is performed by least squares, but using grid search rather than classic iterative methods. Thus, a set of values is tested for the pair (a, b) in a previously defined grid (combination table), allowing the identification of the parameters that best describe the experimental data according to the minimum error criterion.


Equation:

The Langmuir equation is represented below:

\[ Y = \frac{aX}{K_d + X} \] Where:

  • Y = fraction of occupied sites;
  • X = concentration of free ligand;
  • a = saturation value (sites of the same affinity completely occupied);
  • Kd = equilibrium dissociation constant of the complex.
In this case, the nonlinear least squares function is:

\[ SSR(a, K_d) = \sum_{i=1}^n \left( y_i - \frac{a x_i}{K_d + x_i} \right)^2 \]

Where:

SSR = sum of the squares of the residuals (inversely proportional to the quality of the fit)

Suggestion:

1. As in the previous example, alternately display the points and the adjustment curve using the "Boolean" operation ("showPoints = true/false"; "showCurve = true/false");
2. Change the data and perform a new fit to obtain other parameters of the equation (Kd, SSR, R², adjusted R²). Note: Adjusted R² corresponds to the value of R² corrected for the number of parameters in the model (in this case, "2"), directly affecting the "degrees of freedom" for the fit.


Context - Competitive inhibition in enzyme kinetics (Gauss-Newton algorithm)

The Gauss-Newton method iteratively adjusts nonlinear models to experimental data sets. It is especially efficient when the model is quasi-linear with respect to the parameters and the residuals are moderately small. The goal is to find the parameters \(\theta\) that minimize the sum of the squares of the residuals between the observed data yi and the values predicted by the model f(xi,\(\theta\). Unlike the full Newton-Raphson method, Gauss-Newton avoids the explicit calculation of the Hessian, using an approximation based on the Jacobian of the residuals.

Equation:

This approximation results in a linear system that can be solved at each iteration to update the model parameters. The central formula of the method consists of multiplying the transposed Jacobian JT by the Jacobian J itself, forming a symmetric matrix that approximates the Hessian. The residual vector r is then used to determine the update direction \(\Delta\theta\), with the following equation:

\[ \theta_{k+1} = \theta_k + \left(J^T J\right)^{-1} J^T \cdot r \]

To illustrate the potential of the application in iterative nonlinear fitting, here is an example for enzyme kinetics using a competitive inhibition model.
The code allows you to adjust the seeds, tolerance, and number of iterations, and provides the graph (points/fitted line, manually) and statistical parameters of the regression.


Instructions

    1. Note that the code has a flag for true/false in the ajuste constant. Leave it at false for data representation only (x and y vectors, or S and v);
    1. In the Plot area of the code, set the trace constant so that it also includes the data vectors;
    1. Click add plot to view the experimental points;
    1. To superimpose the fitted curve, update the ajuste and trace constants (vectors S_fit and v_fit), and choose lines in mode.

Experimental Simulation

26 Characterization of amino acids and chains

Context:

JSPlotly allows you to work with a set of interactions that are not necessarily linked to equations and their graphical representations. The following example illustrates its potential for simulating a classic experiment in biochemistry: the characterization of amino acids, peptides, and proteins by chromogenic reactions in test tubes. As the scope of this material is to present and facilitate the use of various OIERs (interactive reproducible teaching objects), users are advised to search for sources of information related to the reactions briefly described below.

Simulator instructions

The simulator allows two options: a) discover the nature of a sample or b) discover the color pattern of reactions for a chosen sample.


Option A: discover the nature of a sample:

Here you can “play” to check whether you are dealing with a sample of protein/peptide (chain), an amino acid (AA), a specific mixture of AAs, or AA(s) present in a chain. To do this:
  1. Click on the image below and select “add.” A set of seven colored bars will be generated, representing the colors obtained in test tubes for seven different reactions for protein characterization: ninhydrin, biuret, Pauly, xanthoprotein, Sakagushi, Millon, and sulfur (Cys).

  2. Discover the type of sample based on the color pattern of the tubes presented, providing a guess. The guess must be by type (in the “Guess (type)” menu - AA, chain, or mixture of AAs) and by profile (in the “Guess (profile)” menu - random, specific AA(s), or specific mixture).

  3. After selecting from the two menus, click “Check.” If you are correct, a ‘Score’ just below will calculate the number of correct answers/number of samples “played.”

  4. If you are wrong, try again or look for an explanation of the color pattern in “Explain”;

  1. There are two other possibilities: “Hard mode”, which hides the names of the reactions, while maintaining their sequence, and “Shuffle”, which changes the order of the tubes;

  2. For a new sample, just click on “New sample”;

Option B: discover the color pattern of a reaction for a selected sample:

  1. Select a sample in ‘Generate’, and click on “New sample”. A set of tubes will be generated with the expected reaction pattern for the sample.

Suggestion:

1. Try clicking on "New sample" to check the frequency of possibilities;
2. Try the two "game" operation options (yes, it's a game... after all, it has a scoreboard!);
3. Try "hard mode" (no reference to reactions in sequence) or "shuffle" (swap the order of reactions). Obviously, however, avoid both simultaneously!

Application

27 ZeMol - 3D viewer for molecular models

Context:

Through the combined features of JavaScript and the Plotly.js library, it is possible to extend the application potential of JSPlotly to build true HTML applications. A clear example of this scope is illustrated below, for a molecular viewer similar in nature to some pre-existing ones used in teaching and research (Jmol, Pymol, etc.).
Although it is obviously much smaller in scope compared to other applications, the so-called “ZeMol” has less than 70 kB of memory and offers features such as:
  • models loaded online from the PDB or PubChem websites;
  • models loaded from the device’s physical memory (PDB, MOL, SDF, XYZ, TXT formats);
  • PDB file identification information (or PubChem file name);
  • header containing the number of chains, ligands, ions, and disulfide bonds;
  • identification of only the \(\alpha\) carbons of each residue of the polypeptide skeleton, facilitating model visualization;
  • automatic visualization of disulfide bonds;
  • identification of residues, ligands, and ions by hovering the mouse over the model;
  • checkbox for viewing the secondary structure of proteins (helix, \(\beta\)-sheet);
  • checkbox for residue polarity index;
  • sliders for CA carbon size and model opacity (useful for highlighting crevices, 2a structure, carbon chain, ligands, ions);
  • hide/show different parts of the model by clicking on its legend (chain, ions, helix/sheets, ligands);
  • Highlighting of AAs (individual, sequence, all of one type, predefined groups in the script);
  • Self-sufficient HTML when saving, “freezing” the model in the characteristics you want to present;
  • Zoom on mobile devices (one finger fixed on the screen and the other dragging the model) and Pan (moving the model with two fingers on the screen) - actions also present in the icons above the model;
  • optional adjustments in the script itself (sizes, colors, thicknesses, AA groups, line type and thickness, for example).

Suggestion:

1. Load a model with PDB code, or a molecule from the PubChem website;
2. If it is a protein, try the separate and combined effects of the sliders for CA size and opacity;
3. Observe the polarity of protein residues in the checkbox of the same name;
4. Try zooming in/out or moving it around the screen with the mouse, or do it on a smartphone as instructed above;
5. Show or hide each chain or all chains, ligands, and ions by clicking on the respective terms in the legend;
6. Select residues of interest from the model, such as catalytic sites, coenzyme binding sites, or activity regulation sites. To do this, type in the corresponding field and click "highlight." Some suggestions:
a. Individual: Arg15, ASP117, 15, A:15, A:ARG15;
 
b. Ranges: 15-30, A:15-30, Arg15-Asp30; 
c. Multiple residues separated by commas: "Arg15, Asp117";
d. Groups: "aromatic," "polar," "nonpolar," "basic," "acidic," "small."
Back to top