Recent Publications  



Editor for the Following Journals

HIV-1 Protease Dataset

Let's consider the HIV-1 protease (Pintro & de Azevedo, 2017) for which crystallographic and inhibition constant (Ki) data are available. There are a total of 71 structures satisfying both criteria, table available here. Evaluation of binding affinity using scoring functions available in the program Molegro Virtual Docker (Thomsen & Christensen, 2006) generated Spearman's correlation coefficient ranging from -0.245 to 0.38, the highest correlation was obtained for interaction score. The figure below shows the scatter plot for the predicted and experimental binding affinity using all 71 structures.

Scatter plot for Interaction Score vs log(Ki) for HIV-1 protease dataset. In the plot, au represents arbitrary units.

​The Polscore methodology implemented in the program SAnDReS (Xavier et al., 2016) makes possible to test different scoring schemes, using polynomial equations where their terms are taken from the original scoring functions generated by the molecular docking programs. Here, we consider a polynomial equation involving PLANTS (Korb et al., 2009). Interaction, and Ligand Efficiency 3 Scores. We generated a total of 511 new polynomial scoring functions using SAnDReS. The Table below summarizes the results of training and test set data for the original scoring functions and the top-ranked polynomial equation. The best result was obtained for polynomial equation 504 with ρ = 0.525 (p-value < 0.001) for the training set (51 structures) and ρ = 0.368 (p-value = 0.1106) for a test set with 20 structures. The figure shows the scatter plot for polynomial equation 504 vs log(Ki), with training set data. 

Correlation between scoring functions and log(Ki)

Scoring Function                  ρ                    p-value          ρ                    p-value

PLANTS Score                      0.264               0.06162         0.010              0.9674

MolDock Score                      0.218               0.1247           0.086              0.7193

Re-rank Score                       0.350               0.1184           -0.086              0.7169

Interaction Score                   0.479               0.00038         0.080               0.7383

Co-factor Score                    -0.143              0.3176           -0.384               0.09459

Protein Score                         0.223              0.1154            0.165               0.4877

Water Score                           0.043               0.766            0.214               0.3658

H-Bond Score                        0.027               0.8525         -0.288               0.2181

LE1 Score                              0.187               0.1886          0.256               0.2750

LE3 Score                              0.045               0.7559         -0.140               0.5563

Score504                               0.525               0.000077     0.368                0.1106 

Scatter plot for polynomial equation 504 (Score504) vs log(Ki) for 51 structures in HIV-1 Protease training set. In the plot, au represents arbitrary units.

As we can see, the application of the machine-learning technique generated a model with superior predictive power.


Korb O, Stützle T, Exner TE. Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 2009; 49(1): 8496.   PubMed   

Pintro VO, Azevedo WF. Optimized Virtual Screening Workflow. Towards Target-Based Polynomial Scoring Functions for HIV-1 Protease. Comb Chem High Throughput Screen. 2017; 20(9): 820-827.  PubMed   PDF        

Thomsen R, Christensen MH. MolDock: a new technique for high-accuracy molecular docking. J Med Chem. 2006; 49: 331521.   PubMed   

Xavier MM, Heck GS, de Avila MB, Levin NM, Pintro VO, Carvalho NL, Azevedo WF Jr. SAnDReS a Computational Tool for Statistical Analysis of Docking Results and Development of Scoring Functions. Comb Chem High Throughput Screen. 2016; 19(10): 80112.    Link   PubMed   Go To SAnDReS   PDF