**____________________________________________________________**

**Please cite the following paper Xavier et al. 2016 when using SAnDReS**

## Machine Learning

The use of machine-learning methods to study biological systems is not new. For instance, we can find applications of artificial neural networks, as old as 1985 [1]. Considering application of supervised machine-learning techniques to the prediction of ligand-binding affinity, we have studies dating back 1995 [2].

So, what is new about SAnDReS? SAnDReS makes use of supervised machine-learning techniques to generate polynomial equations to predict ligand-binding affinity, which allows improvement of native scoring functions. SAnDReS allows training a model making it specific for a biological system. Let us consider cyclin-dependent kinases (CDKs), we could make use of a standard scoring function, such as PLANTS score [3] and fine tuning its terms to adjust it to predict log(Ki) for CDKs. We could say that we are integrating computational systems biology and machine-learning techniques to improve the predictive power of scoring functions, which gives you the flexibility to test different scenarios for the biological system you are interested in.

We could think that we have the protein space and the chemical space with all potential binders to elements of the protein space. SAnDReS allows the construction of a third space, we call virtual scoring function space, where we find infinite mathematical functions to predict ligand-binding affinity. SAnDReS applies machine-learning techniques to explore this virtual scoring function space finding the function that predicts the experimental binding affinity as closer as possible.

SAnDReS has a flexible interface that allows to test the predictive power of regression models generated by machine learning techniques, such as :

SGDRegressor, and

All these methods are available from scikit-Learn library [4] and implemented as an intuitive workflow in SAnDReS.

#### References

[1] Nanard M, Nanard J. A user-friendly biological workstation. Biochimie 1985; 67(5): 429-32.

[2] King RD, Hirst JD, Sternberg M.JE. Comparison of artificial intelligence methods for modeling pharmaceutical QSARs. Appl Artif Intell 1995; 9: 213-34.

[3] Korb O, Stützle T, Exner TE. Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 2009; 49(1): 84-96.

[4] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Verplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12: 2825-30. PDF