Original article Biomath 3 (2014), 1410061, 1–7 B f Volume ░, Number ░, 20░░ BIOMATH ISSN 1314-684X Editor–in–Chief: Roumen Anguelov B f BIOMATH h t t p : / / w w w . b i o m a t h f o r u m . o r g / b i o m a t h / i n d e x . p h p / b i o m a t h / Biomath Forum Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 Antagonists into a Single Predictive Mathematical Model Olaposi Idowu Omotuyi, Hiroshi Ueda Department of Pharmacology and Therapeutic Innovation University Graduate School of Biomedical Sciences, 852-8521 Nagasaki, Japan Email: bbis11r104@cc.nagasaki-u.ac.jp Received: 6 June 2013, accepted: 6 October 2014, published: 16 October 2014 Abstract—Sixty six diverse compounds previously reported as Lysophosphatidic Acid Receptor (LPA3) inhibitors have been used to derive a mathematical model based on partial least square (PLS) clustering of 41 molecular descriptors and pIC50 values. The pre- and post- cross-validated correlation coeffi- cient (R2) is 0.94462 (RMSE=0.21390) and 0.74745 (RMSE=0.49055) respectively. Bivariate contingency analysis tools implemented in MOE was used to prune the descriptors and refit the equations at a descriptor-pIC50 correlation coefficient of 0.8 cut- off. A new equation was derived with R2 and RMSE values estimated at 0.88074 and 0.31388 respectively. Both equations correctly predicted the 95% of the pIC50 values of the test dataset. Prin- cipal component analysis (PCA) was also used to reduce the dimension and linearly transform the raw data; 8 principal components sufficiently account for more than 98% of the variance of the dataset. The numerical model derived here may be adapted for screening chemical database for LPA3 antagonism. Keywords-upscaling; LPA3; LPA3 antagonists; Mathematical Model; PCA; Molecular descriptors I. INTRODUCTION Quantitative structure activity relationship (QSAR) allows statistical analysis of experimental data and building of predictive mathematical models from the dataset. The numerical models built using this approach has been successfully implemented in screening of large database of chemical compounds for hit-compound detection [1]. In the presence of experimental dataset [2], the success of QSAR depends on two key factors: array of descriptors that optimally represent the structural parameters required for molecular interaction or reactions [3] and an appropriate statistical learning and validation algorithms [4]. In practice, physical properties descriptors (1D-descriptor), pharmacophore descriptors (2D-descriptors) and geometrical descriptors (3D-descriptors, often requires prior knowledge of target protein binding-pocket) are the most commonly used descriptor types for QSAR modeling [5,6,7]. We seek to answer a single question here, what combination of Citation: O. Omotuyi, H. Ueda, Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 Antagonists into a Single Predictive Mathematical Model, Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 1 of 7 http://www.biomathforum.org/biomath/index.php/biomath http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... molecular predictors would numerically and accurately predict the experimental antagonist activities of LPA3 inhibitors? When answered, the mathematical relationship derived from the descriptors will enable screening of chemical databases for compounds exhibiting LPA3 antagonism required for the treatment of diseased conditions such as ovarian cancer [8] and neuropathic pain [9] with LPA3 etiology. II. STATISTICAL BASIS OF QSAR MODELING USING PARTIAL LEAST SQUARE METHOD The QSAR/PLS modeling equations and algo- rithms have been well described in MOE docu- mentations [10]. Given m molecules of a training dataset, suppose that each of the molecules is described by an n-vector of descriptors xi = (xi1, ..., xin), for one of the molecules denoted as i. Let yi be a representation of the experimental result (pIC50) for a molecule i. A linear model for y (the experimental result) is given by Eq. (1) [11]. y = a0 + a T X , (1) where a0 is a scalar, and aT is a n-vector. If each molecule has an importance weight (non- negative) w representing the relative probability that the associated molecule will be encountered, and that the sum of all the weights are designated as W . The mean square error is given as Eq. (2) [12]. MSEa0,a = 1 w m∑ i=1 [yi −(y = a0 +aT Xi)]2 . (2) Differentiating MSE with respect to the pa- rameters satisfying the normal Eqs (3,4,5,6 &7) solvable by matrix diagonalization: a0 = y0 − aT Xi , (3) y0 = 1 w m∑ i=1 [wiyi] , (4) x0 = 1 w m∑ i=1 [wixi] , (5) Sa = b = 1 W m∑ i=1 [wiyi(xi − x0)] , (6) S = 1 w m∑ i=1 [wi(xi − x0)(xi − x0)T ] . (7) Starting from the normal equations above, an estimate of a can be computed if columns of the weight matrix (GA) (Eq. (8)) is ob- tained through Gram-Schmidt orthogonalization [13] of the vectors generated by Krylov sequence b, Sb, S2b, ..., SA−1b [14]. The Ath PLS coeffi- cient vector is then estimated using Eq. (9). GA = (gi, g2, . . . , gA) . (8) a = GA(G T ASGA) −1GTAb . (9) Noting that gi is the column vectors of length n and A is the degree of the PLS fit; an integer less than or equals n. MOE [10] descriptor calculator was used to generate the numerical representations (a aro, ASA, ASA H, a hyd, SlogP, SlogP VSA0, SlogP VSA1, SlogP VSA2, SlogP VSA3, SlogP VSA4, SlogP VSA5, SlogP VSA6, SlogP VSA7, SlogP VSA8, SlogP VSA9, SMR VSA0, SMR VSA1, SMR VSA2, SMR VSA3, SMR VSA4, SMR VSA5, SMR VSA6, SMR VSA7, a acc, Kier1, Kier2, Kier3, KierA1, KierA2, KierA3, KierFlex, chi0, chi0v, chi0v C, chi0 C, chi1, chi1v, chi1v C, chi1 C, chiral, chiral u) of the 66 (Supplementary fig. 1) randomly selected LPA3 antagonists retrieved from the European Institute of Bioinformatics dataset (https://www.ebi.ac.uk/chembl/) representing our training dataset (CHEMBL3250). Using the PLS method as described above, Eq. (10) was generated relating the descriptors to the pIC50 with a correlation coefficient (R2) 0.94462 (RMSE = 0.21390) (Fig. 1, blue Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 2 of 7 http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... circles and line); when cross validated, R2 was estimated as 0.74745 (RMSE = 0.49055). Fig. 1: Scatter plot of the experimental pIC50 vs. pIC50-predictions of Eq. (10) (blue) and Eq. (12) (green). pIC50 = 3.57363 − 0.25353 · a aro − 0.00361 · ASA + 0.23510 · a hyd + 0.05890 · SlogP − 0.02287 · SlogP V SA0 +0.00032 · SlogP V SA1 + 0.03125 · SlogP V SA2 −0.02059 · SlogP V SA3 + 0.02954 · SlogP V SA4 +0.07226 · SlogP V SA5 + 0.02879 · SlogP V SA6 +0.04687 · SlogP V SA7 + 0.03836 · SlogP V SA8 +0.06880 · SlogP V SA9 + 0.04912 · SMR V SA0 +0.02536 · SMR V SA1 + 0.08743 · SMR V SA2 +0.00289 · SMR V SA3 − 0.01524 · SMR V SA4 +0.04694 · SMR V SA5 + 0.09067 · SMR V SA6 − 0.01442 · SMR V SA7 + 0.18393 · a acc − 0.77650 · Kier1 − 0.43968 · Kier2 − 0.30735 · Kier3 − 0.43752 · KierA1 − 0.03578 · KierA2 + 0.76916 · KierA3 − 0.09573 · KierFlex + 0.00332 · chi0 + 0.55223 · chi0v + 0.13554 · chi0v C − 0.16530 · chi0 C + 0.59498 · chi1 + 0.05911 · chi1v − 0.93262 · chi1v C − 1.22808 · chi1 C − 0.16986 · chiral − 0.56204 · chiral u. (10) Fig. 2: Bar chart representations of the residual (Ex- perimental pIC50-Predicted pIC50 values of the test dataset. Only 1 out of tested compounds (compound 23, see supplementary Fig. 2 for structural details) showed > 1.0 pIC50 unit (indication of wrong prediction). Noting that root mean square error (RMSE) is the square root of MSE function (Eq. (2)) at a given parameter value and the correlation coef- ficient (R2) is 1-MSE/YVAR with values raging between 0 and 1 (0= no fit, 1 is perfect fit and YVAR is the sample variance of the yi values). The predictive suitability of our equation was tested on 23 compounds (Supplementary Fig. 2) with experimentally determined IC50 for LPA3 antag- onism. If we assume that residual value above 1.0 pIC50 unit represents poor fitting. Our data (Fig. 3) suggest that Eq. (10) accurately predicted 22 of the 23 test compounds. III. DESCRIPTOR CONTINGENCY ANALYSIS To determine the level of significance of each of the descriptors to the overall equation and we per- formed contingency analysis. The data presented here provides a window of decision on whether pruning of the descriptor set is required. In MOE [10], QSAR-contingency tool performs a bivariate contingency analysis for each descriptor and the experimental activity value and produces a table of correlation coefficients (Eq. (11)) for each descrip- tor given that X represents a randomly selected molecular descriptor and Y is a randomly selected activity value for a randomly selected sample m, V ar(X) and V ar(Y ), then the covariance of Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 3 of 7 http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... the random variables X and Y is defined to be Cov(X, Y ) = E(XY ) − E(X)E(Y ) [10, 15]. R2 = [E(XY ) − E(X)E(Y )]2 V ar(X)V ar(Y ) . (11) Given that the values of R2 ranges from 0 to 1, and 1 represents a perfectly linear correlation, we therefore proposed that only descriptors R2 values ≥ 0.8 are useful and that the descriptors outside this range can be pruned. Our data suggest that 31 out of the original 41 descriptors have R2 values ≥ 0.8 (Fig. 3, Supplementary Table 1). With the exclusion of the descriptors with unsatisfactory coefficient, QSAR is re-calculated using the resid- ual set of descriptors. New numerical relationship was generated (Eq. (12)) with R2 (0.88074) and RMSE values (0.31388). The scatter plot of the predicted pIC50 and the experimental values for the new Eq. (12) is given in Fig. 1 (green circles and line). lpIC50 = 2.23199 − 0.00516xASA − 0.00516xASA H − 0.48596xa hyd − 0.33917xSlogP −0.05298xSlogPV SA0 − 0.03967xSlogPV SA1 −0.02243xSlogPV SA2 + 0.01681xSlogPV SA7 + 0.02107xSlogPV SA9 −0.00757xSMRV SA0 − 0.00087xSMRV SA1 − 0.00089xSMRV SA3 −0.01173xSMRV SA4 + 0.00955xSMRV SA5 − 0.01412xSMRV SA6 − 0.02508xSMRV SA7 − 0.26771xKier1 + 0.15306xKier20.56650xKier3 − 0.30504xKierA2 + 0.98837xKierA3 − 0.28849xKierFlex + 0.48535xchi0 + 0.90693xchi0v + 0.10234xchi0vC + 0.24407xchi0C + 0.66154xchi1 + 0.36006xchi1v − 1.03589xchi1vC − 0.62474xchi1C − 0.36725xaaro . (12) When this equation was used for predicting the Fig. 3: Bar chart representations of Descriptor- experimental pIC50 correlation coefficient. Only 31 out of 41 descriptors lie above 0.8 coefficient cutoff. Fig. 4: The 3D plot of the first three principal compo- nents. Each point represents a compound in the training dataset and each colour represents a distinct cluster of pIC50 values. pIC50 values of the test set, only one compound lies above the 1.0 pIC50 unit cutoff (data not shown). Thus, Eq. (12) is less bulky and as ac- curate as Eq. (10) in predicting LPA3 antagonism. IV. PRINCIPAL COMPONENT ANALYSIS OF EQUATION We sought to further study the dataset descrip- tors along the principle components through the reduction of the dimensionality and linear trans- formation of the raw data [13]. Given the initial 66 training dataset compounds (represented as m) and for one of the compounds say i its descriptors are represented by n-vector of real numbers xi = Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 4 of 7 http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... (xi1, ..., xin), where n = 1 − 31, new Eq. (12). Assuming that each molecule i has an associated importance weight wi, (non-negative, real number) and that the weights is relative probability that the associated molecule xi will be encountered (adding up to 1); If W denotes the sum of all the weights then, the eigenvalues and eigenvectors for the final data are estimable from the raw data using Eq. (1). If S is a symmetric, semi-definite sample covariance matrix, S can be diagonalized such that S = QT DDQ (Q is orthogonal, D is diagonal-sorted in descending order from top left to bottom right) [13, 14]. E(x) ≈ x = x0 = 1 w m∑ i=1 [wixi] (13) Cov(x) ≈ S = 1 w m∑ i=1 [wixix T i − xx T ]. (14) The effect of the each of the principal com- ponents (eigenvectors) on the condition and the variance shows that nine (8) principal components sufficiently accounts for more than 98% of the variance in the dataset [15]. The 3D-scatter plot of the first three principal components (PCA1, PCA2 and PCA3) with respect to pIC50 values is shown in Fig. (4); each point in the plot corresponds to a dataset molecule colored according to clustered pIC50 values. V. CONCLUSION Given the good mathematical correlation be- tween the set of descriptors and LPA3 antagonism, it is not unusual to propose that the equation is prejudiced for those set of compounds with highly related descriptor properties and therefore may not be a universal formula for LPA3 antagonist screening. That said, it will however capture the compounds with structural properties found within the dataset accurately and therefore may be piped as into ligand-based screening protocol for more successful hit-compound identification. ACKNOWLEDGMENT This work was supported by Platform for Drug Discovery, Informatics, and Structural Life Sci- ence from the Ministry of Education, Culture, Sports, Science and Technology, Japan. APPENDIX Supplementary Table 1.0 Showing Correlation coefficient of each Descriptor S/N Desciptors Corr. Coefficient 1 SlogP_VSA6 0.57623 2 chiral_u 0.65734 3 SlogP_VSA4 0.66609 4 SlogP_VSA5 0.6996 5 chiral 0.72218 6 SMR_VSA2 0.78566 7 SlogP_VSA8 0.78621 8 a_acc 0.78922 9 SlogP_VSA3 0.79094 10 KierA1 0.79264 11 a_aro 0.80122 12 SlogP_VSA9 0.80481 13 SlogP_VSA1 0.80575 14 chi0_C 0.806 15 chi1v 0.80603 16 KierFlex 0.80836 17 chi1v_C 0.81041 18 KierA3 0.81376 19 SlogP_VSA2 0.81493 20 SMR_VSA7 0.81623 21 ASA 0.81908 22 ASA_H 0.81908 23 chi0v 0.82223 24 chi0v_C 0.82394 25 SMR_VSA4 0.82512 26 chi1_C 0.82535 27 KierA2 0.82725 28 chi0 0.82827 29 SlogP_VSA7 0.82933 30 SMR_VSA5 0.82941 31 Kier2 0.83257 32 SlogP_VSA0 0.83519 33 Kier1 0.83644 34 SMR_VSA1 0.83839 35 chi1 0.84721 36 SMR_VSA6 0.84762 37 SMR_VSA3 0.8525 38 Kier3 0.85924 39 SlogP 0.86886 40 SMR_VSA0 0.87545 41 a_hyd 0.88264 Scatter plot of the experimental pIC50 vs. pIC50- predictions of Eq.(10) (blue) and Eq. (12). Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 5 of 7 http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... 1 O N O P O O O O 2 O N O P O O O O 3 O O N N N F Cl 4 O N O P O O O O 5 O P S O O 6 O O O P S O O O O 7 O O N N N O O 8 O N O P O O O O 9 O P O O O 10 O O N N N O O 11 O P S O O 12 O O N O O O 13 O N O P O O O O 14 P O O O 15 O O O N OO N O O O 16 O O O N O O N O O + - 17 O N O P O O O O O 18 O N O P O O O O 19 O P S O O 20 O O O P O O O P O O O OO 21 O P S O O 22 O O O P SO O O O 23 O O N NN N O O F F F 1 O O O NN O O O O O pIC50: 7.5229 $PRED: 7.5301 $RES: -0.0072 2 O O O NN O O O O O pIC50: 7.5229 $PRED: 7.5301 $RES: -0.0072 3 O P O O O pIC50: 6.4157 $PRED: 6.4861 $RES: -0.0705 4 O O N N N N O O pIC50: 4.7305 $PRED: 4.7125 $RES: 0.0179 5 O O O P OO O O O pIC50: 6.6840 $PRED: 6.8724 $RES: -0.1883 6 N OO P S O O NO pIC50: 6.5200 $PRED: 6.4167 $RES: 0.1033 7 O N O P O O O O O pIC50: 6.1925 $PRED: 6.1022 $RES: 0.0902 8 O ON NN N O O F F F pIC50: 5.1637 $PRED: 5.0749 $RES: 0.0888 9 O O O N OO N O O O pIC50: 5.9101 $PRED: 5.9231 $RES: -0.0131 10 O O O N OO N O O O pIC50: 5.9101 $PRED: 5.9231 $RES: -0.0131 11 O P O O O pIC50: 6.4157 $PRED: 6.3266 $RES: 0.0890 12 O N O P O O O O pIC50: 5.0000 $PRED: 5.0393 $RES: -0.0393 13 O O N N N O O F pIC50: 5.2366 $PRED: 5.4671 $RES: -0.2305 14 O P O O O pIC50: 6.3747 $PRED: 6.6310 $RES: -0.2563 15 N OO P O O O NO pIC50: 6.0292 $PRED: 6.1727 $RES: -0.1435 16 O O N N N O O pIC50: 4.5229 $PRED: 4.5776 $RES: -0.0547 17 O O P S O O pIC50: 5.6308 $PRED: 5.6943 $RES: -0.0635 18 N OO P S O O NO pIC50: 6.6003 $PRED: 6.4167 $RES: 0.1836 19 O N O P O O O O pIC50: 5.1649 $PRED: 5.1271 $RES: 0.0379 20 O N O P O O O O pIC50: 5.1904 $PRED: 5.5739 $RES: -0.3834 LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING 21 O O N N N N O N S O O pIC50: 5.6364 $PRED: 5.7584 $RES: -0.1220 22 O O N N N Cl pIC50: 4.5229 $PRED: 4.4398 $RES: 0.0830 23 O O N N N pIC50: 4.5229 $PRED: 4.4673 $RES: 0.0556 24 O P S O O pIC50: 6.9136 $PRED: 6.9600 $RES: -0.0463 25 O O O P O O O O O pIC50: 6.8447 $PRED: 6.6932 $RES: 0.1515 26 O N O P O O O O N pIC50: 6.0269 $PRED: 6.4243 $RES: -0.3974 27 O O P S O O pIC50: 5.8962 $PRED: 5.8462 $RES: 0.0500 28 O O N N N N O O pIC50: 4.6588 $PRED: 4.2947 $RES: 0.3641 29 O N O P OO O O N pIC50: 5.0334 $PRED: 5.2622 $RES: -0.2288 30 O N O P O O O O pIC50: 5.1931 $PRED: 5.3161 $RES: -0.1230 31 O P S O O pIC50: 7.5528 $PRED: 7.3033 $RES: 0.2495 32 O P S O O pIC50: 6.4685 $PRED: 6.8160 $RES: -0.3474 33 O O F P O O O pIC50: 5.1124 $PRED: 4.9684 $RES: 0.1439 34 O O N N N O O F pIC50: 5.0830 $PRED: 4.9372 $RES: 0.1458 35 O P O O O pIC50: 6.1135 $PRED: 6.1509 $RES: -0.0374 36 O N O P O O O O pIC50: 5.1593 $PRED: 5.3161 $RES: -0.1568 37 O N O P O O O O pIC50: 5.0000 $PRED: 4.7783 $RES: 0.2217 38 O O O N O O N O O + - pIC50: 6.1238 $PRED: 6.1151 $RES: 0.0087 39 O P S O O pIC50: 7.5686 $PRED: 7.1552 $RES: 0.4134 40 O O N N N O O F F F pIC50: 6.3206 $PRED: 6.3223 $RES: -0.0018 LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING 41 N O N SN O O NO O O N O O O O + - + - pIC50: 7.6198 $PRED: 7.6400 $RES: -0.0202 42 N O N SN O O NO O O N O O O O + - + - pIC50: 7.6198 $PRED: 7.6400 $RES: -0.0202 43 O P O O O pIC50: 6.0809 $PRED: 6.3361 $RES: -0.2551 44 O O N N N O O pIC50: 5.0788 $PRED: 4.8447 $RES: 0.2341 45 O O O P OO O O O pIC50: 7.0706 $PRED: 6.8724 $RES: 0.1982 46 O P S O O pIC50: 7.5528 $PRED: 7.0811 $RES: 0.4717 47 P O O O pIC50: 6.1844 $PRED: 6.2144 $RES: -0.0299 48 O P O O O pIC50: 6.9872 $PRED: 6.7202 $RES: 0.2669 49 O N O P O O O O O pIC50: 5.1415 $PRED: 5.0991 $RES: 0.0424 50 O O P O O O O pIC50: 6.8447 $PRED: 6.6623 $RES: 0.1824 51 O P O O O pIC50: 7.0177 $PRED: 6.7353 $RES: 0.2824 52 O O N N N O O pIC50: 4.5229 $PRED: 4.7671 $RES: -0.2443 53 O O N O O O pIC50: 5.5240 $PRED: 5.5696 $RES: -0.0456 54 O P S O O pIC50: 6.7905 $PRED: 7.2006 $RES: -0.4101 55 O N O P OO O O pIC50: 5.6364 $PRED: 5.6100 $RES: 0.0264 56 O N O P OO O O N pIC50: 5.5800 $PRED: 5.2622 $RES: 0.3179 57 N OO P O O O NO pIC50: 6.3830 $PRED: 6.1727 $RES: 0.2103 58 O ON N N N ON SO O F F F pIC50: 7.1871 $PRED: 7.1826 $RES: 0.0045 59 O O N N N F Cl pIC50: 4.5229 $PRED: 5.0926 $RES: -0.5698 60 O O O P S O O O O pIC50: 6.7352 $PRED: 6.9852 $RES: -0.2500 LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 6 of 7 http://dx.doi.org/10.11145/j.biomath.2014.10.061 O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ... 61 O N O P O O O O pIC50: 5.9259 $PRED: 5.7730 $RES: 0.1529 62 O O P S O O O pIC50: 6.7352 $PRED: 7.0541 $RES: -0.3189 63 O N O P O O O OO pIC50: 5.2541 $PRED: 5.5111 $RES: -0.2570 64 O N O P O O O O N pIC50: 6.7570 $PRED: 6.4243 $RES: 0.3327 65 P O O O pIC50: 5.9208 $PRED: 6.0663 $RES: -0.1455 66 O O N N O S O OCl pIC50: 6.5214 $PRED: 6.2442 $RES: 0.2772 LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING REFERENCES [1] A.M. Helguera, A. Prez-Garrido, A. Gaspar, J. Reis, F. Cagide, D. Vina, M. Cordeiro, F. Borges, Combining QSAR classification models for predictive modeling of human monoamine oxidase inhibitors, Eur J Med Chem. 2013 ;59:75-90. http://dx.doi.org/10.1016/j.ejmech.2012.10.035 [2] P.P. Roy, J.T. Leonard, K. Roy, Exploring the impact of size of training sets for the development of predictive QSAR models. Chemometrics and Intelligent Laboratory Systems 90 2008 (1): 31-42. [3] R. Todeschini, V. Consonni. ”Molecular Descriptors for Chemoinformatics” (2 volumes), 2009 Wiley-VCH. http://dx.doi.org/10.1002/9783527628766 [4] T. Scior, J.L. Medina-Franco, QT. Do, K. Martnez- Mayorga, J.A. Yunes-Rojas, P. Bernard. ”How to recog- nize and workaround pitfalls in QSAR studies: a critical review”. Curr Med Chem. 2009; 16 (32):4297-313. [5] B.K. Shoichet, I.D. Kuntz, D.L. Bodian. ”Molecular docking using shape descriptors”. Journal of Computa- tional Chemistry 13; 2004 (3): 380-397 [6] R.J. Morris, J. Najmanovich, A. Kahraman, J.M. Thorn- ton. ”Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and lig- and comparisons”. Bioinformatics 21; 2005 (10): 2347- 55. [7] B.B. Goldman, W.T. Wipke. ”QSD quadratic shape de- scriptors. Molecular docking using quadratic shape de- scriptors (QSDock)”. Proteins 38; 2000 (1): 79-94. [8] P. Wang, X.H. Wu, W.X. Chen, B.E. Shan, Q. Guo. ”Expression of lysophosphatidic acid receptor in human ovarian cancer cell lines 3AO, SKOV3, OVCAR3 and its significance” Di Yi Jun Yi Da Xue Xue Bao. 2005 25(11):1422-4, 1431. [9] H. Ueda, H. Matsunaga, O.I. Omotuyi, J. Nagai, ”Lysophosphatidic acid: chemical signature of neuro- pathic pain”. Biochim Biophys Acta. 2013; 1831(1):61- 73.http://dx.doi.org/10.1016/j.bbalip.2012.08.014 [10] Molecular Operating Environment (MOE), 2012.10; Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite 910, Montreal, QC, Canada, H3A 2R7, 2012. [11] M.J. Wichura, ”The coordinate-free approach to linear models”. Cambridge Series in Statistical and Proba- bilistic Mathematics. Cambridge: Cambridge University Press. pp. xiv+199. ISBN 978-0-521-86842-6. 2006. MR 2283455 [12] D. Wackerly, W. Scheaffer. ”Mathematical Statistics with Applications” (7 ed.). Belmont, CA, USA: Thomson Higher Education. ISBN 0-49538508-5. 2008 Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 7 of 7 http://dx.doi.org/10.1016/j.ejmech.2012.10.035 http://dx.doi.org/10.1002/9783527628766 http://dx.doi.org/10.1016/j.bbalip.2012.08.014 http://dx.doi.org/10.11145/j.biomath.2014.10.061 Introduction Statistical basis of QSAR modeling using partial least square method Descriptor contingency analysis Principal component analysis of equation Conclusion References