Original article Biomath 3 (2014), 1410061, 1–7

B f

Volume ░, Number ░, 20░░ 

BIOMATH

 ISSN 1314-684X

Editor–in–Chief: Roumen Anguelov  

B f

BIOMATH
h t t p : / / w w w . b i o m a t h f o r u m . o r g / b i o m a t h / i n d e x . p h p / b i o m a t h / Biomath Forum

Descriptor-based Fitting of Lysophosphatidic
Acid Receptor 3 Antagonists into a Single

Predictive Mathematical Model
Olaposi Idowu Omotuyi, Hiroshi Ueda

Department of Pharmacology and Therapeutic Innovation
University Graduate School of Biomedical Sciences, 852-8521

Nagasaki, Japan
Email: bbis11r104@cc.nagasaki-u.ac.jp

Received: 6 June 2013, accepted: 6 October 2014, published: 16 October 2014

Abstract—Sixty six diverse compounds previously
reported as Lysophosphatidic Acid Receptor (LPA3)
inhibitors have been used to derive a mathematical
model based on partial least square (PLS) clustering
of 41 molecular descriptors and pIC50 values. The
pre- and post- cross-validated correlation coeffi-
cient (R2) is 0.94462 (RMSE=0.21390) and 0.74745
(RMSE=0.49055) respectively. Bivariate contingency
analysis tools implemented in MOE was used to
prune the descriptors and refit the equations at a
descriptor-pIC50 correlation coefficient of 0.8 cut-
off. A new equation was derived with R2 and
RMSE values estimated at 0.88074 and 0.31388
respectively. Both equations correctly predicted the
95% of the pIC50 values of the test dataset. Prin-
cipal component analysis (PCA) was also used to
reduce the dimension and linearly transform the raw
data; 8 principal components sufficiently account for
more than 98% of the variance of the dataset. The
numerical model derived here may be adapted for
screening chemical database for LPA3 antagonism.

Keywords-upscaling; LPA3; LPA3 antagonists;
Mathematical Model; PCA; Molecular descriptors

I. INTRODUCTION

Quantitative structure activity relationship
(QSAR) allows statistical analysis of experimental
data and building of predictive mathematical
models from the dataset. The numerical models
built using this approach has been successfully
implemented in screening of large database of
chemical compounds for hit-compound detection
[1]. In the presence of experimental dataset
[2], the success of QSAR depends on two
key factors: array of descriptors that optimally
represent the structural parameters required for
molecular interaction or reactions [3] and an
appropriate statistical learning and validation
algorithms [4]. In practice, physical properties
descriptors (1D-descriptor), pharmacophore
descriptors (2D-descriptors) and geometrical
descriptors (3D-descriptors, often requires prior
knowledge of target protein binding-pocket) are
the most commonly used descriptor types for
QSAR modeling [5,6,7]. We seek to answer
a single question here, what combination of

Citation: O. Omotuyi, H. Ueda, Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3
Antagonists into a Single Predictive Mathematical Model, Biomath 3 (2014), 1410061,
http://dx.doi.org/10.11145/j.biomath.2014.10.061

Page 1 of 7

http://www.biomathforum.org/biomath/index.php/biomath
http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

molecular predictors would numerically and
accurately predict the experimental antagonist
activities of LPA3 inhibitors? When answered,
the mathematical relationship derived from the
descriptors will enable screening of chemical
databases for compounds exhibiting LPA3
antagonism required for the treatment of diseased
conditions such as ovarian cancer [8] and
neuropathic pain [9] with LPA3 etiology.

II. STATISTICAL BASIS OF QSAR MODELING
USING PARTIAL LEAST SQUARE METHOD

The QSAR/PLS modeling equations and algo-
rithms have been well described in MOE docu-
mentations [10]. Given m molecules of a training
dataset, suppose that each of the molecules is
described by an n-vector of descriptors xi =
(xi1, ..., xin), for one of the molecules denoted as
i. Let yi be a representation of the experimental
result (pIC50) for a molecule i. A linear model
for y (the experimental result) is given by Eq. (1)
[11].

y = a0 + a
T X , (1)

where a0 is a scalar, and aT is a n-vector. If
each molecule has an importance weight (non-
negative) w representing the relative probability
that the associated molecule will be encountered,
and that the sum of all the weights are designated
as W . The mean square error is given as Eq. (2)
[12].

MSEa0,a =
1

w

m∑
i=1

[yi −(y = a0 +aT Xi)]2 . (2)

Differentiating MSE with respect to the pa-
rameters satisfying the normal Eqs (3,4,5,6 &7)
solvable by matrix diagonalization:

a0 = y0 − aT Xi , (3)

y0 =
1

w

m∑
i=1

[wiyi] , (4)

x0 =
1

w

m∑
i=1

[wixi] , (5)

Sa = b =
1

W

m∑
i=1

[wiyi(xi − x0)] , (6)

S =
1

w

m∑
i=1

[wi(xi − x0)(xi − x0)T ] . (7)

Starting from the normal equations above,
an estimate of a can be computed if columns
of the weight matrix (GA) (Eq. (8)) is ob-
tained through Gram-Schmidt orthogonalization
[13] of the vectors generated by Krylov sequence
b, Sb, S2b, ..., SA−1b [14]. The Ath PLS coeffi-
cient vector is then estimated using Eq. (9).

GA = (gi, g2, . . . , gA) . (8)

a = GA(G
T
ASGA)

−1GTAb . (9)

Noting that gi is the column vectors of length
n and A is the degree of the PLS fit; an integer
less than or equals n. MOE [10] descriptor
calculator was used to generate the numerical
representations (a aro, ASA, ASA H, a hyd,
SlogP, SlogP VSA0, SlogP VSA1, SlogP VSA2,
SlogP VSA3, SlogP VSA4, SlogP VSA5,
SlogP VSA6, SlogP VSA7, SlogP VSA8,
SlogP VSA9, SMR VSA0, SMR VSA1,
SMR VSA2, SMR VSA3, SMR VSA4,
SMR VSA5, SMR VSA6, SMR VSA7,
a acc, Kier1, Kier2, Kier3, KierA1, KierA2,
KierA3, KierFlex, chi0, chi0v, chi0v C, chi0 C,
chi1, chi1v, chi1v C, chi1 C, chiral, chiral u)
of the 66 (Supplementary fig. 1) randomly
selected LPA3 antagonists retrieved from the
European Institute of Bioinformatics dataset
(https://www.ebi.ac.uk/chembl/) representing
our training dataset (CHEMBL3250). Using
the PLS method as described above, Eq.
(10) was generated relating the descriptors
to the pIC50 with a correlation coefficient
(R2) 0.94462 (RMSE = 0.21390) (Fig. 1, blue

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 2 of 7

http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

circles and line); when cross validated, R2 was
estimated as 0.74745 (RMSE = 0.49055).

Fig. 1: Scatter plot of the experimental pIC50 vs.
pIC50-predictions of Eq. (10) (blue) and Eq. (12)
(green).

pIC50 =

3.57363 − 0.25353 · a aro − 0.00361 · ASA
+ 0.23510 · a hyd + 0.05890 · SlogP
− 0.02287 · SlogP V SA0
+0.00032 · SlogP V SA1 + 0.03125 · SlogP V SA2
−0.02059 · SlogP V SA3 + 0.02954 · SlogP V SA4
+0.07226 · SlogP V SA5 + 0.02879 · SlogP V SA6
+0.04687 · SlogP V SA7 + 0.03836 · SlogP V SA8
+0.06880 · SlogP V SA9 + 0.04912 · SMR V SA0
+0.02536 · SMR V SA1 + 0.08743 · SMR V SA2
+0.00289 · SMR V SA3 − 0.01524 · SMR V SA4
+0.04694 · SMR V SA5 + 0.09067 · SMR V SA6
− 0.01442 · SMR V SA7 + 0.18393 · a acc
− 0.77650 · Kier1 − 0.43968 · Kier2
− 0.30735 · Kier3 − 0.43752 · KierA1
− 0.03578 · KierA2 + 0.76916 · KierA3
− 0.09573 · KierFlex + 0.00332 · chi0
+ 0.55223 · chi0v + 0.13554 · chi0v C
− 0.16530 · chi0 C + 0.59498 · chi1
+ 0.05911 · chi1v − 0.93262 · chi1v C
− 1.22808 · chi1 C − 0.16986 · chiral
− 0.56204 · chiral u. (10)

Fig. 2: Bar chart representations of the residual (Ex-
perimental pIC50-Predicted pIC50 values of the test
dataset. Only 1 out of tested compounds (compound 23,
see supplementary Fig. 2 for structural details) showed
> 1.0 pIC50 unit (indication of wrong prediction).

Noting that root mean square error (RMSE) is
the square root of MSE function (Eq. (2)) at a
given parameter value and the correlation coef-
ficient (R2) is 1-MSE/YVAR with values raging
between 0 and 1 (0= no fit, 1 is perfect fit and
YVAR is the sample variance of the yi values). The
predictive suitability of our equation was tested
on 23 compounds (Supplementary Fig. 2) with
experimentally determined IC50 for LPA3 antag-
onism. If we assume that residual value above 1.0
pIC50 unit represents poor fitting. Our data (Fig.
3) suggest that Eq. (10) accurately predicted 22 of
the 23 test compounds.

III. DESCRIPTOR CONTINGENCY ANALYSIS

To determine the level of significance of each of
the descriptors to the overall equation and we per-
formed contingency analysis. The data presented
here provides a window of decision on whether
pruning of the descriptor set is required. In MOE
[10], QSAR-contingency tool performs a bivariate
contingency analysis for each descriptor and the
experimental activity value and produces a table of
correlation coefficients (Eq. (11)) for each descrip-
tor given that X represents a randomly selected
molecular descriptor and Y is a randomly selected
activity value for a randomly selected sample m,
V ar(X) and V ar(Y ), then the covariance of

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 3 of 7

http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

the random variables X and Y is defined to be
Cov(X, Y ) = E(XY ) − E(X)E(Y ) [10, 15].

R2 =
[E(XY ) − E(X)E(Y )]2

V ar(X)V ar(Y )
. (11)

Given that the values of R2 ranges from 0 to 1,
and 1 represents a perfectly linear correlation, we
therefore proposed that only descriptors R2 values
≥ 0.8 are useful and that the descriptors outside
this range can be pruned. Our data suggest that 31
out of the original 41 descriptors have R2 values
≥ 0.8 (Fig. 3, Supplementary Table 1). With the
exclusion of the descriptors with unsatisfactory
coefficient, QSAR is re-calculated using the resid-
ual set of descriptors. New numerical relationship
was generated (Eq. (12)) with R2 (0.88074) and
RMSE values (0.31388). The scatter plot of the
predicted pIC50 and the experimental values for
the new Eq. (12) is given in Fig. 1 (green circles
and line).

lpIC50 =

2.23199 − 0.00516xASA − 0.00516xASA H
− 0.48596xa hyd − 0.33917xSlogP
−0.05298xSlogPV SA0 − 0.03967xSlogPV SA1
−0.02243xSlogPV SA2 + 0.01681xSlogPV SA7
+ 0.02107xSlogPV SA9

−0.00757xSMRV SA0 − 0.00087xSMRV SA1
− 0.00089xSMRV SA3
−0.01173xSMRV SA4 + 0.00955xSMRV SA5
− 0.01412xSMRV SA6
− 0.02508xSMRV SA7 − 0.26771xKier1
+ 0.15306xKier20.56650xKier3

− 0.30504xKierA2 + 0.98837xKierA3
− 0.28849xKierFlex + 0.48535xchi0
+ 0.90693xchi0v + 0.10234xchi0vC

+ 0.24407xchi0C + 0.66154xchi1

+ 0.36006xchi1v − 1.03589xchi1vC
− 0.62474xchi1C − 0.36725xaaro . (12)

When this equation was used for predicting the

Fig. 3: Bar chart representations of Descriptor-
experimental pIC50 correlation coefficient. Only 31 out
of 41 descriptors lie above 0.8 coefficient cutoff.

Fig. 4: The 3D plot of the first three principal compo-
nents. Each point represents a compound in the training
dataset and each colour represents a distinct cluster of
pIC50 values.

pIC50 values of the test set, only one compound
lies above the 1.0 pIC50 unit cutoff (data not
shown). Thus, Eq. (12) is less bulky and as ac-
curate as Eq. (10) in predicting LPA3 antagonism.

IV. PRINCIPAL COMPONENT ANALYSIS OF
EQUATION

We sought to further study the dataset descrip-
tors along the principle components through the
reduction of the dimensionality and linear trans-
formation of the raw data [13]. Given the initial
66 training dataset compounds (represented as m)
and for one of the compounds say i its descriptors
are represented by n-vector of real numbers xi =

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 4 of 7

http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

(xi1, ..., xin), where n = 1 − 31, new Eq. (12).
Assuming that each molecule i has an associated
importance weight wi, (non-negative, real number)
and that the weights is relative probability that
the associated molecule xi will be encountered
(adding up to 1); If W denotes the sum of all
the weights then, the eigenvalues and eigenvectors
for the final data are estimable from the raw data
using Eq. (1). If S is a symmetric, semi-definite
sample covariance matrix, S can be diagonalized
such that S = QT DDQ (Q is orthogonal, D is
diagonal-sorted in descending order from top left
to bottom right) [13, 14].

E(x) ≈ x = x0 =
1

w

m∑
i=1

[wixi] (13)

Cov(x) ≈ S =
1

w

m∑
i=1

[wixix
T
i − xx

T ]. (14)

The effect of the each of the principal com-
ponents (eigenvectors) on the condition and the
variance shows that nine (8) principal components
sufficiently accounts for more than 98% of the
variance in the dataset [15]. The 3D-scatter plot of
the first three principal components (PCA1, PCA2
and PCA3) with respect to pIC50 values is shown
in Fig. (4); each point in the plot corresponds to
a dataset molecule colored according to clustered
pIC50 values.

V. CONCLUSION

Given the good mathematical correlation be-
tween the set of descriptors and LPA3 antagonism,
it is not unusual to propose that the equation is
prejudiced for those set of compounds with highly
related descriptor properties and therefore may
not be a universal formula for LPA3 antagonist
screening. That said, it will however capture the
compounds with structural properties found within
the dataset accurately and therefore may be piped
as into ligand-based screening protocol for more
successful hit-compound identification.

ACKNOWLEDGMENT

This work was supported by Platform for Drug
Discovery, Informatics, and Structural Life Sci-
ence from the Ministry of Education, Culture,
Sports, Science and Technology, Japan.

APPENDIX

Supplementary	
  Table	
  1.0	
  Showing	
  Correlation	
  coefficient	
  of	
  each	
  Descriptor

S/N Desciptors Corr. Coefficient
1 SlogP_VSA6 0.57623
2 chiral_u 0.65734
3 SlogP_VSA4 0.66609
4 SlogP_VSA5 0.6996
5 chiral 0.72218
6 SMR_VSA2 0.78566
7 SlogP_VSA8 0.78621
8 a_acc 0.78922
9 SlogP_VSA3 0.79094
10 KierA1 0.79264
11 a_aro 0.80122
12 SlogP_VSA9 0.80481
13 SlogP_VSA1 0.80575
14 chi0_C 0.806
15 chi1v 0.80603
16 KierFlex 0.80836
17 chi1v_C 0.81041
18 KierA3 0.81376
19 SlogP_VSA2 0.81493
20 SMR_VSA7 0.81623
21 ASA 0.81908
22 ASA_H 0.81908
23 chi0v 0.82223
24 chi0v_C 0.82394
25 SMR_VSA4 0.82512
26 chi1_C 0.82535
27 KierA2 0.82725
28 chi0 0.82827
29 SlogP_VSA7 0.82933
30 SMR_VSA5 0.82941
31 Kier2 0.83257
32 SlogP_VSA0 0.83519
33 Kier1 0.83644
34 SMR_VSA1 0.83839
35 chi1 0.84721
36 SMR_VSA6 0.84762
37 SMR_VSA3 0.8525
38 Kier3 0.85924
39 SlogP 0.86886
40 SMR_VSA0 0.87545
41 a_hyd 0.88264

Scatter plot of the experimental pIC50 vs. pIC50-
predictions of Eq.(10) (blue) and Eq. (12).

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 5 of 7

http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

1

O

N

O
P

O

O
O

O

2

O

N

O
P

O

O
O

O

3

O

O

N

N N

F

Cl

4

O

N

O
P

O

O
O

O

5

O
P

S

O

O

6

O

O

O

P S

O

O

O

O

7

O

O

N

N

N

O

O

8

O

N

O

P

O

O

O

O

9

O
P

O

O

O

10

O
O

N

N

N
O

O

11

O
P

S

O
O

12

O

O

N

O

O

O

13

O

N

O
P

O
O

O

O

14

P

O
O

O

15

O

O

O

N

OO

N

O

O

O

16

O

O

O

N

O

O

N

O

O

+

-

17

O

N

O

P
O

O

O

O

O

18

O

N

O
P

O
O

O

O

19

O
P

S
O

O

20

O

O

O
P

O

O

O
P

O

O

O

OO

21

O
P

S

O
O

22

O

O

O

P SO

O

O

O

23

O

O N

NN

N
O O

F

F
F

1

O

O

O

NN

O

O

O

O O

pIC50: 7.5229
$PRED: 7.5301

$RES: -0.0072

2

O

O

O

NN

O

O

O

O O

pIC50: 7.5229
$PRED: 7.5301

$RES: -0.0072

3

O
P

O

O

O

pIC50: 6.4157
$PRED: 6.4861

$RES: -0.0705

4

O

O

N

N
N

N

O

O

pIC50: 4.7305
$PRED: 4.7125

$RES: 0.0179

5

O

O

O

P OO

O

O

O

pIC50: 6.6840
$PRED: 6.8724

$RES: -0.1883

6

N

OO
P

S

O
O

NO

pIC50: 6.5200
$PRED: 6.4167

$RES: 0.1033

7

O

N

O

P
O

O

O

O

O

pIC50: 6.1925
$PRED: 6.1022

$RES: 0.0902

8

O

ON

NN

N

O O

F

F
F

pIC50: 5.1637
$PRED: 5.0749

$RES: 0.0888

9

O

O

O

N

OO

N

O

O

O

pIC50: 5.9101
$PRED: 5.9231

$RES: -0.0131

10

O

O

O

N

OO

N

O

O

O

pIC50: 5.9101
$PRED: 5.9231

$RES: -0.0131

11

O
P

O

O
O

pIC50: 6.4157
$PRED: 6.3266

$RES: 0.0890

12

O

N

O
P

O
O

O

O

pIC50: 5.0000
$PRED: 5.0393

$RES: -0.0393

13

O

O N

N

N
O O

F

pIC50: 5.2366
$PRED: 5.4671

$RES: -0.2305

14

O
P

O

O
O

pIC50: 6.3747
$PRED: 6.6310

$RES: -0.2563

15

N

OO
P

O

O
O

NO

pIC50: 6.0292
$PRED: 6.1727

$RES: -0.1435

16

O
O

N

N

N
O

O

pIC50: 4.5229
$PRED: 4.5776

$RES: -0.0547

17

O

O
P

S

O O

pIC50: 5.6308
$PRED: 5.6943

$RES: -0.0635

18

N

OO
P

S

O
O

NO

pIC50: 6.6003
$PRED: 6.4167

$RES: 0.1836

19

O

N

O
P

O

O
O

O

pIC50: 5.1649
$PRED: 5.1271

$RES: 0.0379

20

O

N

O
P

O

O
O

O

pIC50: 5.1904
$PRED: 5.5739

$RES: -0.3834

                     LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING

21

O

O

N

N
N

N

O

N

S
O

O

pIC50: 5.6364
$PRED: 5.7584

$RES: -0.1220

22

O

O

N

N N Cl

pIC50: 4.5229
$PRED: 4.4398

$RES: 0.0830

23

O

O

N

N N

pIC50: 4.5229
$PRED: 4.4673

$RES: 0.0556

24

O
P

S

O
O

pIC50: 6.9136
$PRED: 6.9600

$RES: -0.0463

25

O

O

O

P O

O

O

O

O

pIC50: 6.8447
$PRED: 6.6932

$RES: 0.1515

26

O

N

O
P

O

O
O

O

N

pIC50: 6.0269
$PRED: 6.4243

$RES: -0.3974

27

O

O
P

S

O O

pIC50: 5.8962
$PRED: 5.8462

$RES: 0.0500

28

O

O

N

N
N

N

O

O

pIC50: 4.6588
$PRED: 4.2947

$RES: 0.3641

29

O

N

O

P OO

O

O

N

pIC50: 5.0334
$PRED: 5.2622

$RES: -0.2288

30

O

N

O

P

O

O

O

O

pIC50: 5.1931
$PRED: 5.3161

$RES: -0.1230

31

O
P

S
O

O

pIC50: 7.5528
$PRED: 7.3033

$RES: 0.2495

32

O
P

S

O

O

pIC50: 6.4685
$PRED: 6.8160

$RES: -0.3474

33

O

O

F

P
O

O O

pIC50: 5.1124
$PRED: 4.9684

$RES: 0.1439

34

O

O

N

N

N

O

O

F

pIC50: 5.0830
$PRED: 4.9372

$RES: 0.1458

35

O
P

O

O

O

pIC50: 6.1135
$PRED: 6.1509

$RES: -0.0374

36

O

N

O

P

O

O

O

O

pIC50: 5.1593
$PRED: 5.3161

$RES: -0.1568

37

O N

O
P

O

O
O

O

pIC50: 5.0000
$PRED: 4.7783

$RES: 0.2217

38

O

O

O

N

O

O

N

O

O

+

-

pIC50: 6.1238
$PRED: 6.1151

$RES: 0.0087

39

O
P

S

O
O

pIC50: 7.5686
$PRED: 7.1552

$RES: 0.4134

40

O

O N

N

N
O O

F

F
F

pIC50: 6.3206
$PRED: 6.3223

$RES: -0.0018

                     LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING

41
N

O

N

SN

O

O

NO

O

O

N

O

O

O

O

+
-

+

-

pIC50: 7.6198
$PRED: 7.6400

$RES: -0.0202

42
N

O

N

SN

O

O

NO

O

O

N

O

O

O

O

+
-

+

-

pIC50: 7.6198
$PRED: 7.6400

$RES: -0.0202

43

O
P

O

O

O

pIC50: 6.0809
$PRED: 6.3361

$RES: -0.2551

44

O

O N

N

N
O O

pIC50: 5.0788
$PRED: 4.8447

$RES: 0.2341

45

O

O

O

P OO

O

O

O

pIC50: 7.0706
$PRED: 6.8724

$RES: 0.1982

46

O
P

S

O

O

pIC50: 7.5528
$PRED: 7.0811

$RES: 0.4717

47

P

O
O

O

pIC50: 6.1844
$PRED: 6.2144

$RES: -0.0299

48

O
P

O
O

O

pIC50: 6.9872
$PRED: 6.7202

$RES: 0.2669

49

O

N

O
P

O

O
O

O

O

pIC50: 5.1415
$PRED: 5.0991

$RES: 0.0424

50

O

O
P

O

O

O

O

pIC50: 6.8447
$PRED: 6.6623

$RES: 0.1824

51

O
P

O
O

O

pIC50: 7.0177
$PRED: 6.7353

$RES: 0.2824

52

O

O

N

N

N

O

O

pIC50: 4.5229
$PRED: 4.7671

$RES: -0.2443

53

O

O

N

O

O

O

pIC50: 5.5240
$PRED: 5.5696

$RES: -0.0456

54

O
P

S

O
O

pIC50: 6.7905
$PRED: 7.2006

$RES: -0.4101

55

O

N

O

P OO

O

O

pIC50: 5.6364
$PRED: 5.6100

$RES: 0.0264

56

O

N

O

P OO

O

O

N

pIC50: 5.5800
$PRED: 5.2622

$RES: 0.3179

57

N

OO
P

O

O
O

NO

pIC50: 6.3830
$PRED: 6.1727

$RES: 0.2103

58

O

ON

N N

N
ON

SO O

F

F

F

pIC50: 7.1871
$PRED: 7.1826

$RES: 0.0045

59

O

O

N

N N

F

Cl

pIC50: 4.5229
$PRED: 5.0926

$RES: -0.5698

60

O

O

O

P S

O

O

O

O

pIC50: 6.7352
$PRED: 6.9852

$RES: -0.2500

                     LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 6 of 7

http://dx.doi.org/10.11145/j.biomath.2014.10.061


O. Omotuyi et al., Descriptor-based Fitting of Lysophosphatidic Acid Receptor 3 ...

61

O

N

O
P

O
O

O

O

pIC50: 5.9259
$PRED: 5.7730

$RES: 0.1529

62

O

O
P

S

O

O

O

pIC50: 6.7352
$PRED: 7.0541

$RES: -0.3189

63

O

N

O
P

O

O
O

OO

pIC50: 5.2541
$PRED: 5.5111

$RES: -0.2570

64

O

N

O
P

O

O
O

O

N

pIC50: 6.7570
$PRED: 6.4243

$RES: 0.3327

65

P

O

O
O

pIC50: 5.9208
$PRED: 6.0663

$RES: -0.1455

66

O

O N

N O

S

O

OCl

pIC50: 6.5214
$PRED: 6.2442

$RES: 0.2772

LPA3 INIHIBITORS: TRAINING SET FOR QSAR MODELING

REFERENCES

[1] A.M. Helguera, A. Prez-Garrido, A. Gaspar, J. Reis, F.
Cagide, D. Vina, M. Cordeiro, F. Borges, Combining
QSAR classification models for predictive modeling of
human monoamine oxidase inhibitors, Eur J Med Chem.
2013 ;59:75-90.
http://dx.doi.org/10.1016/j.ejmech.2012.10.035

[2] P.P. Roy, J.T. Leonard, K. Roy, Exploring the impact of
size of training sets for the development of predictive
QSAR models. Chemometrics and Intelligent Laboratory
Systems 90 2008 (1): 31-42.

[3] R. Todeschini, V. Consonni. ”Molecular Descriptors for
Chemoinformatics” (2 volumes), 2009 Wiley-VCH.
http://dx.doi.org/10.1002/9783527628766

[4] T. Scior, J.L. Medina-Franco, QT. Do, K. Martnez-
Mayorga, J.A. Yunes-Rojas, P. Bernard. ”How to recog-
nize and workaround pitfalls in QSAR studies: a critical
review”. Curr Med Chem. 2009; 16 (32):4297-313.

[5] B.K. Shoichet, I.D. Kuntz, D.L. Bodian. ”Molecular
docking using shape descriptors”. Journal of Computa-
tional Chemistry 13; 2004 (3): 380-397

[6] R.J. Morris, J. Najmanovich, A. Kahraman, J.M. Thorn-
ton. ”Real spherical harmonic expansion coefficients as
3D shape descriptors for protein binding pocket and lig-
and comparisons”. Bioinformatics 21; 2005 (10): 2347-
55.

[7] B.B. Goldman, W.T. Wipke. ”QSD quadratic shape de-
scriptors. Molecular docking using quadratic shape de-
scriptors (QSDock)”. Proteins 38; 2000 (1): 79-94.

[8] P. Wang, X.H. Wu, W.X. Chen, B.E. Shan, Q. Guo.
”Expression of lysophosphatidic acid receptor in human
ovarian cancer cell lines 3AO, SKOV3, OVCAR3 and
its significance” Di Yi Jun Yi Da Xue Xue Bao. 2005
25(11):1422-4, 1431.

[9] H. Ueda, H. Matsunaga, O.I. Omotuyi, J. Nagai,
”Lysophosphatidic acid: chemical signature of neuro-
pathic pain”. Biochim Biophys Acta. 2013; 1831(1):61-
73.http://dx.doi.org/10.1016/j.bbalip.2012.08.014

[10] Molecular Operating Environment (MOE), 2012.10;
Chemical Computing Group Inc., 1010 Sherbooke St.
West, Suite 910, Montreal, QC, Canada, H3A 2R7, 2012.

[11] M.J. Wichura, ”The coordinate-free approach to linear
models”. Cambridge Series in Statistical and Proba-
bilistic Mathematics. Cambridge: Cambridge University
Press. pp. xiv+199. ISBN 978-0-521-86842-6. 2006. MR
2283455

[12] D. Wackerly, W. Scheaffer. ”Mathematical Statistics
with Applications” (7 ed.). Belmont, CA, USA: Thomson
Higher Education. ISBN 0-49538508-5. 2008

Biomath 3 (2014), 1410061, http://dx.doi.org/10.11145/j.biomath.2014.10.061 Page 7 of 7

http://dx.doi.org/10.1016/j.ejmech.2012.10.035
http://dx.doi.org/10.1002/9783527628766
http://dx.doi.org/10.1016/j.bbalip.2012.08.014
http://dx.doi.org/10.11145/j.biomath.2014.10.061

	Introduction
	Statistical basis of QSAR modeling using partial least square method
	Descriptor contingency analysis
	Principal component analysis of equation
	Conclusion
	References