CUBO A Mathematical Journal
Vol.20, No¯ 3, (01–11). October 2018

http: // dx. doi. org/ 10. 4067/ S0719-06462018000300001

Quantitative Approximation by a Kantorovich-Shilkret
quasi-interpolation neural network operator

George A. Anastassiou

Department of Mathematical Sciences,

University of Memphis,

Memphis, TN 38152, U.S.A.

ganastss@memphis.edu

ABSTRACT

In this article we present multivariate basic approximation by a Kantorovich-Shilkret

type quasi-interpolation neural network operator with respect to supremum norm. This

is done with rates using the multivariate modulus of continuity. We approximate con-

tinuous and bounded functions on RN, N ∈ N. When they are additionally uniformly
continuous we derive pointwise and uniform convergences.

RESUMEN

En este art́ıculo presentamos un resultado de aproximación básico multivariado a través

de un operador de cuasi-interpolación en red neuronal de tipo Kantorovich-Shilkret

con respecto a la norma del supremo. Esto se realiza con tasas usando el módulo de

continuidad multivariado. Aproximamos funciones continuas y acotadas en RN, N ∈ N.
Cuando ellas son adicionalmente uniformemente continuas, derivamos convergencias

puntuales y uniformes.

Keywords and Phrases: error function based activation function, multivariate quasi-interpolation

neural network approximation, Kantorovich-Shilkret type operator.

2010 AMS Mathematics Subject Classification: 41A17, 41A25, 41A30, 41A35.

http://dx.doi.org/10.4067/S0719-06462018000300001


2 George A. Anastassiou CUBO
20, 3 (2018)

1 Introduction

The author here performs multivariate error function based neural network approximation to con-

tinuous functions over RN, N ∈ N, and then he extends his results to complex valued functions.
The convergences here are with rates expressed via the multivariate modulus of continuity of the

involved function and give by very tight Jackson type inequalities.

The author comes up with the ”right” precisely defined flexible quasi-interpolation Baskakov-

Shilkret type integral coefficient neural network operator associated to the error function.

Feed-forward neural network (FNNs) with one hidden layer with deal with, are expressed

mathematicaly as

Nn (x) =

n∑

j=0

cjσ(〈aj · x〉 + bj) , x ∈ Rs, s ∈ N,

where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are
the coefficients, 〈aj · x〉 is the inner product of aj and x, and σ is the activation function of the
network. In many fundamental neural network models the activation function is error function

generated.

About neural networks in general you may read [4], [5], [6]. In recent years non-additive

integrals, like the N. Shilkret one [7], have become fashionable and more useful in Economic theory,

etc.

2 Background

Here we follow [7].

Let F be a σ-field of subsets of an arbitrary set Ω. An extended non-negative real valued
function µ on F is called maxitive if µ(∅) = 0 and

µ(∪i∈IEi) = sup
i∈I
µ(Ei) , (1)

where the set I is of cardinality at most countable. We also call µ a maxitive measure. Here f

stands for a non-negative measurable function on Ω. In [7], Niel Shilkret developed his non-additive

integral defined as follows:

(N∗)

∫

D

fdµ := sup
y∈Y

{y · µ(D ∩ {f ≥ y})} , (2)

where Y = [0,m] or Y = [0,m) with 0 < m ≤ ∞, and D ∈ F. Here we take Y = [0,∞).

It is easily proved that

(N∗)

∫

D

fdµ = sup
y>0

{y · µ(D ∩ {f > y})} . (3)


CUBO
20, 3 (2018)

Quantitative Approximation by a Kantorovich-Shilkret . . . 3

The Shilkret integral takes values in [0,∞].

The Shilkret integral ([7]) has the following properties:

(N∗)

∫

Ω

χEdµ = µ(E) , (4)

where χE is the indicator function on E ∈ F,

(N∗)
∫

D

cfdµ = c(N∗)

∫

D

fdµ, c ≥ 0, (5)

(N∗)

∫

D

sup
n∈N

fndµ = sup
n∈N

(N∗)

∫

D

fndµ, (6)

where fn, n ∈ N, is an increasing sequence of elementary (countably valued) functions converging
uniformly to f. Furthermore we have

(N∗)

∫

D

fdµ ≥ 0, (7)

f ≥ g implies (N∗)
∫

D

fdµ ≥ (N∗)
∫

D

gdµ, (8)

where f,g : Ω → [0,∞] are measurable.

Let a ≤ f(ω) ≤ b for almost every ω ∈ E, then

aµ(E) ≤ (N∗)
∫

E

fdµ ≤ bµ(E) ;

(N∗)

∫

E

1dµ = µ(E) ;

f > 0 almost everywhere and (N∗)
∫
E
fdµ = 0 imply µ(E) = 0;

(N∗)
∫
Ω
fdµ = 0 if and only f = 0 almost everywhere;

(N∗)
∫
Ω
fdµ < ∞ implies that

N(f) := {ω ∈ Ω|f(ω) 6= 0} has σ-finite measure; (9)

(N∗)

∫

D

(f + g)dµ ≤ (N∗)
∫

D

fdµ + (N∗)

∫

D

gdµ;

and
∣

∣

∣

∣

(N∗)

∫

D

fdµ − (N∗)

∫

D

gdµ

∣

∣

∣

∣

≤ (N∗)
∫

D

|f − g|dµ. (10)

From now on in this article we assume that µ : F → [0,+∞).


4 George A. Anastassiou CUBO
20, 3 (2018)

3 Main Results

We consider here the (Gauss) error special function ([1], [3])

erf(x) =
2√
π

∫x

0

e−t
2

dt, x ∈ R, (11)

which is a sigmoidal type function and a strictly increasing function.

It has the properties

erf(0) = 0, erf(−x) = erf(x) , erf(+∞) = 1, erf(−∞) = −1,

and

(erf(x))
′
=

2√
π
e−x

2

, x ∈ R,

∫

erf(x)dx = xerf(x) +
e−x

2

√
π

+ C,

where C is a constant.

The error function is related to the cumulative probability distribution function of the standard

normal distribution

Φ(x) =
1

2
+
1

2
erf

(

x√
2

)

.

We consider the activation function

χ(x) =
1

4
(erf(x + 1) − erf(x − 1)) , x ∈ R, (12)

and we notice that

χ(−x) = χ(x) , (13)

and even function.

Clearly χ(x) > 0, all x ∈ R.

We see that

χ(0) =
erf(1)

2
≃ 0.4215. (14)

Let x > 0, we have that

χ′ (x) < 0, for x > 0. (15)

That is χ is strictly decreasing on [0,∞) and is strictly increasing on (−∞,0], and χ′ (0) = 0.

Clearly the x-axis is the horizontal asymptote on χ.

Conclusion, χ is a bell symmetric function with maximum χ(0) ≃ 0.4215.

We further need


CUBO
20, 3 (2018)

Quantitative Approximation by a Kantorovich-Shilkret . . . 5

Theorem 3.1. ([2]) We have that

∞∑

i=−∞

χ(x − i) = 1, all x ∈ R. (16)

Theorem 3.2. ([2]) It holds ∫∞

−∞

χ(x)dx = 1. (17)

So χ(x) is a density function on R.

Theorem 3.3. ([2]) Let 0 < α < 1, and n ∈ N with n1−α ≥ 3.

It holds
∞∑






k = −∞

: |nx − k| ≥ n1−α

χ(nx − k) <
1

2
√
π(n1−α − 2)e(n

1−α−2)
2
. (18)

Remark 3.4. We introduce

Z(x1, ...,xN) := Z(x) :=

N∏

i=1

χ(xi) , (19)

x = (x1, ...,xN) ∈ RN, N ∈ N.

It has the properties:

(i)

Z(x) > 0, ∀ x ∈ RN, (20)

(ii)
∞∑

k=−∞

Z(x − k) :=

∞∑

k1=−∞

∞∑

k2=−∞

...

∞∑

kN=−∞

Z(x1 − k1, ...,xN − kN) = 1, (21)

where k := (k1, ...,kN) ∈ ZN, ∀ x ∈ RN,

hence

(iii)
∞∑

k=−∞

Z(nx − k) = 1, ∀ x ∈ RN, n ∈ N, (22)

and

(iv) ∫

RN

Z(x)dx = 1, (23)

that is Z is a multivariate density function.


6 George A. Anastassiou CUBO
20, 3 (2018)

Here ‖x‖∞ := max {|x1| , ..., |xN|}, x ∈ RN, also set ∞ := (∞, ...,∞), −∞ = (−∞, ...,−∞)
upon the multivariate context.

It is also clear that (see (18))

(v)
∞∑






k = −∞
∥

∥

k
n
− x
∥

∥

∞
> 1

nβ

Z(nx − k) ≤ 1
2
√
π(n1−β − 2)e(n

1−β−2)
2
, (24)

0 < β < 1, n ∈ N : n1−β ≥ 3, x ∈ RN.

For f ∈ C+
B

(

R
N
)

(continuous and bounded functions from RN into R+), we define the first

modulus of continuity

ω1 (f,h) := sup
x,y∈RN

‖x−y‖∞≤h

|f(x) − f(y)| , h > 0. (25)

Given that f ∈ C+
U

(

R
N
)

(uniformly continuous from RN into R+), we have that

lim
h→0

ω1 (f,h) = 0. (26)

We make

Definition 3.5. Let L be the Lebesgue σ-algebra on RN, N ∈ N, and the maxitive measure
µ : L → [0,+∞), such that for any A ∈ L with A 6= ∅, we get µ(A) > 0.

For f ∈ C+
B

(

R
N
)

, we define the multivariate Kantorovich-Shilkret type neural network operator

for any x ∈ RN :
Tµn (f,x) = T

µ
n (f,x1, ...,xN) :=

∞∑

k=−∞





(N∗)
∫

[0, 1N ]
N f
(

t + k
n

)

dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) =

∞∑

k1=−∞

∞∑

k2=−∞

...

∞∑

kN=−∞





(N∗)
∫ 1

n

0
...
∫ 1

n

0
f
(

t1 +
k1
n
,t2 +

k2
n
, ...,tN +

kN
n

)

dµ(t1, ...,tN)

µ
(

[

0, 1
n

]N
)



 (27)

·
(

N∏

i=1

Z(nxi − ki)

)

,

where x = (x1, ...,xN) ∈ RN, k = (k1, ...,kN), t = (t1, ...,tN), n ∈ N.

Clearly here µ
(

[

0, 1
n

]N
)

> 0, ∀ n ∈ N.

Above we notice that

‖Tµn (f)‖∞ ≤ ‖f‖∞ , (28)

so that T
µ
n (f,x) is well-defined.


CUBO
20, 3 (2018)

Quantitative Approximation by a Kantorovich-Shilkret . . . 7

Remark 3.6. Let t ∈
[

0, 1
n

]N
and x ∈ RN, then

f

(

t +
k

n

)

= f

(

t +
k

n

)

− f(x) + f(x) ≤
∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

+ f(x) , (29)

hence

(N∗)

∫

[0, 1n ]
N
f

(

t +
k

n

)

dµ(t) ≤

(N∗)

∫

[0, 1n ]
N

∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

dµ(t) + f(x)µ

(

[

0,
1

n

]N
)

. (30)

That is

(N∗)

∫

[0, 1n ]
N
f

(

t +
k

n

)

dµ(t) − f(x)µ

(

[

0,
1

n

]N
)

≤ (31)

(N∗)

∫

[0, 1n ]
N

∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

dµ(t) .

Similarly we have

f(x) = f(x) − f

(

t +
k

n

)

+ f

(

t +
k

n

)

≤
∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

+ f

(

t +
k

n

)

,

hence

(N∗)

∫

[0, 1n ]
N
f(x)dµ(t) ≤ (N∗)

∫

[0, 1n ]
N

∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

dµ(t)

+ (N∗)

∫

[0, 1n ]
N
f

(

t +
k

n

)

dµ(t) .

That is

f(x)µ

(

[

0,
1

n

]N
)

− (N∗)

∫

[0, 1n ]
N
f

(

t +
k

n

)

dµ(t) ≤ (32)

(N∗)

∫

[0, 1n ]
N

∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

dµ(t) .

By (31) and (32) we derive
∣

∣

∣

∣

∣

(N∗)

∫

[0, 1n ]
N
f

(

t +
k

n

)

dµ(t) − f(x)µ

(

[

0,
1

n

]N
)
∣

∣

∣

∣

∣

≤

(N∗)

∫

[0, 1n ]
N

∣

∣

∣

∣

f

(

t +
k

n

)

− f(x)

∣

∣

∣

∣

dµ(t) . (33)

In particular it holds
∣

∣

∣

∣

∣

∣

(N∗)
∫

[0, 1n ]
N f
(

t + k
n

)

dµ(t)

µ
(

[

0, 1
n

]N
) − f(x)

∣

∣

∣

∣

∣

∣

≤

(N∗)
∫

[0, 1n ]
N

∣

∣f
(

t + k
n

)

− f(x)
∣

∣dµ(t)

µ
(

[

0, 1
n

]N
) . (34)


8 George A. Anastassiou CUBO
20, 3 (2018)

We present

Theorem 3.7. Let f ∈ C+
B

(

R
N
)

, 0 < β < 1, x ∈ RN; N,n ∈ N with n1−β ≥ 3. Then

i)

sup
µ

|Tµn (f,x) − f(x)| ≤ ω1
(

f,
1

n
+
1

nβ

)

+
‖f‖∞√

π(n1−β − 2)e(n
1−β−2)

2
=: λn, (35)

ii)

sup
µ

‖Tµn (f) − f‖∞ ≤ λn. (36)

Given that f ∈
(

C+
U

(

R
N
)

∩ C+
B

(

R
N
))

, we obtain lim
n→∞

T
µ
n (f) = f, uniformly.

Proof. We observe that

|Tµn (f,x) − f(x)| =
∣

∣

∣

∣

∣

∣

∞∑

k=−∞





(N∗)
∫

[0, 1n ]
N f
(

t + k
n

)

dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) −

∞∑

k=−∞

f(x)Z(nx − k)

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∞∑

k=−∞









(N∗)
∫

[0, 1n ]
N f
(

t + k
n

)

dµ(t)

µ
(

[

0, 1
n

]N
)



 − f(x)



Z(nx − k)

∣

∣

∣

∣

∣

∣

≤ (37)

∞∑

k=−∞

∣

∣

∣

∣

∣

∣





(N∗)
∫

[0, 1n ]
N f
(

t + k
n

)

dµ(t)

µ
(

[

0, 1
n

]N
)



 − f(x)

∣

∣

∣

∣

∣

∣

Z(nx − k)
(34)

≤

∞∑

k=−∞





(N∗)
∫

[0, 1n ]
N

∣

∣f
(

t + k
n

)

− f(x)
∣

∣dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) = (38)

∞∑






k = −∞

:
∥

∥

k
n
− x
∥

∥

∞
≤ 1

nβ





(N∗)
∫

[0, 1n ]
N

∣

∣f
(

t + k
n

)

− f(x)
∣

∣dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) +

∞∑






k = −∞

:
∥

∥

k
n
− x
∥

∥

∞
> 1

nβ





(N∗)
∫

[0, 1n ]
N

∣

∣f
(

t + k
n

)

− f(x)
∣

∣dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) ≤

∞∑






k = −∞

:
∥

∥

k
n
− x
∥

∥

∞
≤ 1

nβ





(N∗)
∫

[0, 1n ]
N ω1

(

f,‖t‖∞ +
∥

∥

k
n
− x
∥

∥

∞

)

dµ(t)

µ
(

[

0, 1
n

]N
)



Z(nx − k) (39)


CUBO
20, 3 (2018)

Quantitative Approximation by a Kantorovich-Shilkret . . . 9

+2‖f‖∞





















∞∑






k = −∞

:
∥

∥

k
n
− x
∥

∥

∞
> 1

nβ

Z(nx − k)





















(24)

≤

ω1

(

f,
1

n
+
1

nβ

)

+
‖f‖∞√

π(n1−β − 2)e(n
1−β−2)

2
, (40)

proving the claim.

Additionally we give

Definition 3.8. Denote by C+
B

(

R
N,C

)

= {f : RN → C|f = f1 + if2, where f1,f2 ∈ C+B
(

R
N
)

,

N ∈ N}. We set for f ∈ C+
B

(

R
N,C

)

that

Tµn (f,x) := T
µ
n (f1,x) + iT

µ
n (f2,x) , (41)

∀ n ∈ N, x ∈ RN, i =
√
−1.

Theorem 3.9. Let f ∈ C+
B

(

R
N,C

)

, f = f1 + if2, N ∈ N, 0 < β < 1, x ∈ RN; n ∈ N with
n1−β ≥ 3. Then

i)

sup
µ

|Tµn (f,x) − f(x)| ≤
[

ω1

(

f1,
1

n
+
1

nβ

)

+ ω1

(

f2,
1

n
+
1

nβ

)]

+
(‖f1‖∞ + ‖f2‖∞)√

π(n1−β − 2)e(n
1−β−2)

2
=: ψn, (42)

and

ii)

sup
µ

‖Tµn (f) − f‖ ≤ ψn. (43)

Proof.

|Tµn (f,x) − f(x)| = |T
µ
n (f1,x) + iT

µ
n (f2,x) − f1 (x) − if2 (x)| =

|(Tµn (f1,x) − f1 (x)) + i(T
µ
n (f2,x) − f2 (x))| ≤

|Tµn (f1,x) − f1 (x)| + |T
µ
n (f2,x) − f2 (x)|

(35)

≤ (44)
(

ω1

(

f1,
1

n
+
1

nβ

)

+
‖f1‖∞√

π(n1−β − 2)e(n
1−β−2)

2

)

+

(

ω1

(

f2,
1

n
+
1

nβ

)

+
‖f2‖∞√

π(n1−β − 2)e(n
1−β−2)2

)

=


10 George A. Anastassiou CUBO
20, 3 (2018)

[

ω1

(

f1,
1

n
+
1

nβ

)

+ ω1

(

f2,
1

n
+
1

nβ

)]

+

(‖f1‖∞ + ‖f2‖∞)√
π(n1−β − 2)e(n

1−β−2)
2
, (45)

proving the claim.


CUBO
20, 3 (2018)

Quantitative Approximation by a Kantorovich-Shilkret . . . 11

References

[1] M. Abramowitz, I.A. Stegun, eds, Handbook of Mathematical Functions with Formulas, Graphs,

and Mathematical Tables, New York, Dover Publication, 1972.

[2] G.A. Anastassiou, Univariate error function based neural network approximation, Indian J. of

Math., Vol. 57, No. 2 (2015), 243-291.

[3] L.C. Andrews, Special Functions of Mathematics for Engineers, Second edition, Mc Graw-Hill,

New York, 1992.

[4] I.S. Haykin, Neural Networks: A Comprehensive Foundation (2 ed.), Prentice Hall, New York,

1998.

[5] W. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity,

Bulletin of Mathematical Biophysics, 7 (1943), 115-133.

[6] T.M. Mitchell, Machine Learning, WCB-McGraw-Hill, New York, 1997.

[7] Niel Shilkret, Maxitive measure and integration, Indagationes Mathematicae, 33 (1971), 109-

116.


	Introduction
	Background
	Main Results