CUBO A Mathematical Journal Vol.20, No¯ 3, (01–11). October 2018 http: // dx. doi. org/ 10. 4067/ S0719-06462018000300001 Quantitative Approximation by a Kantorovich-Shilkret quasi-interpolation neural network operator George A. Anastassiou Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, U.S.A. ganastss@memphis.edu ABSTRACT In this article we present multivariate basic approximation by a Kantorovich-Shilkret type quasi-interpolation neural network operator with respect to supremum norm. This is done with rates using the multivariate modulus of continuity. We approximate con- tinuous and bounded functions on RN, N ∈ N. When they are additionally uniformly continuous we derive pointwise and uniform convergences. RESUMEN En este art́ıculo presentamos un resultado de aproximación básico multivariado a través de un operador de cuasi-interpolación en red neuronal de tipo Kantorovich-Shilkret con respecto a la norma del supremo. Esto se realiza con tasas usando el módulo de continuidad multivariado. Aproximamos funciones continuas y acotadas en RN, N ∈ N. Cuando ellas son adicionalmente uniformemente continuas, derivamos convergencias puntuales y uniformes. Keywords and Phrases: error function based activation function, multivariate quasi-interpolation neural network approximation, Kantorovich-Shilkret type operator. 2010 AMS Mathematics Subject Classification: 41A17, 41A25, 41A30, 41A35. http://dx.doi.org/10.4067/S0719-06462018000300001 2 George A. Anastassiou CUBO 20, 3 (2018) 1 Introduction The author here performs multivariate error function based neural network approximation to con- tinuous functions over RN, N ∈ N, and then he extends his results to complex valued functions. The convergences here are with rates expressed via the multivariate modulus of continuity of the involved function and give by very tight Jackson type inequalities. The author comes up with the ”right” precisely defined flexible quasi-interpolation Baskakov- Shilkret type integral coefficient neural network operator associated to the error function. Feed-forward neural network (FNNs) with one hidden layer with deal with, are expressed mathematicaly as Nn (x) = n∑ j=0 cjσ(〈aj · x〉 + bj) , x ∈ Rs, s ∈ N, where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are the coefficients, 〈aj · x〉 is the inner product of aj and x, and σ is the activation function of the network. In many fundamental neural network models the activation function is error function generated. About neural networks in general you may read [4], [5], [6]. In recent years non-additive integrals, like the N. Shilkret one [7], have become fashionable and more useful in Economic theory, etc. 2 Background Here we follow [7]. Let F be a σ-field of subsets of an arbitrary set Ω. An extended non-negative real valued function µ on F is called maxitive if µ(∅) = 0 and µ(∪i∈IEi) = sup i∈I µ(Ei) , (1) where the set I is of cardinality at most countable. We also call µ a maxitive measure. Here f stands for a non-negative measurable function on Ω. In [7], Niel Shilkret developed his non-additive integral defined as follows: (N∗) ∫ D fdµ := sup y∈Y {y · µ(D ∩ {f ≥ y})} , (2) where Y = [0,m] or Y = [0,m) with 0 < m ≤ ∞, and D ∈ F. Here we take Y = [0,∞). It is easily proved that (N∗) ∫ D fdµ = sup y>0 {y · µ(D ∩ {f > y})} . (3) CUBO 20, 3 (2018) Quantitative Approximation by a Kantorovich-Shilkret . . . 3 The Shilkret integral takes values in [0,∞]. The Shilkret integral ([7]) has the following properties: (N∗) ∫ Ω χEdµ = µ(E) , (4) where χE is the indicator function on E ∈ F, (N∗) ∫ D cfdµ = c(N∗) ∫ D fdµ, c ≥ 0, (5) (N∗) ∫ D sup n∈N fndµ = sup n∈N (N∗) ∫ D fndµ, (6) where fn, n ∈ N, is an increasing sequence of elementary (countably valued) functions converging uniformly to f. Furthermore we have (N∗) ∫ D fdµ ≥ 0, (7) f ≥ g implies (N∗) ∫ D fdµ ≥ (N∗) ∫ D gdµ, (8) where f,g : Ω → [0,∞] are measurable. Let a ≤ f(ω) ≤ b for almost every ω ∈ E, then aµ(E) ≤ (N∗) ∫ E fdµ ≤ bµ(E) ; (N∗) ∫ E 1dµ = µ(E) ; f > 0 almost everywhere and (N∗) ∫ E fdµ = 0 imply µ(E) = 0; (N∗) ∫ Ω fdµ = 0 if and only f = 0 almost everywhere; (N∗) ∫ Ω fdµ < ∞ implies that N(f) := {ω ∈ Ω|f(ω) 6= 0} has σ-finite measure; (9) (N∗) ∫ D (f + g)dµ ≤ (N∗) ∫ D fdµ + (N∗) ∫ D gdµ; and ∣ ∣ ∣ ∣ (N∗) ∫ D fdµ − (N∗) ∫ D gdµ ∣ ∣ ∣ ∣ ≤ (N∗) ∫ D |f − g|dµ. (10) From now on in this article we assume that µ : F → [0,+∞). 4 George A. Anastassiou CUBO 20, 3 (2018) 3 Main Results We consider here the (Gauss) error special function ([1], [3]) erf(x) = 2√ π ∫x 0 e−t 2 dt, x ∈ R, (11) which is a sigmoidal type function and a strictly increasing function. It has the properties erf(0) = 0, erf(−x) = erf(x) , erf(+∞) = 1, erf(−∞) = −1, and (erf(x)) ′ = 2√ π e−x 2 , x ∈ R, ∫ erf(x)dx = xerf(x) + e−x 2 √ π + C, where C is a constant. The error function is related to the cumulative probability distribution function of the standard normal distribution Φ(x) = 1 2 + 1 2 erf ( x√ 2 ) . We consider the activation function χ(x) = 1 4 (erf(x + 1) − erf(x − 1)) , x ∈ R, (12) and we notice that χ(−x) = χ(x) , (13) and even function. Clearly χ(x) > 0, all x ∈ R. We see that χ(0) = erf(1) 2 ≃ 0.4215. (14) Let x > 0, we have that χ′ (x) < 0, for x > 0. (15) That is χ is strictly decreasing on [0,∞) and is strictly increasing on (−∞,0], and χ′ (0) = 0. Clearly the x-axis is the horizontal asymptote on χ. Conclusion, χ is a bell symmetric function with maximum χ(0) ≃ 0.4215. We further need CUBO 20, 3 (2018) Quantitative Approximation by a Kantorovich-Shilkret . . . 5 Theorem 3.1. ([2]) We have that ∞∑ i=−∞ χ(x − i) = 1, all x ∈ R. (16) Theorem 3.2. ([2]) It holds ∫∞ −∞ χ(x)dx = 1. (17) So χ(x) is a density function on R. Theorem 3.3. ([2]) Let 0 < α < 1, and n ∈ N with n1−α ≥ 3. It holds ∞∑    k = −∞ : |nx − k| ≥ n1−α χ(nx − k) < 1 2 √ π(n1−α − 2)e(n 1−α−2) 2 . (18) Remark 3.4. We introduce Z(x1, ...,xN) := Z(x) := N∏ i=1 χ(xi) , (19) x = (x1, ...,xN) ∈ RN, N ∈ N. It has the properties: (i) Z(x) > 0, ∀ x ∈ RN, (20) (ii) ∞∑ k=−∞ Z(x − k) := ∞∑ k1=−∞ ∞∑ k2=−∞ ... ∞∑ kN=−∞ Z(x1 − k1, ...,xN − kN) = 1, (21) where k := (k1, ...,kN) ∈ ZN, ∀ x ∈ RN, hence (iii) ∞∑ k=−∞ Z(nx − k) = 1, ∀ x ∈ RN, n ∈ N, (22) and (iv) ∫ RN Z(x)dx = 1, (23) that is Z is a multivariate density function. 6 George A. Anastassiou CUBO 20, 3 (2018) Here ‖x‖∞ := max {|x1| , ..., |xN|}, x ∈ RN, also set ∞ := (∞, ...,∞), −∞ = (−∞, ...,−∞) upon the multivariate context. It is also clear that (see (18)) (v) ∞∑    k = −∞ ∥ ∥ k n − x ∥ ∥ ∞ > 1 nβ Z(nx − k) ≤ 1 2 √ π(n1−β − 2)e(n 1−β−2) 2 , (24) 0 < β < 1, n ∈ N : n1−β ≥ 3, x ∈ RN. For f ∈ C+ B ( R N ) (continuous and bounded functions from RN into R+), we define the first modulus of continuity ω1 (f,h) := sup x,y∈RN ‖x−y‖∞≤h |f(x) − f(y)| , h > 0. (25) Given that f ∈ C+ U ( R N ) (uniformly continuous from RN into R+), we have that lim h→0 ω1 (f,h) = 0. (26) We make Definition 3.5. Let L be the Lebesgue σ-algebra on RN, N ∈ N, and the maxitive measure µ : L → [0,+∞), such that for any A ∈ L with A 6= ∅, we get µ(A) > 0. For f ∈ C+ B ( R N ) , we define the multivariate Kantorovich-Shilkret type neural network operator for any x ∈ RN : Tµn (f,x) = T µ n (f,x1, ...,xN) := ∞∑ k=−∞   (N∗) ∫ [0, 1N ] N f ( t + k n ) dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) = ∞∑ k1=−∞ ∞∑ k2=−∞ ... ∞∑ kN=−∞   (N∗) ∫ 1 n 0 ... ∫ 1 n 0 f ( t1 + k1 n ,t2 + k2 n , ...,tN + kN n ) dµ(t1, ...,tN) µ ( [ 0, 1 n ]N )   (27) · ( N∏ i=1 Z(nxi − ki) ) , where x = (x1, ...,xN) ∈ RN, k = (k1, ...,kN), t = (t1, ...,tN), n ∈ N. Clearly here µ ( [ 0, 1 n ]N ) > 0, ∀ n ∈ N. Above we notice that ‖Tµn (f)‖∞ ≤ ‖f‖∞ , (28) so that T µ n (f,x) is well-defined. CUBO 20, 3 (2018) Quantitative Approximation by a Kantorovich-Shilkret . . . 7 Remark 3.6. Let t ∈ [ 0, 1 n ]N and x ∈ RN, then f ( t + k n ) = f ( t + k n ) − f(x) + f(x) ≤ ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ + f(x) , (29) hence (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) ≤ (N∗) ∫ [0, 1n ] N ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ dµ(t) + f(x)µ ( [ 0, 1 n ]N ) . (30) That is (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) − f(x)µ ( [ 0, 1 n ]N ) ≤ (31) (N∗) ∫ [0, 1n ] N ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ dµ(t) . Similarly we have f(x) = f(x) − f ( t + k n ) + f ( t + k n ) ≤ ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ + f ( t + k n ) , hence (N∗) ∫ [0, 1n ] N f(x)dµ(t) ≤ (N∗) ∫ [0, 1n ] N ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ dµ(t) + (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) . That is f(x)µ ( [ 0, 1 n ]N ) − (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) ≤ (32) (N∗) ∫ [0, 1n ] N ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ dµ(t) . By (31) and (32) we derive ∣ ∣ ∣ ∣ ∣ (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) − f(x)µ ( [ 0, 1 n ]N ) ∣ ∣ ∣ ∣ ∣ ≤ (N∗) ∫ [0, 1n ] N ∣ ∣ ∣ ∣ f ( t + k n ) − f(x) ∣ ∣ ∣ ∣ dµ(t) . (33) In particular it holds ∣ ∣ ∣ ∣ ∣ ∣ (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) µ ( [ 0, 1 n ]N ) − f(x) ∣ ∣ ∣ ∣ ∣ ∣ ≤ (N∗) ∫ [0, 1n ] N ∣ ∣f ( t + k n ) − f(x) ∣ ∣dµ(t) µ ( [ 0, 1 n ]N ) . (34) 8 George A. Anastassiou CUBO 20, 3 (2018) We present Theorem 3.7. Let f ∈ C+ B ( R N ) , 0 < β < 1, x ∈ RN; N,n ∈ N with n1−β ≥ 3. Then i) sup µ |Tµn (f,x) − f(x)| ≤ ω1 ( f, 1 n + 1 nβ ) + ‖f‖∞√ π(n1−β − 2)e(n 1−β−2) 2 =: λn, (35) ii) sup µ ‖Tµn (f) − f‖∞ ≤ λn. (36) Given that f ∈ ( C+ U ( R N ) ∩ C+ B ( R N )) , we obtain lim n→∞ T µ n (f) = f, uniformly. Proof. We observe that |Tµn (f,x) − f(x)| = ∣ ∣ ∣ ∣ ∣ ∣ ∞∑ k=−∞   (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) − ∞∑ k=−∞ f(x)Z(nx − k) ∣ ∣ ∣ ∣ ∣ ∣ = ∣ ∣ ∣ ∣ ∣ ∣ ∞∑ k=−∞     (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) µ ( [ 0, 1 n ]N )   − f(x)  Z(nx − k) ∣ ∣ ∣ ∣ ∣ ∣ ≤ (37) ∞∑ k=−∞ ∣ ∣ ∣ ∣ ∣ ∣   (N∗) ∫ [0, 1n ] N f ( t + k n ) dµ(t) µ ( [ 0, 1 n ]N )   − f(x) ∣ ∣ ∣ ∣ ∣ ∣ Z(nx − k) (34) ≤ ∞∑ k=−∞   (N∗) ∫ [0, 1n ] N ∣ ∣f ( t + k n ) − f(x) ∣ ∣dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) = (38) ∞∑    k = −∞ : ∥ ∥ k n − x ∥ ∥ ∞ ≤ 1 nβ   (N∗) ∫ [0, 1n ] N ∣ ∣f ( t + k n ) − f(x) ∣ ∣dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) + ∞∑    k = −∞ : ∥ ∥ k n − x ∥ ∥ ∞ > 1 nβ   (N∗) ∫ [0, 1n ] N ∣ ∣f ( t + k n ) − f(x) ∣ ∣dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) ≤ ∞∑    k = −∞ : ∥ ∥ k n − x ∥ ∥ ∞ ≤ 1 nβ   (N∗) ∫ [0, 1n ] N ω1 ( f,‖t‖∞ + ∥ ∥ k n − x ∥ ∥ ∞ ) dµ(t) µ ( [ 0, 1 n ]N )  Z(nx − k) (39) CUBO 20, 3 (2018) Quantitative Approximation by a Kantorovich-Shilkret . . . 9 +2‖f‖∞           ∞∑    k = −∞ : ∥ ∥ k n − x ∥ ∥ ∞ > 1 nβ Z(nx − k)           (24) ≤ ω1 ( f, 1 n + 1 nβ ) + ‖f‖∞√ π(n1−β − 2)e(n 1−β−2) 2 , (40) proving the claim. Additionally we give Definition 3.8. Denote by C+ B ( R N,C ) = {f : RN → C|f = f1 + if2, where f1,f2 ∈ C+B ( R N ) , N ∈ N}. We set for f ∈ C+ B ( R N,C ) that Tµn (f,x) := T µ n (f1,x) + iT µ n (f2,x) , (41) ∀ n ∈ N, x ∈ RN, i = √ −1. Theorem 3.9. Let f ∈ C+ B ( R N,C ) , f = f1 + if2, N ∈ N, 0 < β < 1, x ∈ RN; n ∈ N with n1−β ≥ 3. Then i) sup µ |Tµn (f,x) − f(x)| ≤ [ ω1 ( f1, 1 n + 1 nβ ) + ω1 ( f2, 1 n + 1 nβ )] + (‖f1‖∞ + ‖f2‖∞)√ π(n1−β − 2)e(n 1−β−2) 2 =: ψn, (42) and ii) sup µ ‖Tµn (f) − f‖ ≤ ψn. (43) Proof. |Tµn (f,x) − f(x)| = |T µ n (f1,x) + iT µ n (f2,x) − f1 (x) − if2 (x)| = |(Tµn (f1,x) − f1 (x)) + i(T µ n (f2,x) − f2 (x))| ≤ |Tµn (f1,x) − f1 (x)| + |T µ n (f2,x) − f2 (x)| (35) ≤ (44) ( ω1 ( f1, 1 n + 1 nβ ) + ‖f1‖∞√ π(n1−β − 2)e(n 1−β−2) 2 ) + ( ω1 ( f2, 1 n + 1 nβ ) + ‖f2‖∞√ π(n1−β − 2)e(n 1−β−2)2 ) = 10 George A. Anastassiou CUBO 20, 3 (2018) [ ω1 ( f1, 1 n + 1 nβ ) + ω1 ( f2, 1 n + 1 nβ )] + (‖f1‖∞ + ‖f2‖∞)√ π(n1−β − 2)e(n 1−β−2) 2 , (45) proving the claim. CUBO 20, 3 (2018) Quantitative Approximation by a Kantorovich-Shilkret . . . 11 References [1] M. Abramowitz, I.A. Stegun, eds, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York, Dover Publication, 1972. [2] G.A. Anastassiou, Univariate error function based neural network approximation, Indian J. of Math., Vol. 57, No. 2 (2015), 243-291. [3] L.C. Andrews, Special Functions of Mathematics for Engineers, Second edition, Mc Graw-Hill, New York, 1992. [4] I.S. Haykin, Neural Networks: A Comprehensive Foundation (2 ed.), Prentice Hall, New York, 1998. [5] W. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 7 (1943), 115-133. [6] T.M. Mitchell, Machine Learning, WCB-McGraw-Hill, New York, 1997. [7] Niel Shilkret, Maxitive measure and integration, Indagationes Mathematicae, 33 (1971), 109- 116. Introduction Background Main Results