International Journal of Analysis and Applications Volume 18, Number 5 (2020), 689-698 URL: https://doi.org/10.28924/2291-8639 DOI: 10.28924/2291-8639-18-2020-689 SOME NOTES ON ERROR ANALYSIS FOR KERNEL BASED REGULARIZED INTERPOLATION QING ZOU∗ Applied Mathematics and Computational Sciences, The University of Iowa, IA 52242, USA ∗Corresponding author: zou-qing@uiowa.edu Abstract. Kernel based regularized interpolation is one of the most important methods for approximating functions. The theory behind the kernel based regularized interpolation is the well-known Representer Theorem, which shows the form of approximation function in the reproducing kernel Hilbert spaces. Because of the advantages of the kernel based regularized interpolation, it is widely used in many mathematical and engineering applications, for example, dimension reduction and dimension estimation. However, the performance of the approximation is not fully understood from the theoretical perspective. In other word, the error analysis for the kernel based regularized interpolation is lacking. In this paper, some error bounds in terms of the reproducing kernel Hilbert space norm and Sobolev space norm are given to understand the behavior of the approximation function. 1. Introduction Approximating functions in high dimensional spaces is one of the central problems in both mathematics and engineering. Many real-world problems can be viewed as a function approximation problem. For example, the classification problem in engineering can be viewed as approximating a function whose function values give the classes that the inputs belong to [7] and in image processing, the patch-based image denoising problem can be seen as approximating a function from noisy patches to clean pixels. Received May 12th, 2020; accepted June 4th, 2020; published June 17th, 2020. 2010 Mathematics Subject Classification. 65D05, 41A05, 41A10, 65J05, 65G99. Key words and phrases. error bound; kernel based regularized interpolation; reproducing kernel Hilbert space; Sobolev space. ©2020 Authors retain the copyrights of their papers, and all open access articles are distributed under the terms of the Creative Commons Attribution License. 689 https://doi.org/10.28924/2291-8639 https://doi.org/10.28924/2291-8639-18-2020-689 Int. J. Anal. Appl. 18 (5) (2020) 690 Mathematically speaking, the function approximation problem is to approximate an unknown continuous function f : X → R from the knowledge of some observations {(xi,yi)}ni=1 ⊂ X × R, where X ⊂ R d,d ≥ 1 is the input space and n ∈ N is the number of observations. One of the classical ways for approximating functions to explain the real-world phenomenon is called Kriging (a.k.a Gaussian process regression) [4]. However, to use Kriging, we need a large amount of obser- vations, which is expensive to achieve. When only a few samples are given, Kriging will yield non-stationary behavior [10]. Another alternative for approximating multivariate functions is the inverse distance weighting (IDW) method, which was originally proposed in [9]. The key assumption of IDW is that the points that are close to each other are more alike than those that are far away from each other. Therefore, IDW produces prediction at a point relying on observed points close to that point. In other word, the measured values close to the prediction location have more effects on the predicted value than those which are far away. Based on which, the approximation function can be represented as f̂(x) = ∑n i=1 ||x−xi|| −αyi∑n i=1 ||x−xi||−α , where α is a parameter and usually it takes the value 2. IDW can produce good accuracy for the points that are near the observations. However, for the points that are away from the observations, IDW cannot work well. This is one drawback of IDW. Another drawback of IDW is that the gradient vanishes at the observations. Considering the drawbacks of these methods, the kernel based regularized interpolation (see (2.2) for details) was proposed. The kernel based regularized interpolation in 1D is similar to the spline interpolation. It has several advantages. First of all, the kernel based regularized interpolation works for any dimensions with many different situations [1, 2]. It also enjoys solid theoretical understandings — the representer theorem — which we will show it below. Furthermore, comparing to other methods, one can have the same approximation accuracy using kernel based regularized interpolation by solving a better conditioned linear system. The kernel based regularized interpolation also works for the case that the observations have some noise (see Section 2 for analysis). This is one of the biggest advantages that other interpolation methods do not have. Last but not least, we have much flexibility in choosing kernels when using kernel based regularized interpolation. Kernel based regularized interpolation is also widely used in other engineering problems, for example, clustering, dimension reduction [1] and dimension estimation [11]. In view of the advantages of kernel based regularized interpolation, there is pretty much work on using kernel based regularized interpolation to deal with different situations, as we mentioned above. However, the work on the error analysis for the kernel based regularized interpolation is very limited. It is actually worthwhile to reflect on the nature of error bounds that are needed to characterize the behavior of the Int. J. Anal. Appl. 18 (5) (2020) 691 learned functions from observations. In this paper, we would like to provide some error analysis for the kernel based regularized interpolation from the function space point of view. In other word, we focus specifically on deriving Hilbert space type and Sobolev space type error estimates for the kernel based regularized interpolation. 2. Preliminary Let us recall here some basic facts from kernel methods for our analysis. We consider a positive definite kernel K : X ×X → R, i.e., for n ∈ N, x1, · · · ,xn ∈ X and a1, · · · ,an ∈ R, n∑ i=1 n∑ j=1 aiajK(xi,xj) ≥ 0. Associated with the kernel K, there exists a unique native Hilbert space HK of functions from X to R. The elements in HK are of the form f(·) = ∑ i∈I αiK(·,xi), where I is a countable set. The norm defined for HK is given by 〈f,g〉HK =: ∑ i∈I ∑ j∈J αiβjK(xi,xj), where g = ∑ j∈J βjK(·,xj) ∈HK. The Hilbert space HK is called reproducing kernel Hilbert space (RKHS) because the kernel K acts as a reproducing kernel on HK, i.e., f(x) = 〈f(·),K(·,x)〉HK ∀f ∈HK. We require the kernel to be positive definite because when we approximate the functions, linear systems will be involved. The positive-definite property guarantees the linear systems are well conditioned and the approximation problem can be solved in HK using the given observations {(xi,yi)}ni=1. Indeed, one can solve the functions approximation problem by considering the regularization problem with the regularization parameter λ > 0, min f∈HK n∑ i=1 ||f(xi) −yi||2 + λ||f||2HK. (2.1) The solution of this regularization problem is characterized by the Representer Theorem [8], which states that the solution of (2.1) is given as f = n∑ i=1 αiK(x,xi). (2.2) This expression is called kernel based regularized interpolation. In the rest of this paper, we use the notation IXf to stand for the kernel based regularized interpolation, i.e., IXf = n∑ i=1 αiK(x,xi). Int. J. Anal. Appl. 18 (5) (2020) 692 To obtain error analysis, we need to have the true function. We use the notation f to represent the true function that is only known at the sampling locations, i.e., we only have the information yi = f(xi) about the true function f. The goal of this work is to obtain the bounds for the error between the true function f and the kernel based regularized interpolation IXf. It is worth mentioning that the kernel based regularized interpolation is well defined for positive definite kernels, but it is not a pure interpolation. This means that we may not have IXf(xi) = yi = f(xi). But in the case of strictly positive definite kernels, the regularization problem (2.1) will become an interpolation problem which exactly interpolates f at the observations and hence the resulted kernel interpolation is a pure interpolation. However, there is no need to consider strictly positive definite kernels. Using positive definite kernel to obtain kernel based regularized interpolation is more general. This is because we have a tunable parameter λ in the regularization problem and it provides a trade-off between pointwise accuracy and stability. Another important reason is that in real-world applications, the observations may be corrupted by noise and it makes no sense to have pure interpolations anymore. 3. Hilbert space type error analysis In this section, we would like to provide a special case about the error analysis for the kernel based regularized interpolation. The term “special case” is used here because we will assume in this section that the true function f is living in the RKHS. While in some cases, we cannot guarantee that the true function is within the RKHS. We will show the error analysis for the general case in the next section. Before proceeding to the error analysis, we first recall an important concept in numerical analysis: best approximation [5]. Given f ∈HK, define ρ := inf p∈S ||f −p||HK, where S ⊂HK. Then the number ρ is called the minimax error for the approximation of the function f by functions in S. In fact [5], there is a unique p̂ ∈ S such that ||f − p̂||HK = ρ = inf p∈S ||f −p||HK. The function p̂ ∈ S is called the best approximation of f w.r.t. the HK-norm. Now, let us look at the inner product of the RKHS. By Cauchy-Schwarz inequality [3], we have that |〈h,K(·,x)〉HK| ≤ ||h||HK · ||K(·,x)||HK, ∀h ∈HK. (3.1) Based on which, we can claim from (3.1) that there exists a constant 0 < β ≤ 1 such that sup x∈X |〈h,K(·,x)〉HK| ||K(·,x)||HK ≥ β · ||h||HK. (3.2) We then have the first error estimation. Int. J. Anal. Appl. 18 (5) (2020) 693 Theorem 3.1. Let S = span{K(x,xi)}ni=1 ⊂ HK. Assume that the true function f ∈ HK. IXf is the kernel based regularized interpolation. Then we have the following error bound ||f − IXf||HK ≤ (1 + C β ) ·ρ, where C > 1, 0 < β ≤ 1 are two constants and ρ is the minimax error for the approximation of the function f by functions in S in terms of the HK-norm. Proof. First of all, let us consider the trivial case that the true function f ∈ S ⊂HK. In this case, the kernel based regularized interpolation is exact, meaning that we have IXf = f. Then both ||f − IXf||HK and ρ are zero. So the conclusion becomes trivial in this case. Next, we assume that f ∈HK but f 6∈ S. Then for any g ∈ S, we have ||f − IXf||HK ≤ ||f −g||HK + ||IXf −g||HK. (3.3) Replacing h by IXf −g in (3.2), we have β||IXf −g||HK ≤ sup x∈X |〈IXf −g,K(·,x)〉HK| ||K(·,x)||HK . Because we have f 6= g, then for the right-hand side (RHS) of the above inequality, there exists a constant C > 1 such that RHS = sup x∈X |〈f −g + IXf −f,K(·,x)〉HK| ||K(·,x)||HK ≤ sup x∈X [ |〈f −g,K(·,x)〉HK| ||K(·,x)||HK + |〈IXf −f,K(·,x)〉HK| ||K(·,x)||HK ] = sup x∈X |〈f −g,K(·,x)〉HK| ||K(·,x)||HK + sup x∈X |〈IXf −f,K(·,x)〉HK| ||K(·,x)||HK ≤ C · sup x∈X |〈f −g,K(·,x)〉HK| ||K(·,x)||HK ≤ C · ||f −g||HK. The last inequality is due to the Cauchy-Schwarz inequality. Therefore, we have ||IXf −g||HK ≤ C β ||f −g||HK. Then by (3.3), we can obtain that ||f − IXf||HK ≤ (1 + C β )||f −g||HK, ∀g ∈ S, which implies that ||f − IXf||HK ≤ (1 + C β ) inf g∈S ||f −g||HK = (1 + C β ) ·ρ. This completes the proof. � Int. J. Anal. Appl. 18 (5) (2020) 694 In fact, this error bound can be further tightened. The improvement is due to the observation that if the true function f is within HK, the kernel based regularized interpolation is actually a projection. Let S = span{K(x,xi)}ni=1 ⊂HK as defined in Theorem 3.1. Then if f ∈HK, we can see that the kernel based regularized interpolation is a projection ϕ : HK → S. The most important property of a projection is the idempotence. If the function f is already in S, the projection does nothing but to keep the function, i.e., ϕ(g) = g, ∀g ∈ S. This implies the idempotence of the projector: ϕ2(f) = ϕ(ϕ(f)) = ϕ(f), ∀f ∈HK. If we define ||ϕ|| to be the operator norm, which is given by ||ϕ|| =: max 06=g∈HK ||ϕ(g)||HK ||g||HK . Then we have the identity ||ϕ|| = ||I −ϕ||, where I is the identity operator. This conclusion is due to the following result given in [6]. Lemma 3.1. Let H be a Hilbert space and U ⊂H. Let P : H→ U be an (oblique) projection (P2 = P ) and 0 6= P 6= I. Then ||P || = ||I −P ||. Using the observation that the kernel based regularized interpolation is a projection if f ∈ HK, we then have the following tightened error bound. Theorem 3.2. Suppose S = span{K(x,xi)}ni=1 ⊂ HK. The kernel based regularized interpolation is given by IXf = ϕ(f), where ϕ is a projector from HK to S. Then we have the following error bounds ||f − IXf||HK ≤ ρ β , where 0 < β ≤ 1 is a constant and ρ is the minimax error for the approximation of the function f by functions in S in terms of the HK-norm. Proof. Since ϕ is a projection, we have ||ϕ|| = ||I −ϕ||. Based on which, we can derive that for all g ∈ S, ||f − IXf||HK =||f −g − IXf + g||HK =||f −g −ϕ(f −g)||HK =||(I −ϕ)(f −g)||HK ≤||I −ϕ|| · ||f −g||HK =||ϕ|| · ||f −g||HK. Int. J. Anal. Appl. 18 (5) (2020) 695 To show the error bound, we need to show that ||ϕ|| ≤ 1 β . First, it is straightforward that if f ∈HK, we have ||IXf||HK ≤ ||f||HK. Since 0 < β ≤ 1, we can get 1 β ≥ 1. So for f ∈HK, ||IXf||HK ≤ 1 β ||f||HK. Therefore, we can obtain that ||ϕ|| = max 06=f∈HK ||ϕ(f)||HK ||f||HK = max 06=f∈HK ||IXf||HK ||f||HK ≤ 1 β ||f||HK ||f||HK = 1 β . This gives us the desired bound. � 4. Sobolev space type error analysis In the previous section, we have introduced a special case for the error analysis about kernel based regularized interpolation, where we assumed that the true function f lives in the reproducing kernel Hilbert space HK. However, in general we do not have this guarantee. The true functions are usually within a larger function space. In this section, we consider the case that the true function f is in the Sobolev spaces, which are in fact the spaces that consist of all f ∈ Lp(X) with certain properties. The formal definition of the Sobolev space is given as follows. Definition 4.1. Let k be a nonnegative integer, p ∈ [1,∞]. The Sobolev space Wk,p(X) is the set of all the functions f ∈ Lp(X) such that for each multi-index α with |α| ≤ k, the αth weak derivative ∂αf exists and ∂αf ∈Lp(X). The norm of the sobolev space is defined as ||f||Wk,p(X) =   [∑ |α|≤k ||∂ αf||pLp(X) ]1/p , 1 ≤ p < ∞ max|α|≤k||∂ αf||L∞(X), p = ∞ . When p = 2, we usually write Wk,2(X) := Hk(X). For simplicity, we replace ||f||Wk,p(X) by ||f||k,p,X when no confusion and we ignore the p when p = 2. Except for the standard norm defined for the Sobolev space, we also have the definition of the seminorm. The standard seminorm over the space Wk,p(X) is given as |f|Wk,p(X) =   [∑ |α|=k ||∂ αf||pLp(X) ]1/p , 1 ≤ p < ∞ max|α|=k||∂ αf||L∞(X), p = ∞ . Int. J. Anal. Appl. 18 (5) (2020) 696 Similarly, we write |f|Wk,p(X) as |f|k,p,X when no confusion and ignore the p when p = 2. We now define the continuous embedding between two Banach spaces. Let W,V be two Banach spaces with V ⊂ W . We say the space V is continuously embedded in W and write V ↪→ W , if there exists a constant c > 0 such that for all v ∈ V , we have ||v||W ≤ c||v||V . With these concepts in the theory of Sobolev spaces, we can then have the error bound for the kernel based regularized interpolation. We assume that the true function f is within Hk(X). Theorem 4.1. Let k and m be nonnegative integers with k ≥ 0,k + 1 ≥ m. Suppose HK ⊃ S = span{K(x,xi)}. Then there exists a constant c > 0 such that |f − IXf|m,X ≤ c inf g∈S ||f + g||k+1,X. When the positive definite kernel is chosen to be the polynomial kernel with degree d and k + 1 > d, we have |f − IXf|m,X ≤ c|f|k+1,X. Proof. Since k > 0, we then have Hk+1(X) ↪→ C(X), where C(X) is the set of all the continuous functions defined on X. So f ∈ Hk+1(X) is continuous and IXf is well defined. Besides, there exists a constant c1 > 0 such that ||IXf||m,X ≤ n∑ i=1 |αi| · ||K(x,xi)||m,X ≤ c1||f||C(X). Since Hk+1(X) ↪→ C(X), we then have that there exists a constant c2 > 0 such that ||f||C(X) ≤ c2||f||k+1,X. Therefore, there exists some c3 > 0 such that ||IXf||m,X ≤ c3||f||k+1,X. (4.1) Now, for all f ∈ Hk+1 and g ∈ S, we can obtain from (4.1) that there exist some constant c > 0, |f − IXf|m,X ≤ ||f − IXf||m,X = ||f − IXf + g − IXg||m,X = ||(f + g) − IX(f + g)||m,X ≤ ||f + g||m,X + ||IX(f + g)||m,X ≤ c||f + g||k+1,X. By which we have |f − IXf|m,X ≤ c inf g∈S ||f + g||k+1,X. Int. J. Anal. Appl. 18 (5) (2020) 697 We now consider the case that the positive definite kernel K is the polynomial kernel. First, we have the following inequality using the norm equivalence theorem [12] ||f||k+1,X ≤ c4  |f|k+1,X + ∑ |α|≤k ∣∣∣∣ ∫ X ∂αf(x)dx ∣∣∣∣   , (4.2) ∀f ∈ Hk+1(X). Since the kernel is a polynomial kernel and k + 1 > d, we have that for all g ∈ S, ∂αg = 0 for |α| = k + 1. Replacing f by f + g in (4.2) gives us that ∀f ∈ Hk+1(X),g ∈ S, ||f + g||k+1,X ≤c4  |f + g|k+1,X + ∑ |α|≤k ∣∣∣∣ ∫ X ∂α(f + g)dx ∣∣∣∣   =c4  |f|k+1,X + ∑ |α|≤k ∣∣∣∣ ∫ X ∂α(f + g)dx ∣∣∣∣   . For the term ∫ X ∂α(f + g)dx, we should note that for any f ∈ Hk+1(X), one can always have a g ∈ S such that for |α| ≤ k, ∫ X ∂α(f + g)dx = 0. (4.3) This is because when we set |α| = k,k−1,k−2, · · · , we will have a set of equations from (4.3). We can then solve the set of equations to obtain the coefficients of the polynomial g ∈ S. Besides, we note that such set of equations is always solvable because we assumed that k + 1 > d. Then the g which was constructed by such way will annihilate the integral ∫ X ∂α(f + g)dx for |α| ≤ k. Therefore, we have that inf g∈S ||f + g||k+1,X ≤ c4|f|k+1,X, for some constant c4 > 0. Then by the conclusion we obtained from the first part, we know that there exists some c > 0 such that |f − IXf|m,X ≤ c|f|k+1,X. This completes the proof of the result. � From this error bound, we can see that if we consider the polynomial kernel and the true function is in the RKHS HK, we will have |f − IXf|m,X ≤ c||f||HK, for some constant c > 0. This is because the RKHS HK is continuously embedded in Hk+1(X) and we then have |f|k+1,X ≤ c||f||HK . Int. J. Anal. Appl. 18 (5) (2020) 698 5. Conclusion In this paper, some error bounds for the kernel based regularized interpolation are provided. We derived these error bounds in terms of the reproducing kernel Hilbert space norm and Sobolev space norm. We also introduced a error bound for a special type of a commonly used kernel: polynomial kernel. The error bounds then provide theoretical understandings of the approximation performance. Conflicts of Interest: The author(s) declare that there are no conflicts of interest regarding the publication of this paper. References [1] M. Belkin and P. Niyogi, Semi-supervised learning on riemannian manifolds, Mach. Learn. 56 (2004), 209–239. [2] M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res. 7 (2006), 2399–2434. [3] R. Bhatia and C. Davis, A cauchy-schwarz inequality for operators with applications, Linear Algebra Appl. 223 (1995), 119–129. [4] N. Cressie, The origins of kriging, Math. Geol. 22 (1990), 239–252. [5] F. R. Deutsch, Best approximation in inner product spaces, Springer Science & Business Media, 2012. [6] T. Kato, Estimation of iterated matrices, with application to the von neumann condition, Numer. Math. 2 (1960), 22–29. [7] H.-J. Rong, G.-B. Huang, N. Sundararajan, and P. Saratchandran, Online sequential fuzzy extreme learning machine for function approximation and classification problems, IEEE Trans. Syst. Man Cybern. Part B (Cybernetics), 39 (2009), 1067–1072. [8] B. Schölkopf, R. Herbrich, and A. J. Smola, A generalized representer theorem, in International conference on computational learning theory, Springer, 2001, 416–426. [9] D. Shepard, A two-dimensional interpolation function for irregularly-spaced data, in Proceedings of the 1968 23rd ACM national conference, 1968, 517–524. [10] M. Thakur, B. Samanta, and D. Chakravarty, A non-stationary geostatistical approach to multigaussian kriging for local reserve estimation, Stoch. Environ. Res. Risk Assess. 32 (2018), 2381–2404. [11] J. J. Thiagarajan, P.-T. Bremer, and K. N. Ramamurthy, Multiple kernel interpolation for inverting non-linear dimen- sionality reduction and dimension estimation, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2014, 6751–6755. [12] A. Wannebo, Equivalent norms for the sobolev space W m,p 0 (Ω), Ark. Mat. 32 (1994), 245–254. 1. Introduction 2. Preliminary 3. Hilbert space type error analysis 4. Sobolev space type error analysis 5. Conclusion References