CUBO A Mathematical Journal Vol.11, No¯ 05, (99–115). December 2009 On Tikhonov Functionals Penalized by Bregman Distances I.R. Bleyer and A. Leitão Department of Mathematics, Federal University of St. Catarina, P.O. Box 476, 88040-900, Florianópolis, Brazil emails: ismaelbleyer@gmail.com, acgleitao@gmail.com ABSTRACT We investigate Tikhonov regularization methods for linear and nonlinear ill-posed problems in Banach spaces, where the penalty term is described by Bregman distances. We prove convergence and stability results. Moreover, using appropriate source conditions, we are able to derive rates of convergence in terms of Bregman distances. We also analyze an iterated Tikhonov method for nonlinear problems, where the penalization is given by an appropriate convex functional. RESUMEN Investigamos métodos de regularización de Tikhonov para problemas no lineales mal-puestos en espacios de Banach, donde el término de penalización es descrito por distancias de Bregman. Probamos resultados de convergencia y estabilidad. Además, usando condiciones apropriadas, somos capazes de obtener tasas de convergencia en términos de las distancias de Bergman. También analizamos un método de iterados de Tikhnov para problemas no lineales donde la penalización es dada por un funcional convexo apropriado. Key words and phrases: Tikhonov functionals, Bregman distances, Total variation regulariza- tion. Math. Subj. Class.: 65J20, 47J06, 47J25. 100 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) 1 Introduction In this paper we study non-quadratic regularization methods for solving ill-posed operator equa- tions of the form F(u) = y , (1) where F : D(F) ⊂ U → H is an operator between infinite dimensional Banach spaces. Both linear and nonlinear problems are considered. Tikhonov method is widely used to approximate solutions of inverse problems modeled by operator equations in Hilbert spaces [11, 5]. In this article we investigate a Tikhonov methods, which consist of the minimization of functionals of the type Jδα (u) = 1 2 ‖F(u) − yδ‖2 + αh (u) , (2) where α ∈ R+ is called regularization parameter, h (·) is a proper convex functional, and the noisy data yδ satisfy ‖y − yδ‖ < δ . (3) The method presented above represents a generalization of the classical Tikhonov regulariza- tion. Therefore, the following questions arise: • For α > 0, does the solution (2) exist? Does the solution depends continuously on the data yδ? • Is the method convergent? (i.e., if the data y is exact and α → 0, do the minimizers of (2) converge to a solution of (1)?) • Is the method stable in the sense that: if α = α(δ) is chosen appropriately, do the minimizers of (2) converge to a solution of (1) as δ → 0? • What is the rate of convergence? How should the parameter α = α(δ) be chosen in order to get optimal convergence rates? The first point above is answered in [6]. Throughout this article we assume the following assumptions. Assumption 1.1. (A1) Given the Banach spaces U and H one associates the topologies τU and τH , respectively, which are weaker than the norm topologies; (A2) The topological duals of U and H are denoted by U∗ and H ∗, respectively; (A3) The norm ‖·‖ is sequentially lower semi-continuous with respect to τH , i.e., for uk → u with respect to the τU topology, h (u) ≤ lim infk h (uk); CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 101 (A4) D(F) has non-empty interior with respect to the norm topology and is τU -closed. Moreover, D(F) ∩ dom h 6= ∅; (A5) F : D(F) ⊆U → H is continuous from (U ,τU) to (H ,τH ); (A6) The functional h : [0, +∞] → H is proper, convex, bounded from below and τU lower semi- continuous; (A7) For every M > 0 , α > 0, the sets Mα (M) = { u ∈ U | Jδα (u) ≤ M } are τU compact, i.e. every sequence (uk) in Mα (M) has a subsequence, which is convergent in U with respect to the τU topology. The goal of this paper is to answer the last three questions posed above. We obtain convergence rates and error estimates with respect to the generalized Bregman distances, originally introduced in [3]. Even though this tool does not satisfy symmetry requirement nor the triangular inequality, it is the main ingredient to this work. This paper is organized as follow: In section 2 we consider the linear case and give quantitative estimates for the minimizers of (2), for exact and for noisy data. In section 3 contains similar results as the section 2 for nonlinear problems. In section 4 we briefly discuss a iterative method for the nonlinear case, the main results contains convergence analysis. 2 Convergence Analysis for Linear Problems In this section we consider only the linear case. Equation (1) will be denoted by Fu = y, and the operator is defined from a Banach space to a Hilbert space. The main results of this section were proposed originally in [4, 8]. 2.1 Rates of convergence for source condition of type I Error estimates for the solution error can be obtained only under additional smoothness assumption on the data, the so called source conditions. At a first moment we assume that y ∈ R(F) and let u be an h-minimizing solution by definition A.2. We assume that there exist at least one element ξ in ∂h (u) which belongs to the range of adjoint of the operator F . Note that R(F∗) ⊆ U∗ and ∂h (u) ⊆ U∗. Summarizing, we have ξ ∈ R(F∗) ∩ ∂h (u) 6= ∅ , (4) where u is such that Fu = y . (5) 102 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) We can rewrite the source condition (4) as following: there exists an element ω ∈ H such that ξ = F∗ω. Note that under this assumption we can define the dual pairing for (ψ,u) ∈ U∗ ×U , where ψ ∈ R(F∗) as 〈ψ , u〉 = 〈F∗ν , u〉 := 〈ν , Fu〉 H , for some ν ∈H . Theorem 2.1 (Stability). Let (3) hold and let u be an h-minimizing solution of (1) such that the source condition (4) and (5) are satisfied. Then, for each minimizer uδα of (2) the estimate DF ∗ω h ( uδα,u ) ≤ 1 2α (α‖ω‖ + δ) 2 (6) holds for α > 0. In particular, if α ∼ δ, then DF ∗ω h ( uδα,u ) = O (δ). Proof. We note that ∥∥Fu − yδ ∥∥2 ≤ δ2, by (5) and (3). Since uδα is a minimizer of the regularized problem (2), we have 1 2 ∥∥Fuδα − y δ ∥∥2 + αh ( uδα ) ≤ δ2 2 + αh (u) . Let DF ∗ω h ( uδα,u ) the Bregman distance between uδα and u, so the above inequality becomes 1 2 ∥∥Fuδα − y δ ∥∥2 + α ( DF ∗ω h ( uδα,u ) + 〈 F∗ω , uδα − u 〉) ≤ δ2 2 . Hence, using (3) and Cauchy-Schwarz inequality we can derive the estimate 1 2 ∥∥Fuδα − y δ ∥∥2 + 〈 αω , Fuδα − y δ 〉 H + αDF ∗ω h ( uδα,u ) ≤ δ2 2 + α‖ω‖δ . Using the the equality ‖a + b‖ 2 = ‖a‖ 2 + 2 〈a , b〉 + ‖b‖ 2 , it is easy to see that 1 2 ∥∥Fuδα − y δ + αω ∥∥2 + αDF ∗ω h ( uδα,u ) ≤ α2 2 ‖ω‖ 2 + αδ ‖ω‖ + δ2 2 , which yields (6) for α > 0. Theorem 2.2 (Convergence). If u is an h-minimizing solution of (1) such that the source condi- tion (4) and (5) are satisfied, then for each minimizer uα of (2) with exact data, the estimate DF ∗ω h (uα,u) ≤ α 2 ‖ω‖ 2 holds true. Proof. The proof is analogous to the proof of theorem 2.1, taking δ = 0. CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 103 2.2 Rates of convergence for source condition of type II In this section we use another source condition, which is stronger than the one used in previous subsection. This condition corresponds the existence of some element ξ ∈ ∂h (u) ⊂U∗ in the range of the operator F∗F , i.e. ξ ∈ R(F∗F) ∩ ∂h (u) 6= ∅ , (7) where u is such that F∗Fu = F∗y . (8) Note that in (8) we do not require y ∈ R(F). Moreover, the definition A.2 is given in context of least-squares solution. The condition (7) is equivalent to the existence of ω ∈U\ {0} such that ξ = F∗Fω, where F∗ is the adjoint operator of F and F∗F : U → U∗. Theorem 2.3 (Stability). Let (3) hold and let u be an h-minimizing solution of (1) such that the source condition (7) as well as (8) are satisfied. Then the following inequalities hold for any α > 0: DF ∗F ω h ( uδα,u ) ≤ DF ∗F ω h (u − αω,u) + δ2 α + δ α √ δ2 + 2αDF ∗F ω h (u − αω,u), (9) ∥∥Fuδα − Fu ∥∥ ≤ α‖Fω‖ + δ + √ δ2 + 2αDF ∗F ω h (u − αω,u) . (10) Proof. Since uδα is a minimizer of (2), it follows from algebraic manipulation and from the definition of Bregman distance that 0 ≥ 1 2 [∥∥Fuδα − y δ ∥∥2 − ∥∥Fu − yδ ∥∥2 ] + αh ( uδα ) − αh (u) = 1 2 [∥∥Fuδα ∥∥2 − ‖Fu‖2 ] − 〈 F ( uδα − u ) , yδ 〉 H − αDF ∗F ω h (u,u) + α 〈 Fω , F ( uδα − u )〉 H + αDF ∗F ω h ( uδα,u ) . (11) Notice that ∥∥Fuδα ∥∥2 − ‖Fu‖2 = ∥∥F ( uδα − u + αω )∥∥2 − ‖F (u − u + αω)‖2 + 2 〈 Fuδα − Fu , Fu − αFω 〉 H . Moreover, by (8), we have 〈 F ( uδα − u ) , yδ − Fu 〉 H = 〈 F ( uδα − u ) , yδ − y 〉 H . Therefore, it follows from (11) that 1 2 ∥∥F ( uδα − u + αω )∥∥2 + αDF ∗F ω h ( uδα,u ) ≤ 〈 F ( uδα − u ) , yδ − y 〉 H + αDF ∗F ω h (u,u) + 1 2 ‖F (u − u + αω)‖ 2 for every u ∈ U , α ≥ 0 and δ ≥ 0. 104 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) Replacing u by u − αω in the last inequality, using (3), relations 〈a , b〉 ≤ | 〈a , b〉 | ≤ ‖a‖ ‖b‖, and defining γ = ∥∥F ( uδα − u + αω )∥∥ we obtain 1 2 γ2 + αDF ∗F ω h ( uδα,u ) ≤ δγ + αDF ∗F ω h (u − αω,u) . We estimate separately each term on the left hand side by right hand side. One of the estimates is an inequality in the form of a polynomial of the second degree for γ, which gives us the inequality γ ≤ δ + √ δ2 + 2αDF ∗F ω h (u − αω,u) . This inequality together with the other estimate, gives us (9). Now, (10) follows from the fact that∥∥F ( uδα − u )∥∥ ≤ γ + α‖Fω‖. Theorem 2.4 (Convergence). Let α ≥ 0 be given. If u is a h-minimizing solution of (1) satisfying the source condition (7) as well as (8), then the following inequalities hold true: DF ∗F ω h (uα,u) ≤ D F ∗F ω h (u − αω,u) , ‖Fuα − Fu‖ ≤ α‖Fω‖ + √ 2αDF ∗F ω h (u − αω,u) . Proof. The proof is analogous to the proof of theorem 2.3, taking δ = 0. Notice that here α can be taken equal to zero. Corollary 2.5. Let the assumptions of the theorem 2.3 hold true. Further, assume that h is twice differentiable in a neighborhood U of u and there exists a number M > 0 such that for any v ∈ U and u ∈ U the inequality 〈h′′(u)v , v〉 ≤ M ‖v‖ 2 (12) hold true. Then, for the parameter choice α ∼ δ 2 3 we have D ξ h ( uδα,u ) = O ( δ 4 3 ) . Moreover, for exact data we have D ξ h (uα,u) = O ( α2 ) . Proof. Using Taylor’s expansion at the point u we obtain h (u) = h (u) + 〈h′(u) , u − u〉 + 1 2 〈h′′(µ)(u − u) , u − u〉 for some µ ∈ [u,u]. Let u = u − αω in the above equality. For sufficiently small α, it follows from assumption (12) and the definition of the Bregman distance, with ξ = h′(u), that D ξ h (u − αω,u) = 1 2 〈h′′(µ)(−αω) , − αω〉 ≤ α2 M 2 ‖ω‖ 2 U . Note that D ξ h (u − αω,u) = O ( α2 ) , so the desired rates of convergence follow from theorems 2.3 and 2.4. CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 105 3 Convergence Analysis for Nonlinear Problems This section points out the convergence analysis for the nonlinear problems. We need to assume a nonlinear condition. In contrast with other classical conditions, the following analysis covers the case when both U and H are Banach spaces. Assumption 3.1. Assume that an h-minimizing solution u of (1) exists and that the operator F : D(F) ⊆ U → H is Gâteaux differentiable. Moreover, assume that there exists ρ > 0 such that, for every u ∈ D(F) ∩ Bρ (u) ‖F (u) − F (u) − F ′ (u) (u − u)‖ ≤ cD ξ h (u,u) , c > 0 (13) and ξ ∈ ∂h (u). This assumption was proposed originally in [9]. 3.1 Rates of convergence for source condition of type I For nonlinear operators we cannot define an adjoint operator. Therefore the assumptions are done with respect to the linearization of the operator F . In comparison with the source condition (4) introduced on previous section, we assume that ξ ∈ R(F ′ (u) ∗ ) ∩ ∂h (u) 6= ∅ (14) where u solves F (u) = y . (15) The derivative of operator F is defined between the Banach space U and L (U ,H ), the space of the linear transformations from U to H . When we apply the derivative at u ∈ U we have a linear operator F ′ (u) : U → H and so we can define its adjoint, F ′ (u) ∗ : H ∗ → U∗. The source condition (14) is stated as follows: There exists an element ω ∈ H ∗ such that ξ = F ′ (u) ∗ ω ∈ ∂h (u) . (16) Theorem 3.2 (Stability). Let the assumptions 1.1, 3.1 and relation (3) hold true. Moreover, assume that there exists ω ∈ H ∗ such that (16) is satisfied and c‖ω‖ H ∗ < 1. Then, the following estimates hold: ∥∥F ( uδα ) − F (u) ∥∥ ≤ 2α‖ω‖ H ∗ + 2 ( α2 ‖ω‖ 2 H ∗ + δ2 ) 1 2 , D F ′(u)∗ω h ( uδα,u ) ≤ 2 1 − c‖ω‖ H ∗ [ δ2 2α + α‖ω‖ 2 H ∗ + ‖ω‖ H ∗ ( α2 ‖ω‖ 2 H ∗ + δ2 ) 1 2 ] . In particular, if α ∼ δ, then ∥∥F ( uδα ) − F (u) ∥∥ = O (δ) and DF ′(u)∗ω h ( uδα,u ) = O (δ). 106 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) Proof. Since uδα is the minimizer of (2), it follows from the definition of the Bregman distance that 1 2 ∥∥F ( uδα ) − yδ ∥∥2 ≤ 1 2 δ2 − α ( D F ′(u)∗ω h ( uδα,u ) + 〈 F ′ (u) ∗ ω , uδα − u 〉) . By using (3) and (15) we obtain 1 2 ∥∥F ( uδα ) − F (u) ∥∥2 ≤ ∥∥F ( uδα ) − yδ ∥∥2 + δ2 . Now, using the last two inequalities above, the definition of Bregman distance, the nonlinearity condition and the assumption ( c‖ω‖ H ∗ − 1 ) < 0, we obtain 1 4 ∥∥F ( uδα ) − F (u) ∥∥2 ≤ 1 2 (∥∥F ( uδα ) − yδ ∥∥2 + δ2 ) ≤ δ2 − αD F ′(u)∗ω h ( uδα,u ) + α 〈 ω , − F ′ (u) ( uδα − u )〉 ≤ δ2 − αD F ′(u)∗ω h ( uδα,u ) + α ‖ω‖ H ∗ ∥∥F ( uδα ) − F (u) ∥∥ +α‖ω‖ H ∗ ∥∥F ( uδα ) − F (u) − F ′ (u) ( uδα − u )∥∥ = δ2 + α ( c‖ω‖ H ∗ − 1 ) D F ′(u)∗ω h ( uδα,u ) + α ‖ω‖ H ∗ ∥∥F ( uδα ) − F (u) ∥∥ (17) ≤ δ2 + α‖ω‖ H ∗ ∥∥F ( uδα ) − F (u) ∥∥ (18) From (18) we obtain an inequality in the form of a polynomial of second degree for the variable γ = ∥∥F ( uδα ) − F (u) ∥∥. This gives us the first estimate stated by the theorem. For the second estimate we use (17) and the previous estimate for γ. Theorem 3.3 (Convergence). Let the assumptions 1.1 and 3.1 hold true. Moreover, assume the existence of ω ∈ H ∗ such that (16) is satisfied and c‖ω‖ H ∗ < 1. Then, the following estimates hold: ‖F (uα) − F (u)‖ ≤ 4α‖ω‖H ∗ , D F ′(u)∗ω h (uα,u) ≤ 4α‖ω‖ 2 H ∗ 1 − c‖ω‖ H ∗ . Proof. The proof is analogous to the proof of theorem 3.2, taking δ = 0. 3.2 Rates of convergence for source condition of type II In this subsection we consider a source condition similar to the one in (7), namely we assume the existence of ξ ∈ R(F ′ (u) ∗ F ′ (u)) ∩ ∂h (u) 6= ∅ . The assumption above is equivalent the existence of an element ω ∈ U with ξ = F ′ (u) ∗ F ′ (u) ω ∈ ∂h (u) . (19) CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 107 Theorem 3.4 (Stability). Let the assumptions 1.1, 3.1 hold as well as estimate (3). Moreover, let H be a Hilbert space and assume the existence of an h-minimizing solution u of (1) in the interior of D(F). Assume also the existence of ω ∈ U such that (19) is satisfied and c‖F ′ (u) ω‖ < 1. Then, for α sufficiently small the following estimates hold: ‖F ( uδα ) − F (u) ‖ ≤ α ‖F ′ (u) ω‖ + g(α,δ) , D ξ h ( uδα,u ) ≤ αs + (cs)2/2 + δg(α,δ) + cs (δ + α‖F ′ (u) ω‖) α (1 − c‖F ′ (u) ω‖) , (20) where g(α,δ) = δ + √ (δ + cs) 2 + 2αs (1 + c‖F ′ (u) ω‖) and s = D ξ h (u − αω,u). Proof. Since uδα is the minimizer of (2), it follows that 0 ≥ 1 2 ∥∥F ( uδα ) − yδ ∥∥2 − 1 2 ∥∥F (u) − yδ ∥∥2 + α ( h ( uδα ) − h (u) ) = 1 2 ∥∥F ( uδα )∥∥2 − 1 2 ‖F (u)‖ 2 + 〈 F (u) − F ( uδα ) , yδ 〉 H + α ( h ( uδα ) − h (u) ) = Φ ( uδα ) − Φ (u) . (21) where Φ (u) = 1 2 ‖F (u) − q‖ 2 + αD ξ h (u,u) − 〈 F (u) , yδ − q 〉 H + α〈ξ , u〉, q = F (u) − αF ′ (u) ω and ξ is given by source condition (19). From (21) we have Φ ( uδα ) ≤ Φ (u). By the definition of Φ (·), taking u = u − αω and setting v = F ( uδα ) − F (u) + αF ′ (u) ω we obtain 1 2 ‖v‖ 2 + αD ξ h ( uδα,u ) ≤ αs + T1 + T2 + T3 , (22) where s is given in the theorem, and T1 = 1 2 ‖F (u − αω) − F (u) + αF ′ (u) ω‖ 2 , T2 = ∣∣〈F ( uδα ) − F (u − αω) , yδ − y 〉 H ∣∣ , T3 = α 〈 F ′ (u) ω , F ( uδα ) − F (u − αω) − F ′ (u) ( uδα − (u − αω) )〉 H . The next step is to estimate the constants Tj , j = 1, 2, 3 above. We use the nonlinear condition (13), Cauchy-Schwarz, and some algebraic manipulation to obtain T1 ≤ c2s2 2 , T2 ≤ ∣∣〈v , yδ − y 〉 H ∣∣ + ∣∣〈F (u − αω) − F (u) + αF ′ (u) ω − , yδ − y 〉 H ∣∣ ≤ ‖v‖ ∥∥yδ − y ∥∥ + cDξh (u − αω,u) ∥∥yδ − y ∥∥ ≤ δ ‖v‖ + δcs, 108 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) and T3 = α 〈 F ′ (u) ω , F ( uδα ) − F (u) − F ′ (u) ( uδα − u )〉 H +α〈F ′ (u) ω , − (F (u − αω) − F (u) + αF ′ (u) ω)〉 H ≤ α ‖F ′ (u) ω‖ ∥∥F ( uδα ) − F (u) − F ′ (u) ( uδα − u )∥∥ +α‖F ′ (u) ω‖ ‖F (u − αω) − F (u) + αF ′ (u) ω‖ ≤ α ‖F ′ (u) ω‖cD ξ h ( uδα,u ) + α‖F ′ (u) ω‖cD ξ h (u − αω,u) = αc‖F ′ (u) ω‖D ξ h ( uδα,u ) + αcs‖F ′ (u) ω‖ . Using these estimates in (22), we obtain ‖v‖ 2 + 2αD ξ h ( uδα,u ) [1 − c‖F ′ (u) ω‖] ≤ 2δ ‖v‖ + 2αs + (cs)2 +2δcs + 2αcs‖F ′ (u) ω‖ . Analogously as in the proof of theorem 2.3, each term on the left hand side of the last inequality is estimated separately by the right hand side. This allows the derivation of an inequality described by a polynomial of second degree. From this inequality, the theorem follows. Theorem 3.5 (Convergence). Let assumptions 1.1, 3.1 hold and assume H to be a Hilbert space. Moreover, assume the existence of an h-minimizing solution u of (1) in the interior of D(F), and also the existence of ω ∈ U such that (19) is satisfied, and c‖F ′ (u) ω‖ < 1. Then, for α sufficiently small the following estimates hold: ‖F (uα) − F (u)‖ ≤ α‖F ′ (u) ω‖ + √ (cs) 2 + 2αs (1 + c‖F ′ (u) ω‖) , D ξ h (uα,u) ≤ αs + (cs)2/2 + αcs‖F ′ (u) ω‖ H α ( 1 − c‖F ′ (u) ω‖ H ) , (23) where s = D ξ h (u − αω,u). Proof. The proof is analogous to the proof of theorem 3.4, taking δ = 0. Corollary 3.6. Let assumptions of the theorem 3.4 hold true. Moreover, assume that h is twice differentiable in a neighborhood U of u, and that there exists a number M > 0 such that for all u ∈ U and for all v ∈ U , the inequality 〈h′′(u)v , v〉 ≤ M ‖v‖ 2 holds. Then, for the choice of parameter α ∼ δ 2 3 we have D ξ h ( uδα,u ) = O ( δ 4 3 ) , while for exact data we obtain D ξ h ( uδα,u ) = O ( α2 ) . Proof. The proof is similar to the proof of corollary 2.5 and is based on theorems 3.4 and 3.5. 4 An Iterated Tikhonov Method for Nonlinear Problems On this section we investigate an iterative method based on Bregman distances for nonlinear problems. We consider the operator F : U → H defined between a Banach space and a Hilbert CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 109 space, Fréchet differentiable with closed and convex domain D(F). The operator equation (1) is ill-posed in the sense of Hadamard, the solution does not need to be unique, so we define S (y) = {u ∈ D(F) | F (u) = y} . The method was originally proposed by Osher in [7], who generalized the ideas of the method ROF [10] (see [1] for further details). The analyzed method generalizes the iterated Tikhonov method and is given by uk+1 ∈ argmin { 1 2 ∥∥F (u) − yδ ∥∥2 + αkDξkh (u,uk) } , (24) where the subgradient required is updated by the rule ξk+1 = ξk − 1 αk F ′ (uk+1) ∗ ( F (uk+1) − y δ ) . (25) Algorithm 1 generalized Tikhonov with Bregman distance Require: u0 ∈ D(F) ∩ dom h, ξ0 ∈ ∂h (u0) 1: k = 0 2: αk > 0 3: repeat 4: uk+1 ∈ argmin { 1 2 ∥∥F (u) − yδ ∥∥2 + αkDξkh (u,uk) } 5: ξk+1 = ξk − 1 αk F ′ (uk+1) ∗ ( F (uk+1) − y δ ) 6: k = k + 1 7: αk > 0 8: until convergence Remark 4.1. It is easy to see that the definition (25) is equivalent to ξk+1 = ξ0 − k∑ j=0 1 αj F ′ (uj+1) ∗ ( F (uj+1) − y δ ) . (26) We obtain monotonicity of residuals directly from the above definitions. Lemma 4.2. The iterates defined by algorithm 1 satisfy the estimate ∥∥yδ − F (uk+1) ∥∥ ≤ ∥∥yδ − F (uk) ∥∥ . Proof. Defining Jδα (u) = 1 2 ∥∥F (u) − yδ ∥∥2 + αkDξkh (u,uk), the lemma follows the fact that uk+1 is a minimizer of (24), i.e., J δ α (uk+1) ≤ J δ α (uk). Under a nonlinearity condition on F we prove a monotonicity result for the Bregman distance, i.e., D ξk+1 h (u,uk+1) ≤ D ξk h (u,uk). 110 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) Lemma 4.3. Let yδ ∈H be the given data. If for some uk and ξk, the iterate uk+1 in (24) satisfies ∥∥yδ − F (uk+1) − F ′ (uk+1) (u − uk+1) ∥∥ ≤ c ∥∥yδ − F (uk+1) ∥∥ , for some 0 < c < 1, then D ξk+1 h (u,uk+1) − D ξk h (u,uk) + D ξk h (uk+1,uk) ≤ − 1 − c αk ∥∥yδ − F (uk+1) ∥∥2 . (27) Proof. This result follows from the equality (see [1] for details) D ξk+1 h (u,uk+1) − D ξk h (u,uk) + D ξk h (uk+1,uk) = 〈ξk+1 − ξk , uk+1 − u〉 . Using (25) on the right hand side, summing ±(F (uk+1) −y δ ) on the second term (inside the inner product), using Cauchy-Schwarz and the assumptions, we conclude that estimate (27) holds. The subsequent results are obtained assuming that the nonlinear operator F is such that D(F) ⊆ L2 (Ω) and Ω ⊂ Rn is a bounded Lipschitz domain, and assuming that the regularization convex functional is given by h (u) = 1 2 ‖u‖ 2 L2(Ω) + |u|BV (Ω) . (28) Lemma 4.4. If h (·) is the convex functional defined by (28), then 1 2 ‖v − u‖ 2 L2(Ω) ≤ D ξ h (v,u) for every u,v ∈ D(F) and ξ ∈ ∂h (u). Proof. This proof is straightforward, once we establish some auxiliary properties concerning calculus of subgradients. For a complete proof we refer the reader to [2]. Assumption 4.5. Let F : D(F) ⊂ L2 (Ω) → H be a weakly sequentially closed nonlinear operator, F ′ (·) be locally bounded. Moreover, suppose that the nonlinearity condition ‖F (v) − F (u) − F ′ (u) (v − u)‖ ≤ η ‖u − v‖L2(Ω) ‖F (u) − F (v)‖ (29) is satisfied for every u, v ∈ Bρ (u) ∩ D(F), where η,ρ > 0 and Bρ (u) denotes the open ball around u of radius ρ in L2 (Ω) and u ∈ S (y) ∩ dom h. Remark 4.6. We can rewrite the left side of the inequality given in (29) as ‖F ′ (u) (v − u)‖ ≤ ( 1 + η ‖u − v‖L2(Ω) ) ‖F (u) − F (v)‖ . The next result gives the mean result about the sequence of iterates from algorithm 1 is well-defined. CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 111 Proposition 4.7. Let assumption 4.5 hold, k ∈ N and uk,ξk be a pair of iterates according to algorithm 1. Then, there exists a minimizer uk+1 for (24) and ξk+1 given by (25) satisfies ξk+1 ∈ ∂h (uk+1). Proof. If there exists an u such that Jδαk (u) is finite, then there is a sequence (uj) ∈ D(F) ∩ BV (Ω) such that limj J δ αk (uj ) → β, where β = inf { Jδαk (u) | u ∈ D(F) } . In partic- ular, D ξk h (uj,uk) ≤ M αk . By definition of the Bregman distance, together with (28) and observing that 1 2 ‖uj‖ 2 L2(Ω) − 〈ξk , uj〉 = 1 2 ‖uj − ξk‖ 2 L2(Ω) − 1 2 ‖ξk‖ 2 L2(Ω), we obtain |uj|BV (Ω) ≤ M̃k, where M̃k ≥ 0 depends on the current iterates. Thus, the existence of a minimizer follows from compact- ness arguments. It remains to prove that ξk+1 ∈ ∂h (uk+1). This result follows from the inequality φ2(v) ≥ φ2(uk+1) +〈−φ ′ 1(uk+1) , v − uk+1〉, where φ1(u) = 1 2 ∥∥yδ − F (u) ∥∥2 L2(Ω) and φ2(u) = αkD ξk h (u,uk) (see [1, 2] for details). 4.1 Main results The main results of this section give sufficient conditions to guarantee existence of a convergence subsequence in algorithm 1, (for both exact and noisy data). In particular, for noisy data, we introduce a stopping rule based on the discrepancy principle. For a complete proof we refer the reader to [2]. Theorem 4.8 (Convergence). Let the assumption 4.5 hold, γ < min { 1 η , ρ 2 } for η, ρ as in (29), 0 < αk < ᾱ, h (u) < ∞. Moreover, assume that the starting values u0, ξ0 ∈ L 2 (Ω) satisfy D ξ0 h (u,u0) < γ2 8 for some u ∈ S (y). Then, for exact data, the sequence (uk) has a subsequence converging to some u ∈ S (y) in the weak-∗ topology of BV (Ω). Moreover, if S (y) ∩ Bρ (u) = {u}, then uk ∗ −⇀ u in BV (Ω). Proof. Step 1: First we rewrite the assumption in the form 2 √ 2D ξ0 h (u,u0) < γ. Assuming that the same condition holds for a pair of iterates uk, ξk we proof by induction that it also holds for the index k + 1. Let uk+1 be the minimizer of Jαk (·), so Jαk (uk+1) ≤ Jαk (u). Thus we rewrite the inequality, then apply lemma 4.4 twice, and conclude that ‖uk+1 − u‖L2(Ω) < γ. Hence, assumption 4.5 is satisfied and the lemma 4.3 hold for all iterates. Step 2: In this step we proof that ∑∞ i=0 1 αi ‖y − F (ui+1)‖ 2 < ∞. As in the previous step, it follows from Lemma 4.3 that (27) holds for every k. Adding up the first k terms and using the assumptions on the starting values, we conclude that D ξk h (u,uk) + k−1∑ i=0 D ξi h (ui+1,ui) + k−1∑ i=0 1 − ηγ αi ‖y − F (ui+1)‖ 2 ≤ γ2 8 . 112 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) Since all terms on the left hand side are positive, step 2 follows from the third term taking the limit as k tends to infinity. Note that this series is convergent, by the convergence criterion for series follows F (uk) → y. Step 3: We show the uniform limitation of the sequence (h (uk)). Applying the Bregman distance (it is always grater than zero) we have h (uk) ≤ h (u) − 〈ξk , u − uk〉. Thus, by remark 4.1, ‖uk+1 − u‖L2(Ω) < γ and the Cauchy-Schwarz inequality, we obtain h (uk) ≤ h (u) + γ ‖ξ0‖L2(Ω) + k−1∑ i=0 1 αi ‖F (ui+1) − y‖ ‖F ′ (ui+1) (u − uk)‖ . In order to estimate the term inside the sum, note that for 0 ≤ i ≤ k−1, the estimate ‖F ′ (ui+1) (u − uk)‖ ≤ ‖F ′ (ui+1) (u − ui+1)‖ + ‖F ′ (ui+1) (uk − ui+1)‖ holds. Now, using remark 4.6 twice, we find the bound (3 + 5ηγ) ‖F (ui+1) − y‖ for the previous estimate. Substituting this estimate in the sum above and using step 2, the desired boundedness of the sequence (h (uk)) follows. Step 4: We know that |h (uk)| = h (uk) ≤ N, for some N > 0 (see (28)). The remaining assertions of the theorem follow from standard compactness results (Banach-Alaoglu theorem). We use the closed graph theorem to ensure that the limit of the obtained sequence belongs to S (y). In the case of noisy data we use a generalized discrepancy principle as stopping rule. The stopping index is defined as the smallest integer k∗ satisfying ‖F (uk∗ ) − y δ‖ ≤ τδ (30) where τ > 1 still has to be chosen. Theorem 4.9 (Stability). Let assumption 4.5 hold, γ < min { 1 η , ρ 2 } for η and ρ as in (29), 0 < α ≤ αk ≤ α, h (u) < ∞ and the starting values u0, ξ0 ∈ L 2 (Ω) satisfy D ξ0 h (u,u0) < γ2 8 for an u ∈ S (y). Moreover, let δm > 0 be a sequence such that δm → 0, and let the corresponding stopping indices k∗m be chosen according to (30) with τ > (1 + ηγ)/(1 − ηγ). Then for every δm the stopping index is finite and the sequence ( uk∗ m ) has a subsequence converging to an u ∈ S (y) in the weak-∗ topology of BV (Ω). Moreover, if S (y) ∩ Bρ (u) = {u}, then uk∗ m ∗ −⇀ u in BV (Ω). Proof. Step 1: This step is analogous to step 1 in the proof of theorem 4.8. For each k such that k < k∗−1, we have ∥∥F (uk) − yδ ∥∥ > τδ. By induction one can prove that ‖uk+1 − u‖L2(Ω) < γ, and that the nonlinear condition (27) holds. Therefore, lemma 4.3 holds for c = 1 τ (1 + ηγ) + ηγ. Step 2: We show that the stopping index k∗ is finite. Analogous to step 2 in the proof of theorem 4.8, we sum up the first k∗ − 1 terms of (27), obtaining k∗−2∑ i=0 1 αi ∥∥yδ − F (ui+1) ∥∥2 < γ2 8 (1 − c) . (31) Since for every k < k∗ − 1 the inequality ∥∥F (uk) − yδ ∥∥ > τδ holds, we use this inequality on the left hand side of the above estimate and conclude that k∗ < ( γ τδ )2 α 8 (1 − c) + 1 . CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 113 Step 3: In order to prove the convergence of the series in (31), notice that the right hand side of (31) does not depend on k∗. Step 4: Analogous to step 3 in the proof of theorem 4.8, we use the Bregman distance, and remark 4.1 to conclude that h (uk∗ ) ≤ h (u) + |〈ξ0 , u − uk∗〉| + k∗−2∑ i=0 1 αi ∣∣〈F (ui+1) − yδ , F ′ (ui+1) (u − uk∗ ) 〉 H ∣∣ + 1 αk∗−1 ∣∣〈F (uk∗ ) − yδ , F ′ (uk∗ ) (u − uk∗ ) 〉 H ∣∣ . In the sequel we estimate the three terms on the right hand side of this inequality. For the first of them we have ‖u − uk∗‖L2(Ω) < ρ. Indeed, on step k ∗ − 1 we have uk∗ as minimizer of J δ αk (·), thus Jδαk (uk∗ ) ≤ J δ αk (u). Rearranging the terms and discarding some positive terms, it follows that D ξk∗−1 h (uk∗,uk∗−1) < δ2 2αk∗−1 + γ2 8 . Finally, we apply lemma 4.4 with δ < δ = √ 3/4γ2α. To estimate the last two terms we use Cauchy-Schwarz, assumption 4.5, lemma 4.2 and 4.3, remark 4.6 together with steps 1, 2 and 3 above. Summarizing, we obtain h (uk∗ ) < h (u) + ρ‖ξ0‖L2(Ω) + (c + 3 + 4ηρ) (1 − c) γ2 8 + τδ2 (1 + ηρ) (1 + τ) α . Step 5: This step is very similar to step 4 in the proof of theorem 4.8. We just need to show that F ( uk∗ m ) → y. This convergence follows from the estimate ∥∥F ( uk∗ m ) − y ∥∥ ≤ ∥∥F ( uk∗ m ) − yδm ∥∥ + ∥∥yδm − y ∥∥ ≤ (1 + τ) δm when δm goes to zero. A Definitions Definition A.1. Given h (·) a convex functional, one can define the Bregman distance with respect to h between the elements v,u ∈ dom h as Dh (v,u) = { D ξ h (v,u) | ξ ∈ ∂h (u) } , where ∂h (u) denotes the subdifferential of h at u and D ξ h (v,u) = h (v) − h (u) − 〈ξ , v − u〉 . We remark that 〈 , 〉 denotes the standard dual pairing (duality product) with respect to U ∗ ×U . Another important definition is the generalized solution, we introduce the notion of the h- minimizing solution bellow. 114 I.R. Bleyer and A. Leitão CUBO 11, 5 (2009) Definition A.2. An element u ∈ dom h ∩ D(F) is called an h-minimizing solution of (1) if it minimizes the functional h among every possible solutions, that is, u = argmin{h (u) | F (u) = y} . Whenever we need, we can choose the least-square solution instead the standard solution F (u) = y. Acknowledgments The work of A.L. is supported by the Brazilian National Research Council CNPq, grants 306020/2006– 8, 474593/2007–0, and by the Alexander von Humbolt Foundation AvH. Received: December, 2008. Revised: April, 2009. References [1] M. Bachmayr, Iterative total variation methods for nonlinear inverse problems, master’s thesis, Johannes Kepler Universität, Linz, January 2007. [2] I. R. Bleyer, Tikhonov functional and penalty with Bregman distances (in portuguese), mas- ter’s thesis, Federal University of Santa Catarina, Florianópolis, December 2008. [3] L. Bregman, The relaxation method for finding the common point of convex sets and its applications to the solution of problems in convex programming., USSR Computational Math- ematics and Mathematical Physics, 7 (1967), pp. 200–217. [4] M. Burger and S. Osher, Convergence rates of convex variational regularization, Inverse Problems, 20 (2004), pp. 1411–1421. [5] C. W. Groetsch, The theory of Tikhonov regularization for Fredholm equation of the first kind, Pitman, Boston, 1984. [6] B. Hofmann, B. Kaltenbacher, C. Pöschl, and O. Scherzer, A convergence rates result for Tikhonov regularization in Banach spaces with non-smooth operators, Inverse Prob- lems, 23 (2007), pp. 987–1010. [7] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, An iterative regulariza- tion method for total variation-based image restoration, Multiscale Modeling & Simulation, 4 (2005), pp. 460–489. CUBO 11, 5 (2009) On Tikhonov Functionals Penalized ... 115 [8] E. Resmerita, Regularization for ill-posed problems in Banach spaces: convergence rates, Inverse Problems, 21 (2005), pp. 1303–1314. [9] E. Resmerita and O. Scherzer, Error estimates for non-quadratic regularization and the relation to enhancement, Inverse Problems, 22 (2006), pp. 801–814. [10] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), pp. 259–268. [11] A. N. Tikhonov, Solution of incorrectly formulated problems and the regularization method, Soviet Math Dokl, 4 (1963), pp. 1035–1038. English translation of Dokl Akad Nauk SSSR 151, 1963, 501-504. B7-article_cubo