J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 Journal of the Nigerian Society of Physical Sciences Extension of ADMM Algorithm in Solving Optimal Control Model Governed by Partial Differential Equation K. A. Dawodu∗ Department of Mathematical Sciences, Federal University of Technology Akure, Nigeria Abstract This paper presents an Algorithm for the numerical solution of the Optimal Control model constrained by Partial Differential Equation using the Alternating Direction Method of Multipliers (ADMM) accelerated with a parameter factor in the sense of Nesterov. The ADMM tool was applied to a partial differential equation-governed optimization problem of the one-dimensional heat equation type. The constraint and objective functions of the optimal control model were discretized using the Crank-Nicolson and Composite Simpson’s Methods respectively into a derived discrete convex optimization form amenable to the ADMM. The primal-dual residuals were derived to ascertain the rate of convergence of the model for increasing iterates. An existing example was used to test the efficiency and degree of accuracy of the algorithm and the results were favorable when compared the existing method. DOI:10.46481/jnsps.2021.159 Keywords: Optimal Control model, Alternating Direction Method of Multipliers, Partial Differential Equation constraint, Crank-Nicolson, Composite Simpson’s, primal-dual. Article History : Received: 22 January 2021 Received in revised form: 05 May 2021 Accepted for publication: 10 May 2021 Published: 29 May 2021 ©2021 Journal of the Nigerian Society of Physical Sciences. All rights reserved. Communicated by: T. Latunde 1. Introduction Optimal control problems (OCP) involving partial differ- ential equations (PDE) are derivations arising from solid me- chanics, fluid dynamics, engineering, Economics and sciences amongst others. Most of which are formulations into second order performance measures governed by PDE constraints of the ellipticc type. The classical analytic techniques of the Hamil- ton -Jacobi -Bellman (HJB) or Euler Lagrange(E-L) may proof difficult because of the complexity of the formulated contin- uous nonlinear models. Betts and Campbell in [1] worked ∗Corresponding author tel. no: Email address: dawodukazeem@futa.edu.ng (K. A. Dawodu ) on the "first discretized then optimize" technique, a novel ap- proach in obtaining a discretized optimization problem with finite number of (large) vectors. Ghobadi et al. in [2] ex- tended this approach to the one-dimensional heat equation and enumerated a lot of advantages when compared with the historical "first optimize then discretized" approach. These include (i) the circumvention of the need to establish certain properties of the problem such as the active constraint inter- val and the number of touch points a priori and (ii) its amenabil- ity to well-structured discretization schemes and optimiza- tion solvers for inequality constrained problems. The work of [2] was shrouded with conditional stability problem due to the explicit discretization method used. However, this draw- 105 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 106 back of conditional stability will be circumvented in this re- search with the use of the implicit method of higher order in the discretization and the ADMM extended in obtaining the optimal solutions with better accuracy of results and faster rate of convergence. Moreover, many authors have such as Olotu and Dawodu in ([3, 4, 5]) have worked extensively on proportional and multi- delay systems using the "first discretized then optimize" ap- proach. However, He and Yuan in [6] and He et. al in [7] worked on block-convex programming problem synonymous to discretized continuous distributed systems, though the im- plementation on PDE-based optimal control problems were not explored. 2. Statement of problem Considering a one-dimensional bar connected to a heat- ing (cooling) device at its both ends with known initial and lower- bound conditions but with unknown boundary condi- tions, then the temperature functions on the boundaries are considered to be the control variables (ui (t ), i = 1,··· , q ) as in [1]. The heat equation is then given as; ∂f (x , t ) ∂t = ∂ 2 f (x , t ) ∂x 2 , 0 ≤ t ≤ T, l1 ≤ x ≤ l2 (1) where f (x , t ) is the temperature at position x and at time t . It is then imperative to have the temperature profile f (x , t ) at each point of the bar that does not go below a given spe- cific temperature function given by g (x , t )(also known as a lower bound function), during the time interval. Setting the temperature of the bar very high makes sure the lower-bound condition (inequality constraint) below is satisfied. f (x , t ) ≥ g (x , t ) for all x and t (2) However, the objective of this modeling is to choose an appropriate temperature profile such that the energy consumed (objective function) is at minimum by reducing overheating, that is, to minimize the cost of energy consumed while keep- ing the temperature of the bar above the lower-bound profile. The appropriate choice of the objective function can be com- pared to the temperature profile of the bar with the lower- bound profile as closely as possible expressed mathematically as ‖f ((x , t ), y ) − g (x , t )‖. Assuming the temperature profile is the approximation of the temperature of the bar provided the heat equation and initial condition are satisfied. And suppose the squared values of all the temperature in time t and space x is expressed as f 2(x , t ), then the objective function to be minimized is the integral of f 2(x , t ) in time and space. Then the optimization model of the heat-transfer problem can be mathematically expressed as min u, f ∫ T 0 ∫ l2 l1 f 2(x , t ) d xd t + ∫ T 0 [p u21 (t ) + q u22 (t )]d t , s.t ∂2 f (x , t ) ∂x 2 = ∂f (x , t ) ∂t , f (x , t ) > g (x , t ), f (x , 0) = f0(x ), (3) f (l1, t ) = u1(t ), f (l2, t ) = u2(t ), t ∈ [0, T ], x ∈ [l1, l2], where x ∈ R, u ∈ R and f0(x ) is a known initial function of tem- perature at the initial time (t = 0). While u1(t ) and u2(t ) are the temperatures at the two boundary points, l1 and l2 re- spectively, of the bar whose optimal values will be obtained. 3. Materials and methods The continuous non-linear model (problem) formulated above can now be solved using the approach of "first-discretized- then-optimize" to obtain a quadratic optimization problem with finite number of (large) variables. The implicit discretiza- tion methods will be used for the discretization of the continuous- time OCP because its (i) stability is unconditional,(ii) time step-length can be chosen independent of the spatial step- length and (iii) coefficient matrix remains well-conditioned. 3.1. Discretization of the Constraints In the transformation of the continuous time PDE con- straints, the second-order Crank-Nicolson method derived from the averaging of the forward Difference method at the j th step in time t and the Backward-Difference at the ( j + 1)th step in t with truncation error h2 ∂2 f ∂t 2 (xi ,µj ) + O(h2) will be used. And given that the grid points are equally-distributed for both the spatial interval (length of bar)[l1, l2] and the time inter- val [0, T ] with step-lengths δ and h respectively, computed as δ = (l2 − l1)/n and h = T /N . Then the number of intervals n from the discretization in space and N from the discretization in time will generate (n +1) and (N +1) number of grid points respectively. In generating the numerical sequence, let xi = l1 + iδ be the i th point in space (for i = 0, 1, 2, . . . , n) and t j = j h be the j th point in time (for j = 0, 1, 2, . . . , N ), provided the usual dif- ferentiability conditions are satisfied, then the Crank-Nicolson (averaged difference) discretizations scheme in time and space with local truncation error of order O(δ2 + h2) expressed be- low as; ∂f (x , t j ) ∂t = f ′t (x , t j ) ' fi , j +1 − fi , j h , (4) 106 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 107 ∂2 f (x , t j ) ∂x 2 = f ′′x (xi , t j ) + f ′′x (xi , t j +1) 2 , ' 1 2 [ fi −1, j − 2 fi , j + fi +1, j δ2 + fi −1, j +1 − 2 fi , j +1 + fi +1, j +1 δ2 ] (5) where fi , j ' f (xi , t j ) and uk , j ' uk (t j ) for i = 0, 1, 2, . . . , n, j = 0, 1, 2, . . . , N −1 and k = 1, 2. The discretization of the PDE con- straint function using eqns. (4) and (5), given that f (0, t j ) = f0, j = u1, j , f (xn , t j ) = fn, j = u2, j and f (x j , 0) = 0 yields; fi , j +1 − fi , j h = 1 2 [ fi −1, j − 2 fi , j + fi +1, j + fi −1, j +1 − 2 fi , j +1 + fi +1, j +1 δ2 ] , (6) for i = 1, 2,··· , (n −1) and 0 ≤ j ≤ N −1. Substituting θ = h/δ2 and collecting like-terms gives the recurrence formula below; −λfi −1, j −λfi −1, j +1 + 2(λ− 1) fi , j + 2(λ+ 1) fi , j +1 −λfi +1, j −λfi +1, j +1 = 0. (7) Setting i = 1 and f0, j = u1, j yields; −θu1, j −θu1, j +1 + 2(θ− 1) f1, j + 2(θ+ 1) f1, j +1 −θ f2, j −θ f2, j +1 = 0 for 0 ≤ j ≤ N − 1. (8) For j = 0 given f1,0 = f2,0 = u1,0 = 0 yields; −λu1,1 + 2(θ+ 1) f1,1 −θ f2,1 = 0 (9) For j = 1 yields; −θu1,1 −θu1,2 + 2(θ− 1) f1,1 + 2(θ+ 1) f1,2 −θ f2,1 −θ f2,2 = 0 (10) For j = k (2 ≤ k ≤ N − 1); −θu1,k −θu1,k+1 + 2(θ− 1) f1,k + 2(θ+ 1) f1,k+1 −θ f2,k −θ f2,k+1 = 0. (11) When eqn. (11) is expanded for i = 1 and 1 ≤ j ≤ N − 1 gives the matrix equation below:  −θ 0 . . . . . . . . . 0 −θ −θ 0 . . . . . . ... 0 −θ −θ 0 . . . ... ... . . . . . . . . . . . . ... ... . . . . . . . . . . . . 0 0 . . . . . . 0 −θ −θ     u1,1 u1,2 u1,3 ... ... u1,N   +   2(θ+ 1) 0 . . . 0 2(θ− 1) 2(θ+ 1) . . . ... 0 . . . . . . ... ... . . . 2(θ− 1) 2(θ+ 1)     f1,1 f1,2 ... f1,N   +   −θ 0 0 . . . 0 −θ −θ 0 . . . ... 0 . . . . . . . . . ... ... . . . . . . . . . 0 0 . . . 0 −θ −θ     f2,1 f2,2 ... ... f2,N   =   0 0 ... ... 0   , (12) compactly written as LU1 +Ĝ F1 + LF2 = 0, (13) where F1 = [ f1,1, f1,2, .., f1,N ], F2 = [ f2,1, f2,2, .., f2,N ], U1 = [u1,1, u1,2, ..., u1,N ] T and 0 are column vectors of dimension N ×1 while L and Ĝ are bi-diagonal square-matrices of N × N dimensions with respective entries described below. L =   −θ : 1 ≤ i ≤ N and j = i −θ : 2 ≤ i ≤ N and j = i − 1 0 : otherwise (14) and Ĝ =   2(θ+ 1) : 1 ≤ i ≤ N and j = i 2(θ− 1) : 2 ≤ i ≤ N and j = i − 1 0 : otherwise (15) Set i = 2; this then gives the recurrence equation below; −θ f1, j −θ f1, j +1 + 2(θ− 1) f2, j + 2(θ+ 1) f2, j +1− θ f3, j −θ f3, j +1 = 0 for 0 ≤ j ≤ N − 1. (16) For j = 0; and equating f1,0 = f2,0 = f3,0 = 0 yields, −θ f1,1 + 2(θ+ 1) f2,1 −θ f3,1 = 0 (17) For j = 1; −θ f1,1 −θ f1,2 + 2(θ− 1) f2,1 + 2(θ+ 1) f2,2 −θ f3,1 −θ f3,2 = 0 for 0 ≤ j ≤ N − 1. (18) For j = 2; −θ f1,2 −θ f1,3 + 2(θ− 1) f2,2 + 2(θ+ 1) f2,3 −θ f3,2 −θ f3,3 = 0 for 0 ≤ j ≤ N − 1. (19) ... ... ... ... ... ... ... ... ... ..., For j = N − 1; −θ f1,N −1 −θ f1,N + 2(θ− 1) f2,N −1 + 2(θ+ 1) f2,N −θ f3,N −1 −θ f3,N = 0 for 0 ≤ j ≤ N − 1. (20) The matrix equation for eqns. (17) to (20) is then given as, LF1 +Ĝ F2 + LF3 = 0. (21) 107 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 108 For i = 2, 3,··· , n − 2 and 1 ≤ j ≤ N − 1 will then be expressed below as: LFi −1 +Ĝ Fi + LFi +1 = 0 (22) For i = n − 1, 1 ≤ j ≤ N − 1 and setting U2 = Fn gives; LFn−2 +Ĝ Fn−1 + LU2 = 0. (23) Where Fi = [ fi ,1, fi ,2,··· , fi ,N ]T ∈ RN ×1, U2 = [u2,1, u2,2..., u2,N ]T ∈ RN ×1, L and Ĝ are described in eqns. (14) and (15) respec- tively. Combining eqns. (13), (22) and (23) gives the matrix formulation below:   L 0 0 0 ... ... ... ... 0 L     U1 U2   +   Ĝ L 0 . . . 0 L Ĝ L . . . ... 0 . . . . . . . . . 0 ... . . . . . . . . . L 0 . . . 0 L Ĝ     F1 F2 ... ... Fn−1   =   0 0 ... ... 0   , (24) compactly written as, CU + D F = 0, (25) where C is a block-matrix of dimension (n − 1)N × 2N , D is a square block-matrix of dimension (n − 1)N × (n − 1)N , U is a column vector of dimension 2N × 1 while F and 0 are column vectors of dimension (n − 1)N × 1. u1(t j ) and u2(t j ) are the control variables for which gives access to the heating (cooling) resources, while the temperature variables f (xi , t j ) of the other parts of the bar are computed based on the heat- transfer equations. 3.2. Discretization of the Boundaries Given fi , j ≥ g (xi , t j ) for i = 1, 2,··· , n − 1 and j = 1,··· , N , then u1, j ≥ g (l1, t j ) and u2, j ≥ g (l2, t j ) for j = 1, 2,··· , N . Therefore, u1, j + u2, j ≥ g (l1, t j ) − g (l2, t j ) for j = 1, 2,··· , N . (26) For j = 1; (u1,1 + u2,1) − z1 = g (l1, t1) + g (l2, t1) = g 1 (27) For j = 2; (u1,2 + u2,2) − z2 = g (l1, t2) + g (l2, t2) = g 2 (28) ... ... ... ... ... ... ... ... ... ... For j = N ; (u1,N + u2,N ) − zN = g (l1, tN ) + g (l2, tN ) = g N (29) Combining eqns. (27) to (28) generates the matrix-structure below:     1 0 ··· 0 0 1 . . . ... ... . . . . . . 0 0 ··· 0 1     1 0 ··· 0 0 1 . . . ... ... . . . . . . 0 0 ··· 0 1       u1,1 ... u1,N u2,1 ... u2,N   −   z1 z2 ... ... zN   =   g 1 g 2 ... ... g N   . (30) Equation (30) is compactly written as; MU − Z = G , (31) where I ∈ RN ×N is the identity matrix, M = [I , I ] ∈ RN ×2N is the augmented double identity matrices, Z = [z1, z2,··· , zN ]T ∈ RN ×1 is the positive slack-variable vector, U = [U1,U2]T = [u11, u12, .., u1N , u21, u22, .., u2N ]T ∈ R2N ×1 is the unknown control-variable vector and G = [g 1, g 2,··· , g N ]T ∈ RN ×1 is the known coefficient vector. 3.3. Discretization of Objective Function In the discretization of the continuous-time objective func- tion(performance measure) both in space and time, the com- posite Simpson’s rule and the right-Riemann rule will be used respectively as stated below:∫ l2 l1 f (x , t ) d x ' 2δ 3 [ n/2−1∑ i =1 f (x2i , t ) + 2 n/2∑ i =1 f (x2i −1, t ) ] + δ 3 [ f (x0, t ) + f (xi , t ) ] (32)∫ T 0 f (x , t ) d t ' h N∑ j =1 f (x , t j ) (33) Then the discretized objective function of the optimal control problem in eqn. (3) using eqns. (32) and (32) is expressed below as; min u, f ∫ T 0 [ 2δ 3 J1 + δ 3 J2 + J3 ] d t (34) 108 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 109 where, J1 = n/2−1∑ i =1 f (x2i , t ) + 2 n/2∑ i =1 f (x2i −1, t ), (35) J2 = f (x0, t ) + f (xn , t ), (36) J3 = p u21 (t ) + q u22 (t ). (37) Discretizing eqn. (34) using (32) and collecting like-terms gives; ' min u, f [ 4hδ 3 N∑ j =1 n/2∑ i =1 f 22i −1, j + 2hδ 3 N∑ j =1 n/2−1∑ i =1 f 22i , j + ( p + δh 3 ) N∑ j =1 u21, j + ( q + δh 3 ) N∑ j =1 u22, j ] , ' min u, f [ N∑ j =1 [ 4hδ 3 ( f 21, j + f 23, j + .. + f 2n−1, j ) + 2hδ 3 ( f 22, j + f 24, j + .. + f 2n−2, j ) + m1u21, j + m2u22, j ]] (38) where m1 = ( p + δh3 ) and m2 = ( q + δh3 ) . Recall that f 20, j = f 2(l1, j ) = u21 (t j ) = u2i , j and f 2n, j = f 2(l2, j ) = u22 (t j ) = u22, j , set- ting the values of j = 1, 2, ..., N in eqn. (38) yields the matrix operator. For j = 1; 4hδ 3 ( f 21,1 + f 23,1 + .. + f 2n−1,1) + 2hδ 3 ( f 22,1 + f 24,1 + .. + f 2n−2,1) + m1u21,1 + m2u22,1 for j = 2; 4hδ 3 ( f 21,2 + f 23,2 + .. + f 2n−1,2) + 2hδ 3 ( f 22,2 + f 24,2 + .. + f 2n−2,2) + m1u21,2 + m2u22,2 ... ... ... ... ... ... ... ... for j = N ; 4hδ 3 ( f 21,N + f 23,N + .. + f 2n−1,N ) + 2hδ 3 ( f 22,N + f 24,N + .. + f 2n−2,N ) + m1u21,N + m2u22,N . The matrix formation of the above equations is then written as follows: J (F,U ) = 1 2 F T Q F + 1 2 U T HU , (39) where, Q =   8 3 hδI 0 0 . . . 0 0 43 hδI 0 . . . ... ... . . . ... 0 ... ... 0 ... . . . ... 0 . . . ... . . . 8 3 hδI   , (40) H = ( 2m1 I 0 0 2m2 I ) (41) F Ti = [ fi ,1, fi ,2,··· , fi ,N ] ∈ R1×N for i = 1, 2, ..., (n −1) with F T =[ F T1 , F T 2 ,··· , F Tn−1 ] ∈ R1×(n−1)N . Similarly, U T k = [ uk ,1, uk ,2,··· , uk ,N ] ∈ R1×N for k = 1, 2 with U T = [ U T1 ,U T 2 ] ∈ R1×2N . However, the matrices Q ∈ R(n−1)N ×(n−1)N and H ∈ R2N ×2N in the compact- ified convex quadratic eqn. (39) are real, symmetric and positive- definite with the diagonal coefficients described below as fol- lows: Q =   8 3 hδI : (k − 1)N + 1 ≤ i ≤ k N and j = i , for k = 1, 3, ..., (n − 1) 4 3 hδI : (k − 1)N + 1 ≤ i ≤ k N and j = i , for k = 2, 3, ..., (n − 2) 0 : otherwise (42) and H =   2m1 : 1 ≤ i ≤ N and j = i 2m2 : N + 1 ≤ i ≤ 2N and j = i 0 : otherwise (43) Combining eqns. (25), (31) and (39) gives, in a matrix form, the compact convex quadratic optimization problem with lin- ear constraints as expressed below:-  Min J = 12 U T HU + 12 F T Q F s.t : CU + D F = 0, MU ≥ G . (44) The above re-formulated discrete model is a well-structured, quadratic optimization problem whose objective function is convex and separable. While the constraint function is linear and coupled which made the model amenable to the ADMM. However, the mixed-inequality constraint model is re-formulated into an equality constraint form by introducing non-negative slack variables Z (º 0) into the model. 3.4. Stability and Feasibility of the Model With the assumption that the solution of the PDE satis- fies the usual differentiability conditions, the second order Crank Nicholson implicit scheme, with local truncation er- ror of order O(h +δ2), was deplored for the discretization of 109 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 110 the PDE constraint to avoid the conditional stability and fea- sibility associated with explicit discretization methods. This stability of the implicit discretization method reflects in the consistency and statbility of the coefficient matrix operators. However, the composite simpson’s scheme, used for the dis- cretization of the objective function, has truncation error of order O(h4) with degree of accuracy (precision) of three. This helps to improve the accuracy and rate of convergence of the model, hence an improvement over the trapezoidal used in [1]. 3.5. Sparsity of the Model The very suitable approach for ascertaining the sparsity of the model is to compactify it as studied by Kameswaran and Biegler in [8]. The set of optimization eqns. (45) was then re-structured in a compact form as stated below. min J (U , F ) = ( U T F T ) ( H 0̂ 0̂T Q ) ( U F ) s.t ( C D M 0̄ ) ( U F ) ≥ ( 0∗ G ) , compactly written as, min J (W ) = W T P W s.t V W ≥ Ḡ , (45) where, W = (U F )T , P = ( H 0̂ 0̂T Q ) , V = ( C D M 0̄ ) and Ḡ = ( 0∗ G ) . The matrices H and Q are the two coefficient sub-matrices of the objective functions while C and D are the two sub- matrices of the constraint; both correspondent to those of the variables U and F . The sizes of the augmented matri- ces, showing the number of zero and non-zero entries are ex- pressed in the table 1 below. The sparsity ratio is defined here as the proportion of the zero entries in comparison with the size of the matrix. Obj. Coef. Constr. Coef. J W size (n + 3)N × (n + 3)N n N × (n + 1)N zeros (n + 3)N 3n(2N − 1) + 3 − 4N Non- (n + 3)N + n N (n N + N − 6)+ zeros (n N + 3N − 1) (3n − 3 + 4N ) Table 1: Sparsity Table for the Compactified Model 3.6. Positive definiteness and Convexity of the Matrix oper- ator Definition 1. (Sylvester criterion): A square (n × n) matrix is positive definite if it is real, symmetric with all the eigenvalues (λi > 0; i = 1, 2··· , n) or the leading principal minors (Mi > 0; i = 1, 2··· , n) strictly positive. Definition 2. (Corollary to Sylvester criterion): If the princi- pal diagonal entries ai i are the only nonzero entries of a square (n×n) matrix such that ai j = 0 for i , j ; then the leading prin- cipal minors (Mi > 0; i = 1, 2··· , n) are given by Mi = ∏i j =1 a j j for i = 1, 2,··· , n. The constructed matrix-operator P for the derived quadratic function (45) is an augmentation of the real symmetric ma- trices Q and H in (40) and (41) respectively whose principal diagonal entries are all strictly positive. Therefore, P is sym- metric and positive definite by definitions 2 and 3 since all the leading principal minors are all strictly positive. This prop- erty is pertinent to guarantee convexity and minimal solution of the quadratic function. 4. ADMM Algorithm Formulation In the formulation of the ADMM, the derived update for- mulas will be accelerated with an accelerator variant referred to as the relaxation factor. The results will then be extend- able to the discretized PDE governed OCP (44). The associ- ated augmented Lagrangian functional is expressed as Lρ(U , F, Z ,ξ1,ξ2) = 1 2 U T HU + 1 2 F T Q F + l+(Z )+ ρ 2 ∥ CU + D F +ξ1 ∥22 + ρ 2 ∥ MU −G − Z +ξ2 ∥22 (46) with the M-ADMM iterations stated thus: U k+1 ← arg min U { 1 2 U T HU + ρ 2 ∥ CU + D F +ξk1 ∥22 + ρ 2 ∥ MU −G − Z +ξ2 ∥22 } , (47) F k+1 ← arg min F { 1 2 F T Q F ++ρ 2 ∥ CU + D F +ξk1 ∥22 } , (48) Z k+1 ← arg min Z {ρ 2 ∥ MU −G − Z +ξ2 ∥22 } (49) s.t Z ≥ 0, ξk+11 = ∂ξ1 L(U k+1, F k+1, Z k+1,ξk1 ) (50) ξk+12 = ∂ξ2 L(U k+1, F k+1, Z k+1,ξk2 ) (51) where µ ∈ R(n−1)N and λ ∈ RN are Lagrange multipliers, ρ > 0 is the penalty parameter, ‖.‖2 denotes the euclidean (spec- tral) norm of a vector (matrix) argument, ξ1 = µ/ρ ∈ R(n−1)N and ξ2 = λ/ρ ∈ RN are the scaled dual variables, Z ∈ RN is the 110 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 111 introduced slack vector and l+(Z ) is the indicator function for the non-negative orthants defined as l+(Z ) = 0 for z ≥ 0 and l+(Z ) = +∞ otherwise. As the algorithm runs through the (k +1)-th iteration, the primal variables U k and F k as well as the dual variables µk and λk are updated, hence the need to derive the update formulas. Applying the Karush-Kuhn- Tucker (KKT) optimality conditions on the Augmented La- grangian Functional (46) for the sequential minimization of the variables requires updating all the critical variables as in- dicated in the ADMM iterations. Moreover the update formu- las of the variables U k and F k will be accelerated using the Gauss-seidel relaxation factor α ∈ [0, 2] as clearly illustrated in the works of Nesterov [9] and Ghadimi et. al [10]. 4.1. Derivations of the update formulas Applying the Karush-Kuhn-Tucker (KKT) optimality con- ditions on the Augmented Lagrangian Function in eqn. (46) for the sequential minimization of the variables requires up- dating all the critical variables as indicated in the ADMM it- erations. U-Update: Setting ∂U L(U , . k ) = 0 yields; U k+1 = ρB1 M T B2 (U − Update). (52) where, B1 = [ H +ρ(C T C + M T M ) ]−1 B2 = (G + Z k −ξk2 ) −C T (D F k +ξk1 ) F-Update: Setting ∂F L(U k+1, F, .k ) = 0 and replacing CU k+1, to relax the algorithm, with h(U k+1, F k ,α) = αCU k+1 + (1 −α)D F k (53) to yield F k+1 = −ρB3B4 (F − Update). (54) where, B3 = [ Q +ρ(D T D ) ]−1 B4 = [ αCU k+1 + (1 −α)D F k +ξk1 ] Z-Update: - Setting ∂Z L(. k+1, Z , .k ) = 0 and relaxing it with h(U k+1, F k+1,α) yields; Z k+1 = max { 0,−Z ∗ } (55) where Z ∗ = [ αCU k+1 + (1 −α)D F k+1 −G ] (56) Updating the Scaled-Dual variables - ξ1, ξ2 yields ξk+11 = ξk1 +CU k+1 + D F k+1, (57) ξk+12 = ξk2 + MU k+1 −G − Z k+1, (58) 4.2. Convergence Analysis Considering an optimization problem analogous to the form in (45) above;  Min u,y f1(u) + f2(y ) + l+(z ) s.t h1(u) + h2(y ) = 0 g 1(u) − z = 0 (59) where x ∈ Rn , u ∈ Rm while f1 : Rn → R ∪ {∞}, f2 : Rm → R ∪ {∞} and f3 : Rn → R ∪ {∞} are convex functions. The aug- mented lagrangian formulation is then stated thus: Lρ(u, y,µ,λ) = f1(u) + f2(y ) + l+(z ) +µT ( h1(u) + h2(y ) ) + ρ 2 ∥ h1(u) + h2(y ) ∥22 + λT ( g 1(u) − z ) + ρ 2 ∥ g 1(u) − z ∥22 (60) where µ ∈ Rn and λ ∈ Rm are Lagrange multipliers, ρ > 0 is the penalty parameter, ‖.‖2 denotes the euclidean (spectral) norm of a vector (matrix) argument, ξ1 = µ/ρ ∈ Rn and ξ2 = λ/ρ ∈ Rm are the scaled dual variables, z is the introduced slack vector and l+(z ) is the indicator function for the non- negative orthants. From duality theory, it is well-known that if u∗, y ∗,µ∗ and λ∗ are the primal and dual variables that minimize the La- grangian function Lρ, then every pair (u k , y k ) in the convex set, Lρ(u ∗, y ∗,µ∗λ∗) ≤ Lρ(x k , y k ,µ∗λ∗) (61) . Let f k = f1(x k )+ f2(y k ) be the objective function value at the k -th iterate (x k , y k ) and f ∗ = f1(x∗) + f2(y ∗) be the optimal function value. Let r k+11 = h1(uk )+h2(y k ) and r k+12 = g 1(uk )− z k be the residuals of the convex equality constraints in the k - th iteration. The objective of this research is now to prove the following theorems. Definition 3. (Dual Updates): At optimality, the update equa- tions for the dual variables from the augmented Lagrangian in (60) are given as µk+1 = µk + ρ [ h1(u k+1) + h2(y k+1) ] , (62) λk+1 = λk + ρ [ g 1(u k+1) − z k+1 ] , (63) Definition 4. (Lyapunov function): Let V k be a non-negative quantity for the algorithm such that V k − V k+1 decreases in each iteration by an amount that depends on the sum of the norms of the residuals and on the change in slack variable z over one iteration. Then the quantities are defined as; (i ) V k = 1 ρ ||µk −µ∗||22 + 1 ρ ||λk −λ∗||22 + ρ||z k+1 − z∗||22 (64) (i i ) ρVq ≤ V k − V k+1 ≤ V 0 (65) 111 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 112 where Vq = ∞∑ k=0 (||r k+11 ||22 + ||r k+12 ||22) + ||z k+1 − z k ||22 See proof in Boyd and Vandenberghe [11]. Theorem 1. Given the convex linearly constrained control prob- lem (59) with the optimal dual and primal variables for the problem µ∗, λ∗, u∗, y ∗ and z∗. Then the following residual terms (i ) s k1 = ρ [ ∂hT1 (u ∗) ( h2(y ∗) − h2(y k ) ) − ∂g T1 (y ∗) ( z∗ − z k )] (66) (i i ) s k2 = ∂f T2 (y ∗) + ∂h2(y ∗)µ∗ (67) that converge to zero for a given penalty parameter ρ > 0. Proof : From the augmented Lagrangian function (60), uk+1 minimizes the sub-differential Lρ(u, y k ,µk ,λk ) with respect to u. Hence Lρ(u, y k ,µk ,λk ) with respect to u at uk is ex- pressed as 0 ∈∂f1(uk+1) + ∂hT1 (uk+1)µk + ∂hT1 (uk+1) ( h1(u k ) + h2(y k ) ) + ∂g T1 (uk+1)λk + ∂g T1 (uk+1) ( g 1(u k+1) − z k ) (68) Substituting for µk and λk from eqns. (62) and (63) into (68) yields, 0 ∈∂f1(uk+1) + ∂hT1 (uk+1) [ µk+1 − ρ [ h1(u k+1) + h2(y k+1) ]] ︸ ︷︷ ︸ µk + ∂hT1 (uk+1) ( h1(u k ) + h2(y k ) ) + ∂g T1 (uk+1) [ λk+1 − ρ [ g 1(u k+1) − z k+1 ]] ︸ ︷︷ ︸ λk +∂g T1 (uk+1) ( g 1(u k+1) − z k ) (69) ∂f1(u k+1) + ∂hT1 (uk+1)µk+1 + ∂g T1 (uk+1)λk+1 = ρ [ ∂hT1 (u k+1) ( h2(y k+1) − h2(y k ) ) −∂g T1 (y k+1) ( z k+1 − z k )] . (70) If uk+1 = u∗ minimizes the Lagrangian functional, then eqn. (70) becomes Lρ(x ∗, y ∗,µ∗λ∗) =ρ [ ∂hT1 (u ∗) ( h2(y ∗) − h2(y k ) ) −∂g T1 (y ∗) ( z∗ − z k )] , and converges to zero for any u ∈ Ω (convex) which com- pletes the proof for (i). QED � Similarly, y k+1 minimizes the sub-differential Lρ(uk+1, y,µk ,λk ) with respect to y such that 0 ∈ ∂f2(y k+1) + ∂hT2 (y k+1)µk + ∂hT2 (y k+1) ( h1(u k+1) + h2(y k+1) ) , 0 ∈ ∂f2(y k+1) + + ∂hT2 (y k+1) [ µk+1 − ρ [ h1(u k+1) + h2(y k+1) ]] ︸ ︷︷ ︸ µk + ∂hT2 (y k+1) ( h1(u k+1) + h2(y k+1) ) . This yields the equation below and completes the proof for (ii) ∂f2(y k+1) + ∂hT2 (y k+1)µk+1. (71) Hence, y k+1 minimizes the convex function y 7→ f2(y ) + ∂hT2 (y )µk+1. (72) The alternating direction method of multipliers algorithm can now be applied to the derived optimization problem that was obtained from the discretized PDE. By extension, the follow- ing vanishing dual residual obtained as the iterations get to optimality is stated below. s k1 = ρ [ C T D (F k+1 − F k ) − M T (Z k+1 − Z k ) ] . (73) Algorithm: Extension of ADMM Algorithm in Solving PDE Constrained Optimal Control Problem Input parameters and operators ρ, α, M , G , C , D Initialize U 0, F 0, Z 0, ξ01 = µ0/ρ, ξ02 = λ0/ρ Set H º 0, Q º 0 (symmetric and positive definite) For k = 0, 1, 2,··· , do | Compute U k+1 using eqn. (52) | Compute F k+1 using eqn. (53) | Compute Z k+1 using eqn. (54) | Update ξk+11 and ξk+12 using eqns. (57) and (58) respectively. | Until Convergence according to eqn. (73) End (for) Return U k+1, F k+1, Z k+1, J ∗, ξk+11 , ξ k+1 2 112 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 113 5. Numerical Experiments Considering a PDE constrained optimal control described heat transfer model written as follows: min u, f ∫ 5 0 ∫ π 0 f 2(x , t ) d xd t + ∫ 5 0 [u21 (t ) + 2u22 (t )]d t , s.t ∂2 f (x , t ) ∂x 2 = ∂f (x , t ) ∂t , f (x , t ) > sin(x ) sin( πt 5 ) − 0.7, f (x , 0) = sin(x ) − 0.7, f (0, t ) = u1(t ), f (π, t ) = u2(t ), t ∈ [0, 5], x ∈ [0,π], This section illiterates, via a number of computational exper- iments, the effects of different penalty parameter selection on the number of iterations required to reach the stopping criteria on the above problem as posed by Ghobadi et. al in [2] and Betts and Campbell in [1]. The experiment involves selecting the termination (stopping) criteria ²r e l = 10−k and ²ab s = 10−k , k = 3, 4, 5, as relative and absolute tolerances re- spectively for the convergence of the algorithm. The algo- rithm was implemented with a MATLAB subroutine running on a Pentium V Computer with 4.0 GB RAM and 1.7 GHz CPU. The discretization step in time is largely smaller that of the spacial step to keep it feasible; though the implicit Crank- Nicholson numerical scheme used for the discretization of the PDE keeps it unconditionally stable. Below are the results of the experiment; Table 2 above shows the numbers of iter- 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 0 100 200 300 400 500 it e ra ti o n s Relaxation factor (α) Tol. = 10−4 Tol. = 10−5 Tol. = 10−6 Effects of tolerance on varying relaxation factors Figure 1: The effects of tolerance on varying relaxation factors 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 it er a ti o n s Penalty Parameter (ρ) Tol. = 10−4 Tol. = 10−5 Tol. = 10−6 Effects of tolerance on varying penalty parameters Figure 2: The effects of tolerance on varying penalty parameters 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 50 100 150 200 250 300 it er a ti o n s Penalty Parameter (ρ) α = 1.2 α = 1.6 α = 2.0Effects of α on varying ρ with Tol. = 10−4 Figure 3: The effects of relaxation factor on varying penalty parameters h = 0.20 h = 0.10 h = 0.05 α 10−4 10−5 10−4 10−5 10−4 10−5 1.20 235 294 192 240 164 210 1.40 86 128 669 107 56 93 1.60 52 82 41 67 32 58 1.80 37 60 28 49 22 41 2.00 28 47 21 38 16 32 Table 2: Effects of tolerance & time-steps h on varying relaxation factors (ρ = 0.1, δ = 1/3) ations for the varying relaxation parameters α when the tol- erance is fixed for a particular step size in time h and penalty parameter ρ. The step sizes were only kept for h = 0.20, h = 0.10 and h = 0.05. For a fixed choice of the relaxation fac- tor α = 2.0, Tolerance Tol . = 10−3 and step-size ρ = 0.1, the dual sum ||r k+13 ||2 = ||r k+11 + r k+12 ||2, the distances ||r k+11 ||2 = ||µk −µ∗||2 and ||r k+12 ||2 = ||λk −λ∗||2 were plotted against the iteration as shown below in figures 4, 5 and 6 respectively. The convergence of the dual residual sum follows same as the monotonically decreasing quantity; that is ||r k+13 ||2 ≤ V k . 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 iterations || r k + 1 3 || 2 ||rk3||2 (dual sum) α = 2.0 ρ = 0.1 Tol. = 10−3 Figure 4: The response of dual residual sum to iteration Table 3 above shows the varying penalty parameters ρ for same fixed α and various values of time step size h. The iter- ation was plotted against the dual residual ||s k ||2 and the pri- mal variables (||U k+1||2, ||F k+1||2, p k ) and the distance ||Z k − Z ∗||2 for a fixed choice of the relaxation factor α = 2.0, Toler- ance Tol . = 10−3 and step-size ρ = 0.1. It was however ob- served, as expressed in figures 7, 8 and 9 respectively that the residuals of the algorithms decrease after subsequent it- 113 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 114 0 20 40 60 80 100 120 0 0.01 0.02 0.03 0.04 0.05 iterations || µ k − µ ∗ || ||µk − µ∗||2 (primal-1) α = 2.0 ρ = 0.1 Tol. = 10−3 Figure 5: The response of first dual variable (µ) to iteration 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 iterations || λ k − λ ∗ || ||λk − λ∗|| α = 2.0 ρ = 0.1 Tol. = 10 −3 Figure 6: The response of second dual variable (λ) to iteration erations. This then indicates convergence of the algorithm. 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 1.2 x 10 −4 iterations s k ||sk||2(dual) α = 2.0 ρ = 0.1 Tol. = 10−3 Figure 7: The response of dual residuals (sk ) to iteration Table 4 below shows the sparsity of the compact form of the model for varying and fixed step-sizes in time and space re- spectively; with the optimal results obtained at few iterations of the ADMM. It was observed that the level of sparsity in- creases for reducing step sized which consequently reduces the computational burden in the iterative process. Table 5 h = 0.20 h = 0.10 h = 0.05 ρ 10−4 10−5 10−4 10−5 10−4 10−5 0.10 91 124 111 149 184 245 0.20 39 85 48 66 79 107 0.40 19 30 25 38 43 65 0.60 15 22 18 29 32 51 0.80 20 38 13 23 25 43 1.00 84 156 13 20 19 35 Table 3: Effects of tolerance & time-steps h on varying penalty parameters (α = 1.6, δ = 1/3) 0 20 40 60 80 100 120 0 1 2 3 4 5 iterations || U k || , || F k || , p k ||Uk|— ||F k|| pk Figure 8: The response of primal variables to iteration Figure 9: The effects of iterations on the distance norm ||Z k − Z ∗||2 shows the results of the problem for the varying values of the parameters and step size with the optimal results obtained at few iterations of the ADMM. The results obtained on MAT- LAB subroutine were compared with those of Ghobadi et al. in [2] for various time grid points (N ) ranging from N = 1000 to 8000 ran on MOSEK software. Table 6 below shows the op- timal objective functional values generated at varying num- bers N of time grid points (TGP) for fixed number of grid- points in space kept at n = 10. The results in [2] when im- plemented on MOSEK is same as that obtained by the ADMM at a shorter Central Processing Unit (CPU) time though with higher iteration cycles due to high rate of convergence; using the tolerance of t ol . = 10−4. This clearly shows that the rate of convergence of the proposed algorithm is higher than that of MOSEK. 6. Conclusions The extension of the ADMM to solving heat transfer prob- lem as an example of PDE-based optimal control problems (OCP) with equality-Inequality constraints was well studied. The PDE-based OCP was transformed into a discretized con- vex quadratic optimization problem amenable to the ADMM. h = 0.20 h = 0.10 h = 0.05 (N,n) (25,10) (50,10) (100,10) Non-zero 325 650 1300 Zero 105300 421850 1688700 Size 105625 422500 1690000 Sparsity 99.69% 99.85% 99.92% Non-zero 883 1783 3583 Zero 67867 273217 1096417 Size 68730 275000 1100000 Sparsity 98.74% 99.35% 99.67% Table 4: Sparsity from the Compactification of the problem Model with δ = 1/3 114 Dawodu / J. Nig. Soc. Phys. Sci. 3 (2021) 105–115 115 h = 0.50 h = 0.10 10−4 10−5 10−4 10−5 ||U || 0.476610 0.476655 0.476600 0.476627 ||F || 0.245101 0.245661 0.239198 0.239730 ||J ∗|| 0.476611 0.476713 0.476611 0.476659 Iter.(k ) 70 96 33 49 Time 0.09282 0.11094 0.08385 0.11052 (N , n) (10,4) (10,4) (25,4) (25,4) Table 5: Summary of Computational results at optimum parameters (ρ∗ = 1.60, α∗ = 2.00, δ = 1/3) Obj. values iterations k (time/sec) N [2] proposed [2] proposed 1000 0.4741987 0.475758 16(0.26) 112(0.11) 2000 0.4741986 0.475781 18(0.43) 120(0.32) 4000 0.4744534 0.476010 18(0.84) 132(0.47) 8000 0.4745807 0.476620 24(2.04) 140(1.02) Table 6: Comparison of Computational results at optimum The implicit Crank Nicolson method and the 3rd order Simp- sons rule were used for the discretization of the constraints and cost functional respectively with high feasibility, uncon- ditionally stability and well-conditioned matrices. The ADMM algorithm converges faster with higher level of accuracy, low processing time and low memory requirements, when com- pared with that of Ghobadi et. al [2]. Moreover, this approach can as well be applied to other PDE-based optimization prob- lems like that of wave equations or problems involving inte- gral (entropy) functions. Acknowledgments I am grateful to Prof. Montaz Ali of The University of The Witwatersrand, SA for introducing the ADMM Algorithm to me and for his helpful comments in improving the presenta- tion of this article. References [1] J. T. Betts & S. L. Campbell, “Discretize then Optimize, in Mathematics for Industry: Challenges and Frontiers", SIAM (2005). [2] K. Ghobadi, N. Nedialkov & T. Terlaky, On the Discretize then Optimize Approach", Preprint for Industrial and Systems Engineering, 2009 [3] O. Olotu & K. A. Dawodu, “On the Discretized Algorithm for Opti- mal Proportional Control Constrained by Delay Differential Equation", Journal of Math. Theory & Modeling 3 (2013), 157. [4] O. Olotu, K. A. Dawodu & T. T. Yusuf, "Algorithm for Solving Multi- Delay Optimal Control Problems Using Modified Alternating Direction Method of Multipliers", Journal of Applied & Comp. Mathematics 8 (2019) 445. [5] O. Olotu & K. A. Dawodu, "Numerical Solution to One –Dimensional Multi-Delay System using the Modified Alternating Direction Method of Multipliers", Journal of NAMP 50 (2019) 107. [6] B. He & X. Yuan, "Block-wise Alternating Direction Method of Multipli- ers for Multiple-block Convex Programming and Beyond", SIAM Jour- nal of Comp. Mathematics, 691 (2015), 145. [7] B. He, X. Hong & X. Yuan, “On the proximal jacobian decomposition of alm for multiple-block separable convex minimization problems and its relationship to ADMM", Journal of Scientific Computing 66 (2016) 1204. [8] S. Kameswaran & L. T. Beigler, “Advantages of Nonlinear Programming- based Methodologies or Inequality Path Constrained Optimal Control Problem - a Numerical Study", SIAM Journal on Scientific Comp., 30 (2008) 957. [9] Y. E. Nesterov, “A method for Solving the Convex Programming Problem with Convergence Rate O(1/k 2 )", Dokl. Akad. Nauk SSSR 269 (1983), 543. [10] E. Ghadimi, A. Teixeira, I. Shames and M. Johansson , "optimal Pa- rameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems", Linnaeus center, EE Royal Institute of Technology, 60 (2014) 644. https://doi.org/10.1109/TAC.2014.2354892 [11] S. Boyd L. Vandenberghe, Convex Optimization, Cambridge University Press NY, USA 2004. 115