Global Search Strategies for Solving SQU Journal for Science, 17 (1) (2012) 12-21 © 2012 Sultan Qaboos University 12 Global Search Strategies for Solving Multilinear Least-Squares Problems Mats Andersson*, Oleg Burdakov**, Hans Knutsson* and Spartak Zikrin** *Department of Biomedical Engineering, Linköping University, Linköping, Sweden. **Department of Mathematics, Linköping University, Linköping, Sweden, Email: Oleg.Burdakov@liu.se ABSTRACT: The multilinear least-squares (MLLS) problem is an extension of the linear least- squares problem. The difference is that a multilinear operator is used in place of a matrix-vector product. The MLLS is typically a large-scale problem characterized by a large number of local minimizers. It originates, for instance, from the design of filter networks. We present a global search strategy that allows for moving from one local minimizer to a better one. The efficiency of this strategy is illustrated by the results of numerical experiments performed for some problems related to the design of filter networks. KEYWORDS: Filter networks, Global optimization, Global search strategies, Multilinear least- squares. ائل المربعاث الصغرى الخطيت المتعذدةسلحل م شاملاستراتيجياث البحث ال ماتس أندرسون و أولج بورداكوف و هانس ناتسون و سبارتك زكرين بينهما في استعمال الفرق ويكمن . امتداد لمسألة المربعات الصغرى الخطية هي مسألة المربعات الصغرى المتعددة :خصمل تتميز و ،حجمكبيرة ال مسألة المربعات الصغرى المتعددة تكون عادة بدالً من ضرب مصفوفة بشعاع.متعدد الخطية معامل استراتيجية بحث هذه الدراسة قدمتبعدد كبير من النهايات الدنيا الموضعية. إنها تنتج، مثآل، من تصميم شبكات الترشيح. بنتائج فعالية هذه االستراتيجية وسيتم توضيحأخرى أفضل منها. إلى ن نهاية دنيا موضعية شامل تسمح بالتحرك م بعض المسائل التي لها عالقة بتصميم شبكات الترشيح.عددية مطبقة على اترباتاخ 1. Introduction onsider the following multilinear least-squares (MLLS) problem in which u v denotes the component- wise product of vectors u and v. Given a vector m b R and matrices , m ni iA R   = 1, 2, , ,i L find * N x R that solves the problem 2 1 1 2 2( ) ( ) ( ) ,min L L N x R b A x A x A x   (1) C GLOBAL SEARCH STRATEGIES 13 where , ni ix R = 1, 2, , ,i L 1 2= LN n n n   and 1 2= ( , , , ) . T T T T Lx x x x For the sake of simplicity, we consider here the standard Euclidean norm, although the subsequent reasoning holds also for the weighted Euclidean norm. Note that if = 1,L then (1) is a linear least-squares problem. The MLLS problem occurs, for instance, in factor analysis, chemometrics, psychometrics (Leardi et al., 2000; Leurgans and Ross, 1992; Lopes and Menezes, 2003; Paatero, 1997; Wang et al., 2003). We will study this problem in relation to the design of filter networks (Andersson et al., 1999; Norell et al., 2011; Svensson et al., 2005), specifically the sequential connection of sparse sub-filters presented by Figure 1. In this case, ix stands for individual characteristics (design parameters) of sub-filter i, whose frequency response is .i iA x The ideal (desired) frequency response of the sub-filter sequence and the actual one are represented in (1) by b and 1 1 2 2( ) ( ) ( ),L LA x A x A x respectively. It is common for the design of filter networks that << .N m MLLS is a non-convex, typically large-scale, optimization problem with a very large number of local minimizers. Each of the local minimizers is singular and non-isolated. The most typical approach to solving the MLLS problem consists in generating randomly a number of starting points for their further refinement with the use of local optimization methods. One major shortcoming of this approach is that a very large number of starting points is required to be generated in order to find a reasonably good fit in problem (1). Another major shortcoming is that the convergence of local methods is too slow in this problem. Figure 1. Sequential connection of L sub-filters The most popular of the local algorithms used for solving the MLLS problem is called ‘alternating least squares’ (ALS). It exploits the feature of problem (1) that, if to fix all the vectors 1 , , T T Lx x but one, say ,ix then the resulting sub-problem of minimizing over ix is a linear least-squares problem. In the ALS algorithm, the linear least-squares sub-problems are solved for the alternating index i. This algorithm is also known as ‘block-coordinate relaxation’ or ‘nonlinear Gauss-Seidel algorithm’ (Ortega and Rheinboldt, 2000). The mentioned major shortcomings of the local search algorithms are inherent in ALS. The main aim of our work here is to develop an effective global optimization approach to solving the MLLS problem and justify it theoretically. Our work is organized as follows. In Section 2, we consider a new constrained optimization problem introduced in Norell et al. (2011). It is similar, in some sense, to the MLLS problem, and its solution gives a good starting point for running the local search in the MLLS problem. Global optimality conditions for the new problem are derived in Section 3. These conditions are then used in Section 4 for constructing a global search algorithm. In Section 5, we report and discuss results of applying our global search algorithm to solving MLLS problems related to the design of filter networks. In Section 6, we draw conclusions and discuss future work. 2. Problem reformulation Problem (1) can be written in the equivalent form 2 1 , min s.t. = , = 1, , , L N mL x R y R i i i b y y A x y i L    MATS ANDERSSON, OLEG BURDAKOV, HANS KNUTSSON and SPARTAK ZIKRIN 14 where , m iy R = 1, , ,i L 1 2= ( , , , ) T T T T Ly y y y and 's.t.' stands for 'subject to'. This problem is characterized by the relations 1 , and = , = 1, , .L i i ib y y A x y i L Following Norell et al. (2011), we consider a similar, conceptually close, problem in which 1= , and , = 1, , .L i i ib y y A x y i L We formulate it as: 2 =1 , 1 min s.t. = . L i i i N mL ix R y R L y A x y y b    After solving this problem in x, we obtain =1 1 min s.t. = , L T i i i mL iy R L y P y y y b   (2) where iP is the matrix of orthogonal projection defined by .iA In the numerical implementation, it may not be reasonable to compute iP explicitly, but instead, it can be treated as a linear operator defined by iA in one of the standard ways (Björk, 1996). Moreover, since this problem may admit trivial asymptotic solutions, it must be regularized. This can be done by adding 2 iy with a small  to each term in the objective function. We assume further that all matrices iP have been slightly perturbed in this way, and hence they are positively definite. Observe that the regularized objective function in (2) is strictly convex with a unique minimizer in the origin. Unlike (1), this problem does not suffer from the bad property of having non-isolated minimizers. However, it inherits the multi-extremal nature of problem (1). Without loss of generality, we can assume that 0b  in (1) and (2). Indeed, if any component of b is negative, the change of its sign to positive can be compensated by the change of sign in the corresponding row of, for instance, 1.A In (Norell et al., 2011), it is discussed how to treat the case of zero components. From now on, we assume that > 0.b Note that the feasible set in problem (2) consists of disjoint subsets. Each of these subsets is connected. It is characterized by a certain feasible combination of signs of 1, , .Ly y The total number of the subsets is determined by the number of the feasible combinations of signs which is equal to ( 1) 2 . m L  Consider how to solve problem (2) on a given isolated subset of the feasible set, for instance, the subset associated with the positive orthant = { : > 0}. mL mL R y R y  The problem in this case takes the form >0, , >0 =11 1 min s.t. = . L T i i i y y iL L y P y y y b  (3) The substitution = exp( )i iy w reduces this problem to: , , =11 1 exp( ) exp( )min s.t. = ln , L T i i i w w iL L w P w w w b    (4) where exp( ) and ln( ) are component-wise operations. This linear equality constrained problem can be GLOBAL SEARCH STRATEGIES 15 efficiently solved by the conventional methods (Nocedal and Wright, 2006) that are able to take advantage of using the easily available derivatives of the objective function and the simple structure of the linear constraints in (4). In (Norell et al., 2011), the computational time for solving this problem was approximately half the time for one run of the ALS algorithm on problem (1). To study the general case of sign combinations, we divide problem (2) into an outer binary problem to deal with the signs of y and an inner subproblem, similar to (4), in which the minimization is performed on the corresponding subset of the feasible set. Notice that the feasible vectors 1, , Ly y in (2) have no zero components, because > 0.b Following Norell et al. (2011), we present problem (2) as a specially enumerated set of subproblems of the form (3). We will use the following notations: = sign( ), = , = diag( ) diag( ) .i i i i i i i i is y y s y P s P s (5) Let m S be the set of all vectors in m R whose elements equal 1 or 1. If iy is feasible, then m is S and > 0.iy Furthermore, for all feasible vectors 1, , Ly y in (2) we have 1 = ,Ls s e where = (1, , 1) . T m e R It is easy to verify that problem (2) is equivalent to 1 , ,1 1 ( , , )min s.t. = , L m m s S s SL L s s s s e    (6) with the objective function 1 >0, , >0 =11 1 ( , , ) = min s.t. = . L T L i i i y y iL L s s y P y y y b   (7) Here, the dependence of iP on is is given by (5). Note that the substitution 1 1=L Ls s s  is able to eliminate the equality constraint in outer problem (6), which is a binary problem with ( 1) 2 m L  feasible points. This number of feasible points defines the number of all inner problems (7). The important feature of problem (6) is that it performs a partitioning of the feasible set in (2) and reduces this problem to a finite number of easy-to-solve inner problems (7) of the same form as (3). This allows us to capture the nature of the local minimizers of problem (2) and to enumerate them efficiently by combining the signs. Any optimal or close to optimal solution y to problem (6), or equivalently, to problem (2), can be used as an initial point in problem (1), to be further refined by local search algorithms like ALS. Given y, the initial point x is computed by the formula † = , = 1, , ,i iix A y i L where † i A is the pseudo-inverse of iA (Björk, 1996). In our numerical experiments in (Norell et al., 2011), we compared the performance of the ALS for the initial point generated by our approach and for randomly generated points. It was required to run the ALS from at least 500 random points in order to get a local minimizer in (1) with the approximation error comparable with only one run of the ALS from the point generated by solving problem (6). Thus, the approach introduced in (Norell et al., 2011) achieved the overall network design speedup factor of several hundreds. Moreover, the randomly generated initial points did not guarantee any success. This speaks for the robustness of the approach. It should be emphasized that binary problems are, in general, difficult to solve, but fortunately, the nature of signs in the sub-filter outputs are often well understood. Prior knowledge of the filter’s characteristics and its structure helps to facilitate substantially the solution process of the outer problem by focusing on a relatively small number of sign combinations (see (Norell et al., 2011) for details). MATS ANDERSSON, OLEG BURDAKOV, HANS KNUTSSON and SPARTAK ZIKRIN 16 3. Theoretical background for global search Given an approximate solution to problem (2), our global search is aimed at finding a new combination of signs in (6) with a better value of the objective function defined by (7). It is based on solving problem (2) under the assumption that all components of given feasible vectors 1, , Ly y are fixed, except for their kth components denoted here by 1, , ,Lu u respectively. The value of k changes in the process of global minimization. To justify our approach, we will consider problem (2) rewritten in terms of these components. Let ˆiy coincide with iy in all the components, but the kth one which equals zero in ˆ .iy Let ( )i kP and ( )i kkP stand for the kth column and diagonal element of the matrix ,iP respectively. It can be easily verified, for = 1, , ,i L that 2 = ( ( ) ) , T i i i i i k i iy P y y    where 2ˆ ˆ ˆ= ( ) , = ( ) / , = . T T i i kk i i i k i i i i i iP y P y P y      (8) Thus, the minimization over 1= ( , , ) T Lu u u in (2) results in the problem: 2 =1 1 ( )min s.t. = , L i i i L iu R L u u u c       (9) where c denotes the kth component of b. It is worth noting that this problem has at least one global minimizer, because the level sets of the objective function are compact and the function defining the constraint is smooth. Let * U stand for the set of all global minimizers in problem (9). For this problem, the following notations will be used: * * * * = { : = sign( ), }, = { : sign( ) = } , L L L sS s S s u u U R u R u s   1 , 1= ( , , ) = ( , , ) . T T L L      Let the multivariate function ( )  be defined as the product of the signs of all the variables, for instance, 1( ) = sign( ) sign( ) .Lu u u   Note that the feasible set in problem (9) consists of disjoint subsets. Each of these connected subsets belongs to the corresponding orthant L sR determined by sign( ).u Since such subsets are characterized by ( ) = 1,u their total number is 12 .L  It grows exponentially with L. This is indicative of a highly multi-extremal nature of problem (9). The result presented in Theorem 1 allows one to effectively locate the optimal combination of signs * S or, equivalently, to find the orthants that contain the connected subsets of the feasible set on which the global optimum of problem (9) is attained. Theorem 1 Let the coefficients c and  in problem (9) be positive. Then, * * s S if and only if * ( ) = 1s and one of the following conditions holds: (i) ( ) 0   and * = sign( ),i is  for all i such that 0;i  (ii) ( ) = 1   and there exists *i I such that ** * sign( ), if = , = = 1, , , sign( ), otherwise, i i i i i s i L      GLOBAL SEARCH STRATEGIES 17 with the set 1 = | | .min i i L I Arg    Proof. We start by proving the "if'' part. Suppose that * *.s S Let * * u U be such that * * sign( ) = .u s The feasibility of * u implies that * ( ) = 1.s Consider the linear space transformation given by the formula * = .v s u This nonsingular transformation is aimed at easing our analysis because, in the new space, the objective function takes the form of a squared Euclidean distance between the two points 1= ( , , ) T Lv v v and * 1= ( , , ) = . T La a a s  Note that 1 | |= .min i i L Arg a I   Another important feature of the transformation is that it does not change the multiplicative type of the constraint. The problem in the new space takes the form: 2 1 min s.t. = , L v R L v a v v c     (10) where 1= .Lc c     Thus, the reformulated problem (10) is to find the shortest distance from a to the feasible set. Let * * * 1= ( , , ) T Lv v v be the image of * u in the new space, i.e., * * * = .v s u Clearly, * v is a global minimizer for problem (10). Then, in the view of the fact that * > 0,v conditions (i) and (ii) can be reformulated in the new space as follows: (i') if ( ) 0,a  then 0;a  (ii') if ( ) < 0,a then there exists *i I such that > 0,ia for all *,i i and * < 0.ia We first show that there is no more than one negative component of a. Suppose, to the contrary, that at least two components of a are negative, say, ia and .ja It can be verified easily that the open linear segment * ( , )v a intersects the hyperplane = { : = 0} L i jv R v v   at the point * = (1 ) ,v a v    (11) where (0,1)  is given by the formula * * * * = . ( ) ( ) i j i j i j v v v v a a      (12) Consider the point 1= ( , , ) T Lv v v   defined as follows: (13) This point is obviously feasible. The triangle inequality gives * * * , if = , = , if = , = 1, , . , otherwise, j l i l v l i v v l j l L v         MATS ANDERSSON, OLEG BURDAKOV, HANS KNUTSSON and SPARTAK ZIKRIN 18 < ,v a v v v a       where * = ,v v v v    because ,v  * ( ) / 2v v   and * ( ) .v v   Then, we obtain * * < = ,v a v v v a v a           (14) since * ( , ).v v a Hence, the feasible point v  gives a better objective function value in (10) than *.v This contradicts the assumption that * v is a global minimizer for problem (10) and proves that a can have at most one negative component. This result immediately proves (i') for ( ) = 1.a For ( ) = 0,a suppose, contrary to (i'), that there exists < 0.ia Such a component must be unique. There must exist index j such that = 0.ja For these indices i and j, consider the points v  and v  defined by formulas (11), (12) and (13). One can show, as above, that (14) holds for the two points. This contradicts the assumption that * v is a global minimizer. Thus, statement (i'), and consequently part (i) of the theorem, hold. Consider now the case ( ) < 0.a As shown above, exactly one component of a must be negative, say, < 0.ia Suppose, contrary to (ii'), that .i I For this i and any ,j I consider the point v  defined by (13). For the point v  defined by (11) and (12), the condition (0,1)  is satisfied, because < 0.i ja a For v  and v  one can show, as above, that (14) holds, which contradicts that *v is a global minimizer. This proves (ii') and accomplishes the proof of the "if'' part of the theorem. For the "only if'' part, let * s satisfy the sufficient conditions. We choose any * u U and construct a point * * * 1= ( , , ) T Lu u u individually for each of the cases (i) and (ii). Suppose that (i) holds. Consider * u defined as follows: * * = | |, = 1, , .i i iu s u i L Obviously, * * sign( ) =u s and *u is a feasible point. As proved in the "if'' part, sign( )u must satisfy (i). Thus, * =sign( ),i is u for all i such that 0.i  Therefore, * u has the same objective function value in (9) as u. Suppose now that (ii) holds. Let * i I be such that * * *= sign( ). i i s  It must satisfy (ii). Suppose j I is the index for which sign ( ) = sign( ).j ju  This means that *| |=| | .j i   If * = ,i j then we define * = .u u Otherwise, we define * u as follows: * * * * * * * | | / , if = , = | | / , if = , = 1, , . , otherwise, j ji i l j i ji l s u l i u s u l j l L u          It can be easily seen that sign * * ( ) =u s and also that *u is feasible and has the same objective function value in (9) as u. In each of the two cases, * * ,u U and consequently * *.s S This completes the proof of the theorem. □ This result suggests ways in which the sign combinations intrinsic in the global minimizers of problem (9) can be effectively constructed for any given . Our algorithm presented in the next section is based on this result. GLOBAL SEARCH STRATEGIES 19 4. Global search algorithm Theorem 1 implies that * s is not unique when either of the following two cases occurs: •  has more than one zero component. • ( ) = 1   and the set I consists of more than one element. Given  , the set *S can be constructed based on the optimality conditions as follows. • If ( ) = 1,  then the set * = {sign( )}S  is a singleton. • If ( ) = 0,  then *S is composed of all vectors * Ls S whose components * = sign( ),i is  for all i such that 0,i  and the rest of the components ensure * ( ) = 1.s • If ( ) = 1,   then *S is composed of the same number of elements as I. Each * i I determines * *s S in such a way that * * *= sign( ), i i s  and the remaining components of *s are the same as in sign( ). Note that it is not necessary to construct the whole set * S when it is required to find only one * * .s S The same principles as above can be employed in this case. We propose below a global search algorithm. It uses procedures OPTSIGN, LOCAL and ALS. Procedure OPTSIGN( , )y k computes  by formula (8), and then it returns an *s arbitrarily chosen from the set *.S Another task of this procedure is to verify if a given L s S is optimal for problem (9). The optimality conditions given by Theorem 1 can be used for checking if OPTSIGN( , )s y k holds. Procedure 1LOCAL( , , )Ls s returns y that solves problem (7) for a given sign combination 1, , .Ls s Procedure 0 ALS( )x returns the result of running the ALS algorithm from a given starting point 0.x The derived optimality conditions open the way to a successive improvement of the sign combination in outer problem (6). The resulting global search strategies admit various implementations. The one that we present below is based on a sequential checking of the components in 1, , Ls s for a possible improvement. It starts from a given sign combination 1, , Ls s , and it returns an approximate solution for problem (1). Our global search proceeds as follows: In Algorithm 1, an initial sign combination 1, , Ls s is required to be given. For this purpose, the choice of signs proposed in Norell et al. (2011) can be used. An alternative is to choose as the initial sign combination the best one produced by ALS, starting from a number of randomly generated points. MATS ANDERSSON, OLEG BURDAKOV, HANS KNUTSSON and SPARTAK ZIKRIN 20 5. Numerical experiments For generating MLLS test problems of the form (1), we considered two-dimensional filters of the monomial class (Knutsson et al., 2011) with the lognormal (Granlund and Knutsson, 1995; Knutsson, 1982) and logerf (Knutsson and Andersson, 2005; Norell et al., 2011) radial parts. We shall use the abbreviations LP, BP and HP standing for the Low-Pass, first-order Band-Pass and first-order High-Pass filters, respectively. They were approximated by a sequence of = 5L sub-filters. The total number of coefficients N of the sub-filters and the number of components m of the discretized ideal frequency responses b are specified in Table 1 for each filter. Our numerical experiments were performed on a PC with a 2.27GHz Intel Xeon E5520 processor and 32- bit Windows XP operating system. The results are shown in Table 1. The Matlab routine FMINCON was used to solve problem (4) which is a reformulation of (7). As mentioned earlier, the objective function in (2) is required to be regularized. For the regularization parameter value, we used = 0.5. We shall use the term approximation error to refer to the objective function value in (1) and denote it by . In Table 1, min( )j stands for the best approximation error obtained by running ALS from 500 randomly generated starting points. The CPU time (in seconds) spent on performing these 500 runs is denoted by .alst We shall use the term local search to refer to solving problem (7) only once for the sign combination chosen as proposed in (Norell et al., 2011). The approximation error loc is the result of one run of ALS from the starting point produced by the local search. Our global search strategy is aimed at improving the local search results. To initialize it, we used the same sign combination as in the local search. The global search, as proposed in Algorithm 1, took globt seconds of CPU time and yielded a relative improvement, calculated by the formula = 100% , loc glob loc        where loc and glob are the values of the regularized objective function in (2) produced by the local and global search, respectively. ALS started from the point generated by our global search resulted in the error .glob The set of filters used in our experiments included also zero- and second-order band-pass and high-pass filters. For these filters, the initial sign combination proposed in (Norell et al., 2011) was nearly optimal in the sense that there was practically no difference between the approximation errors loc and min( ).j For this reason, our global search strategy was unable to improve the initial sign combination. Table 1. Performance of the ALS, local and global search strategies Filter N m min( )j loc glob  alst globt LP, lognorm 15 361 1.94e-4 3.44e-4 3.31e-4 0.01 1262.4 73.0 BP, lognorm 13 360 1.66e-3 3.16e-3 1.66e-3 0.19 1226.3 21.9 BP, logerf 13 360 1.23e-4 5.63e-4 1.23e-4 0.19 1251.5 21.1 HP, logerf 13 360 1.05e-3 3.19e-3 1.05e-3 0.10 1246.2 23.4 The results presented in Table 1 refer to the filters for which the global search strategy was able to improve local search solutions in terms of objective function values in problems (7) and (1). In the case of high-pass and band-pass filters, the solution produced for problem (1) was as good as the best of those produced by 500 runs of ALS, but for achieving this, the global search required a CPU time that was over 50 times shorter. These results demonstrate the efficiency of our global search strategy and its capability for substantially speeding up the filter design process. GLOBAL SEARCH STRATEGIES 21 6. Conclusions The derived optimality conditions open up possibilities to perform a global search for a better sign combination. The implemented global search strategy is not a very computationally demanding procedure. Its efficiency was demonstrated by the results of numerical experiments. For some filters, our global search ensured a faster process of optimizing sub-filter parameters with an overall speedup factor of over fifty. We plan to extend our approach to solving optimal filter design problems having more general sub-filter network structures. 7. Acknowledgment This work was supported by the Swedish Research Council; the Linnaeus Center for Control, Autonomy, and Decision-making in Complex Systems (CADICS); the Swedish Foundation for Strategic Research (SSF); Strategic Research Center (MOVIII); the Linköping Univ. Center for Industrial Inf. Technology (CENIIT). 8. References ANDERSSON, M., WIKLUND, J. and KNUTSSON, H. 1999. Filter networks. In N.M. NAMAZI, editor, Signal and Image Processing (SIP), Proceedings of the IASTED International Conferences, October 18-21, 1999, Nassau, The Bahamas, pages 213–217. IASTED/ACTA Press. BJÖRK, Å. 1996. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, USA. GRANLUND, G. and KNUTSSON, H. 1995. Signal Processing for Computer Vision. Kluwer, Dordrecht. KNUTSSON, H. 1982. Filtering and Reconstruction in Image Processing. PhD thesis, Linköping University, Sweden, Diss. No. 88. KNUTSSON, H. and ANDERSSON, M. 2005. Implications of invariance and uncertainty for local structure analysis filter sets. Signal Processing: Image Communications, 20(6):569–581. KNUTSSON, H., WESTIN, C-F. and ANDERSSON, M. 2011. Representing local structure using tensors II. In A. HEYDEN and F. KAHL, Editors, Image Analysis, Lecture Notes in Computer Science, 6688: 545–556. Springer, Berlin/Heidelberg. LEARDI, R., ARMANINO, C., LANTERI, S. and ALBEROTANZA, L. 2000. Three-mode principal component analysis of monitoring data from Venice lagoon. Journal of Chemometrics, 14(3):187–195. LEURGANS, S. and ROSS, R.T. 1992. Multilinear models: applications in spectroscopy. Statistical Science, 7(3):289–310. LOPES, J.A. and MENEZES, J.C. 2003. Industrial fermentation end-product modelling with multilinear PLS. Chemometrics and Intelligent Laboratory Systems, 68(1):75–81. NOCEDAL, J. and WRIGHT, S.J. 2006. Numerical Optimization. Springer-Verlag, New York, US, 2nd Edition. NORELL, B., BURDAKOV, O., ANDERSSON, M. and KNUTSSON, H. 2011. Approximate spectral factorization for design of efficient sub-filter sequences. Technical Report LiTHMAT-R-2011-14, Department of Mathematics, Linköping University. ORTEGA, J.M. and RHEINBOLDT, W.C. 2000. Iterative Solution of Nonlinear Equations in Several Variables, Classics in Applied Mathematics 30. SIAM, Philadelphia, USA. Reprint of the 1970 Original Edition. PAATERO, P. 1997. Least squares formulation of robust non-negative factor analysis. Chemometrics and Intelligent Laboratory Systems, 37(1):23–35. SVENSSON, B., ANDERSSON, M. and KNUTSSON, H. 2005. A graph representation of filter networks. In Proceedings of the 14th Scandinavian Conference on Image Analysis (SCIA’05), pp. 1086–1095, Joensuu, Finland, June 2005. WANG, J.H., HOPKE, P.K., HANCEWICZ, T.M. and ZHANG, S.L. 2003. Application of modified alternating least squares regression to spectroscopic image analysis. Analytica Chimica Acta, 476(1):93–109. Received 27 October 2011 Accepted 29 November 2011