INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 11(6):755-775, December 2016. Incremental and Decremental SVM for Regression H. Gâlmeanu, L.M. Sasu, R. Andonie Honorius Gâlmeanu* 1. Siemens Corporate Technology honorius.galmeanu@siemens.com 2. Faculty of Mathematics and Informatics Transilvania University of Braşov *Corresponding author: galmeanu@unitbv.ro Lucian Mircea Sasu 1. Faculty of Mathematics and Informatics Transilvania University of Braşov lmsasu@unitbv.ro 2. Siemens Corporate Technology lucian.sasu@siemens.com Răzvan Andonie 1. Computer Science Department Central Washington University, Ellensburg, USA andonie@cwu.edu 2. Electronics and Computers Department Transilvania University of Braşov Abstract: Training a support vector machine (SVM) for regression (function ap- proximation) in an incremental/decremental way consists essentially in migrating the input vectors in and out of the support vector set with specific modification of the associated thresholds. We introduce with full details such a method, which allows for defining the exact increments or decrements associated with the thresholds be- fore vector migrations take place. Two delicate issues are especially addressed: the variation of the regularization parameter (for tuning the model performance) and the extreme situations where the support vector set becomes empty. We experimentally compare our method with several regression methods: the multilayer perceptron, two standard SVM implementations, and two models based on adaptive resonance theory. Keywords: support vector machine, incremental and decremental learning, regres- sion, function approximation 1 Introduction The approximation of continuous functions that are known only at a certain number of dis- crete points (also known as regression), a standard procedure in statistics, can be also approached from a machine learning perspective. In the context of supervised training, incremental learning means learning each input-output sample pair, without keeping it for subsequent processing. The topic addressed in this paper is supervised incremental/decremental learning of regression models, in particular SVM models. The fundamental issues in incremental/decremental learning are: i) how can a learning system adapt to new information without corrupting or forgetting previously learned information (the stability-plasticity dilemma addressed by Carpenter and Grossberg [2]), and ii) how can a learning system "forget" information without corrupting previously learned in- formation and without adding new information. In other words, how can a system learn without unlearning and unlearn without learning. Copyright © 2006-2016 by CCC Publications 756 H. Gâlmeanu, L.M. Sasu, R. Andonie Support Vector Regressions (SVRs) are SVM learning models used for regression. These algorithms solve the quadratic optimization problem using a decomposition method based on Sequential Minimum Optimization [4], or on an incremental learning method. The incremental method considers that input patterns are added to the solution (or removed from the solution), one at a time, without affecting the learning process of the other patterns [13,14]. After introducing some basic notations, we aim review in this introductory section related work on incremental/decremental SVRs and also highlight our contribution. 1.1 SVR notations For a non-linear SVR, regression translates into the following optimization problem [14]: Approximate a given set of training pairs (xi,yi), i = 1 . . .N, xi ∈ Rm, yi ∈ R, with an SVM defined by g(xi) = w T Φ(xi) + w0, (1) where Φ(·) is a non-linear function and w0 is an offset. The objective function g(x) : Rm → R requires the minimization [21] of min w,w0,ξi,ξ ∗ i J(w) = min w,w0,ξi,ξ ∗ i ( 1 2 ‖w‖2 + C N∑ i=1 (ξi + ξ ∗ i ) ) , (2) and the constraints are given by −�− ξ∗i ≤ g(xi) −yi ≤ � + ξi for ξi,ξ∗i ≥ 0, i = 1 . . .N (3) SVR matches the output value of the objective function g(xi) as close as possible to the given yi. The output value is constrained to lay inside a "tube" of values, given by −� + yi ≤ g(xi) ≤ � + yi, where � is a given constant. The slack variables ξi and ξ∗i allow the system to cope with patterns that do not fall inside the "tube" and also to minimize the acceptable error for such patterns. To minimize the expression of J(w) in equation (2) with respect to w and w0, after several manipulations, the Lagrangian can be written as: L(w,w0,αi,α ∗ i ) = − 1 2 ∑ i ∑ j (α∗i −αi)Φ(xi)T Φ(xj)(α∗j −αj) + ∑ i (α∗i −αi)yi − ∑ i (α∗i −αi)w0 − ∑ i (αi + α ∗ i )� (4) subject to αi,α∗i ≥ 0. The Karush-Kuhn-Tucker (KKT) conditions (imposed by minimizing the Lagrangian) are given by: w = ∑ i (α∗i −αi)Φ(xi) (5)∑ i (αi −α∗i ) = 0 0 ≤ αi ≤ C 0 ≤ α∗i ≤ C (6) Incremental and Decremental SVM for Regression 757 The Wolfe dual form states that this minimization with respect to w and w0 can be trans- formed into a maximization with respect to αi and α∗i . This leads to the minimization of L′(w,w0,αi,α ∗ i ) = −L(w,w0,αi,α∗i ). The expression of L′(w,w0,αi,α∗i ) depends only on the inner product Φ(xi) T Φ(xj). There- fore, we use the notation Qij = Φ(xi)T Φ(xj). However, SVM uses the "kernel trick": the inner products of the vectors Φ(x), belonging to an infinite-dimension feature space, are replaced by the non-linear kernel function K(xi,xj) = e−σ(xi−xj) 2 = Φ(xi) T Φ(xj) [19]. Generally, kernel functions are used for non-linear SVMs, so we use the notation Qij = K(xi,xj) to designate the kernel function. The extremum conditions for L′(w,w0,αi,α∗i ), considering also the conditions (5, 6), are given by: ∂L′ ∂αi = − ∑ j Qij(α ∗ j −αj) −w0 + yi + �−δi + ui = 0 (7) ∂L′ ∂α∗i = ∑ j Qij(α ∗ j −αj) + w0 −yi + �−δ∗i + u∗i = 0 (8) with the following additional KKT conditions: δi,δ ∗ i ≥ 0 δiαi = 0, δ∗i α∗i = 0 (9) ui,u ∗ i ≥ 0 ui(C −αi) = 0, u∗i (C −α∗i ) = 0 (10) From equation (1), using the condition (5) and the notation α∗j −αj = θj, the expression of the "margin" function hi becomes: hi = g(xi) −yi = ∑ j Qijθj + w0 −yi (11) where −C ≤ θi ≤ C (12) From equations (7) - (12), observing in the KKT conditions that at most one of the αi, α∗i values is nonzero, it follows that there are three possible vector sets: • error vectors (E), for the following cases: hi < −� (on θi = C), or hi > � (on θi = −C); • support vectors (S), for the cases: hi = −� (on 0 < θi < C), or hi = � (on −C < θi < 0); • other vectors (O), for the cases where −� < hi < �, with θi = 0. SVMs can be trained in an incremental/decremental way. Using equation (11), we can decompose hi into two parts: the contribution of other vectors and the contribution of its own vector: hi = ∑ j 6=i Qijθj + Qiiθi + w0 −yi (13) Given that Qii > 0, an increasing negative contribution of other vectors θj (for j 6= i) would be compensated by increasing the value for θi up to the point where θi = C. From this point onward, since the increasing of negative contribution for θj would no longer be counteracted by 758 H. Gâlmeanu, L.M. Sasu, R. Andonie the increase of θi, the net result is that hi would decrease past the −� value, and xi would be treated as an error vector. In equation (13), only the error and support vectors contribute to the expression hi. For all the sets, the associated threshold value θi is bounded by the regularization parameter C (equation (12)). R = {E ∪O} is the set of reserve vectors. We denote by r, s, and e the reserve, support and error vectors, respectively. The whole training set is defined as {E ∪S ∪O}. 1.2 Related work and our contribution The adiabatic incremental/decremental training algorithm for SVMs (and SVRs) was intro- duced in [13,14] and it follows from a method proposed by Cauwenberghs and Poggio [3,5]. For each pattern being part of the solution (called support vector), the associated threshold value is θi ∈ {−�,�}. Following equation (13), as one new vector is "learned", its associated threshold θi increases/decreases, starting from 0, on the expense of other vectors’ thresholds. During this process, the rest of the vectors migrate between the sets of support, error and other vectors. When the same vector is "unlearned", its corresponding threshold value θi is decreased (when positive) or increased (when negative), in order to reach 0 (when the vector is removed). Dur- ing unlearning a vector, the vectors which previously migrated between sets when learning that vector will now migrate in the reverse order. An implementation of the adiabatic algorithm can be found in [12], with further developments in [7]. During incremental training, while changing the threshold of the newly introduced vector, the vectors will migrate among sets. The value of the threshold increment determines how many vectors will migrate. Therefore, we are interested to determine the largest threshold increment ∆θi which is small enough to cause only one vector to migrate. Proceeding with fixed small increments of ∆θi is not an efficient search technique. We discussed in [7] the conditions for vector migrations between the sets and how to determine the optimal increment. We gave the incremental/decremental context, but we did not describe particular situations, like the empty support vector set situation and the variation of the regularization parameter. An incremental/decremental SVR, which can simultaneously add batches of new samples and also remove the obsolete ones, was applied for online time series prediction [11]. The ν-SVR, presented in [21], is an incremental/decremental SVR which determines a priori the proportion ν of support vectors from the total set. The algorithms in [9,11] are variations of the adiabatic algorithm. For a classification problem, an incremental/decremental SVM operates in the following way: To add vector xc to the solution, the associated threshold θc is set from 0 to C, whereas to remove xc, θc is set to 0. For SVRs, this operations are not symmetric anymore: To add xc, θc is set from 0 to C (increase) or to −C (decrease); to remove xc, θc is also increased or decreased. The direction (incremental or decremental) is not directly given by the intention - adding or removing the vector, but by the margin hc. We will discuss this aspect in Section 2. We name our method Incremental/Decremental Support Vector Machine Regression (IDSVMR). It is based on the adiabatic training algorithm. Compared to [13,14], the IDSVMR contains all aspects needed for an implementation: a) We provide the relations that give the exact amount needed for increasing the threshold θc for the new vector, before any vector migrates; b) We determine the exact threshold variation that leads from one migration to the next. This pro- duces a robust incremental algorithm, that creates a sequence of vector migrations, as part of the learning process; c) We deduct the expressions for the variation of the threshold θc, from one migration to the next. During training, the support vectors’ set may become empty. We give the exact expressions needed to increase θc also considering this particular case. Compared to [7], we contribute with: i) a complete description of the increment/decrement Incremental and Decremental SVM for Regression 759 operations of the regularization parameter, showing that the expressions to be maximized (min- imized) are similar, and ii) we expand the results from [12] by giving an exact procedure on how to continue the migration in the empty support vector set cases, for both the regular procedure and the increment/decrement of the regularization parameter. Section 2 describes the expressions of the threshold variables θi and the expression of the vectors margins before the first migration. Section 3 introduces the expressions of the maximum threshold variations that occur before vectors migrations. Section 4 analyzes the maximum variation of the regularization parameter. Section 5 discusses the variations of the free parameter w0 and the regularization parameter in case of an empty support vector set. Section 6 presents the results of the IDSVMR implemented as a Weka [10] plugin, compared with the multilayer perceptron (MLP), two models based on adaptive resonance theory, and two classical SVM– derived regression models. Finally, Section 6 contains our conclusions. 2 Incremental and decremental updates of thresholds We start this section with summarizing the adiabatic training algorithm, which is at the core of the IDSVMR method. Then, we will introduce IDSVMR implementation details which are not included in the original adiabatic algorithm. To adapt to a new vector xc, the adiabatic algorithm migrates specific vectors between the sets, in order to reestablish the KKT conditions [5]. At the beginning, xc has the threshold θc = 0. The threshold can evolve both toward positive or negative directions. First, this direction is determined. Then, the threshold is modified considering, with each update, the migration between the sets. The migration is identified by observing the maximum (or minimum) variation of the margins hi for each vector. Considering the KKT conditions, specifically the conditions for the thresholds (6) and margin function (11), their variations can be expressed as: ∆hi = ∑ j Qij∆θj + ∆w0 (14) 0 = ∑ j ∆θj (15) When adding vector xc to the set, the migrations that take place strive to keep KKT conditions in place. Thus, the variations of thresholds and margins can be further expanded to: ∆hi = ∑ j Qij∆θj + ∆w0 + Qic∆θc (16) 0 = ∑ j ∆θj + ∆θc (17) These relations can be written more compact using e, the vector of ones:  ∆hS ∆hR ∆hc 0   =   eS QSS eR QRS 1 QcS 0 eTS   [ ∆w0 ∆θS ] + ∆θc   QSc QRc Qcc 1   (18) The modification of ∆θc would be absorbed by the modification of ∆θS (the set of support vectors), ∆w0, and the variation of margin functions. The margins of support vectors do not change: they are either −� or +�. Hence, we have ∆hS = 0 and we obtain: 760 H. Gâlmeanu, L.M. Sasu, R. Andonie [ ∆w0 ∆θS ] = − [ 0 eTS eS QSS ]−1 [ 1 QSc ] ︸ ︷︷ ︸ β ∆θc (19) Considering the definitions of ∆w0 and ∆θS, the margin functions of the current and reserve vectors are: [ ∆hc ∆hR ] = [ 1 QcS eR QRS ] β + [ Qcc QRc ] ︸ ︷︷ ︸ γ ∆θc (20) To prevent the recalculation of the inverse matrix in equation (19), the Sherman-Morrison- Woodbury formula was used in [12]. If used repeatedly, this leads to error accumulation. There- fore, we prefer to calculate the inverse by LU decomposition, which is in O(n3) but numerically stable [8]. 3 Learning direction and extremum increment/decrement before vector migration In what follows, we describe the conditions for vector migration between the sets and establish migration conditions for all possible situations. We explain how we prevent immediate cycling, mentioned in [12]. Although laborious, we give full coverage of all particular situations. To "learn" a new vector xc, the training process starts with the initial value θc = 0. Its margin, hc, is first determined using relation (11). Since ∆hc = γc∆θc, the aim is to modify θc such that the margin function is updated to fit as well as possible into the "tube" according to relation (3). The following cases occur: 1. if hc > �, then the margin decreases to fit into the −� < hc < � "tube" and ∆hc < 0: • if γc > 0, then ∆θc < 0, θc decreases to −C, and the direction is decremental; • else, if γc < 0, then ∆θc > 0, θc increases to C, and the direction is incremental; 2. if hc < −�, then the margin increases and ∆hc > 0: • if γc > 0, then ∆θc > 0, θc increases to C and the direction is incremental; • else, if γc < 0, then ∆θc < 0, θc decreases to −C and the direction is decremental; From these relations we can establish the direction: • if sgn(hc) 6= sgn(γc), the approach is incremental and ∆θc > 0; • if sgn(hc) = sgn(γc), the approach is decremental and ∆θc < 0. To "unlearn" a vector xc from the support or error sets, the direction only depends on the value of θc: if θc < 0, the direction is incremental; if θc > 0, the direction is decremental. Once the direction (the variation of θc) is established, we have to determine the variation increment ∆θc. We have the following cases (details are provided in Appendix A): Incremental and Decremental SVM for Regression 761 1. Migration of support (S) vectors. Considering only the ∆θS components from equation (19) and following the β notation, the expression of the increment for only one support vector is ∆θs = βs · ∆θc. It gives the limits for the θs updates (we considered βs to be the specific component from the β vector). On the other hand, considering the allowed variation interval for θs, we can determine the limit for θc before the support vector associated with θs leaves the set: • the incremental case: – if sgn(hs) = sgn(βs), θs may reach 0, so we have the migration to set O: ∆θc ≤− θs βs (S → O) – if sgn(hs) 6= sgn(βs), θs may reach C or −C: ∆θc ≤ sgn(βs) ·C −θs βs (S → E) • the decremental case: – if sgn(hs) 6= sgn(βs), θs may reach 0: ∆θc ≥− θs βs (S → O) – if sgn(hs) = sgn(βs), θs may reach C or −C: ∆θc ≥ −sgn(βs) ·C −θs βs (S → E) 2. Migration of other (O) vectors. These are the vectors with −� < he < �: • for the incremental case: ∆θc ≤ sgn(γr) · �−hr γr (O → S) • for the decremental case: ∆θc ≥ −sgn(γr) · �−hr γr (O → S) 3. Migration of error (E) vectors. These are the vectors with he < −� or he > �: • for the incremental case: – if sgn(hr) 6= sgn(γr): ∆θc ≤ −sgn(γr) · �−hr γr (E → S) – if sgn(hr) = sgn(γr): vector does not migrate • for the decremental case: – if sgn(hr) = sgn(γr): ∆θc ≥ sgn(γr) · �−hr γr (E → S) 762 H. Gâlmeanu, L.M. Sasu, R. Andonie – if sgn(hr) 6= sgn(γr): vector does not migrate 4. Migration of current xc vector. The current vector xc newly added to the set is checked for possible migration to the support set (S): • for the incremental case: – if sgn(hc) 6= sgn(γc): ∆θc ≤ −sgn(γc) · �−hc γc (E → S) – if sgn(hr) = sgn(γc): vector does not migrate • for the decremental case: – if sgn(hc) = sgn(γc): ∆θc ≥ sgn(γc) · �−hc γc (E → S) – if sgn(hc) 6= sgn(γc): vector does not migrate For removing xc from the training set, the objective is to increase/decrease θc to zero, so that the variation of the margin does not determine vector migration. 5. The variation of θc from/to zero should be −C ≤ θc ≤ C, and we have the following conditions: • for adding xc: – for the incremental case, ∆θc ≤ C −θc – for the decremental case, ∆θc ≥−C −θc • for removing xc: – for the incremental case, where θc < 0, ∆θc ≤−θc – for the decremental case, where θc > 0, ∆θc ≥−θc The maximum variation of ∆θc for the incremental case, before the vectors change sets, is determined by computing the minimum of all these values. Conversely, the minimum variation for the decremental case is established by determining the maximum of all these (negative) values deduced in the five cases presented above. 4 Incrementing and decrementing the regularization parameter Training a SVM also requires at some stage a fine-tuning of the regularization parameter. Many times, this is done by trial-and-error. In order to avoid re-training from scratch, the adiabatic algorithm modifies the regularization parameter in an incremental or decremental way, in small increments, watching for vectors migrations between sets. In the following, we aim to determine the maximum values for these increments. Starting from equations (14) and (15), and adopting the notation θk = bk ·C for the threshold of error vectors, the relations for the margin functions (when only the regularization parameter is modified) can be written as: ∆hi = ∑ j∈S Qij∆θj + (∑ k∈E Qikbk ) ∆C + w0 (21) Incremental and Decremental SVM for Regression 763 0 = ∑ j∈S ∆θj + (∑ k∈E bk ) ∆C (22) Or, more compact:  ∆hS∆hR 0   =   eS QSSeR QRS 0 eTS   ·[ ∆w0 ∆θS ] +   ∑ k∈E bkQSk∑ k∈E bkQRk∑ k∈E bk   · ∆C (23) Since the margins for the support vectors remain constant, ∆hS = 0, the variations of support vectors threshold and the margin variations can be computed as [ ∆w0 ∆θS ] = − [ 0 eTS eS QSS ]−1 · [ ∑ k∈E bk∑ k∈E bkQSk ] ︸ ︷︷ ︸ η︸ ︷︷ ︸ β ∆C (24) ∆hR = ([ eR QRS ] β + ∑ k∈E bkQRk ) ︸ ︷︷ ︸ γ ∆C (25) Let us summarize the migration conditions for each vector type (details can be found in Appendix B). 1. Migration of support vectors. • for incremental (∆C > 0, cumulative conditions): – if sgn(βs + sgn(hs)) 6= sgn(hs), then the limit is imposed by (migration S → E): ∆C ≤ −sgn(hs) ·C −θs βs + sgn(hs) – if sgn(βs) = sgn(hs), then the limit is imposed by (migration S → O): ∆C ≤−θs βs • for decremental (∆C < 0, cumulative): – if sgn(βs + sgn(hs)) = sgn(hs), then the limit is imposed by (migration S → E): ∆C ≥ −sgn(hs) ·C −θs βs + sgn(hs) – if sgn(βs) 6= sgn(hs), then the limit is imposed by (migration S → O): ∆C ≥−θs βs 2. Migration of other vectors. For other (O) vectors, −� < hr < �: 764 H. Gâlmeanu, L.M. Sasu, R. Andonie • for incremental (∆C > 0): ∆C ≤ sgn(γr)�−hr γr • for decremental (∆C < 0): ∆C ≥ −sgn(γr)�−hr γr 3. Migration of error (E) vectors. • for incremental (∆C > 0): (a) if sgn(hr) 6= sgn(γr): ∆C ≤ −sgn(γr)�−hr γr (b) if sgn(hr) = sgn(γr), the limit is not active; • for decremental (∆C < 0): (a) if sgn(hr) = sgn(γr): ∆C ≥ sgn(γr)�−hr γr (b) if sgn(hr) 6= sgn(γr), the limit is not active; 4. Variation of C is also limited by a target value Ctarget: • for incremental: ∆C ≤ Ctarget −C • for decremental: ∆C ≥ Ctarget −C When varying C, the maximum variation before the vectors change sets is given by the minimum of these previously described ∆C values for the incremental scenario, or as the maximum, for the decremental case. 5 Migration when the support vector set is empty During training process, the support vector set may become empty. In this circumstance, the variations of the free parameter w0 and the regularization parameter are very particular. 5.1 Regular training when the support vector set is empty For this case, equations (19) and (20) cannot be used to determine new values for the threshold parameters. Equation (16) can be written considering that there are no support vectors and that the threshold values for the error and other vectors do not change: ∆hc = ∆hr = ∆w0 (26) The variation of the margin for the current vector is the same as the variation for the margins of all vectors. Imposing limits on the allowed change in margins, before some vector change sets, will lead to determine the specific vector that migrates first. Depending on desired direction (increase or decrease of θc), we have two cases: Incremental and Decremental SVM for Regression 765 1. if hc < −�, then θc > 0 increases, we have the following limits for margins: • for other vector xo, −� < ho < �, ∆ho ≤ �−ho; • for error vector xe with he > �, the restriction is not active; • for error vector xe with he < −�, ∆he ≤−�−he; • for xc, ∆hc ≤−�−hc. 2. if hc > �, then θc < 0 decreases, we have the following limits for margins: • for other vector xo, −� < ho < �, ∆ho ≥−�−ho; • for error vector xe with he > �, ∆he ≥ �−he; • for error vector xe with he < −�, the restriction is not active; • for xc, ∆hc ≥ �−hc. The rules are valid for either the incremental or the decremental case. The margin of the xc vector is sought to either increase (first situation) or decrease (second). It is sufficient to take ∆w0 to be the smallest (or largest, depending on the desired direction) of these quantities. This way of varying the margins would always guarantee that some of them would be equal to −� or � limit, ensuring that the support vector set would not become empty. 5.2 Updating the regularization parameter when the support vector set is empty When the support vector set is empty, the regularization parameter cannot modify on the expense of the variation of support vectors’ threshold values. Relations (21) and (22) will change to: ∆hi = (∑ k∈E Qikbk ) ∆C + ∆w0 (27) 0 = ∑ k∈E bk∆C (28) Relation (28) is always valid, regardless of C’s variation. As opposed to how we proceeded previously, we assume that w0 parameter is kept unchanged, thus ∆hi = µi∆C, where we made the notation µi = ∑ k∈E Qikbk. 1. when C increases (∆C > 0, incremental): (a) for other vectors xr ∈ O, −� < hr < �: • if µr > 0 then ∆hr > 0, thus ∆hr ≤ � − hr or µr∆C ≤ � − hr, which leads to ∆C ≤ �−hr µr ; • if µr < 0 then ∆hr < 0, thus ∆hr ≥−�−hr or µr∆C ≥−�−hr, which leads to ∆C ≤ −�−hr µr ; (b) for error vectors xr inE and θr = C, hr < −�: • if µr > 0 then ∆hr > 0, thus ∆hr ≤−�−hr or µr∆C ≤−�−hr, which leads to ∆C ≤ −�−hr µr ; • if µr > 0 then ∆hr < 0, the condition is not active; (c) for error vectors xr inE and θr = −C, hr > �: • if µr > 0 then ∆hr > 0, the condition is not active; 766 H. Gâlmeanu, L.M. Sasu, R. Andonie • if µr < 0 then ∆hr < 0, thus ∆hr ≥ � − hr or µr∆C ≥ � − hr, which leads to ∆C ≤ �−hr µr ; In brief, a minimum will be computed among all the thresholds computed for vectors, imposed by conditions like: ∆C ≤ set ·sgn(µr)�−hr µr (29) where set = 1 if xr ∈ O and set = −1 if xr ∈ E. The condition will be inactive if sgn(µr) = sgn(hr) for xr ∈ E. 2. when C decreases (∆C < 0, incremental): Proceeding in the same manner, a maximum will be computed among all the thresholds determined by similar conditions: ∆C ≥ −set ·sgn(µr)�−hr µr (30) where set is defined like above. The condition will be inactive if sgn(µr) 6= sgn(hr), for xr ∈ E. 6 Experimental results In our experiments, we compare the IDSVMR with the MLP, two SVM–based regression models, and two incremental learning models derived by us in previous work from adaptive resonance theory: the Fuzzy ARTMAP with Relevance (FAMR) and the Bayesian ARTMAP for Regression (BAR). Since the last two models are less known, we included here a short description of them. FAMR was introduced in [1], as an extension of the classical Fuzzy ARTMAP (FAM) [15] and PROBART [16]. FAMR can be used as classifier, regression model and posterior prob- ability estimator. Each training pattern can come with its own relevance factor assigned to it, which influences the probabilistic links formed between the input and output categories. A relevance factor allows for ranking of sample pairs according to the confidence one has in the information source. FAMR builds one–to–many mappings between input and output categories through maximum likelihood, approximating the probability of associations between input and output categories. FAMR comes with a stochastic approximation procedure, which allows mak- ing use of the relevance factor associated to the training patterns. Under some mild constraints, FAMR’s Mapfield values converge both in mean square and with probability one to the posterior probability P(k|j) between the jth input category and the kth output category. Bayesian ARTMAP (BA) [17] is a neural architecture which uses a combination of FAM competitive learning and Bayesian learning. BA uses Gaussian categories and FAM competitive learning. BA modifies some of the characteristics of the FAM algorithm mainly by replacing the hyperrectangular categories with Gaussian categories. The categories may grow or shrink, and they are probabilistically associated to classes. BA probabilistically infers the classes associated to input categories, by using all input categories, unlike the competitive approach used by FAM and FAMR. In [18], the BA is extended for function approximation. The resulted architecture – Bayesian ARTMAP for Regression (BAR) – generalizes the BA algorithm using the clustering functionality of both input and output modules. The BAR has the universal approximation capability and also the best approximation property; this is very important in regression [18]. Both the FAMR and the BAR can be trained incrementally, but not decrementally. This is typical for FAM models. Incremental and Decremental SVM for Regression 767 For benchmarking, we use the following public datasets [6]: CPU Computer Hardware (CPU), Boston Housing (BH), Wisconsin Breast Cancer (WBC), and Communities and Crime (CC). For the CPU datasets we removed the following input features: vendor name, model name, and estimated relative performance. From the WBC dataset, we removed the four instances with missing values and also the outcome attribute. The first five features of the CC dataset are acknowledged (by the dataset donors) not to be predictive. We removed them, together with the features with missing values. The original datasets are described in [6]. The filtered datasets, as explained above1, are syn- thetically described in Table 1. To support performance comparisons across various benchmarks, both input and output values are independently scaled between 0 and 1. Table 1: Synthetic description of the four benchmark datasets used. Some instances and/or attributes were removed from the original datasets Dataset Input attributes Instances CPU 6 209 BH 13 506 WBC 32 194 CC 99 1994 We compare the regression capabilities of IDSVMR, BAR, and FAMR. We are also interested in comparing these incremental models with the popular MLP model and two classical SVM- based regression algorithms, namely ε-SVR [20] and ν-SVR [21]. This is of interest because IDSVMR, ε-SVR and ν-SVR have overlapping inductive bias, and beyond that they cover incre- mental and non-incremental adaptive models. We include MLP in the group of non–incremental models. Even if is trained with mini–batches of data, the MLP does not learn new information without possibly corrupting previously learned information (the stability-plasticity condition is not addressed). For FAMR, BAR and IDSVMR, we use our own Weka plugins, whereas for ε-SVR, ν-SVR and MLP we use the Weka–provided implementations [10]. We use ten random permutations (shuffles) of each datasets. For each permutation, 66% of the data is used as training/validation set and the rest serves exclusively as test set. A ten–fold cross- validation on the training/validation set is performed, for fine-tuning the hyperparameters of each model. For MLP, BAR and FAMR, paper [18] enumerates the sought hyperparameters and their corresponding search ranges. For IDSVMR, the hyperparameters are: the regularization coefficient C (shown in equation 2) and σ, the width of the Gaussian kernel2. For the CC dataset, we seek an optimal value of C in {4, 4.5, . . . , 8}, and σ is sought over the candidate values {0.5, 0.7, 0.9}. For the other three datasets, C is sought in the set {0.2, 0.4, . . . , 10}, and the candidate values for σ are in the set {0.5, 0.6, . . . , 0.9}. The same hyperparameters and corresponding search ranges as for IDSVMR are used for both ε-SVR and for ν-SVR. After cross-validation, we train the optimized models on the whole learning/validation set, and their generalization capability is assessed on the test set. The test set is used solely in this final assessment. The reported scoring values are the root mean squared error (RMSE) and the mean absolute error (MAE). Table 2 contains the average RMSE and MAE values over the ten permutations, measured on the test sets. We split the table into two sections: the former contains only incremental models, the latter is devoted to MLP and to the two SVM–based regression models. 1available at http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html 2Note that Section 1.1 and the subsequent ones "hide" this hyperparameter under the notation Qij. Never- theless, one ought to seek for a proper value of it as well. 768 H. Gâlmeanu, L.M. Sasu, R. Andonie The results for MLP, FAMR, and BAR are the ones found in [18]. All the steps for cross- validation, learning, and performance assessment are performed under the Weka Experimenter framework [10]. Table 2: The performance assessment results for ten random permutations of each of the four benchmarks datasets. The results are split into two groups: the former refers the incremental models (FAMR, BAR and IDSVMR), the latter presents the scores for ε-SVR, ν-SVR and MLP. Every cell of the table contains the pair of RMSE and MAE values averaged on the test sets. The boldfaced values are optimal in their corresponding group. The loss values for MLP, FAMR and BAR are the ones reported in [18]. Model CPU BH WBC CC FAMR 0.09, 0.06 0.15, 0.12 0.31, 0.27 0.21, 0.14 BAR 0.12, 0.06 0.17, 0.12 0.27, 0.23 0.20, 0.14 IDSVMR 0.12, 0.05 0.09, 0.05 0.26, 0.22 0.14, 0.09 ε-SVR 0.07, 0.05 0.09, 0.06 0.26, 0.22 0.16, 0.12 ν-SVR 0.05, 0.02 0.09, 0.05 0.26, 0.22 0.15, 0.11 MLP 0.19, 0.14 0.15, 0.11 0.30, 0.25 0.15, 0.12 For all four datasets, in the group of incremental models, the minimum MAE is obtained by IDSVMR. Except for the CPU dataset, the lowest RMSE is produced by IDSVMR; for CPU, IDSVMR produces the median RMSE value, at tie with the BAR. For the BH dataset, IDSVMR largely outperforms the other models. For all four datasets, IDSVMR exhibits a consistent tendency to produce the lowest error scores. In case of non–incremental models, ν-SVR always obtains the lowest scores. Except for the CPU dataset, the ε-SVR and ν-SVR models have a similar performance. MLP’s scores are close to the ones obtained by ν-SVR for CC, while for the other ones MLP shows rather modest performances. Finally, we compare the incremental and non–incremental models. For CPU, ν-SVR obtains better performance scores than any incremental model. For BH and WBC, ν-SVR is at par with IDSVMR. For CC, IDSVMR outperforms any non–incremental model. Conclusion The IDSVMR proves to be a performant and functional implementation of the adiabatic incremental/decremental SVR model. We have described it here with complete implementation details and we have implemented it as a Weka plugin. In conclusion, the newly–introduced IDSVMR shows itself as a promising incremental regres- sion model, favorably comparing with the other three incremental algorithms in terms of per- formance. Due to the differences between the hyperparameters number and ranges, it is rather meaningless to consider execution time for the cross-validation stage used for hyperparameter optimization. SVMs (and SVRs) are known be relatively slow during training. Using incremental/decre- mental training is therefore very attractive when dealing with large datasets and with data streams. Incremental and Decremental SVM for Regression 769 Bibliography [1] Andonie, R.; Sasu, L. (2006); Fuzzy ARTMAP with input relevances, IEEE Transactions on Neural Networks, 17: 929-941. [2] Carpenter, G.A.; Grossberg, S. (1988); The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network, IEEE Computer, 77–88. [3] Cauwenberghs, G.; Poggio, T. (2000); Incremental and Decremental Support Vector Ma- chine Learning, Neural Information Processing Systems, 409-415. [4] Chang, C.C.; Lin, C.J. (2001); LIBSVM: a Library for Support Vector Machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm [5] Diehl, C.P.; Cauwenberghs, G. (2003); SVM Incremental Learning, Adaptation and Opti- mization, Proceedings of the IJCNN, 4: 2685-2690. [6] Frank, A., Asuncion, A. (2010); UCI Machine Learning Repository, http://archive.ics.uci.edu/ml [7] Galmeanu, H.; Andonie, R. (2008); Incremental/Decremental SVM for Function Approx- imation, Proceedings of the 11th Intl. Conf. on Optimization of Electrical and Electronic Equipment, 2: 155-160. [8] Golub, G.H.; Van Loan, C.F. (1996); Matrix Computations, JHU Press, Baltimore and London, 1996. [9] Gu, B.; Wang, J.D.; Yu, Y.; Zheng, G.S.; Yu Fan.; Xu, T. (2012); Accurate on-line v-support vector learning, Neural Networks, 27: 51–59. [10] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. (2009); The WEKA data mining software: an update, SIGKDD Explorations Newsletter, 11 (1):10–18. [11] Karasuyama, M.; Takeuchi, I. (2010); Multiple incremental decremental learning of support vector machines, IEEE Transactions on Neural Networks, 21(7): 1048–1059. [12] Laskov, P.; Gehl, C.; Krüger, S.; Müller, K.R. (2006); Incremental Support Vector Learn- ing: Analysis, Implementation and Applications, Journal of Machine Learning Research, 7: 1909–1936. [13] Martin, M. (2002); On-line Support Vector Machines for Function Approximation, Technical Report LSI-02-11-R, Software Department, Universitat Politecnica de Catalunya. [14] Ma, J.; Thelier, J.; Perkins, S. (2003); Accurate On-line Support Vector Regression, Neural Computation, 15(11): 2683–2703. [15] Carpenter, G. A.; Grossberg, S.; Markuzon, N.; Reynolds, J. H.; Rosen, D. B., (1992), Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Transactions on Neural Networks, 3: 698-713. [16] Marriott, S.; Harrison, R. F. (1995), A modified fuzzy ARTMAP architecture for the ap- proximation of noisy mappings, Neural Networks, 8(4): 619–641. [17] Vigdor, B.; Lerner, B. (2007). The Bayesian ARTMAP, IEEE Transactions on Neural Net- works 18: 1628–1644. 770 H. Gâlmeanu, L.M. Sasu, R. Andonie [18] Sasu, L.M.; Andonie, R. (2013); Bayesian ARTMAP for regression, Neural Networks, ISSN 0893-6080, 46: 23-31. [19] Shashua, A. (2009); Introduction to Machine Learning: Class Notes 67577, https://arxiv.org/abs/0904.3664v1 [20] Vapnik, V. (1998); Statistical learning theory, New York: Wiley. [21] Schölkopf, B; Smola, A. J; Williamson, R. C.; Bartlett, P. L. (2000); New support vector algorithms, Neural computation, 12 (5): 1207–1245. Appendix A Migration of vectors on learning and unlearning 1. Migration of support (S) vectors. From equation (19), ∆θs = βs · ∆θc, the following cases are possible: (a) hs = �, for −C < θs < 0: i. if incremental (∆θc > 0) and βs > 0: ∆θs > 0, θs can reach 0, βs · ∆θc = ∆θs ≤ 0 −θs, thus ∆θc ≤−θsβs (S → O) ii. if incremental (∆θc > 0) and βs < 0: ∆θs < 0, θs can reach −C, βs · ∆θc = ∆θs ≥ −C − θs, thus ∆θc ≤ −C−θsβs (S → E) iii. if decremental (∆θc < 0) and βs > 0: ∆θs < 0, θs can reach -C, βs · ∆θc = ∆θs ≥ −C − θs, thus ∆θc ≥ −C−θsβs (S → E) iv. if decremental (∆θc < 0) and βs < 0: ∆θs > 0, θs can reach 0, βs · ∆θc = ∆θs ≤ 0 −θs, thus ∆θc ≥−θsβs (S → O) (b) hs = −�, for 0 < θs < C: i. if incremental (∆θc > 0) and βs > 0: ∆θs > 0, θs can reach C, βs ·∆θc = ∆θs ≤ C−θs, thus ∆θc ≤ C−θsβs (S → E) ii. if incremental (∆θc > 0) and βs < 0: ∆θs < 0, θs can reach 0, βs · ∆θc = ∆θs ≥ 0 −θs, thus ∆θc ≤−θsβs (S → O) iii. if decremental (∆θc < 0) and βs > 0: ∆θs < 0, θs can reach 0, βs · ∆θc = ∆θs ≥ 0 −θs, thus ∆θc ≥−θsβs (S → O) iv. if decremental (∆θc < 0) and βs < 0: ∆θs > 0, θs can reach C, βs ·∆θc = ∆θs ≤ C−θs, thus ∆θc ≥ C−θsβs (S → E) 2. Migration of other (O) vectors. These are the vectors with −� < he < �, θr = 0, ∆hr = γr · ∆θc. (a) if incremental (∆θc > 0) and γr > 0: ∆hr > 0, hr can reach �, γr · ∆θc = ∆hr ≤ �−hr, thus ∆θc ≤ �−hrγr (O → S) (b) if incremental (∆θc > 0) and γr < 0: ∆hr < 0, hr can reach −�, γr ·∆θc = ∆hr ≥−�−hr, thus ∆θc ≤ −�−hrγr (O → S) Incremental and Decremental SVM for Regression 771 (c) if decremental (∆θc < 0) and γr > 0: ∆hr < 0, hr can reach −�, γr ·∆θc = ∆hr ≥−�−hr, thus ∆θc ≥ −�−hrγr (O → S) (d) if decremental (∆θc < 0) and γr < 0: ∆hr > 0, hr can reach �, γr · ∆θc = ∆hr ≤ �−hr, thus ∆θc ≥ �−hrγr (O → S) 3. Migration of the error (E) vectors. (a) for hr > �, θr = −C, ∆hr = γr · ∆θc: i. if incremental (∆θc > 0) and γr > 0: ∆hr > 0, hr increases even further, vector does not migrate; ii. if incremental (∆θc > 0) and γr < 0: ∆hr < 0, hr may reach �, γr · ∆θc = ∆hr ≥ �−hr, ∆θc ≤ �−hrγr (E → S) iii. if decremental (∆θc < 0) and γr < 0: ∆hr > 0, hr increases even further, vector does not migrate; iv. if decremental (∆θc < 0) and γr > 0: ∆hr < 0, hr may reach �, γr · ∆θc = ∆hr ≥ �−hr, ∆θc ≥ �−hrγr (E → S) (b) for hr < −�, θr = C, ∆hr = γr · ∆θc: i. if incremental (∆θc > 0) and γr > 0: ∆hr > 0, hr may reach −�, γr ·∆θc = ∆hr ≤−�−hr, ∆θc ≤ −�−hrγr (E → S) ii. if incremental (∆θc > 0) and γr < 0: ∆hr < 0, hr decreases even further, vector does not migrate; iii. if decremental (∆θc < 0) and γr > 0: ∆hr < 0, hr decreases even further, vector does not migrate; iv. if decremental (∆θc < 0) and γr < 0: ∆hr > 0, hr may reach −�, γr ·∆θc = ∆hr ≤−�−hr, ∆θc ≥ −�−hrγr (E → S) 4. Migration of the current xc vector. The current newly added vector xc is checked for possible migration to the support set (S): (a) for hc > �: i. if incremental (∆θc > 0) and γc > 0: ∆hc > 0, hc increases even further, vector does not migrate; ii. if incremental (∆θc > 0) and γc < 0: ∆hc < 0, hc may reach �, γc · ∆θc = ∆hc ≥ �−hc, ∆θc ≤ �−hcγc (E → S) iii. if decremental (∆θc < 0) and γc > 0: ∆hc < 0, hc may reach �, γc · ∆θc = ∆hc ≥ �−hc, ∆θc ≥ �−hcγc (E → S) iv. if decremental (∆θc < 0) and γc < 0: ∆hc > 0, hc increases even further, vector does not migrate; (b) for hc < −�: i. if incremental (∆θc > 0) and γc > 0: ∆hc > 0, hc may reach −�, γc ·∆θc = ∆hc ≤−�−hc, ∆θc ≤ −�−hcγc (E → S) ii. if incremental (∆θc > 0) and γc < 0: ∆hc < 0, hc decreases even further, vector does not migrate; iii. if decremental (∆θc < 0) and γc > 0: ∆hc < 0, hc decreases even further, vector does not migrate; iv. if decremental (∆θc < 0) and γc < 0: ∆hc > 0, hc may reach −�, γc ·∆θc = ∆hc ≤−�−hc, ∆θc ≥ −�−hcγc (E → S) 772 H. Gâlmeanu, L.M. Sasu, R. Andonie Appendix B Migration of vectors on varying regularization parameter 1. Migration of the support vectors. (a) for ∆C < 0, when C is decreasing (decremental): i. for support vector with 0 ≤ θs ≤ C, hs = −�, the variation of C is limited by 0 ≤ θs + ∆θs ≤ C + ∆C: A. for βs ≤ 0: • βs · ∆C = ∆θs ≥ 0, θs + ∆θs ≥ 0 is always fulfilled; • θs + βs · ∆C ≤ C + ∆C; so (βs − 1)∆C ≤ C −θs leads to ∆C ≥ C−θs βs−1 , ∆θs ≥ 0, θs can reach C (migration S → E) B. for 0 ≤ βs ≤ 1, both conditions are active: • 0 ≤ θs + βs∆C, leading to ∆C ≥−θs βs , ∆θs ≤ 0, θs can reach 0 (migration S → O) • θs + βs∆C ≤ C + ∆C, (βs − 1)∆C ≤ C −θs, leading to ∆C ≥ C−θs βs−1 , ∆θs = βs∆C, θs can reach C (migration S → E) C. for βs ≥ 1: • θs + βs∆C ≥ 0 leads to ∆C ≥−θs βs , θs decreases (migration S → O) • θs + βs∆C ≤ C + ∆C, or (βs − 1)∆C ≤ C −θs, is always fulfilled. Summary: • if βs − 1 ≤ 0, we have S → E migration: ∆C ≥ C −θs βs − 1 • if βs ≥ 0, we have S → O migration: ∆C ≥−θs βs ii. for support vector with −C ≤ θs ≤ 0, hs = +�, the variation of C is limited by −C − ∆C ≤ θs + ∆θs ≤ 0: A. for βs ≤−1: • −C − ∆C ≤ θs + βs∆C, or −C −θs ≤ (βs + 1)∆C is always fulfilled; • θs + βs∆C ≤ 0 leads to ∆C ≥−θs βs , ∆θs ≥ 0, θs can reach 0 (migration S → O) B. for −1 ≤ βs ≤ 0, both conditions are active: • −C − ∆C ≤ θs + βs∆C, leading to ∆C ≥ −C−θs βs+1 (migration S → E) • θs + βs∆C ≤ 0 leads to ∆C ≥−θs βs (migration S → O) C. for βs ≥ 0: • −C − ∆C ≤ θs + βs∆C, or −C −θs ≤ (βs + 1)∆C, leading to ∆C ≥ −C−θs βs+1 (migration S → E) • θs + βs∆C ≤ 0 is always fulfilled. Incremental and Decremental SVM for Regression 773 Summary: • if βs ≤ 0, we have S → O migration: ∆C ≥−θs βs • if βs + 1 ≥ 0, we have S → E migration: ∆C ≥ −C −θs βs + 1 (b) for ∆C > 0, when C is increasing (incremental): i. for support vector with 0 ≤ θs ≤ C, hs = −�, the variation of C is limited by 0 ≤ θs + ∆θs ≤ C + ∆C: A. for βs ≤ 0: • θs + βs∆C ≥ 0, or βs∆C ≥−θs leads to ∆C ≤−θs βs (migration S → O) • θs + βs∆C ≤ C + ∆C or (βs − 1)∆C ≤ C −θs is always fulfilled; B. for 0 ≤ βs ≤ 1: • θs + βs∆C ≥ 0, or βs∆C ≥−θs is always fulfilled; • θs + βs∆C ≤ C + ∆C or (βs − 1)∆C ≤ C −θs is also always fulfilled; C. for βs ≥ 1: • θs + βs∆C ≥ 0, or βs∆C ≥−θs is always fulfilled; • θs + βs∆C ≤ C + ∆C or (βs − 1)∆C ≤ C −θs leads to ∆C ≤ C−θs βs−1 (migration S → E) Summary: • if βs ≤ 0, we have S → O migration: ∆C ≤−θs βs • if βs − 1 ≥ 0, we have S → E migration: ∆C ≤ C −θs βs − 1 ii. for support vector with −C ≤ θs ≤ 0, hs = +�, the variation of C is limited by −C − ∆C ≤ θs + ∆θs ≤ 0: A. for βs ≤−1: • −C − ∆C ≤ θs + βs∆C, or −C −θs ≤ (βs + 1)∆C leads to ∆C ≤ −C−θs βs+1 (migration S → E) • θs + βs∆C ≤ 0 or βs∆C ≤−θs is always fulfilled; B. for −1 ≤ βs ≤ 0: • −C − ∆C ≤ θs + βs∆C, is always fulfilled; • θs + βs∆C ≤ 0 is also always fulfilled; C. for βs ≥ 0: • −C − ∆C ≤ θs + βs∆C is always fulfilled; • θs + βs∆C ≤ 0 or βs∆C ≤−θs leads to ∆C ≤−θs βs (migration S → O) 774 H. Gâlmeanu, L.M. Sasu, R. Andonie Summary: • if βs + 1 ≤ 0, we have S → E migration: ∆C ≤ −C −θs βs + 1 • if βs ≥ 0, we have S → O migration: ∆C ≤−θs βs For migration of support vectors, these conditions are summarized in section 4. Hence, we have (conditions may be cumulative): • for incremental (∆C > 0): – if sgn(βs + sgn(hs)) 6= sgn(hs), then the limit is imposed by (migration S → E): ∆C ≤ −sgn(hs) ·C −θs βs + sgn(hs) – if sgn(βs) = sgn(hs), then the limit is imposed by (migration S → O): ∆C ≤−θs βs • for decremental (∆C < 0): – if sgn(βs + sgn(hs)) = sgn(hs), then the limit is imposed by (migration S → E): ∆C ≥ −sgn(hs) ·C −θs βs + sgn(hs) – if sgn(βs) 6= sgn(hs), then the limit is imposed by (migration S → O): ∆C ≥−θs βs 2. Migration of other vectors. For other (O) vectors, −� < hr < �, where θr = 0; ∆hr = γr · ∆C. (a) for ∆C < 0, when C is decreasing (decremental): i. for γr > 0: ∆γr < 0, hr may reach −�, γr∆C ≥−�−hr, it leads to ∆C ≥ −�−hrγr (migration O → S) ii. for γr < 0: ∆γr > 0, hr may reach �, γr∆C ≤ � − hr, it leads to ∆C ≥ �−hrγr (migration O → S) (b) for ∆C > 0, when C is decreasing (decremental): i. for γr > 0: ∆γr > 0, hr may reach �, γr∆C ≥ � − hr, it leads to ∆C ≤ �−hrγr (migration O → S) ii. for γr < 0: ∆γr < 0, hr may reach −�, γr∆C ≥−�−hr, it leads to ∆C ≤ −�−hrγr (migration O → S) Resuming the O → S migration: Incremental and Decremental SVM for Regression 775 • for incremental (∆C > 0): ∆C ≤ sgn(γr)�−hr γr • for decremental (∆C < 0): ∆C ≥ −sgn(γr)�−hr γr 3. Migration of error (E) vectors. (a) for error vectors with hr > �, where θr = −C; ∆hr = γr · ∆C: i. for ∆C < 0 and γr < 0, or ∆C > 0 and γr > 0, then ∆hr > 0, hr increases even further, there is no migration; ii. for ∆C < 0 and γr > 0, then ∆hr < 0, hr may reach �; γr∆C ≥ �−hr leads to ∆C ≥ �−hr γr (migration E → S) iii. for ∆C > 0 and γr < 0, then ∆hr < 0, hr may reach �; γr∆C ≥ �−hr leads to ∆C ≤ �−hr γr (migration E → S) (b) for error vectors with hr < −�, where θr = C; ∆hr = γr · ∆C: i. for ∆C < 0 and γr > 0, or ∆C > 0 and γr < 0, then ∆hr < 0, hr decreases even further, there is no migration; ii. for ∆C < 0 and γr < 0, then ∆hr > 0, hr may reach −�; γr∆C ≤−�−hr leads to ∆C ≥ −�−hr γr (migration E → S) iii. for ∆C > 0 and γr > 0, then ∆hr > 0, hr may reach −�; γr∆C ≤−�−hr leads to ∆C ≤ −�−hr γr (migration E → S) Resuming the E → S migration: • for incremental (∆C > 0): (a) if sgn(hr) 6= sgn(γr): ∆C ≤ −sgn(γr)�−hr γr (b) if sgn(hr) = sgn(γr), the limit is not active; • for decremental (∆C < 0): (a) if sgn(hr) = sgn(γr): ∆C ≥ sgn(γr)�−hr γr (b) if sgn(hr) 6= sgn(γr), the limit is not active; 4. Variation of C is also limited by a target value Ctarget: • for incremental: ∆C ≤ Ctarget −C • for decremental: ∆C ≥ Ctarget −C