Microsoft Word - 211.docx


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 61, 2017 

A publication of 

 
The Italian Association 
of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Petar S Varbanov, Rongxin Su, Hon Loong Lam, Xia Liu, Jiří J Klemeš 
Copyright © 2017, AIDIC Servizi S.r.l. 

ISBN 978-88-95608-51-8; ISSN 2283-9216 

Batch Process Scheduling under Uncertainty using Data-

Driven Multistage Adaptive Robust Optimization 

Chao Ning, Fengqi You*  

Cornell University, Ithaca, New York 14853, USA 

fengqi.you@cornell.edu 

This paper proposes a novel data-driven batch process scheduling approach based on multistage adaptive 

robust optimization coupled with robust kernel density estimation (RKDE). The kernelized iteratively re-weighted 

lease squares (KIRWLS) algorithm combined with kernel tricks are adopted to learn the probability density 

function from outlier-corrupted uncertain processing time data. We then propose a data-driven outlier-resilient 

uncertainty set for scheduling problem using the extracted distributional information. The proposed framework 

exhibits robustness to contamination of uncertainty data by integrating robust optimization with robust statistics. 

The batch process scheduling is then formulated as a data-driven multistage decision-making problem. By 

introducing affine decision rules for recourse variables, the resulting data-driven multistage adaptive robust 

optimization problem can be solved efficiently. We apply the proposed data-driven multistage adaptive robust 

optimization to a multipurpose batch process scheduling problem using a dataset to demonstrate the superiority 

of the proposed method. Our proposed approach generates $13,851 more profits than those of multistage 

adaptive robust optimization with box set. Compared with the multistage adaptive robust optimization using 

kernel density estimation (KDE), the result returned from the proposed method generates $4,064 more profits.  

1. Introduction 

In the past decades, batch processes are becoming increasingly important in modern manufacturing industries 

(Hegyháti and Friedler, 2010). Due to their customer-oriented nature, multiproduct batch plants have been 

widely employed in manufacturing high-value-added products, such as fine chemicals and pharmaceuticals. 

Therefore, considerable research effort has been made in batch process scheduling (Sun et al., 2016). The 

uncertain nature of some parameters in scheduling models necessitates the research in batch process 

scheduling under uncertainty (Shi and You, 2016). Among these methods, robust optimization gains popularity 

due to its strong ability to hedge against uncertainties and also because of its computational attractiveness (Ben-

Tal et al., 2004), and it has been applied in biomass processing network (Gong et al., 2016), biofuel supply 

chain (Tong et al., 2014) and microalgae production (Gong and You, 2017). Recently, a nested stochastic robust 

optimization framework was proposed to handle multi-scale uncertainties (Yue and You, 2016). However, most 

of the existing robust optimization based approaches assume a perfect knowledge on the region of uncertainty 

parameters or uncertainty set, which might not be realistic in some applications. In practice, what process 

operators have is the historical data of these uncertain parameters, and this is especially true in the era of big 

data. Besides, datasets in process industries are often contaminated with outliers. Data outliers arise for various 

reasons, such as sensor malfunction, human recording error and disturbance in processes (Liu et al., 2004). A 

main drawback of the existing robust scheduling methods is the lack of a proper mechanism to deal with these 

outliers. Thus, their uncertainty sets will be distorted in the presence of outliers. Consequently, their robust 

solutions could be very conservative, meaning a less profitable schedule when using real datasets. 

This paper proposes a novel data-driven multistage adaptive robust optimization (ARO) modeling framework 

and its solution strategy to address batch process scheduling under uncertainty. The kernelized iteratively re-

weighted lease squares (KIRWLS) algorithm is adopted to extract the probability density functions of uncertain 

parameters from corrupted uncertain processing time data (Kim and Scott, 2012). The RKDE can be interpreted 

as a weighted version of kernel density estimation (KDE), in which weights of outliers are diminished (Kim and 

Scott, 2012). In this way, the RKDE approach captures a reliable probability distribution in the presence of 

                               
DOI: 10.3303/CET1761259

 
Please cite this article as: Ning C., You F., 2017, Batch process scheduling under uncertainty using data-driven multistage adaptive robust 
optimization, Chemical Engineering Transactions, 61, 1567-1572  DOI:10.3303/CET1761259  

1567


outliers in uncertainty data. We then propose a data-driven uncertainty set for multistage ARO using quantile 

functions. The proposed data-driven multistage ARO exhibits robustness to contamination of uncertainty data 

by exploiting the idea from robust statistics in the context of robust optimization. To address the computational 

challenge, we introduce affine decision rules for recourse variables, and reformulate the problem as a 

deterministic problem. We apply the proposed data-driven multistage ARO to an industrial scale multipurpose 

batch process scheduling under processing time uncertainty.  

2. Robust kernel density estimation 

We assume uncertain parameters follow some unknown distribution with a density function f, and u1, …, uN are 

N realizations of uncertainties. The KDE is shown as follows. 

   
1

1ˆ ,
N

KDE h i

i

f K
N 

 u u u    (1) 

where Kh is a kernel function with a positive bandwidth h. The Gaussian kernel, which is given below. 

 
2

1
, exp

22

d

i

h i
K

hh

  
   
   

u u
u u    (2) 

Since Gaussian kernel is a positive semi-definite kernel, there exists a mapping   from Rd to H such that 

     , ,h i iK   u u u u (Kim and Scott, 2012). H is an infinite dimensional Hilbert space of functions, also 

known as reproducing kernel Hilbert space (RKHS). 

We employ RKDE, and it is casted as a minimization problem shown as follows (Kim and Scott, 2012). 

  
1

min
N

i
g H

i

g




  u    (3) 

where   is a robust loss function, and the notation   represents the first derivative of robust loss function  . 

The objective function is denoted as  J g . The KIRWLS algorithm is given below (Kim and Scott, 2012). 

 
Figure 1: Flowchart of the KIRWLS algorithm. 

3. Data-driven multistage ARO based batch process scheduling model 

In this section, the data-driven multistage ARO based process scheduling model is proposed. In a scheduling 

problem, the following information is given: the structure of the facility, recipe data, equipment units and their 

1568


sizes, storage capacities of intermediates, product prices and processing times (Chu and You, 2015). The data-

driven multistage ARO based process scheduling model is presented as follows. 

The objective function of the scheduling model is expressed as the revenue from selling products minus the 

cost of purchasing raw materials, and the expression is given as follows.  

 0max s sN s
s

price ST ST          (4)

where the index s presents state, N is the total number of time points, prices is the price of chemical s and 

STsn is a continuous decision variable representing the storage amount of chemical s at time point n. 

Constraint (5) specifies that the batch size of task i should not exceed its maximum batch size or fall below its 

minimum batch size when the task i is scheduled. 

min max
  , ,

i inn inn i inn n
b W B b W n N n N j J



  
              (5) 

where 
inn

B   represents the batch size of task i that starts at time point n and finished before time point n , innW   

is a binary variable, and is equal to 1 when the task i starts at time point n and finished before time point n . The 

parameters 
max

i
b  and 

min

i
b  are the maximum and minimum batch sizes of task i, respectively. 

Constraint (6) specifies the balance of equipment units.  

 1
  ,

j n n

jn inn in nj n
i I n N n N

E E W W n N j J
 

 
   

 
      

 
 

           (6)

where jnE  indicates whether equipment unit j is used at time point n, and its upper bound is given by, 

1 ,
jn

E j n                     (7)

Constraints (8) and (9) specify mass balance and storage capacities.  

 1
,

s sn n

O I

sn is in n is inns n
i TO i TIn N n N

ST ST B B s n 
 

 
   

           (8) 

max
,

sn s
ST C s n           (9)

where TIs and TOs are the sets of tasks that utilize chemical s as input and output, respectively. 
I

is
  and 

O

is
  

are the material balance coefficients for chemical s as input and output of task i, respectively. 
max

s
C  stands for 

the maximum storage capacity for chemical s. 

Constraint (10) enforces that the time duration between two time points is no less than the processing times in 

each equipment unit. 

       ,
j

n n

n n in inn in inn

i I

T T d W vd B U j


  



      d d d                      (10)

where dn represents the realizations of uncertain processing times available up to time point n. din is the fixed 

processing time of task i, and vdin is the variable processing time of task i. Note that Tn(dn) is a general function 

of the past uncertainty realizations. The set U is the data-driven uncertainty set defined as follows, which is 

decision-dependent. 

 
 

 
 

 

 1  1

0

ˆ ˆ 1   ,

1   
j jn n

i i

RKDE in RKDE

in

in n in in n i i

i I n N i I n Nn N n N

F d F i n

U d
W d W d j

 


 

 

 

     

    
  

     
         

     
   

                         (11)

where set nN


 represents the set of time points prior to n, 
0

i
d  and 

i
  are defined in (12) and (13), respectively. 

 
 

 
 

 1  10 ˆ ˆ 1 2
i i

i RKDE RKDE
d F F 

    
 

                              (12) 

 
   1 0 0ˆ 1ii RKDE i iF d d 


                                     (13)

Constraints (14)-(17) specify the initial and final conditions.  

0 , , \
inn n

W i n n N N



                        (14) 

0 , , \
inn n

B i n n N N



                     (15) 

1569


1
0T                     (16) 

0
jN

E j                                    (17)

The nonnegative and integral constraints are shown as follows. 

, , , 0 , , , ,
inn jn sn n

B E ST T i n n s j                     (18) 

 0,1 , ,innW i n n                    (19) 

The multistage ARO with general decision rules is known to be computationally intractable (Ben-Tal et al., 

2004). Therefore, we adopt the widely used affine decision rules to address this computational challenge. An 

affine decision rule for timing decision variables is adopted as follows. 

     0
n nn

n in in

i n n

T d   


  d                   (20) 

 
0 0

n

n

in n in

n N

W 



  



                      (21) 

By plugging the affine decision rule of (20) into (21), we can reformulate it as follows.  

 
 

 
          1  1 0 0ˆ ˆ1 1

j j

i i n njnn jnn jnn

RKDE i n RKDE i n j i n i inn

i I n N j J i I n N i I

F r F q s vd B    


   

       

         

                               (22) 

We adopt the Glover’s linearization technique to handle the bilinear term, which is a product of a continuous 

variable and a binary variable. 

1

n

jnn jnn

i n n j i n j

n N

W s p



 

      



                     (23) 

0 0

n

jnn

i n n j i n

n N

W s





     



                     (24) 

jnn jnn

j i n j
s p

 

                      (25) 

The resulting data-driven multistage adaptive robust optimization based scheduling problem can be solved 

efficiently using the state-of-the-art branch-and-cut methods implemented in solvers like CPLEX. 

4. Case study 

 
Figure 2: State task network of the multipurpose batch process. 

In this section, an industrial multipurpose batch process (Yue et al., 2013) is presented to demonstrate the 

advantages of the proposed data-driven multistage ARO based approach. Figure 1 depicts the state-task 

network of this batch process (Chu et al., 2013). This network consists of 14 chemicals and 8 operation tasks 

(Wassick et al., 2012).  In this application, the fixed processing time of all tasks are subject to uncertainty (Chu 

et al., 2013). The processing time data of Reaction 1 and Reaction 2 are used to construct uncertainty sets. It 

1570


is worth noting that these processing time data are contaminated by outliers (Chu et al., 2015). The fixed 

processing times of other tasks are assumed to change up to 20 % of their corresponding nominal values. 

In this case study, we demonstrate the superiority of the proposed data-driven multistage ARO based scheduling 

approach through comparing it with other methods. These methods include multistage ARO with box uncertainty 

set, multistage ARO based on KDE, multistage ARO based on RKDE using Huber loss function and multistage 

ARO based on RKDE using Hample loss function. All these optimization methods are modeled in GAMS 24.7.3, 

and are implemented on a computer with an Intel (R) Core (TM) i7-6700 CPU @ 3.40 GHz and 32 GB RAM. 

The optimality tolerance is set to be 0.001 % for the solver CPLEX 12.6.3. 

In the schedule results of multistage ARO with box uncertainty set and multistage ARO based on KDE, the idle 

periods are actually reserved to hedge against the outliers in the dataset. It is worth pointing out that these data 

outliers might be ascribed to human recording error or sensor malfunctions. Therefore, its robust scheduling 

result is over-conservative and less profitable. Table 1 shows that the multistage ARO based on KDE method 

generates 22.3 % more profits than the multistage ARO with box set. The reduction of conservatism is due to 

the probability distribution information leveraged by the multistage ARO based on KDE. 

 
Figure 3: Gantt charts of data-driven multistage ARO using RKDE with different robust loss function: (a) Huber 

loss function, (b) Hampel loss function. 

Table 1: Comparisons of model and solution statistics between different methods in the case study. 

 Multistage 
ARO with 
box set  

Data-driven 
multistage ARO 
with KDE 

Data-driven 
multistage ARO with 
RKDE (Huber loss) 

Data-driven multistage 
ARO with RKDE 
(Hampel loss) 

Binary variables 1,320 1,320 1,320 1,320 

Continuous 
variables 

95,401 95,401 95,401 95,401 

Constraints 215,457 215,457 215,457 215,457 

CPU (s) 1,294.8 363.7 299.0 261.6 

Max. Profit ($) 43,964 53,751 54,296 57,815 

Figure 3 shows the Gantt charts of the data-driven multistage ARO based on RKDE using two kinds of robust 

loss functions, including Huber loss function and Hampel loss function (Kim and Scott, 2012). Due to the 

integration of robust statistics into robust optimization, the robust schedule is able to tolerate the data outliers in 

1571


processing times. Comparing Figure 3 (a) and (b), we can readily conclude that the Hampel loss function is 

more suitable than the Huber loss function in the process scheduling problem. The data-driven multistage ARO 

based on RKDE (Hampel loss function) generates the highest profit, which is 31.5 % and 7.6 % higher than the 

results of the multistage ARO with box set and the multistage ARO based on KDE, respectively. 

5. Conclusions 

This paper proposed a novel data-driven multistage ARO coupled with RKDE batch process scheduling model. 

The proposed framework exhibited robustness to contamination of uncertainty data by integrating robust 

optimization with robust statistics. The novel contributions of this work include the novel data-driven multistage 

ARO based batch process scheduling model and its solution strategy, as well as the data-driven outlier-resistant 

uncertainty set based on RKDE for multistage ARO. We applied the proposed data-driven multistage ARO 

modelling framework and solution strategy to a multipurpose batch process scheduling to demonstrate the 

superiority of the proposed method. Future works may consider extensions of the proposed method for nonlinear 

and/or nonconvex mixed-integer ARO problems, and the comparison of the impacts of different loss functions 

on the results of data-driving multistage ARO frameworks. 

References 

Ben-Tal A., Goryashko A., Guslitzer E., Nemirovski A., 2004, Adjustable robust solutions of uncertain linear 

programs, Mathematical Programming, 99, 351-376. 

Chu Y., Wassick J.M., You F., 2013, Efficient scheduling method of complex batch processes with general 

network structure via agent-based modeling. AIChE Journal, 59, 2884-2906. 

Chu Y., You F., 2013, Integrated scheduling and dynamic optimization of sequential batch processes with online 

implementation. AIChE Journal, 59, 2379-2406. 

Chu Y., You F., 2015, Model-based integration of control and operations: Overview, challenges, advances, and 

opportunities, Computers & Chemical Engineering, 83, 2-20. 

Chu Y., You F., Wassick J.M., Agarwal A., 2015, Integrated planning and scheduling under production 

uncertainties: Bi-level model formulation and hybrid solution method. Computers & Chemical Engineering, 

72, 255-272. 

Gong J., Garcia D. J., You F., 2016, Unraveling Optimal biomass processing routes from bioconversion product 

and process networks under uncertainty: An adaptive robust optimization approach, ACS Sustainable 

Chemistry & Engineering, 4, 3160-3173. 

Gong J., You F., 2017, Optimal processing network design under uncertainty for producing fuels and value-

added bioproducts from microalgae: Two-stage adaptive robust mixed integer fractional programming model 

and computationally efficient solution algorithm, AIChE Journal, 63, 582-600. 

Hegyháti M., Friedler F., 2010, Overview of industrial batch process scheduling, Chemical Engineering 

Transactions, 21, 895-900. 

Kim J., Scott C., 2012, Robust kernel density estimation, Journal of Machine Learning Research, 13, 2529-

2565. 

Liu H., Shah S., Jiang, W., 2004, On-line outlier detection and data cleaning, Computers & Chemical 

Engineering, 28, 1635-1647. 

Shi H., You F., 2016, A computational framework and solution algorithms for two-stage adaptive robust 

scheduling of batch manufacturing processes under uncertainty, AIChE Journal, 62, 687-703. 

Sun L., Zou X., Dong H., Wang S., 2016, Simultaneous optimization of short-term scheduling and heat 

integration schemes for multipurpose batch plants, Chemical Engineering Transactions, 52, 355-360. 

Tong K., You F., Rong G., 2014, Robust design and operations of hydrocarbon biofuel supply chain integrating 

with existing petroleum refineries considering unit cost objective, Computers & Chemical Engineering, 68, 

128-139. 

Wassick J.M., Agarwal A., Akiya N., Ferrio J., Bury S., You F., 2012, Addressing the operational challenges in 

the development, manufacture, and supply of advanced materials and performance products. Computers & 

Chemical Engineering, 47, 157-169. 

Yue D., You F., 2013, Sustainable scheduling of batch processes under economic and environmental criteria 

with MINLP models and algorithms, Computers & Chemical Engineering, 54, 44-59. 

Yue D., You F., 2016, Optimal supply chain design and operations under multi-scale uncertainties: Nested 

stochastic robust optimization modeling framework and solution algorithm. AIChE Journal, 62, 3041-3055. 

1572