Mathematical Problems of Computer Science 53, 39--48, 2020 

 
 
 

UDC 519.6 
 

Application of Machine Learning-Based Electrochemical 

Deposition Models to CMP Modeling 

 
Ruben G. Ghulghazaryan1, Davit G. Piliposyan1, Misak T. Shoyan1 and Hayk V. Nersisyan2 

1 Mentor, a Siemens Business, Yerevan, Armenia 
2 American University of Armenia, Yerevan, Armenia 

e-mail: ruben_ghulghazaryan@mentor.com 
 

Abstract 
 

Chemical mechanical polishing/planarization (CMP) is the primary process used for 
modern integrated circuits (IC) manufacturing. Modeling of the post-CMP surface profile 
is critical for detecting planarity hotspots prior to manufacturing and avoiding fatal 
failures of chips. Electrochemical deposition (ECD) is a key process for the void-free 
filling of interconnection wires and vias in modern chips. Large surface topography 
variations generated after ECD affect the post-CMP surface profile. In this paper, several 
machine learning approaches are used to model surface profiles after ECD that are used 
as input to CMP models. Different combinations of deep neural networks, long-short-
term-memory (LSTM) recurrent networks, convolutional neural networks and XGBoost 
algorithms are investigated and compared. The model based on the XGBoost library 
showed superior performance and accuracy. 

Keywords: CMP, ECD, Machine learning, Neural networks, LSTM, XGBoost. 
 
 
 

1. Introduction 
 
Chemical mechanical planarization/polishing (CMP) is a crucial technology used during the 
manufacturing of multi-level interconnections of semiconductor chips and electronic devices [1, 
2]. Many process steps used for chip manufacturing require planar surfaces for correct pattern 
printing during lithography to generate structures for the next layer. CMP is the key process used 
in the chip production flow for achieving surface planarity required for depth of focus (DOF), 
lithography requirements, further etch steps for construction of multi-level interconnection wires, 
high-k replacement metal gate transistors, 3D stacked chips, 3D NAND memory cells, etc.  
During CMP, a polishing pad is pressed against the rotating wafer. A chemical slurry containing 
abrasive particles and chemical agents is added between the pad and the wafer. Combined 
mechanical and chemical interactions that take place simultaneously at the wafer pad contact area 

 
39 
 

mailto:ruben_ghulghazaryan@mentor.com


Application of Machine Learning-Based Electrochemical Deposition Models to CMP Modeling 

 
40 

result in material removal from the wafer surface, leading to wafer surface profile planarization 
[1, 2]. The post-CMP surface profile depends on the applied pressure, slurry chemistry, and the 
pattern printed on the wafer. An inappropriate combination of these parameters can lead to non-
uniform material removal, which can lead to defects like dishing of metal lines and erosion of 
dielectrics, causing hotspots such as open contacts, circuit shorting, timing delays, and RC 
violations. 

In modern chips, copper interconnection lines are built using dual damascene processes. 
During the copper dual damascene process, both wires and vias are filled with copper 
simultaneously. The electrochemical deposition (ECD) process is then used to fill the wires and 
vias with copper [3]. After ECD, copper is not only deposited over trenches, but also between 
them. CMP is used after ECD to remove the excess copper over oxides and isolate wires to avoid 
fatal electrical shorts.  Large surface topography variation created after ECD introduces additional 
challenges for Cu CMP (Fig. 1), making modeling of the ECD surface profile critical for high-
accuracy CMP modeling.  

Modeling of the post-CMP surface profile enables the detection of possible CMP hotspots 
prior to manufacturing [4, 5]. On the other hand, modeling of deposition surfaces prior to CMP is 
crucial for correct CMP simulation, since the post-deposition surface profile is used as input for 
CMP simulation, and may affect the final surface profile. Even with advanced deposition 
processes, the pre-CMP profile on a patterned wafer is non-uniform and affects the surface 
planarity after CMP. A fully connected neural network (NN)-based full-chip deposition model for 
predicting post-deposition surface profile for shallow trenches isolation (STI) and inter-level 
dielectric (ILD) deposition steps has been developed [6]. However, the post-ECD surface profile 
shows even complicated topography variation, depending on the underlying layer pattern 
geometries [3]. 

To predict these complicated topography variations, the use of more advanced machine 
learning (ML) modeling techniques should be investigated. ML is a sub-area of artificial 
intelligence (AI) that enables computer systems to recognize patterns using algorithms and data 
sets, and use that information to develop solutions. ML is used in a wide variety of applications, 
including fraud detection, computer vision, transportation, bioinformatics, etc. [7, 8]. The ML 
algorithms make predictions based on initially gathered (training) data without being explicitly 
programmed for that task. Typical ML applications use neural networks (NNs), which consist of 
layers of artificial “neuron” data processing elements with an input layer, several hidden 
processing layers, and an output layer, with weighted connections between the layers. Training an 
NN means finding values for the weights of connections (based on minimization of a cost function) 
that best fit the training and validation data. There are multiple types of NN architectures, including 
deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural 
networks (RNN) [7, 8].  

A gradient boosted decision tree (GBDT) is another class of prediction technique widely 
used in ML modeling [9]. A GBDT model consists of an ensemble of decision trees the predictions 
of which are combined and optimized based on the prediction error on training data to reduce the 
error of the overall model. Extreme gradient boosting (XGBoost) is one of the implementations of 
GBDT algorithms, popular for its ability to learn non-linear decision relations and used in both 
industry and academia alike [9]. 

In this paper, we apply ML methods such as CNN, long short-term memory (LSTM), RNN, 
and XGBoost to model surface profiles after ECD for CMP modeling. 
 
 
 
 

https://www.edureka.co/blog/machine-learning-applications/%23fraud-detection
https://www.edureka.co/blog/machine-learning-applications/%23transportation


R. Ghulghazaryan, D. Piliposyan, M. Shoyan and H. Nersisyan 

 
41 

2. Electrochemical Deposition Process 

Dual damascene is a state-of-the-art technique containing copper deposition and CMP steps used 
in the manufacturing of back-end-of-line (BEOL) wiring. To date, ECD is the most suitable copper 
deposition method for the copper deposition step. Using ECD, the vias and trenches can be filled 
simultaneously. To avoid air gaps, voids or defects during trench filling, chemical additives are 
used during the ECD process. These additives strongly affect the local deposition rate, leading to 
superfill, where the copper deposition rate at the trench bottom is higher than on trench sidewalls. 
These chemical additives are mainly categorized into two types: suppressors and accelerators. 
Accelerators consist of surfactant molecules that are absorbed on the surface and have the feature 
of forcing out more weakly-absorbed additives. For a given voltage, they enhance the current 
value, resulting in a higher copper plating rate. Suppressors consist of polymer-like molecules.  In 
contrast to accelerators, they reduce current value and passivate the copper plating rate. Levelers 
are a subordinate class of suppressor additives that polarize the areas with high current densities 
and even out current distribution. They are commonly used to reduce overburden thickness or 
bumps. During the ECD process, the trench bottom surface area shrinks, causing the accelerator 
molecules to displace more weakly-absorbed suppressor molecules. This leads to a higher 
concentration of accelerator molecules at the bottom of the trenches than on trench sidewalls and 
non-trench areas, resulting in the desirable void-free superfill effect. The side effect of this process 
is that the sites with narrow trenches fill up faster and continue to grow at a higher rate than sites 
with wider trenches. This causes a formation of bumps. When narrow trenches are located close 
to each other, these bumps accumulate into a single large bump. These bumps lead to a large profile 
surface variation (Fig. 1) and require excessive polishing, introducing additional challenges for the 
CMP. The effect of bump formation can be weakened by adding levelers; however, in general, 
bump formation cannot be avoided. 
 
 

 

Fig. 1. Schematic plot of post-ECD surface profile. 

 
The ECD process is complicated and continuously evolving. It is hard to follow all the 

effects of chemistry and physics in physics-based models. Meanwhile, even minimalistic 
chemistry and physics-based models should keep track of deposition rate changes during ECD to 
properly model the surface profile. This tracking requires a huge number of computations for 
solving a large system of differential equations needed to track surface profile evolution during 
ECD. The runtime of chemistry- and physics-based ECD models may take several minutes up to 
hours for a layer, depending on the chip size.  

To model the ECD process, the design is first divided into fixed-size tiles and for each tile, 
average values of geometrical characteristics like width, space, pattern density, and perimeter are 



Application of Machine Learning-Based Electrochemical Deposition Models to CMP Modeling 

 
42 

extracted. Using these characteristics, the surface dynamic transformations are modeled using the 
“effective trench” (ET) approximation for each tile. 

In ET approximation, the tile surface profile is characterized by previously extracted 
characteristics together with parameters describing the height of the profile surface material inside 
(ZT) and outside (ZNT) of the trenches. All these parameters are determined dynamically and 
passed to the ECD model for simulation. During the simulation, ZT, ZNT, and the geometric data 
are updated for each tile. After the ECD simulation, the surface profile data for each tile are used 
as input for the CMP model. 
 

3. Training Data Generation for Machine Learning   

The ML training data should include topography height and geometry data of patterns collected 
from design layouts. When collecting geometry data for ML model input, both specially designed 
test chips and production design layouts were used to provide sufficient coverage of possible 
geometry patterns supported by the technology. For training an ML model, large amounts of data 
are required for separation into training, validation, and test data. We carried out a series of 
experiments using Calibre™ CMP ModelBuilder and CMPAnalyzer tools. First, we created 
physics-based models using data collected at the factories. We then used these physics-based 
models to generate the training, validation, and test data for ML model building. 

To generate input and output data, we extracted six grids with geometry characteristics of 
patterns (width, space, pattern density, perimeter, etc.) for input, and surface profile grids ZT and 
ZNT for output, from different test chip and production design layouts.  

 

4. Results and Discussion 

We experimented with different architectures of DNN, CNN, RNNs and GBDT methods and their 
combinations for modeling post-ECD surface profiles (Fig. 2). In this section, we review the 
architectures and models that provided the best combination of running time and accuracy. 
 
 

 
 Fig. 2. NN input and output data. 

 

5. RNN-Based Modeling   

For the first model, we considered a model based on the combination of DNN and LSTM RNN 
networks. The DNN part consists of a feed-forward neural network with three hidden layers and 
10 neurons per layer. Six-dimensional geometry data (width, space, pattern density, etc.) are used 



R. Ghulghazaryan, D. Piliposyan, M. Shoyan and H. Nersisyan 

 
43 

as input to the network. Two-dimensional output is used for modeling post-ECD profiles.  A 
sigmoid activation function is used for all layers. The model is trained using a stochastic gradient 
descent optimization method for reverse Huber (berHu) loss function [1], which has the form    

𝐵𝐵(𝑥𝑥) = �
|𝑥𝑥|        |𝑥𝑥| ≤ 𝑐𝑐,

𝑥𝑥2 + 𝑐𝑐2

2𝑐𝑐
   |𝑥𝑥| > 𝑐𝑐.     

 

Here, c is a scalar related to 10% of the maximum error on the batch. This method has been 
proved to have several advantages over other standard loss functions usually used for regression 
models, such as the mean square error function, etc. [1]. 

The output of the DNN model is then used as input to a bidirectional LSTM RNN. This 
input is extended to include several neighbor sites as a new input to LSTM RNN, to take into 
account weak long effects inherent to ECD process (i.e., the effects of neighbor site patterns on 
the local surface profile heights after deposition). Using the DNN output values of neighbors, the 
new input dimension becomes 7*7*2. For each site, we use as input an extended block containing 
three neighbor sites, from left, right, top and bottom, or radius 3 neighbors of a given site. The 
goal of this block of data is to improve the predictions of the DNN results by taking into account 
the effects of neighbors. In Fig. 3, scatter plots of height data of predictions vs. targets are presented 
after the application of LSTM RNN. 

 

 
Fig. 3. Scatter plots of combined DNN and LSTM RNN model predictions vs. targets for ZNT and ZT 

height data. 

  

6. CNN-Based Modeling 

The next model was based on CNN [7-8]. The architecture we used consists of two blocks, as 
shown in Fig. 4. The first block is a separate feature extraction for each input parameter through 
CNN. The collection of all outputs from the first phase serves as an input to the second block of 
the network. The second block consists of a DNN with three hidden layers and 10 neurons per 
layer. Again, the berHu function is used for the loss function. 
 



Application of Machine Learning-Based Electrochemical Deposition Models to CMP Modeling 

 
44 

 

 

 

 

 

 

 

 

 

Fig. 4. Schematic plot of the combined CNN-DNN model. 

 
The accuracy of the model predictions on the test set is analyzed using scatter plots of prediction 
vs. target data for surface heights, as shown in Fig 5․  Red lines show 10% deviations from the 
target. 
 
 

 
Fig. 5. Scatter plots of predictions vs targets for ZNT and ZT using NN model. 

 

 
7.  XGBoost Method for Modeling ECD Surface Profiles 
Finally, algorithms of the XGBoost library [9] were used for ECD surface profile modeling. The 
motivation for using XGBoost is that it uses fewer resources compared to DNN and CNN 
approaches. XGBoost is a form of a gradient tree boosting or a gradient boosting machine. It is 
implemented with a high-level optimization strategy for fast execution speed and high 



R. Ghulghazaryan, D. Piliposyan, M. Shoyan and H. Nersisyan 

 
45 

performance [9]. A gradient boosting algorithm is a sequential combination of several predictors, 
where each predictor is an improvement on its predecessor (e.g., by fitting to the errors of its 
predecessor). This algorithm uses a number of hyper-parameters. We tune the parameters using a 
greedy search. We use a linear regression model as the objective function, and choose the best 
results by minimizing the root mean squared error. We use the same training, validation, and 
testing datasets for these experiments as for the previous models. 
 

 

Fig. 6.  Scatter plots of predictions vs targets for ZNT and ZT using XGBoost. 

A significant improvement in prediction accuracy is obtained with the XGBoost algorithm, as 
shown in scatter plots in Fig 6-7. Moreover, we noticed a significant reduction in model training 
time with XGBoost compared to the other approaches we tried. 
 

 
Fig. 7.  Linescan (for fixed x value) predictions vs targets plots for ZNT and ZT using XGBoost. 

 
 
 

8. Conclusion 

In this paper, different ML methods were used to model surface profiles after electrochemical 
deposition of copper over patterned wafers for the creation of interconnection wires for chips. 
Weak long-range interactions of patterns on the design are inherent for post-ECD surface profiles, 
which means that the surface height above the given pattern is not defined solely by the pattern 
itself, but is affected by neighbor patterns. Using physics and chemistry-based electrochemical 
deposition simulation, training, validation, and test data was generated for the ML model building.  



Application of Machine Learning-Based Electrochemical Deposition Models to CMP Modeling 

 
46 

Several ML models were used for modeling, including combined DNN-LSTM RNN, combined 
CNN-DNN, and XGBoost-based models. It was found that the XGBoost-based model provided 
the best accuracy in surface height prediction, as well as correct data trends and high correlation 
with simulated linescans. It also has a much shorter training time (a couple of hours) compared to 
other methods (several hours or several days).    
 

Acknowledgements 

The authors would like to express their appreciation to Shelly Stalnaker for her editorial assistance 
in the preparation of this paper. 
 
 

References 
 

[1] D. Zhao and X. Lu, “Chemical mechanical polishing: Theory and experiment,” Friction, 
vol. 1, no. 4, pp. 306-326, 2013. 
 

[2]  G. Banerjee and R. L. Rhoades, “Chemical mechanical planarization historical review and 
future direction,” ECS Transactions, vol. 13, no. 4, pp. 1-19, 2008. 
 

[3] J. Jhothiraman and R. Balachandran. “Electroplating: Applications in the Semiconductor 
Industry.” Advances in Chemical Engineering and Science, vol. 9, pp. 239-261, 2019. 
 

[4]  R. Ghulghazaryan, J. Wilson and N. Takeshita, “CMP Model building and hotspot 
detection by simulation”, Proceedings of 158th Meeting of Planarization CMP Committee, 
Nagoya, Japan, vol. 55 pp. 55-59, 2017. 
 

[5] R. Ghulghazaryan, J. Wilson  and N. Takeshita, “Building CMP Models for CMP 
Simulation and Hotspot Detection”,  Mentor, a Siemens Business, mentor.com 
(whitepaper), 2017. 
 

[6] R. Ghulghazaryan, D. Piliposyan and J. Wilson, “Application of neural network-based 
oxide deposition models to CMP modeling”, ECS Journal of Solid State Science and 
Technology, vol.8,  no. 5, pp 3154-3162, 2019. 
 

[7] I. Laina, Ch. Rupprecht, V. Belagiannis, F. Tombari and N. Navab, “Deeper depth 
prediction with fully convolutional residual networks”, CoRR, abs/1606.00373, 2016. 
 

[8] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT press, 2016. 
 

[9] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in KDD ’16: 
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining, pp. 785–794, 2016. 
 

 
Submitted 18.12.2019, accepted 22.04.2020 

 



R. Ghulghazaryan, D. Piliposyan, M. Shoyan and H. Nersisyan 

 
47 

Մեքենայական ուսուցման միջոցով ստեղծված 
էլեկտրաքիմիական նստեցման  մոդելների կիրառումը քիմիա-

մեխանիկական փայլեցման մոդելավորման համար 
 

Ռուբեն Գ․  Ղուլղազարյան1, Դավիթ Գ․  Փիլիպոսյան1,   
Միսակ Տ․  Սհոյան1 և Հայկ Վ. Ներսիսյան2 

 

1 Mentor, a Siemens Business, Երևան, Հայաստան 
2Հայաստանի Ամերիկյան Համալսարան Երևան, Հայաստան 

e-mail:  ruben_ghulghazaryan@mentor.com 
 
 

Ամփոփում 
 

Քիմիա-մեխանիկական փայլեցումը/պլանարիզացիան (CMP) հանդիսանում է 
ժամանակակից ինտեգրալ միկրոսխեմաների արտադրության հիմնական 
գործընթացը: CMP-ից առաջացած մակերևույթի պրոֆիլի մոդելավորումն ունի 
վճռորոշ նշանակություն արտադրությունից առաջ չիպերի պլանարության 
դեֆեկտների, «տաք կետերի» (hotspots) ստացման և կրիտիկական խափանումերը 
բացահայտելու համար: Էլեկտրաքիմիական նստեցումը (ECD) ժամանակակից 
միկրոսխեմաների միջմիացումների և միջանցիկ միացումների (vias) անթերի լցոնման 
կարևորագույն պրոցես է:  ECD-ից հետո առաջանում են մակերևույթի տոպոգրաֆիայի 
զգալի փոփոխություններ, որոնք ազդում են CMP-ից հետո առաջացած մակերևույթի 
պրոֆիլի վրա։ 

Այս հոդվածում ուսումնասիրված են մեքենայական ուսուցման մի քանի 
մոտեցումներ՝ ECD-ից ստացված մակերևույթի պրոֆիլը մոդելավորելու համար, որն 
օգտագործվելու է որպես մուտքային տվյալ CMP մոդելավորման համար: 
Հետազոտված են խորը նեյրոնային ցանցերի, երկար կարճաժամկետ հիշողության 
(LSTM) ռեկուրրենտ ցանցերի, փաթույթային նեյրոնային ցանցերի և XGBoost 
գրադիենտային բուսթինգի ալգորիթմների տարբեր համակցություններ: Այլ մոդելների 
նկատմամբ XGBoost գրադարանի վրա հիմնված մոդելը ցույց տվեց ամենաբարձր 
արտադրողականություն և ճշտություն: 

 Բանալի բառեր` CMP, ECD, մեքենայական ուսուցում, նեյրոնային ցանցեր, 
LSTM, XGBoost. 

 
 
 
 
 
 

mailto:ruben_ghulghazaryan@mentor.com


Application of Machine Learning-Based Electrochemical Deposition Models to CMP Modeling 

 
48 

Использование моделей электрохимического осаждения 
основанных на машинном обучении для моделирования 

химико-механического полирования 
 

Рубен Г. Гулгазарян1, Давит Г. Пилипосян1, Мисак Т. Сгоян1 и Айк В. Нерсисян2 

1 Mentor, a Siemens Business, Ереван, Армения 
2 Американский университет Армении, Ереван, Армения 

e-mail: ruben_ghulghazaryan@mentor.com 
 

Аннотация 
 

          Химико-механическое полирование/планаризация (CMP) - основной процесс, 
используемый в производстве современных интегральных микросхем. Моделирование 
профиля поверхности после CMP имеет решающее значение для дефектов планарности, 
«горячих точек» (hotspots) перед началом производства чипов и выявления их фатальных 
отказов. Электрохимическое осаждение (ECD) - это ключевой процесс бездефектного 
заполнения межсоединений и сквозных соединений (vias) в современных микросхемах. 
После ECD возникают значительные  изменения топографии поверхности, которые влияют 
на профиль поверхности после CMP. В данной статье, для моделирования профилей 
поверхности после ECD, используемых в качестве входных данных для CMP 
моделирования,  рассмотрены несколько подходов машинного обучения. Исследуются 
различные комбинации глубоких нейронных сетей, рекуррентных нейронных сетей с 
долгой краткосрочной памятью (LSTM), сверточных нейронных сетей и алгоритмов 
градиентного бустинга XGBoost. По сравнению с другими моделями, модель на основе 
библиотеки XGBoost проявила наивысшую производительность и точность. 

Ключевые слова: CMP, ECD, машинное обучение, нейронные сети, LSTM, 
XGBoost. 

 

 
 

mailto:ruben_ghulghazaryan@mentor.com