44 

 
Mathematical Problems of Computer Science 55, 44--53, 2021 

 
    UDC 004.932 

 
Application of Deep Learning-Based Methods to the Single 

Image Non-Uniform Blind Motion Deblurring Problem 

Misak T. Shoyan1, Robert G. Hakobyan1 and Mekhak T. Shoyan2 

1 National Polytechnic University of Armenia 

2 Yerevan State University, Armenia 

e-mail: misakshoyan@gmail.com, rob.hakobyan@gmail.com, mexakshoyan@gmail.com 

 
Abstract 

 
In this paper, we present deep learning-based blind image deblurring methods for 

estimating and removing a non-uniform motion blur from a single blurry image. We 
propose two fully convolutional neural networks (CNN) for solving the problem. The 
networks are trained end-to-end to reconstruct the latent sharp image directly from the 

given single blurry image without estimating and making any assumptions on the blur 
kernel, its uniformity, and noise. We demonstrate the performance of the proposed models 
and show that our approaches can effectively estimate and remove complex non-uniform 
motion blur from a single blurry image. 

Keywords: Motion blur, Blind motion deblurring, Non-uniform blurring, Blur kernel. 

 
1. Introduction 
 

Motion blur is one of the most undesired types of image degradation when taking photos. The 

shake of the camera and the object motion during the exposure cause motion blurry images. Motion 

blur is an undesirable effect, particularly in photography, and still is considered an effect, which 

causes a significant distortion of an image. The process of recovering the latent sharp image from 

a single motion blurry image or from a sequence of blurry video frames is called motion deblurring. 

In practice, there are a large number of possible motion paths, and every motion-blurred image is 

uniquely blurred, thus motion deblurring is a common and challenging problem nowadays. 

A high-level representation of the blurring process is the following model  

 
                                             𝑏 = 𝐼 ⊗ f + n,                                       (1) 

mailto:misakshoyan@gmail.com
mailto:rob.hakobyan@gmail.com
mailto:mexakshoyan@gmail.com


M. Shoyan, R. Hakobyan and M. Shoyan 45 

where I is the latent sharp image, f is the blur kernel, n denotes the noise, and ⊗ is the convolution 
operator. In the presence of only one blurry image, the problem is called single-image motion 

deblurring. In the case of multiple sequential blurry images, the problem is called multi-

image/video motion deblurring. Our interest is mainly related to single-image motion deblurring. 

If the blur kernel or point spread function (PSF) is shift-invariant in the sense that blurring is 

uniform, then the deblurring problem turns into the image deconvolution problem. When the point 

spread function (PSF) is shift-variant and therefore the blurring is non-uniform, then it is 

considered a deblurring problem. 

Image deblurring is categorized as non-blind and blind cases. In the case of non-blind 

deblurring, the blur kernel is known, or there is a way to compute it using some prior knowledge, 

so the problem turns to estimate the latent sharp image given the known blur kernel. There are 

some difficulties to overcome even though it may seem not a hard task. For example, the presence 

of noise and possible ringing artifacts arising during deblurring make it a challenging problem.  

There are some traditional methods such as Wiener deconvolution [1] which is expressed as 

 
                                                𝐺(𝑓) =  
𝐻 ∗(𝑓) 𝑆(𝑓)

|𝐻(𝑓)|2 𝑆(𝑓)+ 𝑁(𝑓)
 ,                                       (2) 

 
where 𝑓 is the frequency in the frequency domain, 𝐺 is the Fourier transform of the estimated 
kernel, which then is convolved with the blurry image to estimate the latent sharp image, 𝐻 is the 
Fourier transform of the blur kernel, 𝑁 and 𝑆 are the mean power spectral density of the noise and 
latent sharp image respectively, ∗ denotes the complex conjugation. Iterative Richardson-Lucy 
(RL) [2, 3] deconvolution is another method, which is expressed as 

                                                    𝐼𝑡+1 =  𝐼𝑡  (𝑃𝑆𝐹 𝑇 ⊗ (
𝐵

𝐼𝑡⊗𝑃𝑆𝐹
)),                             (3) 

where  𝐼𝑡 and 𝐼𝑡+1 are tth and (t+1)th estimations of the latent sharp image 𝐼, 𝐵 is the blurry image 
and 𝑃𝑆𝐹𝑇  is the flipped version of 𝑃𝑆𝐹. 

These methods were presented decades ago. In further studies, the solution to the problem of 

non-blind deblurring tends to be based on many famous image priors, for example, sparse priors 

[4] and total variation [5], which have been introduced for regularization purposes to improve the 

quality of deconvolution in the presence of noise. 

The blind deblurring [6] is a more challenging problem since in this case the blur kernel or 

PSF is also unknown in addition to the unknown latent sharp image. The blind deblurring problem 

consists of two stages: the PSF estimation and non-blind deconvolution. In contrast to non-blind 

deblurring, more sophisticated priors have been introduced here, such as norm-based prior [7], 

dark channel prior [8], reweighted graph total variation prior [9], etc. 

Image deblurring methods are also categorized as deep learning-based (DL) and non-deep 

learning-based (non-DL) or optimization-based methods. Non-DL-based or optimization-based 

methods try to reconstruct the latent sharp image by minimizing the energy function [10, 11], 

using, for example, Gaussian or Poisson likelihoods in the scope of maximum-a-posteriori 

estimation [12]. 

Even though non-DL-based methods are effective in image deblurring, they are usually based 

on relatively simplified assumptions on the blur model compared with DL-based methods. It is 

also worth mentioning the time-consuming hyperparameter tuning process for non-DL-based 

methods, which is significant in real-world cases. In recent years, DL-based approaches have 

become more and more applicable. DL-based methods use convolutional neural networks to 

reconstruct the latent sharp image [13]. Also, recurrent neural networks are used for single image 

deblurring [14]. In terms of both accuracy and efficiency, these methods exceed non-DL methods. 

So, we present deep learning-based blind image deblurring methods for estimating and removing 

non-uniform motion blur from a single blurry image. 


Application of DL-Based Methods to the Single Image Non-Uniform Blind Motion Deblurring Problem 46 

2. Dataset 

A common practice for creating a dataset for supervised image deblur problems is to synthetically 

generate blurry images by blurring latent sharp images with a kernel and then adding some noise 

[15, 16]. However, the blurry images generated in this way may differ from a real blurry image, 

and the dataset might not be representative enough. 

A new kernel-free approach of dataset generation for supervised motion deblur problems was 

proposed in [17]. They used a GOPRO4 Hero Black camera for dataset generation. They record 

high-quality videos with 240 fps and then average sequential video frames of latent sharp images 

to produce motion blurry images [18]. The corresponding latent sharp image for the generated 

blurry image is chosen as the middle image of the sequence that is used to average and generate  

the blurry image. 

When the motion blur is caused by the motion of an object, the blurriest part of the blurry 

image should be the object itself, leaving the background mostly the same as in the latent sharp 

image. The proposed kernel-free dataset generation method [17] for supervised motion deblur 

problems solves that problem unlike the other methods [15, 16]. 

We chose the GOPRO dataset [18] for training and evaluating our models. The dataset contains 

3214 pairs of blurry and sharp images. 

 
3. Proposed Methods 

We propose two encoder-decoder architecture based fully convolutional neural networks. 

The first one (ResnetEncDec) uses Resnet-50 [19] as an encoder. It receives a 3x256x256 RGB 

image as input. The first step is a convolution with a 7x7 kernel wit h stride 2 followed by max-

pooling with stride 2. Then the Resnet-50 residual blocks follow, which use 1x1 and 3x3 

convolutions. Each convolution layer is followed by a batch normalization layer [20] and ReLU 

activation. The encoder part outputs a 2048x8x8 feature map, which is used as an input of the 

decoder part.  

The decoder part consists of transposed convolution and upsample layers. First, 3 decoder 

blocks follow, each of which consists of a transpose convolution layer followed by 2 convolutions. 

Then, 2 upsample layers follow, each of which performs a bilinear upsampling with a factor of 2 

followed by 2 convolutions. Then, a 1x1 convolution follow to reduce the channels of the 

activation map to 3. Then, a sigmoid activation follow to output colors in [0, 1] range for each 

pixel of the output image. All the convolution and deconvolution layers are followed by batch 

normalization and ReLU activation (except the last convolution layer, which is followed by 

sigmoid activation). 

The skip connections are used between the encoder and decoder layers inspired by the U-Net 

architecture [21]. The architecture of the network is shown in Figure 1.  

The next proposed network is inspired by the real-time style transfer method proposed in [22]. 

They propose using an image transform network (TransformNet) for the style transfer problem to 

stylize the input content image with the style of the style image (Fig 2). Since the network 

performed well on style transfer image to image problem, thus, being able to generate an image  

that is some modified version of the input image, we proposed it for the motion deblur problem.  

 
M. Shoyan, R. Hakobyan and M. Shoyan 47 

 
Fig. 1. The architecture of the ResnetEncDec fully convolutional network. 

 
Fig. 2. The architecture of the style transfer network [22]. 

The first layer of the proposed Transform Net is a 9x9 convolution with stride 1. Then two 3x3 

convolutions follow with stride 2. Then, 5 residual blocks follow, each of which consists of two 

3x3 convolutions followed by batch normalization and ReLU activation (Fig. 3). Each  residual 

block contains a residual connection between its input and output. After the 5 residual blocks, two 

3x3 transposed convolution layers follow with stride 2. Then, a 9x9 convolution follow with stride 

1. Finally, sigmoid activation follows to output colors in [0, 1] range for each pixel of the output 

image. Each convolution layer is followed by batch normalization and ReLU activation (except 

the last convolution layer, which is followed by sigmoid activation). 

   
(a)                                                    (b) 

Fig. 3. (a) The architecture of the TransformNet. [23] (b) The architecture of each residual block [23]. 

  
4. Training  

Both proposed networks are trained on the GOPRO dataset with 256x256 resized images. Since 

we want to minimize the pixel-wise differences between the output and latent sharp image in the 

motion deblur problem, we chose MSE [24] and MAE [25] as loss functions: 


Application of DL-Based Methods to the Single Image Non-Uniform Blind Motion Deblurring Problem 48 

                                                   𝑀𝑆𝐸 =
1

𝑁
 ∑(𝑦�̂� − 𝑦𝑖 )

2

𝑁

𝑖=1

,                                                      (4) 

                                                           𝑀𝐴𝐸 =
1

𝑁
 ∑|𝑦�̂� − 𝑦𝑖 |

𝑁

𝑖=1

,                                                (5) 

where 𝑁 is the number of pixels in the image, 𝑦 is the pixel value of the sharp image and �̂� is the 
predicted pixel value. 

Our experiments showed that MSE performs better for both of the networks, at least at the 

early steps of training, so we used MSE for further experiments. 

As evaluation metrics we chose PSNR (peak signal-to-noise ratio) [26] and MSE functions: 

 
                                                 𝑃𝑆𝑁𝑅 = 20 log10 (
𝑀𝐴𝑋𝑖

√𝑀𝑆𝐸
) ,                                 (6) 

 
where 𝑀𝐴𝑋𝑖  is the maximum possible pixel value of the image. 
The Adam optimizer [27] was used with a learning rate of 0.001. Both networks are trained 

for 350 epochs with batch sizes 15 and 44 for ResnetEncDec and TransformNet correspondingly 

running on GeForce GTX 1070 Ti GPU. ImageNet [19] pre-trained weights are used to initialize 

the ResnetEncDec encoder part. For TransformNet, training continued additionally for 250 epochs 

with SGD optimizer [28] without momentum with a learning rate of 0.0001. However, it does not 

lead to significant improvements. 

The learning curves of both networks are shown in Figure 4. 

 
Fig 4. The learning curves of ResnetEncDec (a, b) and TransformNet(c, d). 

 
5. Results 

We evaluate the performance of our proposed models on the GOPRO dataset. The results are 

compared with one of the state-of-the-art methods [17]. The quantitative performance comparison 

of the proposed models is shown in Table 1 (note that we use 256x256 resized images, while in 

[17] they use images with an original size of 1280x720). 


M. Shoyan, R. Hakobyan and M. Shoyan 49 

 
Table 1: Quantitative performance comparison of the models. 

   
Metrics ResNetEncDec TransformNet Nah et al.  [17] 

PSNR 24.98 26.26 28.93 

MSE 0.0033 0.00245 - 

Some deblurring results are shown in Fig. 5. 

In terms of performance and memory usage, the TransformNet and ResNetEncDec are 

lightweight networks compared to [17], since [17] relies on a deep multi-scale architecture.  

At the same time, as it is obvious from the architectures of the proposed networks, the 

TransformNet is more lightweight and requires less computational time and resources than the 

ResNetEncDec. 

   
      Input image                   ResnetEncDec result  TransformNet result 

   
 Fig 5. The results on GOPRO test dataset. 


Application of DL-Based Methods to the Single Image Non-Uniform Blind Motion Deblurring Problem 50 

6. Conclusion 

In this paper, two deep learning-based blind motion deblurring methods were presented to 

reconstruct the latent sharp image from a single motion blurry image without having any 

information about the blur kernel, its uniformity, and existing noise. The proposed methods, which 

are encoder-decoder architecture-based fully convolutional neural networks, were trained, 

validated and evaluated on the GOPRO dataset [18] (using 256x256 resized images) and compared 

with one of the state-of-the-art methods presented in [17]. Based on the results shown in Table 1 

and Figure 5, it becomes clear that the proposed methods can effectively remove complex non -

uniform motion blur demonstrating acceptable results. The code and results are available at 

https://github.com/Mekhak/motion_deblur_dl. 

Future work should address improving the accuracy of the proposed methods.  

 
References 

 
[1] Wikipedia, (2008) Wiener Deconvolution. [Online]. Available: 
           https://en.wikipedia.org/wiki/Wiener_deconvolution 

[2] W. Richardson, “Bayesian-based iterative method of image restoration”, Journal of the 
Optical Society of America, vol. 62, no. 1, pp. 55-59, 1972. 

[3] L. Lucy, “An iterative technique for the rectification of observed distributions”, The 

Astronomical Journal, vol. 79, no. 6, pp. 745-754, 1974. 

[4] D. Krishnan and R. Fergus, “Fast image deconvolution using hyperlaplacian priors”, 
Proceedings of the 23rd International Conference on Neural Information Processing 

Systems, Vancouver, Canada, pp. 1033–1041, 2009. 

[5] L. Rudin, S. Osher and E. Fatemi, “Nonlinear total variation based noise removal 
algorithms”, Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992. 

[6] A. Levin, Y. Weiss, F. Durand and W. Freeman, “Understanding blind deconvolution 
algorithms”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, 

no. 12, pp. 2354–2367, 2011. 

[7] J. Pan, Z. Hu, Z. Su and M. Yang, “L0 -regularized intensity and gradient prior for 
deblurring text images and beyond”, IEEE Transactions on Pattern Analysis and 

Machine Intelligence, vol. 39, no. 2, pp. 342-355, 2017. 

[8] J. Pan, D. Sun, H. Pfister and M. Yang, “Blind image deblurring using dark channel 
prior”, Proceedings of the IEEE Conference on Computer Vision and Pattern 

Recognition (CVPR), Las Vegas, USA, pp. 1628-1636, 2016. 

[9] Y. Bai, G. Cheung, X. Liu and W. Gao, “Graph-Based Blind Image Deblurring From a 
Single Photograph”, IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1404-

1418, 2019. 

[10] S. Cho and S. Lee, “Fast motion deblurring”, ACM Transactions on Graphics, vol. 28, 
no. 5, article 145, pp. 1-8, 2009. 

[11] S. Zheng, L. Xu and J. Jia, “Forward motion deblurring”, Proceedings of the IEEE 

International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 1465-

1472, 2013. 

[12] Wikipedia, (2016) The maximum-a-posteriori estimation. [Online]. Available: 

             https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation 

[13] L. Xu, J. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image 
deconvolution”, Proceedings of the 27th International Conference on Neural 

Information Processing Systems, Montreal, Canada, pp. 1790–1798, 2014. 

https://github.com/Mekhak/motion_deblur_dl
https://en.wikipedia.org/wiki/Wiener_deconvolution
https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation


M. Shoyan, R. Hakobyan and M. Shoyan 51 

[14] J. Zhang, J. Pan, J. Ren, et al., “Dynamic scene deblurring using spatially variant 

recurrent neural networks”, Proceedings of IEEE/CVF Conference on Computer Vision 

and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 2521-2529, 2018. 

[15] T. Nimisha, V. Rengarajan and R. Ambasamudram, “Semi-Supervised Learning of 

Camera Motion from a Blurred Image”, Proceedings of the 25th IEEE International 

Conference on Image Processing (ICIP), Athens, Greece, pp. 803-807, 2018. 

[16] J. Sun, W. Cao, Z. Xu and J. Ponce, “Learning a convolutional neural network for non-

uniform motion blur removal”, Proceedings of the IEEE Conference on Computer 

Vision and Pattern Recognition (CVPR), Boston, USA, pp. 769-777, 2015. 

[17] S. Nah, T. Kim and K. Lee, “Deep Multi-scale Convolutional Neural Network for 

Dynamic Scene Deblurring”, Proceedings of the IEEE Conference on Computer Vision 

and Pattern Recognition (CVPR), Honolulu, USA, pp. 257-265, 2017. 

[18] S. Nah, (2017) The GOPRO dataset. [Online]. Available: 

https://seungjunnah.github.io/Datasets/gopro 

[19] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition”, 
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 

(CVPR), Las Vegas, USA, pp. 770-778, 2016. 

[20] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training 
by Reducing Internal Covariate Shift”, Proceedings of the 32nd International 

Conference on Machine Learning, Lille, France, pp. 448-456, 2015. 

[21] O. Ronneberger, P. Fischer and T. Brox, “ U-Net: Convolutional Networks for 
Biomedical Image Segmentation”, Proceedings of the International Conference on 

Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, 

Germany, pp. 234-241, 2015. 

[22] J. Johnson, A. Alahi, and L. Fei, “Perceptual losses for real-time style transfer and super-

resolution”, Proceedings of the European Conference on Computer Vision (ECCV), 

Amsterdam, The Netherlands, pp. 694-711, 2016. 

[23] J. Johnson, (2016) Perceptual Losses for Real-Time Style Transfer and Super-

Resolution: Supplementary Material. Link for Fig. 3 a -b. [Online]. Available:  

https://cs.stanford.edu/people/jcjohns/papers/fast-style/fast-style-supp.pdf 

[24] Wikipedia, (2019) The mean squared error. [Online]. Available: 

https://en.wikipedia.org/wiki/Mean_squared_error 

[25] Wikipedia, (2017) The mean absolute error. [Online]. Available: 
https://en.wikipedia.org/wiki/Mean_absolute_error 

[26] Wikipedia, (2013) The peak signal-to-noise ratio. [Online]. Available: 
https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio 

[27] D. Kingma and J. Ba, (2017) arXiv paper page - Adam: A Method for Stochastic 

Optimization. [Online]. Available: https://arxiv.org/abs/1412.6980v5 

[28] Wikipedia, (2020) The stochastic gradient descent. [Online]. Available: 
https://en.wikipedia.org/wiki/Stochastic_gradient_descent 

 
Submitted 18.12.2020, accepted 22.03.2021 

 
https://seungjunnah.github.io/Datasets/gopro
https://cs.stanford.edu/people/jcjohns/papers/fast-style/fast-style-supp.pdf
https://en.wikipedia.org/wiki/Mean_squared_error
https://en.wikipedia.org/wiki/Mean_absolute_error
https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
https://arxiv.org/abs/1412.6980v5
https://en.wikipedia.org/wiki/Stochastic_gradient_descent


Application of DL-Based Methods to the Single Image Non-Uniform Blind Motion Deblurring Problem 52 

Խորը ուսուցման վրա հիմնված մեթոդների կիրառումը 

պատկերում շարժման հետևանքով առաջացած ոչ միատարր 

պղտորման հեռացման կույր խնդրում 

Միսակ Տ․ Սհոյան1, Ռոբերտ Գ․ Հակոբյան1 և Մեխակ Տ․ Սհոյան2 

1 Հայաստանի ազգային պոլիտեխնիկական համալսարան 

2 Երևանի պետական համալսարան 

e-mail: misakshoyan@gmail.com, rob.hakobyan@gmail.com, mexakshoyan@gmail.com 

 
Ամփոփում 

Հոդվածում ներկայացվում են խորը ուսուցման վրա հիմնված պատկերից 

պղտորման հեռացման կույր մեթոդներ՝ պատկերում շարժման հետևանքով 

առաջացած ոչ միատարր պղտորման գնահատման և հեռացման համար։ Խնդրի 

լուծման համար առաջարկվում են երկու լրիվ փաթույթային նեյրոնային ցանցեր 

(CNN)։ Տրված պղտորված պատկերից սկզբնական սուր պատկերը վերականգնելու 

համար ներկայացված ցանցերը ուսուցանվում են ամբողջապես՝ առանց գնահատելու 

և որևէ ենթադրություններ անելու պղտորման միջուկի, նրա միատարրության և առկա 

աղմուկի վերաբերյալ։ Ցուցադրվում է առաջարկվող մոդելների 

արտադրողականությունը և ցույց է տրվում, որ առաջարկվող մոտեցումները կարող են 

արդյունավետորեն գնահատել և հեռացնել շարժման հետևանքով պատկերում 

առաջացած բարդ, ոչ միատարր պղտորումը։ 

Բանալի բառեր՝ Շարժման հետևանքով առաջացած պղտորում, շարժման 
հետևանքով առաջացած պղտորման հեռացում, ոչ միատարր պղտորում, պղտորման 

միջուկ։ 

 
mailto:misakshoyan@gmail.com
mailto:rob.hakobyan@gmail.com
mailto:mexakshoyan@gmail.com


M. Shoyan, R. Hakobyan and M. Shoyan 53 

Применение методов глубокого обучения в задаче слепого 

устранения размытости вслед за движением из одного 

неоднородно размытого изображения 

Мисак Т. Сгоян1, Роберт Г. Акопян1 и Мехак Т. Сгоян2 

1 Национальный политехнический университет Армении 

2 Ереванский государственный университет 

e-mail: misakshoyan@gmail.com, rob.hakobyan@gmail.com, mexakshoyan@gmail.com 

 
Аннотация 

В этой статье представляются слепые методы устранения размытости изображения 

основанные на глубоком обучении – для оценки и удаления неоднородного размытия вслед 

за движением из одного размытого изображения. Для решения задачи предлагаются две 

полностью сверточные нейронные сети (CNN). Сети, предназначенные для восстановления 

исходного резкого изображения из размытого изображения, обучаются полностью – без 

оценки и каких-либо предположений о кернеле размытия, его однoродности и 

присутствующего шума. Демонстрируется производительность предложенных моделей и 

показано, что предложенные методы могут эффективно оценивать и устранить сложное 

неоднородное размытие вслед за движением из одного размытого изображения.  

Ключевые слова: Размытие из за движения, слепое устранение размытости вслед 

за движением, неоднородное размытие, кернел размытия. 

 
mailto:misakshoyan@gmail.com
mailto:rob.hakobyan@gmail.com
mailto:mexakshoyan@gmail.com