Microsoft Word - ETASR_V12_N1_pp8007-8012

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8007

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

Hardware Acceleration of Video Edge Detection with

Hight Level Synthesis on the Xilinx Zynq Platform

Taoufik Saidani

Department of Computer Science
Faculty of Computing and Information Technology

Northern Border University

Rafha, Saudi Arabia and
Laboratory of Electronics and Microelectronics

Faculty of Sciences of Monastir

University of Monastir

Monastir, Tunisia
taoufik.saidan@nbu.edu.sa

Refka Ghodhbani

Department of Computer Science
Faculty of Computing and Information Technology

Northern Border University

Rafha, Saudi Arabia and
Laboratory of Electronics and Microelectronics

Faculty of Sciences of Monastir

University of Monastir

Monastir, Tunisia

Refka.Ghodhbani@nbu.edu.sa

Abstract-The study conducted in the current paper consists of

validating an original design flow for the rapid prototyping of

real-time image and video processing applications on FPGAs. A
video application for edge detection with Simulink HDL coder

and Vivado High-Level Synthesis (HLS) has been designed as if

the code was going to be executed on a conventional processor.

The developed tools will automatically translate the code into

VHDL hardware language using an advanced compilation

technique. This amounts to embedding processors on Xilinx

Zynq-7000 System on-Chip (SoC) device in an optimal manner.
This automated hardware design flow reduces the time to create

a prototype since only the high-level description is required. The

design of the video edge detection system is implemented on

Xilinx Zynq-7000 platform. The result of the implementation

gave effective resource utilization and a good frame rate (95 FPS)
under 170MHz frequency.

Keywords-high-level synthesis; automated hardware design; co-

design; Xilinx Zynq-7000

I. INTRODUCTION

Over the past ten years, several architectures combining
reconfigurable processors and/or circuits (Field Programmable
Gate Arrays-FPGAs) have been proposed for the acceleration
of the execution of increasingly complex applications [1].
Dedicated signal and image processing systems currently use
either processors with general or dedicated use, wired solutions
configured for specific circuits of the ASIC type, or a
combination of these two. However, the conjugation of the
increasing algorithm complexity and the large volume of data
to process, usually with strong real-time constraints, requires
performance that processors cannot provide [2]. Parallel
processing can provide speed and programming flexibility, but
at the expense of cost and complexity of implementation. In
addition, the use of ASICs which offer more processing speed
suffers from rigidity and expensive development. Indeed,
adding a new functionality to a wired system will surely require
a redesign of one or more ASICs and will increase cost [3]. The
reconfigurable architectures represent an appropriate response,

they offer better performance than programmable architectures
and more flexibility than wired solutions. They use
reconfigurable logic components that allow the user to modify
the architecture after manufacturing it in a software way, unlike
the ASIC whose algorithms are wired in silicon [2, 3].

Video and image processing are now booming and playing
an increasingly important role in our everyday life [4, 6]. The
integration of embedded systems in the field of modern
wireless communications allows us to respond quickly to its
ever-increasing demands. The invention of the FPGA made
possible the concept of reconfigurable material hardware for
video and image processing systems. The traditional design of
video and image system architecture around the FPGA does not
allow high productivity [3, 4]. The latter can be improved by
using a new design technique based on the Model Based
Design (MBD) model [5]. Tools based on this technique are
intended for embedded systems, the development of signal
processing algorithms, the rapid integration of systems, and the
analysis of the behavior of complex digital systems for a wide
variety of cases. To resolve this constraint, many high level
synthesis tools have been developed to take advantage of this
technique to achieve FPGA rapid prototyping [3]. The FPGA
with its great integration capabilities and reconfiguration is a
key component in rapidly developing prototypes. In the
objective of encouraging the wide distribution of this type of
circuits, it is necessary to improve the development
environments to make them more accessible to non-electronics
experts[8, 9].

The current study consists in proposing and validating a
design flow for the rapid prototyping of real-time image
processing applications on FPGAs. It programs a model with
HDL coder library as if the code was going to be executed on a
Zynq 7000 [8, 14]. The developed tools will automatically
generate the code into hardware language (VHDL) using an
advanced compilation technique. This amounts to embedding
processors in the FPGA in an optimal way.

Corresponding author: Taoufik Saidani

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8008

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

II. ACCELERATION OF THE PROPOSED ARCHITECTURE

Figure 1 shows the functional diagram of the proposed
system architecture. The design is applicable to both video and
image processing since the input pixels are processed
successively. For embedded video processing, the process
requires mega pixels. It is built by a video block that can be
programmed by adapted algorithms and thus accelerate
complex and heavy (in terms of resources and execution time)
algorithms on the integrated blocks of the Xilinx SoC platform.
The output of the video and image systems is passed directly to
a video processing block and a display component [10]. The
various video and image processing accelerators in FPGA are
implemented in platform systems of the Xilinx Zynq 7000 [5,
8].

Fig. 1. The proposed global diagram for video and image system

architecture.

In the architecture shown in Figure 1, the Advanced
Extensible Interface (AXI) interconnects the processor that
transmits the input video to the video processing pipeline
system. The USB camera shares the interfaces and
communicates with the ARM processor. The frame data are
stored in the memory controller. Once the video IP core
finalizes the process on the frame, it is displayed via the HDMI
display. The pipeline is continued for the next iterations [14].

III. MBD MODEL BASED DESIGN FOR VIDEO AND IMAGE
PROCESSING

The MBD technique for FPGAs results from the need to
design complex DSP systems which require specific arithmetic
units such as the addition-compare-select unit for the Viterbi
decoder. These specific computing units require a finer level of
FPGA-based circuit optimization [11]. This level of
optimization is generally associated with traditional digital
design video and image processing system. Model-based
design is essentially a way of describing how a system will
interact with the analog world in real time. This FPGA
prototyping technique consists of converting the model of the
system in question from its mathematical formulation to an
executable specification [12]. The MBD technique provides a
common framework that involves different phases of the
development process. This reduces the Time to Market needed
to create a complete model. The associated design phases to
this technique allow the designer to locate and correct errors
before prototyping the system [13]. There are many tools
available for designing FPGA systems using the MBD
technique. Most of these tools take advantage of the
standardized Unified Modeling Language (UML). These tools
differ in the way they describe a system and define its
characteristics. Some implementation techniques used by these

tools may be less effective than others. However, they
guarantee rapid prototyping of the system ensuring time
efficiency. The choice of a tool depends on many factors such
as the level of flexibility, the availability of pre-built
libraries/blocks, and the overall understanding of these blocks.
Some UML-based tools are the Arti-Real time Studio, the I-
logix's Rhapsody, MATLAB and Simulink Realtime workshop
[9]. These tools are used for the design of an embedded
multiprocessor environment.

The tools that allow the generation of HDL code for the
FPGA can be classified into two categories, the block-based
tools and those based on the C language of blocks that generate
HDL code from the block diagram, which is then used by the
synthesis tool hardware to implement the system design on
FPGA. Most of these tools, such as Synplify DSP, Xilinx
System Generator, DSP Builder Altera , and Simulink HDL
Coder [5], are based on Simulink and MATLAB environment.
These tools guarantee a high level modeling environment of
signal processing algorithms. Blocks from the Simulink library
are used with IP cores from FPGA suppliers to create HDL
code specific to the platform in question. Tools such as the
Simulink HDL Coder allow more flexibility to the designer, as
they integrate MATLAB functions and m-block files. With
these tools, the designer develops a Simulink model, then
translates it to the FPGA environment. The second category of
MBD tools uses C programming language to create an
abstraction for the design of systems. Among these tools are
the Mentor Graphics Catapult C and the Celoxica’s Handel- C.
The main motivation behind these tools is the Simulink HDL
coder, which is commonly used for the implementation of
embedded systems on FPGA [13].

Fig. 2. A complete solution for embedded vision.

Simulink HDL coder is an MBD tool that enables
modeling, analysis, and simulation of systems. It gives a well-
structured graphical environment for the designer that allows
him/her to create system designs of a high level of complexity.
In addition, this tool allows the user to create flexible custom
blocks from MATLAB functions. The Simulink HDL coder
allows the designer to create HDL code bit-precise and
synthesizable from the model developed using Simulink
blocks. The obtained HDL code can be synthesized and
mapped to the target FPGA board using tools such as Altera
Quartus II, Synplify, and Xilinx Vivado. The Simulink HDL
coder has many built libraries [15]. Some of these predefined
libraries include adders, multipliers, accumulators, integrators,
multi-port switches, lookup tables, etc.

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8009

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

IV. MODEL BASED MDB DESIGN FOR FPGA PROTOTYPING

A typical MBD design flow for implementing FPGA-based
video and image processing system is shown in Figure 3.

A. System Specifications

The design of any DSP block always begins with defining
the design specifications. This definition represents an abstract
way to describe the main function of the block. At the end of
this step, different parameters are obtained, such as the type of
input/output and the basic mathematical function of the block.
We can always determine whether the design of the block can
be reused or not [12].

B. Analysis of Design Needs

As mentioned above, the basic mathematical function of the
block is determined in the specification phase. The next step in
the design process is to identify the algorithm required for the
implementation practice. This algorithm is chosen based on
many factors such as complexity, duration of the calculation,
and the resources used. The algorithm is then subdivided into
functions [12].

Fig. 3. Model-based design for embedded vision.

C. Simulation and Modeling

In this step we convert the design specifications into a high-
level model using MBD tools such as the Simulink HDL Coder
and XSG. This step consists in representing the mathematic
equations in the form of models. Libraries and blocks provided
by the tool facilitate this representation. The resulting model is
then tested by observing the output for the various test input
signals. Despite the wide variety of features and blocks
provided by Simulink, only certain blocks can be converted
using its HDL conversion environment which means that the
other blocks must be replaced or built using the primitive
libraries of the HDL converter tools such as VHDL [12].

D. System Implementation

This is the most important step. It consists of an automatic
generation mechanism which is provided by the MBD tools.
The flow advisor tool provided by Simulink HDL Coder, helps
checking the model compatibility for code generation,
converting the floating point model to a fixed point model,

setting the clock, types of input/output data into the fixed point
model, and data scaling. When the HDL code is generated, it
can be synthesized using the Simulink HDL Coder tool. The
latter supports simulation tools. It also supports the synthesis
tools of Xilinx and Altera [14].

V. EXPERIMENTAL APPLICATIONS

A. Sobel Edge Detector

The Sobel operator allows to locally evaluate the norm of
the two-dimensional spatial gradient of a grayscale image. The
regions of strong local variations in intensity corresponding to
the contours are amplified. Along the Ox and Oy axes, the
Sobel operator approximates the directional derivatives using a
convolution of the image f(x,y) with 3×3 masks. We notice that
the mask of the Sobel operator corresponds in fact to the
application of a smoothing operation by the (1 0 1) operator
followed by the application of a derivation operation by the (1
0 -1) operator in the orthogonal direction. The 3×3 matrix is
convolved with the image to calculate the approximated
horizontal and vertical gradients Gx and Gy as follows:

��� � �1 0 �12 0 �21 0 �1
(1)

��y � � 1 2 10 0 0�1 �2 �1
(2)
The approximate absolute gradient magnitude and the

gradient angle of each pixel are shown by:

|��| �
�′�� � �′�� (3)
� � arctan ��� ����) (4)

B. Edge Detection Hardware Design Using Simulink

The proposed video edge detection hardware prototyping
uses the Sobel filter algorithm, and it has been designed with
the Simulink HDL coder toolbox, so this design can be inserted
as an FPGA IP core within any video processing pipeline
flexibly. This is followed by a detailed description of the
proposed hardware architecture. Figure 4 presents the top-level
module for video edge detection based on the Sobel filter. The
"video From File" block obtains the input video for the video
edge detection system from the directory. This input video is
converted to frames. The frames are serialized to pixels. Since
the pixel values are of double type, the "Convert" block is used.
Pixel-streaming processing is performed by HDL
implementations of image and video processing algorithms.
Therefore, a pixel stream is created with the Simulink
"serialize" blocks [12]. The inverse process at the output of
video processing system is performed by a "deserialize" block
to verify the output processed in image format. These two
blocks are depicted in Figures 6 and figure 7 respectively.

C. Synthesis and FPGA Implementation

Finally, in the third step (Figure 8), the Simulink HDL
Coder converts the video edge detection Software-Hardware
model into AXI4 streaming bus compliant IP core in the form
of HDL (VHDL) source code. To verify in real time the

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8010

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

functionality of this IP core in a practical environment, a
Hardware–Software co-design (HW-SW) has been
implemented directly on a Xilinx Zynq-7000 AP SoC
XC7Z020-CLG484 FPGA running at 170MHz. Simulink HDL
coder generates a Full Vivado project for the HW-SW. This

project can be implemented in the Xilinx Vivado tool (version
2019.1), with all the software and hardware peripherals. Also,
the Color transform IP core and the Sobel core are connected
across a single bus.

Fig. 4. Simulink HDL coder model for video edge detection.

Fig. 5. Simulink HDL coder model for video edge detection core.

Fig. 6. Frame pixel serialization.

Fig. 7. Frame pixel deserialization.

Fig. 8. HDL workflow for the video edge detection system.

The HLS tool allows the generation of stream interfaces
incorporating the hardware accelerator (IP). Once the high
level synthesis is completed, a compressed file (.zip) containing
all hardware components is generated. This file is exported to
the media Vivado packages to generate our SW/HW design. At
this level, the Vivado tool constitutes a complete environment
for the design of the finalized SW/HW architecture. The latter
allows the installation of an on-board processor connected to
one or several hardware accelerators through AXI
interconnection buses specific to the selected platform. For
example, in order to reset the IP core, one has to write 0x1 to
the bit 0 of the IPCore_Reset register. To enable or disable the
IP core, 0x1 or 0x0 must be written to the IPCore_Enable
register.

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8011

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

Fig. 9. Software-Hardware interface via AXI4-Lite.

To access the data ports of the MATLAB/Simulink
algorithm, read or write to the associated data registers. The

AXI4 Slave port to pipeline register ratio is selected as 35 in
task 3.2 for this model. The default delay to read the AXI4
register is one clock cycle. Depending on the selected ratio and
the IO connected to the AXI4 interface, register pipelining is
introduced in the read logic of the AXI4 registers. For this
model the AXI4 pipeline register ratio setting 35 is larger than
all the readable AXI4 slave registers. The total readable AXI4
slave registers are 1, so no pipelining is added to the AXI4
register read back logic. Figure 11 shows the design created
using the Xilinx's Vivado CAD tool.

Fig. 10. The AXI Zynq interface.

Fig. 11. HW/SW video edge detection architecture.

In Figure 11, the blue rectangle encircles the IP ZYNQ7
Processing System which represents the processor part of the
Xilinx Zynq-7000 SoC. This IP is used to make the system
configurations, such as configuring clocks and enabling the
propagation of security status from ARM cores (the security
state of the processor part) to the reconfigurable part. The rest
of the blocks constitute the elements that are implemented in
the reconfigurable part of the SoC. The black rectangle
encircles the AXI Interconnect which allows the IPs to be
connected to each other. The green rectangle surrounds the
edge detection IPs with Sobel filters. The red rectangle
surrounds the direct and reverse color transformation IP.

D. Resource Utilization

At the next step, we implemented our video processing
system cores on Zynq-7020 SOC. The resources used are
presented in Table I. Data values, instruction memories, and the
firmware are stored in BRAM. The consumed LUTs and DFFs
define the architecture of the processor including control
signals, internal registers, and microcode. The optimum
frequency for our proposed system core is approximately

170MHz and 95FPS for throughput. The complexity of the
implemented design affects the resources of the utilized
Zynq7000 SOC. The proposed design is performed for input
resolution of 1080-1920 frames. The reconfigurable SOC
platform using Vivado is approved by these results.

TABLE I. RESOURCE UTILIZATION AND MAXIMUM FREQUENCY
FOR THE VIDEO SOBEL EDGE DETECTION MODULE

Maximum frequency 170 Mhz

LUT-FF Pairs 3816.667 7.38%

LUTs as Logic 2727.667 5.12%

LUTs as Memory 648.6667 3.73%

Slice Registers 2465 2.32%

RAM 36/18 0 0

DSP48 0 0

VI. CONCLUSION

In this paper, a high level model-based hardware design
flow using MBD tools was presented. The proposed design
flow was put by a video edge detection prototyping into a Zynq
7000 SoC board. Each step of the adopted high level synthesis

Engineering, Technology & Applied Science Research Vol. 12, No. 1, 2022, 8007-8012 8012

www.etasr.com Saidani & Ghodhbani: Hardware Acceleration of Video Edge Detection with Hight Level Synthesis on …

prototyping design, from the top level description to the
hardware implementation, was described. The global design
process can reduce the time of processing by 65%. The
experimental results of the proposed video design based on
Sobel edge detection reimburse the hardware constraints
reducing the complexity for embedded video processing in
FPGAs.

REFERENCES

[1] H. M. Abdelgawad, M. Safar, and A. M. Wahba, "High Level Synthesis

of Canny Edge Detection Algorithm on Zynq Platform," International
Journal of Computer and Information Engineering, vol. 9, no. 1, pp.

148–152, Jan. 2015.

[2] T. T. Duong, J. H. Seo, T. D. Tran, B. J. Young, and J. W. Jeon,
"Evaluation of Embedded Systems for Automotive Image Processing,"

in 2018 19th IEEE/ACIS International Conference on Software
Engineering, Artificial Intelligence, Networking and

Parallel/Distributed Computing (SNPD), Busan, Korea (South), Jun.
2018, pp. 123–128, https://doi.org/10.1109/SNPD.2018.8441073.

[3] C. Li, Y. Bi, F. Marzani, and F. Yang, "Fast FPGA prototyping for real-

time image processing with very high-level synthesis," Journal of Real-
Time Image Processing, vol. 16, no. 5, pp. 1795–1812, Oct. 2019,

https://doi.org/10.1007/s11554-017-0688-1.

[4] M. B. Ayed, S. Elkosantini, and M. Abid, "An Automated Surveillance

System Based on Multi-Processor and GPU Architecture," Engineering,
Technology & Applied Science Research, vol. 7, no. 6, pp. 2319–2323,

Dec. 2017, https://doi.org/10.48084/etasr.1645.

[5] Arjona, R., Baturone, I., 2020. Using Simulink HDL Coder to
implement a Fingerprint Recognition Algorithm into an FPGA, in: 2020

XIV Technologies Applied to Electronics Teaching Conference (TAEE).
Presented at the 2020 XIV Technologies Applied to Electronics

Teaching Conference (TAEE), Porto, Portugal, https://doi.org/
10.1109/TAEE46915.2020.9163790.

[6] H. Mestiri, I. Barraj, and M. Machhout, "AES High-Level SystemC

Modeling using Aspect Oriented Programming Approach," Engineering,
Technology & Applied Science Research, vol. 11, no. 1, pp. 6719–6723,

Feb. 2021, https://doi.org/10.48084/etasr.3971.

[7] L. Zouari, S. Chtourou, M. B. Ayed, and S. A. Alshaya, "A Comparative
Study of Computer-Aided Engineering Techniques for Robot Arm

Applications," Engineering, Technology & Applied Science Research,
vol. 10, no. 6, pp. 6526–6532, Dec. 2020, https://doi.org/10.48084/

etasr.3885.

[8] T. Han, G. W. Liu, H. Cai, and B. Wang, "The face detection and
location system based on Zynq," in 2014 11th International Conference

on Fuzzy Systems and Knowledge Discovery (FSKD), Xiamen, China,
Aug. 2014, pp. 835–839, https://doi.org/10.1109/FSKD.2014.6980946.

[9] A. Alsheikhy and Y. F. Said, "Design of Embedded Vision System

based on FPGA-SoC," International Journal of Advanced Computer
Science and Applications, vol. 10, no. 10, 2019, https://doi.org/

10.14569/IJACSA.2019.0101013.

[10] J. Jiang, C. Liu, and S. Ling, "An FPGA implementation for real-time

edge detection," Journal of Real-Time Image Processing, vol. 15, no. 4,
pp. 787–797, Dec. 2018, https://doi.org/10.1007/s11554-015-0521-7.

[11] R. Ghodhbani, L. Horrigue, T. Saidani, and M. Atri, "Fast FPGA

Prototyping based Real-Time Image and Video Processing with High-
Level Synthesis," International Journal of Advanced Computer Science

and Applications, vol. 11, no. 2, 2020, https://doi.org/10.14569/IJACSA.
2020.0110215.

[12] "HDL Coder Evaluation Reference Guide," Mathworks.

https://nl.mathworks.com/matlabcentral/fileexchange/58941-hdl-coder-
evaluation-reference-guide (accessed Dec. 08, 2021).

[13] "Accelerate Design Space Exploration Using HDL Coder Optimizations

- Video," Mathworks. https://nl.mathworks.com/videos/accelerate-
design-space-exploration-using-hdl-coder-optimizations-81998.html

(accessed Dec. 08, 2021).

[14] L. H. Crockett, R. A. Elliot, M. A. Enderwitz, and R. W. Stewart, The
Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx

Zynq-7000 All Programmable Soc. Glasgow, UK: Strathclyde Academic
Media, 2014.

[15] Introduction to FPGA Design with Vivado High-Level Synthesis

(UG998). Xilinx, 2019.