8214 FACTA UNIVERSITATIS Series: Electronics and Energetics Vol. 35, No 2, June 2022, pp. 243-252 https://doi.org/10.2298/FUEE2202243D © 2022 by University of Niš, Serbia | Creative Commons License: CC BY-NC-ND Original scientific paper AREA AND POWER-EFFICIENT RECONFIGURABLE DIGITAL DOWN CONVERTER ON FPGA Debarshi Datta1, Himadri Sekhar Dutta2 1Electronics & Communication Engineering Department, MAKAUT Kolkata, West Bengal, India 2Electronics & Communication Engineering Department, Kalyani Government Engineering College, Nadia, West Bengal, India Abstract. This paper presents a field-programmable gate array (FPGA)-based digital down converter (DDC) that can reduce the bandwidth from about 70 MHz to 182.292 kHz. The proposed DDC consists of a polyphase COordinate Rotation DIgital Computer (CORDIC) processor and a multirate filter. The advantage of polyphase CORDIC processor is to process with high sample rate input data and produces computational efficient noiseless baseband spectrum. The pipeline multirate filter works at a high clock speed. Moreover, the multirate filter generates a fractional sample rate factor using a cubic B-spline Farrow filter. The proposed DDC is coded with optimal hardware description language (HDL) and tested on Kintex-7 Xilinx FPGA as the target device. Experimental results indicate that the proposed design saves chip area, power consumption and operates at high speed without loss of any functionality. Additionally, the proposed design offers sufficient spurious-free dynamic range (SFDR) and produces less than 1 Hz frequency resolution at the output. Key words: Digital down converter (DDC), COordinate Rotation DIgital Computer (CORDIC), Half-band (HB) filter, Field programmable gate array (FPGA), MATLAB 1. INTRODUCTION The demand for a high-performance digital down converter (DDC) is very much essential in modern communication [1]. The sample rate reduction process plays an important role in data communication systems for its various data rates. Hence, field- programmable gate array (FPGA)-based DDC architecture is very much essential due to its outstanding flexible architecture as compared to application-specific integrated circuits (ASIC) [2]. Furthermore, the implementation of DDC on FPGA performs superbly in frequency response and phase characteristics with a high precision output. Received September 3, 2021; received in revised form December 23, 2021 Corresponding author: Debarshi Datta Electronics & Communication Engineering Department, MAKAUT Kolkata, West Bengal, India E-mail: debarshidatta7@gmail.com 244 D. DATTA, H. S. DUTTA In the last decade, several researchers have reported hardware-efficient different DDC architectures on FPGA devices. Recently, the authors in L. L. Motta et al. [3] have proposed a digital up-down converter using polyphase cascaded integrated comb (CIC) filters. The simulation results show the functional verification of the filters, and the design has achieved a high performance using fixed-point filter coefficients. Again, L. Guo et al. [4] have suggested parallel DDC architecture using numerical control oscillator (NCO). The NCO was decomposed into several sinusoidal sequences. These sequences are multiplied by the input signals to produce complex waveforms. The design was verified by MATLAB and tested on the FPGA board. Similarly, the authors in X. Liu et al. [5] proposed a reconfigurable DDC architecture that performed a down-converted signal about 3.6 GHz to the output range of 1 kS/s-225 MS/s. The design was implemented on the Xilinx Kintex- 7 device and measured the synthesized results in terms of resources and power consumption. Furthermore, the authors in B. Tietche et al. [6] described FPGA-based resampling circuits for software-defined radio applications. The implementation schemes controlled the spurious-free dynamic range (SFDR). Again, the authors in V. Obradović et al. [7] discussed a flexible DDC architecture for wideband direction finder. The DDC was tested on Xilinx Kintex-7 using Xilinx IP cores to implement the filters chain. Similarly, authors in J. Thabet et al. [8] presented a reconfigurable DDC design implemented on Virtex-7 FPGA board to obtain high speed, low power consumption. The design reduced the complexity for applicability in multi-standard GNSS receivers. Furthermore, the authors in A. Agarwal et al. [9] suggested COordinate Rotation DIgital Computer (CORDIC)-based DDC on Xilinx Virtex-6 FPGA for multi-standard radio communications and achieved a maximum operating speed of 240 MHz. However, all the existing designs have some drawbacks in hardware implementation. They consume a large area and power in the FPGA platform. Thereby a cost-efficient reconfigurable DDC architecture is very much attractive in a communication system. Therefore, hardware efficient flexible DDC architecture is required that can meet all the practical applications. The proposed design uses a polyphase and pipelined architecture to improve the operating speed. Again, the truncation process in each unit reduces the area requirements. Finally, the proposed design is tested on the Xilinx Kintex-7 FPGA board. The implementation results indicate that the proposed DDC optimizes the hardware resources and power as compared to existing architectures without losing any significant information. The organization of this paper is as follows: Section 2 describes the proposed architecture and its components. Results are discussed in Section 3. Section 4 concludes the paper. 2. PROPOSED ARCHITECTURE The proposed DDC consists of a polyphase CORDIC processor and multirate filter, as shown in Fig. 1. The polyphase CORDIC processor works a high data rate input signal which is beyond 1 GHz. The multirate filter such as multi-stage CIC, half-band (HB), and cubic B-spline Farrow filters are connected in cascade to achieve a high decimation factor and to the produce correct baseband spectrum. Area and Power-Efficient Reconfigurable Digital Down Converter on FPGA 245 Fig. 1 Proposed DDC architecture The total sample rate (R) factor is calculated as R = fout / fs = R1 x R2 x 2 x R3 (1) Where fs and fout is the input and output sampling rate, respectively. R1 is the decimation factor of the polyphase CORDIC processor, R2 is the decimation factor of the multi-stage CIC filter, R3 is the decimation factor of the cubic B-spline Farrow filter. The sample rate factors can be changed dynamically in real-time to match any practical application. Hence, the design offers maximum flexibility. The frequency resolution at the output is fs/2 32 (= 0.8381Hz for fs = 3.6 GHz). The following sub-modules describe each component of the proposed design. 2.1. Polyphase CORDIC Processor The polyphase CORDIC processor can satisfactorily work with a high sample rate signal which is the output from an analog-to-digital (ADC) converter (typically, ADC12D1800). The proposed polyphase CORDIC processor is shown in Fig. 2. The polyphase component operates at a speed of fs/R1, resulting in the polyphase CORDIC processor being more feasible in the FPGA platform [10]. To achieve correct output, the relation between fs and R1 is expressed as R1 ≤ fs/W (2) Where W is bandwidth of the input signal. Fig. 2 Polyphase CORDIC processor From the polyphase algorithm, the signal gi(n) can be represented as gi(n) = x(nR1 + i) (3) Where i = 0, 1, ………… (R1-1), and x(n) is the input sequence. 246 D. DATTA, H. S. DUTTA Hence, the in-phase (yI(n)) and quadrature (yQ(n)) parts of the polyphase CORDIC processor are expressed as [11] yI(n) = ∑ [gi(n)ICi(n)] R1−1 i=0 = ∑ [xi(nR1 + i) cos [2π(nR1 + i)f0/fs] R1−1 i=0 (4) and yQ(n) = ∑ [xi(n)QCi(n)] R1−1 i=0 (5) = ∑ [xi(nR1 + i) sin [2π(nR1 + i)f0/fs] R1−1 i=0 respectively Where fo is the central frequency. To eliminate unwanted frequency components and further reduce the sample rate to ensure a correct output signal, both yI(n) and yQ(n) signals are passed through multirate decimation filters. 2.2. CIC Filter The CIC filter performs low-pass filtering to remove the multiple copies of images and produces a very narrow passband for the DDC system [12]. CIC is a high efficient decimation filter that is placed just after the polyphase CORDIC processor. A multi-stage CIC filter is typically used to reduce the sidelobe producing maximum main lobe gain [13]. This work allows a pipeline 4-stage CIC decimation filter, shown in Fig. 3. The additional register in the integrator and comb section reduces critical path delay. Fig. 3 4-stage truncated pipeline-based CIC filter The filter gain is calculated as [14] G = (𝑅2𝐷) 𝑁 = 65536 (For 𝑅2 = 8, stage N = 4, and comb delay D = 2) = 48.16 dB (6) Area and Power-Efficient Reconfigurable Digital Down Converter on FPGA 247 The full resolution data width at the output stage is 𝐵𝑜𝑢𝑡 = [𝐵𝑖𝑛 + N𝑙𝑜𝑔2(𝑅2𝐷)] = 36 bits [𝐵𝑖𝑛 = 20] (7) Fig. 4 depicts the magnitude response of the CIC filter. Fig. 4 Magnitude response of the CIC filter for R2 = 8, D = 2, N = 4 Generally, integrator works at a high sample rate with a large data width. Hence, the truncation process is necessary to reduce the word length without losing desired information. It is noted that the five least significant bits (LSBs) are truncated from the first integrator's 36-bit. Hence, the second integrator works only 31-bit. Using the same procedure, the third and fourth integrators are work only with 26-bit and 21-bit, respectively. As a consequence, the truncation process reduces the output data width to 16-bit. Usually, the Matlab tool provides the data length in each stage. The passband frequency (𝜔𝑝) is π N⁄ 𝑅2. 2.3. HB Filter It is important to note that the CIC filter does not provide a flat response and its non- flatness must be compensated in other processing stages. After the CIC filter, the HB filter is used to attain the correct passband droop [15]. The HB filter has symmetric property at cut-off frequency π/2. Fig. 5 shows a 31-tap symmetric HB filter with decimation factor 2. Fig. 5 31-tap transpose symmetric HB filter 248 D. DATTA, H. S. DUTTA The pass-band frequency (ωp) is 0.45π, and stop-band frequency (ωs) is 0.55π. The transpose symmetric HB architecture reduces the multiplication units [16]. Hence, the computational workload reduces significantly. For this work, the HB filter coefficients are 16-bit fixed points and generated using the “firhalfband” Matlab function [17]. 2.4. Cubic B-spline Farrow structure Finally, a cubic B-spline Farrow structure is used to produce the fractional sampling output with 3/2 times the input signal. This type of implementation provides a better reconstruction of the signal as compared with conventional Lagrange interpolation [18], [19]. The calculation of the cubic B-spline Farrow structure is described below. The Nth degree B-spline at time domain is expressed as [14] βN(t) = 1 𝑁! ∑ (−1)𝑘𝑁+1𝑘=0 ( 𝑁 + 1 𝑘 ) (𝑡 − 𝑘 + N + 1 2 )𝑁 (8) Where βN represents as N-th B-spline. Consider, N = 3, or cubic spline type, then the polynomial becomes β3(t) = 1 6 ∑ (−1)𝑘4𝑘=0 ( 4 𝑘 ) (𝑡 − 𝑘 + 2)3 (9) = 1 6 (t + 2)3 - 2 3 (t + 1)3 + t3 - 2 3 (t − 1)3 + 1 6 (t − 2)3 (10) The reconstruction spline is the summation of weighted B-spline sequences and expressed as y(t) = ∑ 𝑥(𝑘)𝛽3(𝑡 − 𝑘)𝑘 (11) Consider, the samples are taken at time t = -1, 0, 1, 2, and from Eq. (11), the four parts B-splines are calculated as y(d) = x(n + 2) β3 (d - 2) + x(n + 1) β3 (d - 1) + x(n) β3 (d) + x(n – 1) β3 (d + 1) = x(n + 2) 𝑑3 6 + x(n + 1) [ 1 6 (d + 1)3 - 2 3 d3 ] + x(n) [ d3 - 2 3 (d + 1)3 + 1 6 (d + 2)3] + x(n - 1) [ - 1 6 (d - 1)3] = x(n + 2) 𝑑3 6 + x(n + 1) [- 𝑑3 2 + 𝑑2 2 + 𝑑 2 + 1 6 ] + x(n) [ 𝑑3 2 – d2 + 2 3 ] + x(n - 1) [- 𝑑3 2 + 𝑑2 2 - 𝑑 2 + 1 6 ] (12) For realizing the above equations in Farrow structure, the factors of fractional delay dk are generated by the following four equations: d0 : 0 + x(n + 1)/6 + 2x(n)/3 + x(n - 1)/6 = C0 d1 : 0 + x(n + 1)/2 + 0 - x(n - 1)/2 = C1 d2 : 0 + x(n + 1)/2 - x(n) + x(n - 1)/2 = C2 d3 : x(n + 2)/6 - x(n + 1)/2 + x(n)/2 - x(n - 1)/6 = C3 (13) Where C0, C1, C2, and C3 are represented as spline matrix coefficients and d lies between 1 and 0. The above coefficients in Eq. (13) are transformed into z-domain to realize the transfer functions of the Farrow filter architecture, as shown in Fig. 6. Farrow filters are the most suitable architecture for fractional sample rate converter due to its one programmable fractional delay component without changing filter coefficients [19]. Area and Power-Efficient Reconfigurable Digital Down Converter on FPGA 249 Fig. 6 Cubic B-spline Farrow structure [20] 3. RESULT ANALYSIS The following sub-sections describe the result analysis in detail. 3.1. Design Specifications The proposed system performs for mobile communication specifications. All floating-point data are converted to fix-point data to achieve stopband specifications. The specifications of the proposed DDC are summarized as follows: i. Input signal bandwidth: 70 MHz. ii. Output signal bandwidth: 182.292 kHz iii. Decimation factor: 384 (R1=4, R2=2 5, HB =2, R3=3/2) iv. Input data width: 16-bit v. Output data width: 20-bit vi. Passband ripple ≤ 0.1 dB vii. Stopband attenuation ≥ 80 dB 3.2. Data Truncation The truncation is applied in each signal path to protect overflow error. Each polyphase branch can be represented as an FIR filter. The multiplication-accumulation is described as follows. An M-bit binary word signifies in signed 2’s complement fixed-point rational format and can take value from subset S as [21] S = {s/2y1| - 2 M-1 ≤ s ≤ 2M-1 -1, s∈ Z} (14) Which is represented as P (x1, y1), where x1 = M - 𝑦1 – 1 and y1 fractional bits. Using fixed-point arithmetic, the multiplication is calculated as P (x1, y1) x P (x2, y2) = P (x2+ x2+ 1, y1+ y2) or P (x3, y3) (15) Consider, the multiplication and accumulation are denoted by P (x3, y3) and P (x4, y4), respectively, so that P (x4, y4) = A (x3 + floor [log2(R-1)], y3) [Where R = R1 + 1] (16) For example, the input data is P (8, 7), and the coefficient data is A (3, 12). Hence, the multiplication and accumulation data are P (12, 19) and P (14, 19) respectively [for R1 = 4]. According to the word length reduction, the output data is P (14, 19–14) or A (14, 5) or data word length (14 + 5 + 1) 20-bit which are the input of the CIC filter. The output word lengths of the CIC filter are 16-bit [described in section 2.3]. Again, the Farrow FIR output word length is 20-bit. 250 D. DATTA, H. S. DUTTA 3.3. FPGA Implementation The proposed DDC design is simulated in Xilinx Vivado 2017.4 tool and implemented on Kintex-7 XC7K70T-FBG676 with 16-bit input precision to meet the desired specifications. The design is coded using Verilog hardware description language (HDL). Additionally, the code optimization technique reduces the logical resources and power [22], [23]. The compilation report contains Slices, LUTs, IOB blocks, maximum frequency, and power consumption. Table 1 indicates the synthesized list of each component of the proposed design. Table 1 Resource utilization of each component of the proposed DDC architecture Synthesis parameters Polyphaser CORDIC processor CIC filter (R2 = 25) HB filter (31-tap) (2) Cubic B-spline Farrow filter (R3 = 3/2) R1=4 R1=8 R1=16 Slice Registers 1758 3650 8521 1290 1948 3182 6-input LUTs 832 1975 3932 556 878 2185 IOBs 62 62 62 65 80 86 BRAMs 2 4 8 0 0 8 DSP48Es 4 8 16 0 0 36 3.4. Validation For the purpose of verification, ChipScope outputs are sent back in the Matlab R2015a tool. Fig. 7 shows SFDR of 88 dB, which can be generated using 1024 samples with unity signal amplitude. Fig. 7 Power spectrum of proposed DDC 3.5. Comparison Table 2 shows a comparison report of the proposed DDC design with the existing designs. The proposed design uses data truncation to reduce the resources. This area reduction leads to power optimization. Moreover, the pipeline version of this proposed Area and Power-Efficient Reconfigurable Digital Down Converter on FPGA 251 architecture enhances the operating speed. The area and power are reduced by 39.65% and 32.92%, respectively. The polyphase CORDIC processor improves the SFDR, which is 88 dB. Results analysis suggested that the proposed DDC is an energy-efficient architecture that is widely used in real-time signal processing applications. Table 2 Comparison report of existing architectures and proposed solution Synthesis parameters Vuk et al. [7] (Kintex-7) fs = 120 MHz R = 6 Liu et al. [5] (Kintex-7) fs = 3.6 GHz R = 20 Proposed solution Slices 37066 13552 8178 LUTs 69499 7269 4451 BRAMs Not available 22 10 DSP48Es 1034 83 40 Fmax (MHz) Not available 454.5 512 Power (W) Not available 1.446 0.970 SFDR (dB) Not available 83.3 88 4. CONCLUSION This paper briefs an FPGA-based flexible DDC architecture so that it can match any digital radio specifications. The proposed design uses a polyphase and pipelined structure which can save the area and improve the operating speed. The multirate filter performs sample rate reduction and channel filtering with enhanced sensitivity and selectivity. These new design techniques increase the operating speed. Furthermore, the truncation and optimum coding style are used to improve area efficiency and power reduction. Additionally, the proposed design has achieved an SFDR of 88 dB. Thus, the presented DDC design has been enhanced in real-time applications. Acknowledgement: The authors are expressed their sincere gratitude to MAKAUT for providing the valuable Xilinx tools and FPGA board. REFERENCES [1] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Third Edition. Prentice Hall, 2010. [2] W. Wolf, FPGA-Based System Design. Englewood Cliffs, NJ: Prentice- Hall, 2004. [3] L. L. Motta, B. A. Acurio, N. F. T. Aniceto and Luís Geraldo P. Meloni, "Design and implementation of a digital down/up conversion directly from/ to RF channels in HDL", Integration, vol. 68, pp. 30–37, Sept. 2019. [4] L. Guo, F. Tan, P. Zhan and H. Zeng, "Decomposing numerically controlled oscillator in parallel digital down conversion architecture", J. Circuits, Syst. Comput., vol. 26, no. 9, p. 1750126, Feb. 2017. [5] X. Liu, X. Yan, Z. Wang, and Q. Deng, "Design and FPGA implementation of a reconfigurable digital down converter for wideband applications", IEEE Trans. on VLSI systems, vol. 25, no. 12, Dec. 2017. [6] B. H. Tietche, O. Romain, and B. Denby, "A Practical FPGA-Based Architecture for Arbitrary-Ratio Sample Rate Conversion", J. Sign. Process. Syst., vol. 78, pp. 147–154, Feb. 2015. [7] V. Obradović, P. Okiljević, N. Kozić and D. Ivković, "Practical implementation of digital down conversion for wideband direction finder on FPGA", Sci. Tech. Rev., vol. 66, no. 4, pp. 40–46, Jan. 2016. 252 D. DATTA, H. S. DUTTA [8] J. Thabet, R. Barrak, N. Kamoun, N. Khouja and A. Ghazel, "A reconfigurable Digital Down Converter architecture for multistandard GNSS receiver", In Proceedings of the 14th International Symposium on Communications and Information Technologies (ISCIT), Incheon, 2014, pp. 404–408. [9] A. Agarwal, L. Boppana and K. R. Kodali, "A factorization method for FPGA implementation of Sample Rate Converter for a multi-standard radio communications", In Proceedings of the 2013 Tencon - Spring, Sydney, NSW, 2013, pp. 530–534. [10] D. Datta, P. Mitra and H. S. Dutta, "FPGA implementation of high performance digital down converter for software defined radio", Microsyst. Technol., vol. 28, pp. 533–542, Aug. 2019. [11] J. E. Volder, "The CORDIC trigonometric computing technique", IRE Trans. Electron. Comput., vol. EC–8, pp. 330–334, Sept. 1959. [12] E. B. Hogenauer, "An economical class of digital filters for decimation and interpolation", IEEE Trans. Acoustic Speech, Signal Process, vol. ASSP-29, no. 2, pp.155–162, April 1981. [13] Q. Jing, Y. Li, and J. Tong, "Performance analysis of multi-rate signal processing digital filters on FPGA", EURASIP J. Wirel. Commun. Netw., p. 31, Feb. 2019. https://doi.org/10.1186/s13638019-1349-9. [14] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays, Springer, Third Edition, 2007. [15] P. P. Vaidyanathan and T. Q. Nguyen, "A “TRICK” for the Design of FIR Half-Band Filters", IEEE Trans. Circuits Syst., vol. CAS–34, no. 3, Mar. 1987. [16] A. N. Willson, "Desensitized Half-Band Filters", IEEE Trans. Circuits Syst.–I: Regul. Pap., vol. 57, no. 1, pp. 152-167, Jan. 2010. [17] MathWorks HDL Coder, https://www.mathworks.com/products/hdl-coder.html. Accessed 14 Aug. 2019. [18] R. Ratan, S. Sharma and A. K. Kohli, "Cubic Lagrange polynomial-based designing of efficient interpolators", Int. J. Electron. Lett., vol. 2, no. 1, pp. 8–16, Nov. 2013. [19] C. Farrow, "A continuously variable digital delay element", In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS88), 1998, pp. 2642–2645. [20] D. Datta, P. Mitra and H. S. Dutta, "Implementation of Fractional Sample Rate Digital Down Converter for Radio Receiver Applications", In Proceedings of the Devices for Integrated Circuit (DevIC), Kalyani, 2021, pp. 94–98. http://dx.doi.org/10.1109/DevIC50843.2021.9455805. [21] R. Yates, "Fixed-Point Arithmetic: An Introduction" 2007. Available at: https://courses.cs.washington. edu/courses/cse467/08au/labs/l5/fp.pdf. [22] S. Navid Shahrouzi and Darshika G. Perera, "HDL Code Optimizations: Impact on Hardware Implementations and CAD Tools", In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), Canada, 2019, pp. 1–9. [23] Z. Zulfikar, "Novel area optimization in FPGA implementation using efficient vhdl code", Jurnal Rekayasa Elektrika, vol. 10, no. 2, pp. 61–66, Oct. 2012.