AP07_4-5.vp 1 Introduction Optical coherence tomography (OCT) is a rather new im- aging technique based on broadband interferometry. Like ultrasonic imaging, OCT generates a cross-sectional image of the scanned tissue. The penetration depth of OCT ranges from about 1–2 mm in opaque tissue (e.g. skin) to 2 cm in transparent tissue (e.g. eye). The achievable lateral resolution mainly depends on the focusing optics, while the axial reso- lution is based on the light source of the OCT scanner. Currently, resolutions down to a few micrometers are possible, allowing OCT images to compete with histological examina- tion of tissue. The main drawback of histology is that for each observation a tissue sample has to be taken surgically. Thus, the main advantages of OCT are its non-invasive nature and real-time feasibility. Similar to ultrasonic images, OCT speckle arises from constructive and destructive interference of light backscat- tered from reflectors smaller than a wavelength. Speckle noise degrades contrast, and makes it difficult to identify small reflectors. Additionally, post-processing algorithms (such as deconvolution or segmentation) that are used to enhance image resolution and quality may suffer from the presence of speckle in OCT images. Thus, prior to image enhance- ment algorithms, a speckle reduced image is available that can be blended with the original image to enhance viewing convenience. This paper is organized as follows: In section 2 the basic principles of OCT are described and the high resolution OCT system is presented in detail. Section 3 deals with the theory of speckle reduction algorithms and gives an overview of the algorithmic complexity. Based on these investigations, section 4 evaluates real time implementation feasibility on an FPGA. 2 High resolution OCT This section gives details about the OCT system used throughout this work, and a flexible digital post processing re- alized on an FPGA is presented. 2.1 OCT principle A general OCT system is depicted in Fig. 1. The light from a broadband light source is split into a reference beam and a probe beam directed onto the object of interest. By combining the backscattered light with the reference beam, an interfer- ence pattern is created and detected by a photodetector. The strength of the interference is dependent on the optical path difference and on the reflection. The envelope of the inter- ference pattern shows the reflectivity in a certain depth of the probe [1]. Hence, by altering the length of the reference beam, a re- flectivity profile of the probe can be recorded, called A Scan. The photodetector signal is amplified and analog-to-digital converted to enable digital filtering, demodulation and ad- vanced post processing. By deflecting successively the probe beam in lateral direction, multiple scans can be combined to a 2-dimensional B-Scan, showing a high resolution cross-sec- tional view of the probe. 2.2 OCT system Interference of the detected light depends on the dif- ference of the optical length of the reference arm and probe arm. Unlike narrowband light sources, broadband light © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 91 Acta Polytechnica Vol. 47 No. 4–5/2007 Stick Based Speckle Reduction for Real-Time Processing of OCT Images on an FPGA H. Luecken, G. Tech, R. Schwann, G. Kappen This paper presents an FPGA based real-time implementation of an adaptive speckle reduction algorithm. Applied to the log-compressed image of a high-resolution optical coherence tomography (OCT) system, all related signal processing steps from envelope detection to VGA video signal generation are executed on a single chip. Images from measured OCT data show that the chosen algorithm produces a smooth, detailed image with fewer image artifacts than comparable approaches. An estimation of the hardware effort, the possible throughput rate and the resulting image frame rate is given for different window sizes used here in speckle reduction. Keywords: optical coherence tomography, real-time processing, speckle reduction, FPGA. Fig. 1: OCT principle sources show interference only for small differences up to the coherence length. The applied source emits light with a spec- tral bandwidth of 255 nm at a center wavelength of 800 nm, which leads to a coherence length of 1.6 �m. The coherence length is the limit for the axial resolution of the system. Resolution in the lateral direction of the OCT-System depends basically on the focusing optics and lenses. The sys- tem considered here provides up to 0.01 degrees of lateral resolution. The length of the reference arm can be varied on a mi- crometer scale with a piezoelectric crystal. Adaptive control enables a linear movement of the reference mirror at constant speed. Therefore the recorded signal shows the interference pattern as a sine-wave carrier signal. The depth-dependent reflectivity of the probe shapes a superimposed envelope. The carrier frequency depends on the mirror velocity and center frequency of the light source, and is about 350 KHz for the considered system, mainly limited by the mechanical set-up of the reference mirror. 2.3 Digital post-processing After amplifying the photodiode signal, it is sampled and converted to the digital domain. A standard analog-to-digital converter is used, providing precision of 12 Bits at a sampling frequency of 1 MHz. This data is fed to an Altera Stratix II FPGA, which serves as a test-bed to evaluate different speckle reduction algorithms and the real-time abilities of the system. Fig. 3 shows a block diagram of the architecture. Standard signal processing for an OCT system consists of filtering and amplitude demodulation to determine the signal’s envelope. The implemented system starts with a bandpass filter. Subsequently a Hilbert transformation is cho- sen to demodulate the signal by adding the imaginary part to the OCT signal and taking the absolute value. The bandpass and Hilbert transform can be performed in one step, in Fig. 3 denoted as complex bandpass. To compute the absolute value, i.e. the squared sum of the imaginary and real part of the signal, an efficient coordinate rotation (CORDIC) algo- rithm has been implemented as proposed in [5]. Subsequent to envelope detection, the signal is compres- sed to 8 Bit by computing the logarithm. To adapt the sampling rate to the axial resolution given by the coherence length, down-sampling is done after low-pass filtering to avoid aliasing. The demodulated and compressed OCT data of each scan line is written to a memory buffer, and the adjacent scans are combined in lateral direction to a B-Scan. A VGA proto- col converter allows visualization of the OCT picture on a standard VGA display with a resolution of 1024×768 pixels at 75 Hz. The architecture makes use of a 2-stage memory structure (Table 1). For advanced processing and speckle re- duction, up to 63 adjacent A Scans can be stored in a buffer and can be processed in both the axial and the lateral direc- tion. A 32 MByte SDRAM is used as frame buffer by the video controller. The required Arithmetical Logical Units (ALUTs) for an FPGA implementation of this post processing scheme are summarized in Table 2. As can be seen, only 9 % of the overall FPGA (48,352 ALUTs) are used in this post processing approach, leaving sufficient space for extension and enhancements. 3 Speckle reduction algorithm This section details with the speckle reduction technique considered here. Speckle reduction aims to reduce noise-like speckle dis- tortion in B-Scans, and it should also generally preserve structures like edges, borders and small features of interest. A large variety of speckle reducing filters has been published, particularly from research in ultrasonic image processing. Approaches using region growing, diffusion and wavelet de- composition have been proposed. This paper considers the adaptive speckle filter intro- duced in [3] using asymmetric sticks, called ASF. Similar to other speckle reduction techniques, the basic concept is to compute the output from windows which are adapted to the local content of the source image. In contrast to algorithms working with rectangular windows and/or weights [6, 7, 8] 92 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 2: Spectrum of the broadband light source and interference pattern of an ideal mirror Fig. 3: Digital post-processing of OCT data Block A-Scan Buffer SDRAM Memory Size 524,288 Bits 804,864 Bytes Table 1: Memory usage of OCT post-processing Block Band -pass Abs Log LP VGA SDRAM- Controller � ALUTs 936 469 109 763 1562 429 4268 Table 2: Resource requirements of basic OCT post-processing blocks which depend on the distance to the center of the window, ASF resolves the window in angular steps. This method shows superior results at the expense of high computational load and is impracticable to implement on a general purpose pro- cessor for real-time processing. The algorithm will now be explained in detail. The filter uses directional features (called sticks, Fig. 5) and allows iterative application, since it behaves like a non- linear diffusion model. The method is based on a weighted sum of means, which are calculated in the stick direction. The weights are given by the reciprocal of the local variance in each direction. According to [3], the filter algorithm is given by � ( )I W g V Ii i i N � � � �1 1 4 4 and W g Vi i N � � � � ( ) 1 4 4 , where Î denotes the output for each pixel. Vi and Ii are the variance and mean of the original image data I along the ith stick pointing from the center of the N×N filter window in one of 4N � 4 different directions. The sticks can be con- structed by hardware, making efficient use of Bresenham’s line algorithm [9]. Given the ith stick as a filter kernel Hi(x, y), the mean and variance are computed by I x y I x y H x yi i i( , ) ( , ) ( , )� � and � V x y I x y H x y Ii i i( , ) ( , ) ( , )� � �2 2 with * denoting a 2-dimensional correlation. As variation function g( ) this paper applies g V k V V k i i i( ) � � � � �� 255 255 if else. Sticks across an edge show high variance and therefore the mean in this direction is weighted to have less influence on the computed output. Hence, smoothing only happens in homogeneous directions whereas it is suppressed across edges. In homogeneous speckle regions, all sticks have ap- proximately the same variance and weight, and as a result all directions are smoothed evenly. Because of the high quantization of angular steps, ASF is able to adapt its kernel very flexibly to the image content. Inhomogeneities with various shapes can not only be re- tained, but can be smoothed without blurring edges. Fig. 4 shows a measured OCT image of a spheroid placed on an agar substrate and the enhanced images for two different stick lengths. The speckle pattern and additive noise are significantly reduced. In [3], it is stated that due to the overlapping sticks the algorithm works like a Gaussian smoothing filter in homoge- neous regions. However, a closer look at the filter kernel in these regions shows that the weights are w r c N r c N r ( ) ( ) � � � � � � �� 4 4 0 4 4 8 for else. (Assuming Vi equal for all directions i, with r denoting the number of stick pixels from the window center and a constant c) The kernel is not Gaussian, but decreases with 1/(8r). For better smoothing, the influence of the center pixel can be reduced. A straightforward way to do this is to subtract a frac- tion of the original pixel value from the output � ( ) �� � � �I I I1 � � . An empirically found value of � is 0.125. This adjustment is especially important if only one iteration is applied. To compare the image quality of ASF to an existing FPGA implementation of the ARGF-based [8] algorithm by Mazumdar et al. [4], both filters have been implemented in C�� and Matlab and adjusted to OCT-image data. Fig. 6 shows the resulting images. © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 93 Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 4: High resolution OCT image of a spheroid: (a) Original image, (b) Asymmetric stick filter (11×11 window), (c) Asymmetric stick filter (21×21) Fig. 5: Kernels Hi (x,y) showing sticks for a window size N � 11 The right image is slightly degraded by artifacts (marked with arrows). These are caused by hard switching between square filter kernels and outliers found in speckle statistics. They do not appear in the left image, because of the soft thresholding and high angular quantization of ASF. Further- more, ASF needs only one parameter k, which can be adjusted very easily. The filter introduced in [4] needs 4 parameters, which are difficult to adjust empirically. 4 Real-time implementation This section describes the real-time FPGA implementa- tion of the ASF algorithm introduced in section 3. This work focuses on meeting real-time requirements. Therefore, the timing requirements for the ASF algorithm are derived based on the A-Scan rate and pixels per A-Scan. Subsequently the architecture of the ASF algorithm will be shown and the throughput rate for a serial and a parallel implementation will be estimated. In the second part, implementation results are presented for a Stratix II FPGA for different sets of parameters of the algorithm. Here, the dependency of different stick-lengths corresponding to varying window sizes as well as the number of sticks on the required hardware will be shown, measured in Logic Elements (LEs). A possible parallel execution of the ASF algorithm to achieve highest throughput rate will be described. 4.1 Signal processing concept Fig. 7 shows the signal processing concept of the ASF algo- rithm. As can be seen, the algorithm is divided into three blocks that realize different steps of the algorithm at different sample rates. This architecture allows pipelining of the algo- rithm to increase the throughput rate. In Fig. 7, the first block receives a window of N×N pixels and calculates the sum along each stick and the sum of the squared elements for NS sticks. Because the N×N window pixels are prefetched and stored in an array, they are available without latency. In each clock cycle, a serial implementation performs one of the NS summations over NL stick elements. This leads to a required number of NS processing cycles for the first block and thus one summation value is calculated each clock cycle. In a parallel implementation, all these summations are calcu- lated simultaneously leading to a calculation time of one clock cycle for all summation results. The processing cycles of the second block are mainly af- fected by the summation over NS sticks, leading also to a number of about NS cycles, and hence one calculated value per clock cycle. Here, the parallel implementation calculates all results in one clock cycle. Finally, block III realizes a divider which does not affect the throughput rate of this pipelined architecture. The number of processing cycles required to perform the ASF algorithm realized by the three-block approach consid- ered here is equal to the maximum number of processing cy- cles of block I and II. Thus, a serial implementation requires about NS processing cycles if the pipeline has been filled up. In contrast to this, a fully parallel implementation requires three clock cycles to determine a speckle reduced pixel value. Based on the A-Scan rate and the number of pixels per A-Scan, the upper limit of processing cycles to calculate a new pixel value can be derived from cyc clk A-Scan � f f l , where l is the A-Scan length (in pixels), fclk and fA-Scan are the clock frequency and the A-Scan rate in Hz respectively. With an A-Scan rate of 50 Hz and an A-Scan length of 768 pixels, approximately 2600 cycles are acceptable for the calculation of each speckle reduced pixel. As can be seen from the pro- cessing cycle calculation above, serial as well as parallel ASF implementations easily meet this requirement for reasonable window sizes. 4.2 FPGA implementation Fig. 8 shows the number of required ALUTs for a serial im- plementation of the ASF algorithm. The increasing number of ALUTs is mainly caused by the first block. As can be seen in Fig. 8, the bit widths of the second and third block’s input 94 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 6: Comparison of speckle reduction algorithms (left Asym- metric Stick Filter, right ARGF-based filter) NL NL NS NS Fig. 7: Signal processing architecture of the speckle reduction algorithm parameters are just slightly increasing by ld( )NL and ld( )NS respectively, leading to nearly constant hardware resources for these blocks. The number of required ALUTs of the first block is approximately a quadratic function of N. Furthermore, the maximum clock frequency for an imple- mentation on the Stratix II FPGA (EP2S60F1020C4) is given in Fig. 8. As can be seen for reasonable values of N, the maxi- mum clock frequency is greater than 50 MHz. 4.3 Performance evaluation To determine the maximum window size for real time cal- culation, the equation for the upper limit of available cycles per pixel in section 4.1 can be rewritten. For a stick length of N NL � �( )1 2 , using the minimum number of required cy- cles N NS � �4 4 for the serial implementation yields: N f l w f � �clk img4 1 . This form allows a calculation of the window size depend- ing on A-Scan length (l), image rate (fimg) and width (w) as well as clock frequency (fclk). For a second available OCT system based on Fourier-do- main image reconstruction, an A-Scan length of 1000 pixels, an image rate of 5 Hz and a width of 250 lines is required. In this case, a window size of N � 11 can be processed in real-time. 5 Conclusion A generic design approach allows for a flexible implemen- tation of the ASF algorithm. Thereby, algorithm parameters like window size, number of sticks, and weighting function can be configured at compile time. The performance of the serial implementation meets the requirements of the broadband OCT system used here. Even for FFT-based fast OCT systems, the presented hardware implementation can apply real-time speckle re- duction to the full image size. For even higher demands, parallelization of the algorithm is straightforward using the proposed generic design approach. Thus, increased through- put rate can be traded for hardware effort regarding the number of required ALUTs. The introduction of a slight modification of the algorithm shows that a single run provides sufficient smoothing of the image. Acknowledgments Parts of this work were funded by the Deutsche For- schungsgesellschaft (DFG). The work was supported by Felix Spöler and the Institute of Semiconductor Electronics at RWTH Aachen University. The authors would like to thank Prof. Tobias G. Noll for supervising this work at the Chair of Electrical Engineering and Computer Systems, RWTH Aachen University. References [1] Schmitt, J. M.: Optical Coherence Tomography (OCT): A Review, In IEEE Journal of Selected Topics in Quantum Electronics, Vol. 5 (1999), No. 4, July/August 1999. [2] Schmitt, J. M., Xiang, S. H., Yung, K. M.: Speckle in Op- tical Coherence Tomography, In Journal of Biomedical Optics, Vol. 4 (1999), No. 1, January 1999. [3] Xiao, C. Y., Zhang, S., Chen, Y. Z.: A Diffusion Stick Method for Speckle Suppression in Ultrasonic Images, In Pattern Recognition Letters, 2004, No. 25. [4] Mazumdar, B., Mediratta, A., Bhattacharyya, J., Baner- jee, S.: A Real Time Speckle Noise Cleaning Filter for Ultrasound Images, In Proceedings of the 19th IEEE Sympo- sium on Computer-Based Medical Systems, 2006. [5] Schwann, R., Kappen, G.: Cordic Based Postproces- sing of Ultrasound Beamformer Data, In Proceedings 9th International Student Conference on Electrical Engineering “POSTER 2005”, 2005. [6] Loupas, T., McDicken, W. N., Allan, P. L.: Adaptive Weighted Median Filter for Speckle Suppression in Medical Ultrasonic Images, In IEEE Transactions on Cir- cuits and Systems, Vol. 36 (1989), p. 129–135. [7] Koo, J. I., Park, S. B.: Speckle Reduction with Edge Preservation in Medical Ultrasonic Images Using a Ho- mogeneous Region Growing Mean Filter (HRGMF), In Ultrasonic Imaging, Vol. 13 (1991), p. 211–237. [8] Chen, Y., Yin, R., Flynn, P., Broschat, S.: Aggressive Region Growing for Speckle Reduction in Ultrasound Images. In Pattern Recogn. Lett. Vol. 24 (Feb. 2003), No. 4–5, p. 677–691. [9] Bresenham, J.: Algorithm for Computer Control of a Digital Plotter, In IBM Systems Journal, Vol. 4 (1965), No. 1, p. 25–30. Heinrich Luecken Gerhard Tech Dipl.-Ing. Robert Schwann e-mail: schwann@eecs.rwth-aachen.de Dipl.-Ing. Götz Kappen e-mail: kappen@eecs.rwth-aachen.de Chair of Electrical Engineering and Computer Systems RWTH Aachen University Schinkelstrasse 2 Aachen, Germany © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 95 Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 8: Resource usage and maximum clock frequency of ASF as a function of window size