177Filtering and Wavelet Transform.....(Ridha Sefina Samosir) FILTERING AND WAVELET TRANSFORM ALGORITHM FOR OLD DOCUMENT IMAGE RESTORATION Ridha Sefina Samosir Information System Study Program, Creative Industry, Institute Technology and Business Kalbis Jln. Pulomas Selatan Kav. 22, Jakarta Timur 13210, Indonesia ridha.samosir@kalbis.ac.id Received: 11th September 2017/ Revised: 18th September 2017/ Accepted: 27th September 2017 Abstract - The aim of this research was to develop image restoration system using filtering and wavelet transform algorithm. Data collection was through observation and system was developed using prototyping model. Result of this research is a computer based on system to restore image containing noise. Based on the research process, filtering and wavelet transform algorithm can be used to restore old document image from interferences (noise). Keywords: old document, image restoration, filtering, wavelet transform algorithm I. INTRODUCTION Old document is one of the historical heritage in a nation or state. There is much valuable information that can be extracted from an old document. Some document image comes from different source or place but documents may have a relation or link with each other. It means that many information and the relation among information can be mined to historical heritage. As a relic of history, the main problem of the old document is the appearance of some interference (noise) strokes. It is difficult to read and understand all the content. Interference can be caused by many things such as the storage time, storage media, storage methods, materials paper and ink used, and image apturing technique. One type of interferences that often occurs in the old document image is ink bleed through removal. Ink bleed through removal is the appearance of various marks or signs that affect the quality of the document such as printed paper from the back side of the document appearing on the front side of the document. Moreover, ink widening is a lack of the ability of the paper to absorb the ink. A sign appears as a result of the digitalization process. In particular, the issue of the appearance of printed paper on the back side of the document is more common in italics documents type. If the gradient is approximately 45, usually printed paper from the other side has a slope of approximately 135. Because of these interferences, much information in the document cannot be recognized. Indonesia is a country with many cultural heritages from 17.000 islands. Each island consists of many tribes and various languages. One of the cultural heritages is history. Historical stories are poured on various media such as temples or paper. Moreover, writing is a technique that can be used to tell about history. A collection of Indonesian historical heritage is stored in Arsip Nasional Republik Indonesia (ANRI - National Archives of Indonesia). This research uses a collection of document obtained from ANRI and all of documents represent italic hand writing type. From the document obtained by researcher from ANRI, it shows some significant damage. The existing damage includes the appearance of writing from the back side on the front side, ink stains on the document, and unclean document background. This is very unfortunate because many historical stories can be extracted from the document. In addition, there is possible relevance among the documents. Thus, many information and historical events are lost. Figure 1 is an example of the input image. According to Huiyu, Jiahua, and Jianguo (2010), the technique to minimize or eliminate interference in the old document image is one of the processes in image processing, namely restoration. Restoration is one of the operations in the image processing system aiming to improve the image quality degraded or degradation. Old document image restoration aims to minimize the interference. The beginning stage of the restoration process is digitalization of the old document to the digital image (Rafael & Richard, 2008). Figure 1 Example of Input Image 178 ComTech, Vol. 8 No. 2 September 2017, 177-181 There are many methods to perform image restoration. There are classification, serialization, filtering, and wavelet transform. Konidaris, Kesidis, and Gatos (2016) said that digital image classification techniques divided the old documents into three parts. Those were background, original text, and interfering stroke. Through the classification techniques, the front side of the document was extracted to produce original text. Moreover, Hinami et al. (2016) used serialization techniques. This algorithm applied the principle of sequentiality like sliding windows for the entire image. Another previous research uses a thresholding technique with Otsu threshold methods. The principle of this technique is the extraction of text characters that contain background noise based on their gray level distribution (2013). This algorithm is suitable for the degraded document image. If the image size is large, it will cause the gray level to overlap between foreground and background document. Meanwhile, Another algorithm that has been done is the K-Means Singular Value Decomposition (K-SVD) algorithm or by Ren, Lu, and Zeng (2015). The principle of this algorithm is to train a dictionary that represents the semantic structure of the image based on the library of the original image. The main idea of K-SVD algorithm optimization is by updating and adjusting elements in the dictionary continuously until it matches with the image signal people want. Different from the K-SVD algorithm, ant colony and genetic algorithm are an algorithm which is a combination of two bionic evolutionary algorithms (Gülcü et al., 2016). Ant Colony Optimization (ACO) is adopted from the behavior of ant colonies or the ant systems. Ants can find the shortest route from the nest to the location of food based on footprints on the trajectory that has been passed. The path passed by large ant will be followed by the other ants. It will increase the density of the ants that pass through it, or all the ants will pass that path. This is because of the nature of ants that produce pheromone substances. Such substances can only be identified by similar living things. If many ants cross a path, the substance will be more and more. However, if a track is rarely passed, the substance will be lost. In image restoration, the ant colony algorithm can easily generate the behavior of the image signal. However, ant colony algorithm requires a long search time that also increases the rate of convergence. On the other hand, genetic algorithms conduct searches with random existing techniques to speed up multidimensional nonlinear data computation (Feng, Lu, and Zeng, 2015). The methods in previous research suggest that the result will be less optimal if the writing on the front side of the document contains more than one color and the image of an old document has many noises (degraded document image). Aside from these two problems, the main problem of using classification techniques and K-SVD is that the system requires supervision or involvement from the users to determine the amount of area (cluster) to be formed and a color sample from each class (cluster) that cannot be done automatically. Meanwhile, the combination of ant colony algorithm and genetic algorithm is more suitable for the restoration process that clarifies the edges and textures of the image. With the various weaknesses and drawbacks from the application of the algorithm for the document images restoration, this research proposes to combine both of wavelet transform and filtering approach. Therefore, the combination of these two algorithms is expected to provide a solution to improve the quality of damaged old document image. People can read the information contained in it. II. METHODS This research starts from the problems that it is difficult to recognize the contents or text written on many old documents. From 39 documents image obtained from ANRI, the researchers classify the document based on the type of noise subjectively. There are four types of disorders. Those are damage caused by noise in the background document; noise in the form of ink widening and ink splashes from unwanted scratches; damage in the form of printed paper from the back side appearing on the front side of the document; and Damage caused by fail digitalization process. The instrument research used is two algorithms which are wavelet transformation and filtering. This research proposes to use multi directional wavelet transform algorithm and mean shift filtering algorithm. Then, the software for the system development is Matlab R2008. The principle of the mean shift filtering algorithm is to apply mean shift algorithm to filter the data of the image. The mean shift filter algorithm works iteratively and generates a set of neighborhood pixels (M) based on the spatial radius (hs/spatial kernel bandwidth) and color distance (hr/range or color kernel bandwidth) values. In each set of neighborhood pixels, mean of its spatial and color distance is calculated. The obtained mean value is the new starting position for the next iteration. This iteration procedure will stop when the mean values of spatial and color distance do not change from the previous iteration. Next, wavelet transformation proposed in this research is multi-directional wavelet transform. With multi-directional wavelet transform algorithm, all posts from various directions can be well identified. Thus, it is easier to process the restoration. The steps of the research are shown in Figure 2. First, data are collected. Data obtained from ANRI are in the form of an old document image. The researcher also uses literature study from various sources to find out the most appropriate solutions to the problems. The problem statement of this research is how to implement both filtering algorithm and wavelet transformation to perform old document image restoration. Second, it is the design of the application to be built. The design phase is designing navigational structure and graphic interface of the application. Third, there is the development of the applications. Based on the results of the first step, it indicates the algorithm that can be used is a combination of filtering algorithm and wavelet transformation. Filtering is a process of taking the partial signal of a certain frequency and discards the signal on the other frequencies. Filtering the image also uses the same principle which takes the function of the image at a certain frequency and discards the image function at certain frequencies as well. The frequency of the image is affected by the existing color gradation on the image. Image with gradation (threshold level) tends to have a lower frequency and vice versa (Trieu & Maruyama, 2015). The working principle of a filtering algorithm is divided. First, if people want to maintain the gradation (the number of color levels in an image), the pixels have to be maintained at a low frequency and eliminate pixels at high frequencies (Low Pass Filtering). Second, people will get the image of a certain threshold value or the binary image pixels at high frequency and low frequency maintained is discarded (High Pass Filtering). Third, if people want to maintain gradation, they should reduce the frequency field (bandwidth) and discard unnecessary signal the low frequency, and maintain the high frequency. Then, the mid- 179Filtering and Wavelet Transform.....(Ridha Sefina Samosir) frequency is discarded (Band Stop Filtering). Furthermore, wavelet transformation algorithm works through signal analysis using wavelet function to produce wavelet coefficients. Transformation means a change action that is usually done to help simplify the problem. Meanwhile, the image is a two-dimensional image on a field that contains much information. Therefore, the image transformation means the process of changing the form of the image to explore or get the information contained in the image and the information is used to solve the occurring problems with the image. Wavelet is a mathematical function fulfilling the requirements that can be used to represent a signal. Wavelet comes from the word ‘wave’ and ‘let’ which means little. Briefly, wavelet can be interpreted as a small wave. Small waves are translated as scale. Therefore, a wavelet is used to analyze the data or functions based on the scale. To ease the process of decomposition of directional wavelet transform algorithm, input images are arranged in a dyadic matrix with a pixel size 2n. If the result of the image is not symmetrical, it can use the zero extension or expansion of the matrix by adding a value of 0 in the row or column. Wavelet function will be divided into the signal components of different frequencies. Then, the frequency components are analyzed using a scale of resolution (scale function) by Hou et al. (2013). From the explanation, a wavelet transform is a tool that can be used to analyze non-stationary signals (frequency content of the signal which varies with time). Wavelet analysis can show the temporal behavior of the signal, filter (filtering) data, and signals, and eliminate unwanted behaviors of signals for image compression. The most important properties of wavelet are the localization of time and frequency so that the analysis of the signals is done locally and detail according to the scales. In other words, wavelet-based analysis splits the signal into several different frequencies, the approximation (A/lowpass) and a detailed section (D/highpass). The approximation is the components of the low-frequency signal while the detail is high-frequency signal components. The detail part consists of horizontal, vertical, and diagonal detail. Low-frequency signal components (approximation) indicates the identity of a high-frequency signal and the nuances/details of the signal (Wang, 2010). Figure 3 is an illustration of decomposition process signals with wavelet transform. Fourth, it tests the application. Once the application is built, the application is tested using 39 ANRI input images of the institution. To analyze test results easier, 39 input images are divided into four categories based on the type of interference. Last, the researcher can conclude the test from the result. start Problem and Requirement Analysis Application Design System Implementation System Testing Declare Conclusion End Figure 2 System Development Flowchart III. RESULTS AND DISCUSSIONS Several stages are done in the system development. First, the researcher uploads the input image. The input image is converted from RGB color space to L.U.V. Then, L.U.V color space is represented in the form of data points. This is the pseudocode to generate an initial image and convert an image from RGB color space to LUV color space. Second, pixel input image is subjected without downsampling the advanced wavelet transform to separate the image based on a different frequency in the wavelet domain. Advanced wavelet transformation generates four coefficients (Sub- band). The four coefficients are the approximation coefficient, the detailed horizontal coefficient, vertical detail coefficients, and diagonal detail coefficient. Third, it is followed by a convolution process on the component horizontally, diagonally and vertically. Convolution process filters the signal between the input signals with the impulse response of the filter. The coefficient that acts as the input signal is a horizontal coefficient, vertical coefficient, and diagonal coefficient. Meanwhile, the coefficient of the response impulse is a matrix that represents the direction (direction) specific. фj,m,n (x,y)= Øj,m (x1) Øj,n (y1) (1) Ѱ1j,m,n (x,y)= Ѱj,m (x1) Øj,n (y1) (2) Ѱ2j,m,n (x,y)=Øj,m (x1) Ѱj,n (y1) (3) Ѱ3j.m,n (x,y) = Ѱj,m (x1) Ѱj,n (y1) (4) фj,m (x)= x/√(2 j) ф (x/2j - m) (5) Ѱj,m (x)= 1/√(2 j ) Ѱ (x/2j - m) (6) Multi-directional wavelet transforms, horizontal, vertical, and diagonal coefficients are convolved with a matrix representing a certain direction (orientation). The matrix equation is as follows. (7) (8) Value of C= √2 and Ø represent the direction (orientation/directional) of the posts on the document image. This research proposes the value of Ø as a combination from 0o to 3600. In the process of this convolution, a LL2 HL2 HL1 LH2 HH2 LH1 HH1 HH1 Figure 3 Image Decomposition Using Wavelet 180 ComTech, Vol. 8 No. 2 September 2017, 177-181 filtering algorithm is done by applying standard procedures to obtain the mean shift convergent circumstances. The filtering process begins with the initialization of each pixel of the input image. Then, it proceeds to mean shift standard process until the convergent state is obtained. The calculation of the mean shift vector kernel involves spatial bandwidth and range/color kernel bandwidth. (9) The convergent state is achieved if the mean value of the current shift vector is not equal to the previous mean shift vector. The next step is the storing pixels (Yis, Yir) when the convergent state had reached a certain point. Pixel Zi = (Xis, Yir) is stored as a filter output value. Filter output value is filtered pixels from the surrounding pixels. Filter output value represents pixels output. (10) Next process is thresholding the output value filter. Thresholding an additional operation is performed to improve the image of the output result. This process means determining thresholding parameter values that suit the image quality output the best. After the reconstruction image is obtained through the filtering process, the data points are reconstructed back into the RGB color space. The result of this research is an application for image restoration using both mean shift filtering and multi directional wavelet transform algorithm. Figure 4 is a graphical user interface of the developed system. The system is developed with Matlab version 2008. The system consists of one menu. The user can upload an input image, set the parameter value of each algorithm, and display the output image. Moreover, Figure 5 is input and output of old document images after filtering and wavelet transformation implementation. Figure 4 Graphical User Interface of Restoration Image System For Old Document From the output image, it shows that the result of image restoration is very good. Some noise can be minimized, so it gives some impact such as the cleaner background of the paper and printed text from the back side of the paper. According to the wavelet transformation algorithm, it conducts analysis technique with the image signal. It aims to explore the properties of the image signal. Properties of the image are used to filter and eliminate unnecessary signals. By adding multi-directional approach in wavelet transformation, it can analyze image signal from any direction. Filtering algorithms can eliminate unwanted image frequency and maintain the desired image frequency. In addition, the results show that the parameter values in both algorithms determine the quality of the image restoration. This becomes the advantage of the built system that the users can set the parameter values that match the conditions of the input image the best. Figure 5 Filtering Output Image Figure 6 Wavelet Transform Output Image Then, the input parameters of wavelet transformation algorithm are a combination of the value of θ and the threshold value of the third detailed coefficient factors. In Figure 6, the results of tests performed by the filtering algorithm and wavelet transformation algorithm are as follows. First, it is for filtering algorithm. There are some characteristics of the output image based on the parameter value. It has the larger value of hs, and the occurrences level of noise in the background is reduced. It also has a larger 181Filtering and Wavelet Transform.....(Ridha Sefina Samosir) value of hr, so the writing becomes unclear. The writing is difficult to be identified. The smaller value of hs causes the appearance level of noise increases. Then, the smaller value of hr causes noise cluster to increase, but the large cluster is smaller. M value influences the quality of image restoration results less significantly. Based on experiment result, the value of the input parameter (hs, hr, M) that provides the best results in the output image is 10, 7, and 20 respectively. After experiment for all of input image, this parameter value give better result for document image with noise like interference strokes in the document background, paper printed in the front side that appear from backside and noice that appear because digitation process. But its showed not sigificant result for document image with widening ink. Next, there is some characteristic of the output image based on the wavelet transformation parameter value. If the value of θ at detail coefficients is 15°, 60°, 90°, 135°, 180°, and 360°, the quality of output image is not good. This is because all six grades of θ represent the direction from the back side of a document which is interfering strokes. However, if the value of θ at detail coefficient is 0°, 30°, 45°, 120°, 225°, 270°, and 315°, the quality of output image will be better. The threshold factor is inversely proportional to the quality of the output image. The greater the threshold factor is, the worse the resulting output is. It is because the distribution of wavelet coefficients is centralized at wavelet coefficient value 0 and vice versa. Value of wavelet input parameter is tried for all of input image. From the experiment show that multi directional wavelet transform give good result for document image because interference strokes in document background, paper printed from back side that appear in front side and fail digitalization process. IV. CONCLUSIONS From experiment, it shows that quality of the output image is influenced by the accuracy of input parameter values in the algorithm. Kernel bandwidth (hs), range/ color kernel bandwidth (hr) and M (Minimum Region) are for filtering algorithm. Meanwhile, θ and threshold are for wavelet transformation. Both filtering and wavelet transformation show optimal performance for interference strokes like printed paper from document backside that appears in the document front side and noise because of digitalization process faulty. Then, the less optimal result is for document image which is ink widening or splash or blobs. REFERENCES Feng, Y., Lu, H., & Zeng, X. (2015). Image restoration based on hybrid ant colony algorithm. TELKOMNIKA (Telecommunication Computing Electronics and Control), 13(4), 1298-1304. Gülcü, Ş., Mahi, M., Baykan, Ö. K., & Kodaz, H. (2016). A parallel cooperative hybrid method based on ant colony optimization and 3-Opt algorithm for solving traveling salesman problem. Soft Computing, 20(11), 1-17. Hetal, J.V., & Astha, B. (2013). A review on Otsu image segmentation algorithm. IJARCET (International Journal of Advanced Research in Computer Engineering & Technology), 2(2), 387-389 Hinami, R., Liu, X., Chiba, N., & Satoh, S. I. (2016). Bidirectional extraction and recognition of scene text with layout consistency. International Journal on Document Analysis and Recognition (IJDAR), 19(2), 83-98. Hou, X., Yang, J., Jiang, G., & Qian, X. (2013). Complex SAR image compression based on directional lifting wavelet transform with high clustering capability. IEEE Transactions on Geoscience and Remote Sensing, 51(1), 527-538. Huiyu, Z., Jiahua, W., & Jianguo. Z (2010). Digital image processing: Part 1. Retrieved from www.bookboon. com Konidaris, T., Kesidis, A. L., & Gatos, B. (2016). A segmentation-free word spotting method for historical printed documents. Pattern Analysis and Applications, 19(4), 963-976. Rafael, C. G., & Richard, E. W. (2008). Digital image processing. USA: Addison-Wesley. Ren, J., Lu, H., & Zeng, X. (2015). Image denoising based on k-means singular value decomposition. TELKOMNIKA (Telecommunication Computing Electronics and Control), 13(4), 1312-1318. Trieu, D. B. K., & Maruyama, T. (2015). Real-time color image segmentation based on mean shift algorithm using an FPGA. Journal of Real-Time Image Processing, 10(2), 345-356. Wang, X. (2010). Recovery of blurring scanned manuscript image based on wavelets transform algorithm. In 3rd International Congress on Image and Signal Processing (CISP) (pp. 844-847). IEEE.