IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Image Compression Using Proposed Enhanced Run Length Encoding Algorithm A. H. Husseen, S. Sh. Mahmud, R. J. Mohamme d Departme nt of Computer Science ,College of Education Ibn Al- Haitham , Unive rsity of Baghdad Recei ved in Feb. 7, 2011 Accepted in March 23, 2011 Abstract In this p ap er, we will present p rop osed enhance p rocess of image comp ression by usin g RLE algorithm. This p rop osed y ield to decrease the size of comp ressing image, but the original method used p rimarily for comp ressing a binary images [1].Which will y ield increasin g the size of an original ima ge mostly when used for color images. The test of an enhanced algorithm is p erformed on sample consists of ten BMP 24-bit true color images, building an app lication by using visual basic 6.0 to show the size after and before comp ression p rocess and computing the compression ratio for RLE and for the enhanced RLE algorithm. Keywords: Compression, RLE, Run len gth encoding, GIF, TIFF, PNG, JPEG, BMP, BMP header, BM P file, Compression ratio, Lossless, Lossy, True color. Introduction The size of the comp ressed stream depends on the comp lexity of the image[2]. Image comp ression is minimizing the size in bytes of a gr aphics file without degr adin g the quality of the image to an unacceptable level. The redu ction in file size allows mor e images to be stored in a given amount of disk or m emory sp ace. It also reduces the time requ ired for images to be sent over the Internet or downloaded from Web p ages[3]. Computer graphics applications, p articularly those gener ating d igital p hotographs and other comp lex color ima ges, can generate very large file sizes. Issues of storage sp ace, and the requirement to rap idly transmit image data across networks and over the Internet, have therefore led to the development of a range of image compression techniques, to reduce the p hy sical size of files. M ost comp ression techniques are indep endent of sp ecific file for mats – indeed, many formats supp ort a number of different comp ression types. Graphi cs compressi on al gorithms Grap hics compression algorithms fall into t wo categories: a- Lossy comp ression achiev es its effect at the cost of a loss in image quality , by removing some image information. b-Lossless comp ression techniques reduce size whilst p reserving all of the original image information, and therefore without degrading the quality of the image. Although lossy techniques may be very useful for cr eating versions of ima ges for d ay-to-day use or on the Internet, they should be avoided for archival master versions of image[4] Di fferent formats of images Digital images can b e stored in differ ent formats. JPEG is a form developed by the Joint Phot ographic Exp ert Group . Images in this form comp ressed to a high d egree. JPEG’s p urp ose is to achieve high comp ression ratio with images containing large numb er of colors. IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Grap hics Int erchange Format (GIF) is an 8-bit p er pixel format. It supp orts animas well. GIF images are compressed using lossless LZW data comp ression technique. Port able Network Grap hics (PNG) is another lossless data comp ression which is close to GIF but supp orts 24-bit color p alett e. It can be seen that JPEG can be used whenever the size is a criterion [5]. JPEG: Is a lossy comp ression scheme for color and gr ay-scale images. It works on full 24- bit color (True Color) which means it can st ore up to 16 million colors [6]. JPEG is favored because its comp ression p roduces a r easonable image with a small f ile size. JPEG’s comp ression reduces image files t o about 5% t heir normal size with loss due to destructive or “lossy ” compression [7]. TIFF: The TIFF (Tagged Image File Format) format, it is a flexible format that normally saves 8 bits or 16 bits p er color (red, green, blue) for 24-bit and 48-bit totals, using the LZW comp ression algor ithm for lossless st orage(lossless, but is less effective for 24 bit color images). M ost graphics p rograms that use TIFF do not comp ression. Consequently , file sizes are quite big [8]. The TIFF file format is ideal for p roducing and st oring high-quality images that are used to p roduce p rofessional p hotographic p rints, exhibit back grounds, p ost ers, magazines, etc[7]. GIF: Works best for ima ges with only a few dist inct colors, such as line drawings and simple cartoons. GIF is useful for cartoon images that have less than 256-(2^8) colors, gr ayscale images, and black and white text. The p rimary limitation of a GIF is that it only works on images with 8 bits p er p ixel or less, which means 256 or fewer colors. GIF compresses images using LZW compression [9]. GIF achieves comp ression in two way s. First , it reduces the number of colors of color-rich images, thereby reducing the number of bits needed p er pixel, as just described. Second, it replaces commonly occurring p att erns (esp ecially lar ge areas of uniform color) with a short abbreviation: instead of st oring "white, white, white, white, white," it stores "5 white." Thus, GIF is "lossless" only for images with 256 colors or less. For a rich, true color image, GIF may "lose" 99.998% of the colors [10]. BMP: Windows BM P is the native image for mat in the M icrosoft Windows op erating systems. It supp orts images with 1, 4, 8, 16, 24, and 32 bits p er p ixel, althou gh BM P files using 16 and 32 bits p er pixel ar e rar e. BM P also supp orts simple run-len gth comp ression for 4 and 8 bits p er p ixel. However, BM P comp ression is of use only with large blocks with identical colors, makin g it of very limited value. It is rare for Windows BMP to be in a comp ressed format. M ulti-byte integers in the Windows BM P format are stored with t he least significant by tes first. Data stored in the BMP format consists entirely of comp lete by tes so bit string orderin g is not an issue. The BMP file st ructure is very simp le [12]. It starts with a file header that contains the two bytes BM and the file size. This is followed by an image header with the width, height, and number of bitp lanes[2]. Windows BMP files begin with a 54-byte header [13]: IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Offset Size De scription 0 2 signature, must be 4D42 hex 2 4 size of BM P file in bytes (unreliable) 6 2 reserved, must be zero 8 2 reserved, must be zero 10 4 offset to st art of image data in bytes 14 4 size of BITMAPINFOHEADER st ructure, must be 40 18 4 image width in p ixels 22 4 image height in pixels 26 2 number of p lanes in the image, must be 1 28 2 number of bits p er p ixel (1, 4, 8, or 24) 30 4 comp ression type (0=none, 1=RLE-8, 2=RLE-4) 34 4 size of image data in by tes (includin g p addin g) 38 4 horizont al resolution in pixels p er meter (unreliable) 42 4 vertical resolution in p ixels p er meter (unreliable) 46 4 number of colors in image, or zero 50 4 number of imp ortant colors, or zero PNG: Provide a non-p rop rietary alternative to the LZW comp ression emp loy ed by GIF and other file formats. PNG comp ression uses the deflate comp ression method. It is lossless algorithm and is effective with color depths from 1-bit (monochrome) to 48-bit (True color). It allows y ou to make a trade-off between file size and image quality when the image is comp ressed. Typically, an image in a PNG file can be 10 to 30% more comp ressed than in a GIF format[4] . PNG uses ZIP compression which is lossless, and slightly more effective than LZW (slightly smaller f iles)[11]. IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Run length encoding (RLE) Run length encod ing (RLE) is p erhap s t he simplest comp ression technique in common use. RLE algor ithms are lossless, and work by searching for runs of bits, by tes, or p ixels of the same value, and encoding the len gth and value of the run. As such, R LE achieves best results with images containing large areas of conti guous color, and esp ecially monochrome images. Complex color images, such as p hotographs, do not comp ress well – in some cases, RLE can actually increase the file size. There is a nu mber of RLE variants in common use, which are encountered in the TIFF, PCX and BMP graphics formats [4]. Run-len gth encod ing represents a string by rep lacing each subsequence of consecutive identical characters with(char; length). The string 11112222333111 would have representation (1; 4)(2; 4)(3; 3)(1 ; 3). Then comp ress each (char; length) as a unit using, say , Human coding. Clearly, this technique works best when the characters repeat often. One such situation is in f ax transmission, which contains alternating long sequ ences of 1's and 0's. The distribution of code words is taken over many documents to comp ute the op timal Human code.[14] Propose d enhanced run length encoding al gorithm Each color images consists of the basic three colors (R,G,B), the R LE algorith m represented the image consists on N p ixels as follows : Red R Green G Blue B Number of pixels C r1 g1 b1 c1 r2 g2 b2 c2 ….. ….. ….. …… ….. ….. ….. …… rn gn bn cn But in the new algorithm we comp ute the differences between the adjacent p ixels for each color, if the differ ence between r1 and r2 less than or equal t o a threshold value (th<=10) and if the difference between g1 and g2 less than or equal to a threshold valu e (th<=10) and if the difference between b1 and b2 less than or equal to a threshold value (th<=10) we add 1 to C1, and if the difference is greater than 10 we do this p rocess between next adjacent p ixels until we reach the last p ixel in the ima ge, and the followin g examp le exp lains the RLE algorithm and the enhanced R LE algorithm : Let we have the image of 4 x4 p ixels 100 101 102 100 200 200 205 209 300 300 305 301 210 205 300 300 The number of records required to save this image are 16. When compress this image by using RLE algor ithm we will generate the followin g values : IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 100(1) , 101(1), 102(1), 100(1), 200(2), 205(1), 209(1), 300(2), 305(1), 301(1), 210(1), 205(1), 300(2). The number of records required to save this image are 13. But when comp ress t his image by using an enhan ced RLE algor ithm we will generate the followin g valu es 100(4) , 200(4), 300(4), 210(2), 300(2). The number of records requir ed to save this image ar e 5 . The propose d enhanced RLE algorithm 1- Read the BM P image. 2- Get the height N and the width M for the image 3- Create an array , let it M ain(N,M ) ,each element of this array consists of three fields for R,G,B colors. 4- Convert t he image to t he main array ; M ain(N,M ). 5- Let X=Main(0,0) ; M ain(0,0) is the first element in an arr ay. Let TH=10, TH : the threshold . 6- For I= 0 to N-1 7- For J=0 to M -1 8- If X-M ain(I,J) <= TH then Let C=C+1 Else Let X=Main(I,J) and C=0 9- End. Ex perime ntal results we use ten images to test the above algor ithm and the threshold value equal or less t han (10), then we get t he following results : Image Name (BMP) Original Size (KB) New Size (KB) Using RLE Compression Ratio Original Size/ Size Using RLE New Size (KB) Using p rop osed enhanced RLE Compression Ratio Original Size/ Size Using p rop osed enhanced Birds 475 586 0.8:1 133 3.5:1 Heart 919 1137 0.86:1 471 1.95:1 Light 226 298 0.003:1 211 1.07:1 M adina 80 89 0.9:1 38 2.1:1 M ansour 88 116 0.98:1 23 3.8:1 M icrosoft 523 526 0.99:1 145 3.6:1 Ramadan 533 669 0.79:1 240 2.2:1 Rawda 998 775 1.28:1 343 2.9:1 Rose 127 160 0.79:1 62 2.08:1 Winnie 170 64 2.65:1 31 5.48:1 IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 The above table shows t he size of true color BM P images and the size after the comp ression p rocess using RLE and prop osed enhanced RLE algorithms. The compression ratio is changing fro m image to another, but t his related to the variety between the values of adjacent p ixels. Decreasing variety between the values of adjacent p ixels will y ield to increase comp ression ratio, and vice v ersa. As shown in the figures (4, 5, 6, 7, 8, 9, 10, 11, 12, 13). The table below ap p ears t he size of all samp le images when converted to JPEG, GIF, PNG, TIFF types using M icrosoft Paint: Image Name (BMP) Original Size (KB) New size (KB) p rop osed enhanced RLE JPEG format GIF format TIFF format PNG format Birds 475 133 20 43 361 247 Heart 919 471 54 123 958 686 Light 226 211 25 31 239 183 M adina 80 38 6 6 23 19 M ansour 88 23 5 10 84 68 M icrosoft 523 145 18 52 434 285 Ramadan 533 240 37 54 485 350 Rawda 998 343 38 90 564 349 Rose 127 62 8 17 131 99 Winnie 170 31 8 9 52 40 Conclusions 1- Run length encoding algorithm is a method of comp ressing images depend on the number of adjacent p ixels value in the image. 2- RLE algorithm is failing mostly when using it for comp ressing color images, because we need new field to st ore the number of repeated adjacent p ixels. 3- Run length encoding algorithm is an efficient comp ression method with images have less various between values of adjacent p ixels, but fail with images have high difference between adjacent p ixels value. 4- When increasing the values of threshold in the p rop osed enhanced RLE algorithm this will y ield increasing the compression ratio, and vice versa. 5.Controlling the co mpression ratio depends on the value of the threshold, which dep ends on the typ e of domain that image used it . IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Re ferences 1. Umbaugh, S.E. (1998), Computer Vision and Image Processin g: A Practical App roach Using CVIPt ools, First Edition, Prentice Hall, New York,253. 2. Salo mon, D. (2007), Data Comp ression", Fourth Edition, Sp ringer-Verla g,London,28-36. 3. http://searchsoa.techtarget.com/definition/imageco mpression 4. Adrian, B. (9 July 2003), Image comp ression, Digital Preservation Guidance note 5, DPGN-05, 1-9,www.nationalarchiv es.gov.uk /documents/image_comp ression.p df 5. Sriniv asan,V., I mage Compression Using Embedded Zero-tree Wavelet Coding (EZW),2, www.umiacs.umd.edu 6. Blello ch,E. (25 September 2010) , Introduction to Data Comp ression, Comp uter Science Dep artment ,Carnegie M ellon University ,www.cs.cmu.edu/afs/cs/p roject/p scico- guy b/realworld /www/compression.p df 7. Carey ,L. (August 2008), Underst anding Digital Image, Conserve O Gram, Sp onsored by the Park M useum M anagement Program, 22/2, 1-2, www.np s.gov/hist ory /museum/p ublications/conserveogr am/22-02.pdf 8. http://en.wikipedia.org/wiki/I mage_file_formats 9. www.contentdm.com/USC/tutorials/image-file-ty p es.p df 10. www.wfu.edu/~matt hews/misc/graphics/formats/formats.html 11. www.scantip s.com/basics09.ht ml 12. M iano, J. (1999), Comp ressed image file formats: JPEG, PNG, GIF, XBM , BMP, ACM Press/Addison-Wesley, New York. 13. www.fastgraph.com/help/bmp _os2_header_format.html 14. Jessica, F. (12 February 2002), ”Algorithms for M assive Data Sets Context-based Compression”, CS 493,1:1-3,www.docstoc.com/docs/54164044/Run-length-encoding 2011) 1( 24مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة المجلد المطورة RLEالـعملیة ضغط الصور باستعمال خوارزمیة علي ھادي حسین، شیماء شاكر محمود، روال جاسم محمد ابن الھیثم ، جامعة بغداد -قسم علوم الحاسبات ،كلیة التربیة 2011باط ش 7استلم البحث في 2011 اذار 23قبل البحث في خالصةال RLEة في عملیة ضغط الصور، خوارزمیة عملالمست RLEفي هذه الدراسة سنعرض عملیة تحسین على خوارزمیة مباشرة لضغط الصور RLEخوارزمیة عمالعند است، Binary Images [1]ة لضغط الصور الثنائیة یبكفا عمل تست اختبرت. وطة عن الصورة األصلیةاألغلب إلى زیادة حجم الصورة المضغ فيفإن ذلك یؤدي True colorالملونة Visualوبناء تطبیق بلغة BMP 24-bit True Colorالخوارزمیة الجدیدة على عینة تتألف من عشر صور ملونة نوع Basic 6.0 نسبة الضغط وبیان الحجم األصلي للصور والحجم الجدید بعد عملیة الضغط وحسابCompression Ratio خوارزمیة عمالباستRLE االعتیادیة والمحسنة . ، هیكل RLE، Run length encoding ، GIF، TIFF، PNG JPEG، BMPعملیة الضغط ، : الكلمات المفتاحیة BMP ملف ،BMP ، نسبة الضغط ، فقدان ، بدون فقدان ،True color. Appendix for Application Form and all Images used as a s ample Fig. (1): Application form built using visual basi c 6.0 IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Fig. (2): Histogram for the image Birds.BMP using RLE algorithm Fig. (3): Histogram for the image Birds.BMP using propose d enhanced RLE algorithm IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Fig. (4): Bi rds.BMP 456x355 pixels Fig. (5): Heart.BMP 560x560 pixels IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Fig .(6): Microsoft.BMP 432x413 pixels Fig .(7): Light.BMP 320x240 pixels Fig. (8): Madina.BMP 200x135 pixels IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Fig. (9): Ramadan.BMP 436x417 pixels Fig .(10): Mnsour.BMP 200x150 pixels Fig. (11): Winnie.BMP 232x250 pixels IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.24 (1) 2011 Fig. (12): Rawda.BMP 692x492 pixels Fig .(13): Rose.BMP 240x180 pixels