IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 A Proposed Algorithm for Steganography A.O. Ibadi, O. Z. Akif Departme nt of Software, College for Economic Sciences Engineering , Unive rsity of Baghdad Departme nt of Computer Science, College of Education I bn Al-Haitham, Unive rsity of Baghdad Abstract Stegano graphy is an imp ortant class of security which is widely used in computer and network security nowadays. In this research, a new p rop osed algorithm was introduced with a new concept of dealin g with st eganography as an algorithmic secret key technique similar to st ream cip her cryptographic sy stem. The prop osed algorithm is a secret key syst em suggested to be used in communications for messages t ransmission steganography . Introduction The rise of the Internet and multimedia techniques in the mid-1990s has p rompted increasin g interest in hiding d ata in digital media. Ear ly research concentrated on watermarkin g to p rotect copyrighted multimedia p roducts (such as images, audio, v ideo, and text). Data embeddin g has also been found to be useful in covert communication, or st eganogr aphy . The goal was and st ill is to convey messages under cover, concealing the very existence of information exchange[1]. The p roblem in the methods used in the st egano graphy is the ability to detect the information which embedded in the message by using the roles of detection. In this research, we use a new app roach to solve this p roblem, the p ower of this app roach is the ability to hide the information in random p ositions in the image that means any roles that’s using in the detection can not succeed to detect any information that was embedded in the image thus a new approach p revent any ability for detection. All p revious methods do not use the random generator to hide the information, but for cryptography it is used. In a new app roach we use the random gener ator to hide the information in the image by generate random positions from the image to hide the information on it. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 The main att ribute to this app roach is to keep the same p rop rieties of the image before and after the embedding of the information such so that one cannot detect this image to have any information. Steganography We all know what is t he purp ose of st eganogr aphy : to hide data into ot her data. The p oint of cryptography is to transform structured and intelligible data (like a text file) into a st ream of random-lookin g by tes. The p oint of steganography is somehow the opp osite, to mix random- lookin g data with decoy information so t hat it will look like structured format. In the field of st eganography , some terminology has develop ed. The term “cover” is used to describe the original, innocent message, data, audio, st ill v ideo and so on. When referr ing to audio si gnal st eganography , the cover signal is sometimes called the “host ” signal. The information to be hidden in the cover data is known as the “embedded” data. The “stego” data is the data containin g both the cover si gnal and the “embedded ” information. Logically, the p rocessing of p utting the hidden or embedded data, into the cover data, is sometimes known as embeddin g. Occasionally , esp ecially when referring to image st eganogr aphy , the cover ima ge is known as t he container[2]. There are p lenty of softwares around that can hide data on BM P images. Unfort unately, BMP p ictures are not widely used or exchanged, unlike JPG. So why p rogrammers don't p erform st egano programs for JPG. The reason is more con cep tual. The concep t of lossy comp ression like the one used in JPG (or MP3 for audio) is to remove most of the unimp ortant or redundant information. The concept of most st eganogr aphy algorithms is to hide bits by rep lacing this very same unimp ortant or redundant information (like the Least Significant Bits). So both techniques are go ing in opp osite directions. T he more y ou comp ress, t he more difficult it is to find room to hide data. Steganography classifications We would classify steganography softwares in several categories of increasing quality . It's a little bit artificial as a scale, not absolutely defined with golden rules, but I would say it's a p ragmatic way of quickly estimating soft ware. Not ice that I am not taking in account any up stream encryp tion, before the steganography st ep . That's another matter, although gener ally, p rograms t hat offer some solid, known and p ublished encry ption algorithm, should be trusted IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 more than any "in-house" obfuscation method. So the techniques are, one of the followin g classification: 1. Adding data at t he end of the carrier file. 2. Inserting data in some junk or comment field in the header of the file st ructure. 3. Embeddin g d ata in the carrier byte stream, in a linear, sequ ential and fixed way. 4. Embedding d ata in the carr ier by te st ream, in a p seudo-random way depending on a p assword. 5. Embedding d ata in the carr ier by te st ream, in a p seudo-random way depending on a p assword, and changing other bits of the carrier file to comp ensate for the modifications induced by the hidden data, to avoid modifying st atist ical p rop erties of t he carrier file. In our p oint of view, we don't consider p oints 1 and 2 as real steganography . Of course, what "real st eganography " may depend on the definition y ou use. If we choose to use "hidin g data from my y oung brother" as a defin ition, y es, this is st eganography . The concep t of breaking a st eganography software does not mean that y ou can recover the hidden data. To break a st eganography algorithm means that y ou can decide with a st atistically high level of confid ence that an image contains emb edded information, and estimate the size of this information. Least Significant Bit (LSB) insertion method Least sign ificant bit insertion is a common, simple app roach to embed information in a cover file. The LSB is the lowest order bit in a binary value. This is an imp ortant concept in comp uter data st orage and p rogr amming that app lies to the order in whi ch d ata are or ganized, st ored or transmitted. The last bit of the byte is selected as the least significant bit (as illustrated in Figure 1) because of the impact of the bit to the minimum degr adation of images. The last bit is also known as right-most bit, due to the convention in p ositional notation of writing less significant digit further to the right.[3] In this method, we can take the binary representation of the hidden data and overwrite the LSB of each by te within the cover image. If we are using 24-bit co lor, the amount of chan ge will be minimal and indiscernible to t he human ey e. As an examp le, supp ose that we have three adjacent p ixels (n ine by tes) with the following RGB encodin g: IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 10010101 00001101 11001001 10010110 00001111 11001010 10011111 00010000 11001011 Now supp ose we want to "hide" the followin g 9 bits of data (the hidden d ata is usually comp ressed p rior to being hidden) : 101101101. If we overlay these 9 bits over the LSB of the 9 bytes above, we get t he following (where bits in bold have been changed): 10010101 00001100 11001001 10010111 00001110 11001011 10011111 00010000 11001011 Not e that we have successfully hidden 9 bits but at a cost of only changing 4, or roughly 50%, of the LSBs. This description is meant only as a high-level overview. Similar methods can be applied to 8-bit color but the changes are more dramatic. Gray -scale ima ges, too, are very useful for steganographic p urp oses.[4] De tection of Steganography Steganaly sis is the art and science behind the detection of the use of steganography by a third p arty . The basic function of st eganalysis is to detect first or estimate the p robability that hidden information is p resent in any given file. The detection and estimation are b ased only on the data p resented in its observable form (i.e. nothing is known about the file p rior to invest igation). Because simply detecting the p resence of hidden data may not be sufficient, st eganalysis also covers the functions of extracting the message, disablin g and/or destroy ing the hidden message so that it cannot be extracted, and finally , altering the hidden message such that misinformation can be sent t o the intended recipient instead of the original message [5]. Dep ending on how much information is known about the embedded image, st eganaly sis techniques and methods closely mirror traditional cryp tanalysis methods. T he steganalysis attack methods can be broken into si x typ es:  Stegano graphy -only attack: Only the file with the embedded data is available for analysis.  Known-carrier att ack: Both the or iginal carrier file and the final (h idden message embedded) f iles are available for analy sis.  Known-message att ack: The original message p rior to embedding in the carr ier is known.  Chosen-st eganography attack: Both t he algorithm used to embed the data and the final (hidden message embedded) file are known and available for analysis. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010  Chosen-message att ack: The original message and the algorithm used to embed the message are available, but neither the carrier nor the final (h idden message embedded) file ar e. This attack is used by the analy st for comp arison t o future files.  Known-steganography attack: All comp onents of the sy stem (the original message, the carrier message, and the algor ithm) are available for analysis. It follows that the success of any steganaly sis t echnique is tied to the amount of information known about the file prior to invest igation. As more information about the file is known p rior to invest igation, the invest igator can move from simply detecting to modify ing or alterin g the hidden message before sending it on to the intended recipient. In the first category (st eganography -only att ack), the purp ose of analysis is t o detect simp ly the existence of a hidden message. Without p rior knowledge of the encoding mechanism, key, or data contained within the message, recov ery of the contents using this method while p ossible, can take an excessive amount of time. With access to the original carrier and the final file with the embedded content (known-carrier), the p urp ose of analysis can move toward recovering the emb edded message by comp aring the differ ences between the two files. If the algorithm is known and the file with the hidden message embedded is also available (chosen-st eganography att ack), the analy st may have the ability to reverse the embedding to recover the hidd en messa ge and can easily alter or dest roy the hidden contents [5]. Finally, if the analyst has the algorithm and a message p rior to embedding (chosen-message att ack), they can move towards identify ing p ossible (hidden message embedded) files to attempt to recover the original carr ier. If the carrier can b e recovered or closely reproduced, the ab ility to insert alternate messages in lieu of the original message is p ossible. The steganography -only attack can b e accomp lished through the use of statist ical analy sis p erformed on the final med ium. In the following examp le, the color contents of JPEG images are examin ed. A modification to each coeff icient LSB p roduces variations in the data that results in deviations to the histogram for the given f ile. If the deviations are lar ge enough to p roduce noticeable aberr ations, the embedded file histogram can identify the existence of the hidden message. Likewise, LSB modifications to p alett e-based images (GIF, etc.) cause dup lications of the colors in the p alett e with identical or near ly identical colors app earing. This dup lication of colors can also serve as an ind icator p ointing to the existence of hidden data. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Ne w proposed stego-system The main concept for suggesting the p rop osed sy stem is to keep st ego-data out of detection, in other words, the detection test must fail. In the following se ctions a new p rop osed st ego-sy stem will be introduced st arting with the general st ructure of the whole sy stem and then the comp lete algor ithm followed by sections to exp lain some which is not well-defined p arts (new concepts) of the algor ithm. 1.The Gene ral S tructure Figure 2, shows t he general st ructure of the stego-algor ithm. A messa ge will be f ed to t he st ego-algorithm to be embedded in we ll suited cover (ima ge) of len gth enough to hold the embedded text. For LSB techniques t he length of the cover (Lcov) file must accomp lished the following: Lcov ≥ (N+55)*8 bits Where N is the length the embedded text in bits. This inequality is comp uted for a cover image of bitmap typ e (BM P). T he number 55 is l ength of the image header in bitmap image typ e The stego-algorithm must have a stego-k ey which is a seed key fed to t he stego-algorithm to hide the text in the cover ima ge and p roduce the st ego-data. These op erations are p erformed in the sender side while the receiver will retrieve the st ego-data and feed it with the same st ego-key to the stego-algorithm to obtain the embedded te xt. It is very imp ortant t o know t hat t he stego-data is an ima ge and must be the same cover image which is used in the send er side, for this reason, the receiver doesn’t need to obtain the cover ima ge by extracting the embedded data from stego-data file. 2.S tego-Algorithm Concept There are two app roaches to keep the stego-data out of detection, the first is to hide a serial data in random p ositions of the cover image, and the second approach is to hide a random data in serial p ositions of the cover image. We can p rove this consideration by app ly ing the detection tests to the stego-data in later se ctions. The second app roach can be app lied by p erforming the transp osition encry ption algorithms to the embedded d ata before hiding them in the cover image, which means to apply two op erations cryp tography and steganography and this can be p roved in other new research. In this p ap er, we p rop osed st eganographic algorithm using the second approach. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 To hide a serial bits in a random p ositions we require a p seudo-random numbers gener ator to work in a similar fashion as st ream cip her encry p tion algorithms (see figure 3). The differ ence between the prop osed syst em and stream cip her is that stream cip her uses t he generated random bits to be Xored with t he p lain text bits while the prop osed sy stem uses t hese random numbers as a referen ce to a position to hide the embedded bit in the covered image. The random numbers gener ator must be non-repeatable random numbers gener ator for a p redefined range len gth or for the whole cover file len gth. As we mentioned in section 5, the hidin g p rocess will change rou gh ly 50% of the LSBs, additionally, we mean to use p art of the cover file to reduce the op p ortunity to detect the hiding information or break the st ego- algorithm. 3.Pseudo Random Numbers Gene rator The random number generator is a function fed with a seed v alue to gener ate a randomly selected numbers from a ran ge defined by 2 n where n is length of the sequence b its that rep resents the seed value (will be discussed later). The generation function is non- linear and generated 2 n unrepeated random numbers. The output number (R)10 = (R0R1...Rn-1)2 can be resulted from the inp ut number (I)10 = (I0I1...In-1)2 at each st ep, where R, I are decimals and R0, R1, ..., Rn-1, I0, I1, ..., In-1 are b inary digits as followin g: Ri=T 1 Ii for i=0,1,…,n-1 , Where if i=0 then T=0, if i=1 then T=I0 else T=I0*I1*…*Ii-1. Exam ple: assuming I=178 what is t he outp ut of the function R. The binary representation of I=10110010 so the outp ut bits will be as followin g: R0= 0 1 1 = 0, R1= 1 1 0 = 0, R2= 0 1 1 = 0, R3= 0 1 1 = 0, R4= 0 1 0 = 1, R5= 0 1 0 = 1, R6=0 1 1 = 0, R7= 0 1 0 = 1, R=(00001101)2=(13)10 IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 So, if the input I=178 the output will be R=13 to p roduce another output we must feed the last output (13) as inp ut to the function and so on. The first five outp ut will be: R0=13, R1=242 ,R2=107 ,R3=138 ,R4=53 ,R5=202 ,… To show that this function p roduce unrepeatable sequence of random numbers in the range of 2 n where n is the number of bits t o represent the block or the cover file len gth, assume that n=4 and the seed number is 11 the outp ut will be 1011, 0000, 1111, 0111, 1000, 0011, 1100, 0101, 1010, 0001, 1110, 0110, 1001, 0010, 1101, 0100,1011, … The seed 1011 will be repeated after (2 4 =16) step and there are no rep eated number in this range as been observ ed. 4.Seed Value Length Seed valu e length (N) is the numb er of bits needed to convert t he seed value into bin ary. This number (N) is very imp ortant and must be selected carefully. The imp ortance of it obtained from bein g N represents t he range of the random number generator which must be large enough to increase the comp lexity of the setgo- algorithm (N!) and in the same time must not exceed the len gth of the cover file. Basically , the first imp ortant p rocess which must be performed is to calculate the value of (N) as following: where Len is the length of the cover file and means the nearest upp er integer value. N in our prop osed algor ithm limits the text len gth of the hidin g information which must not exceed the value of N. In other words, we must select t he cover f ile as big enough to hide all the secret t ext. Practically, N ≥10 is fair because 10! is lar ge enou gh for not break ing the sy st em. This means that the minimum cover file size must be greater than (2 10 =1024) and other cases depending on the secret t ext len gth. 5.The Complete S tructure Figure 4, shows the comp lete st ructure of the p rop osed stego-sy stem. Starting from reading the text file (which is wanted to be hidden) and the cover file and then p erforming some imp ortant p rep rocessing st eps like calculating the value of N and the length of the hiding text (Len) and delimitin g the secr et t ext by p utting the end delimiter at the end of the te xt (app ending the word "END" or "STOP" at the end) to know where to stop retrieving the hiding text later, IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 IHJPAS then checkin g that Len is not greater than (2 N ) which means that t he cover file length is lar ge enou gh to hide the secr et information. Randomness Tests Randomness t ests (or tests of randomness) are used to analy ze the distribution p attern of a set of data. In stochast ic modeling, as in some comp uter simulations, the exp ected random input data can be ver ified to show t hat tests were performed using randomized data. In some cases, data reveals an obvious non-r andom p att ern, as with so-called "runs in the data" (such as exp ecting random 0-9 but finding "4 3 2 1 0 4 3 2 1..." and rarely going above 4). If a selected set of data fails the tests, then p arameters can be changed or other randomized data can be used which does p ass the tests for randomness[6]. There are many p ractical measures of randomn ess for a binary sequence. These include measures based on st atist ical tests, transforms, and complexity or a mixture of these. The best for testing the semi-random generators are based on st atist ics which are as follows [7]: the frequency test, the serial test, the poker test, the run test and the auto correlation test , which all depend on X 2 distribution Conclusions To conclude t hat t he key generator is a random generator we need to p rove that it can pass the randomness tests. Figure 5, shows the test results of a sev eral gener ated outp ut sequences of a differ ent len gth, All of them passed the randomness tests. Re ferences 1. HUAIQING WANG AND SHUOZHONG WANG, (2004),Cy ber Warfare: Steganography vs. Steganaly sis”, October / 47(10), COMMUNICAT IONS OF THE ACM . 2. www.wikip idia.com, The free ency clop edia, steganogr aphy . 3. Por, L.Y.;. Alireza,W. K. Lai2 Z. ; Ang,F. T.and Su,M .T. B. “StegCure: A Comprehensive Stegano graphic Tool using Enhan ced LSB Scheme” Delin a, Faculty of Comp uter Science and Information Technolo gy University of M alaya 50603, Kuala Lumpur M ALAYSIA. IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 IHJPAS 4. M urshed ,M d. M . “Steganography using LSB hiding” Up p er Iowa University Fay ette, Iowa, USA. 5. Dickman,S. D. and M adison, J. (2007).An Overview of Steganography ” University Infosec Techreport Dep artment of Computer Science JM U-INFOSEC-July 6. www.wikip edia.com/randomness t ests. 7. Robshaw,M .J.B. (1995),Stream Ciphers, RSA Laboratories Technical Report TR-701, Version 2.0|July 25,. IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Fig.( 1): Least S igni ficant Bit Fig.( 2): the general structure Stego-Algorithm Stego-Algorithm text Stego-Ke y Cove r-image Stego-data Stego-Ke y text Sender Receiver Start IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Fig.( 4):The Complete S tructure of the S tego-system IHJPAS Se quence length Freque ncy Test (≤3.84) Se rial test (≤ 7.81) Poker Test (≤ 11.1) Run Test T0 gaps t est T1 blocks t est Autocorrelation test (d=8) (P=pass, F=fail) 1024 0.34 0.88 8.01 T0=p ass T1=p ass PPPPPPPP 2048 2.11 1.4 6.41 T0=p ass T1=p ass PPPPPPPF 4096 3.01 1.44 8.44 T0=p ass T1=p ass PPPPPPPF 8192 3.00 3.45 3.6 T0=p ass T1=p ass PPPPPPPF 16384 0.78 2.23 2.19 T0=p ass T1=p ass PPPPPPPF Fig.( 5): Randomness tests results IHJPAS 2010) 3( 23المجلد مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة خوارزمیة مقترحة ألخفاء البیانات عمر زیاد عاكف،عبدالكریم عكلة كلیة بغداد للعلوم االقتصادیة ، قسم هندسة البرامجیات جامعة بغداد، أبن الهیثم -كلیة التربیة ،قسم علوم الحاسبات الخالصة علم أخفاء البیانات أحد الفروع المهمة لعلم التشفیر الذي یستخدم بصـورة واسـعة فـي أمنیـة الحاسـبات والشـبكات فـي هـذه دیع خوارزمیـة جدیـدة تـم تقـدیمها بمفهـوم جدیـد للتعامـل مـع علـم اخفـاء البیانـات وفـي تقنیــــة حســــــاب تفـي هـذا البحـث اقتراحـ. االیـام لكــــــي یستخـــدم فـي االتصـاالت حترحـة هـي مفتـاح سـري تـم أقتـر الخوارزمیة المق. هة لنظام التشفیر االنسیابيالمفتاح السري مشاب . لغرض أخفاء معلومات الرسائل المرسلة IHJPAS