Microsoft Word - BKDAS 90 ON RECOGNITION OF CIPHER BIT STREAM FROM DIFFERENT SOURCES USING MAJORITY VOTING FUSION RULE SHRI KANT*, VEENA SHARMA*, B. K. DASS** Abstract In the present paper, majority-voting rule has been investigated for its possible application in cryptological sciences. A novel approach is proposed to address the complex identification problem of overlapping classes. The method for representing patterns using different measurements has been discussed and the majority voting rule is used to fuse the results obtained in different measurement spaces. The proposed approach is quite natural and simple to implement in comparison with usual fusion strategies. The scheme has been implemented for three-class problem and results were tabulated and presented graphically. Keywords Decision fusion, Representation space, Pattern space, Expert classifiers, Majority logic, Stream ciphers and Cryptology. --------------------------------------------------------------------------------------- Address for correspondence: Scientific Analysis Group, Defence R & D Organization, Metcalfe House complex, Delhi-ll0054 Tel No.: (011) 23813862 Email: shrikant@scientist.com, shrikant.ojha@gmail.com *e**Deptt. of Mathematics Faculty of mathematical Sciences University of Delhi Delhi-110054 *Scientific Analysis Group Defence R & D Organization Metcalfe House complex Delhi-ll0054 91 1. Introduction Identification of cipher bit streams generated from different sources is the primary step for a cryptanalyst. It requires cipher bit stream to be represented in the form of a pattern vector. In the measurement space, the analyst can take various measurements for patterns. Based on specific perception and scale, patterns are represented as points in some multidimensional feature space. The feature space is partitioned using the discriminant functions made on the basis of patterns of known classes, referred as training/learning patterns. The performance of the discriminant function is measured by categorization of independent patterns, known as test patterns, to their own partitions. A higher percentage of correct classification of the patterns in the test set indicates a better discriminator. The fundamental goal of an analyst is to arrive at the highest probable correct classification of a given set of patterns. This objective leads to the design and development of different type of classifiers to solve a particular pattern recognition problem. Here, the accuracy in classification attained by different classifiers may be different. Also, the set of patterns correctly classified by one classifier may differ with the set of patterns correctly classified by another classifier. Thus, instead of searching for the best among the set of classifiers, it is found better to combine the decisions of individual classifiers. By applying a combination strategy to the set of classifiers such that the participating classifiers work complementary to each other, we are likely to get a classification rate better than that of a single best classifier. Various combination strategies or decision fusion techniques have been proposed and studied by many researchers. Lam and Suen ([1]:1997), Kittler, et. al. ([2]:1998), Alkoot, et. al. ([3]:1999), Kuncheva, et. al. ([4]:2001), Chen and Cheng ([5]:2001) and Alexandre, et. al. ([6]:2001) etc. made a detailed study of different aspects of these combination strategies. 92 We first give a brief description of these fusion schemes. Let X be a pattern which is to be assigned to one of m possible classes w1, w2, …, wm with the help of anyone of M individual classifiers. Each classifier approximates the a posteriori probability P(wi/X), i= 1, 2, … , m, that is the probability that pattern X belongs to class wi, given that X was observed. A classifier assigns X to class wk if )1( )/(max)/( ,...,1 −−−= − XwPXwP i mi k For convenience, let we denote the a posteriori probabilities computed by classifier Cj by Pj(wi/X), where j = 1, 2, …, M and i = 1, 2, …, m. It is assumed that these estimates of a posteriori probabilities given by individual classifiers are independent and identically distributed according to some pre-assumed distribution function. Here, aim is to get improved estimates P(wi/X) by applying some combination rule ‘f’ to the individual estimates Pj(wi/X) given by each of the M classifiers i.e. ( ) )2( m...,,2 ,1,)/(...,,)/(..., ,)/()/( 1 −−−== iXwPXwPXwPXwP iMijii f Pattern X is finally allocated to class wk according to the rule (1). Thus the rule for decision fusion becomes ( ){ })/(...,,)/(max )/( 1 1 XwPXwPXwPifwX iMi m i kk f = =∈ Some prevalent decision fusion rules are the average rule, geometric mean rule, maximum rule, minimum rule, median rule, and majority vote rule. The theoretical and experimental comparative studies about the performance of decision fusion approaches have been carried out by Kittler, et. al. ([2]:1998), Alkoot and Kittler ([3]:1999), Chen & Cheng ([5]:2001) and Kuncheva ([7]:2002) etc., using different data sets. Sensitivity to estimation errors of these schemes under different 93 assumptions and different approximations has been analyzed, Kittler, et. al. ([2]:1998), Alkoot and Kittler ([3]:1999). It has been found that relative performance of various combination schemes changes under different conditions. The main emphasis has been given to comparison of the two basic schemes i.e. sum rule and product rule, Kittler, et. al. ([2]:1998), Alkoot and Kittler ([3]:1999), Alexandre, et. al. ([6]:2001). The sum rule is found easy to implement and less sensitive to errors than product rule, in most of the scenarios, Kittler, et. al. ([2]:1998). The product rule and strategies devised from it perform better when all the experts produce small errors. Further, the number of classifiers employed in fusion and number of classes in the problem also has an effect on the relative performance of different experts, Alkoot and Kittler ([3]:1999). In general as stated earlier, these fusion rules, with an exception of majority voting rule, use the probabilities obtained by different classifiers to take the final fused decision about the class-memberships of the patterns. These probabilities given by different classifiers are called soft decisions. On the other hand, majority-voting rule works on hard decisions. That is, in majority voting rule, different classifiers first give their respective decisions about the class-memberships called the hard decisions, and then the decision taken by maximum number of classifiers is taken as the final decision. Instead of handling the probabilities, it simply works on the decisions given by different classifiers and therefore, is easiest to implement, Lee and Srihari ([8]:1993) and Lam and Suen ([1]:1997). And yet, experiments show that majority-voting rule is just as effective as other combination schemes, which are more complex in nature. Also, majority-voting scheme is found to be one of the schemes, which are relatively stable. Keeping all these facts into mind, we have chosen majority-voting scheme for experimentation to support our approach of fusion, which is slightly different from the usual approach. Let us first formulate majority-voting scheme mathematically. 94 In majority voting rule, the individual a posteriori probabilities Pj(wi/X) are used to produce hard decisions δij where ⎪⎩ ⎪ ⎨ ⎧ = =δ = otherwise 0 )X/w(Pmax)X/w(P if 1 kj m 1k ij ij Then we assign the pattern X to class wk if ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ δ= ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ δ ∑∑ = = = M 1j ij m 1i M 1j kj max In the literature, mostly, two-class problem have been addressed with the help of decision fusion rules, although the rules can be implemented for any number of classes and any number of features representing a pattern. However, when the number of classes increases, the computational complexity also increases and the final decision may be costly for overlapping classes. We address this difficulty by proposing in Section 2, a simple and easy to implement approach, working on the basis of consensus of decisions taken in different representation spaces. Section 3 presents the problem definition and a description about various representation spaces. Section 4 contains the algorithm and Section 5 contains details of experimentation and results. Finally, in Section 6 we present our observations and conclusions. 2. Proposed Approach For Classification Before discussing our approach, let us put the usual fusion approach in a form, which can be compared with proposed one. Let, there are m classes and M classifiers. As discussed before, the a posteriori probabilities pij, where i = 1, 2, …, M and j = 1, 2, …, m, are computed for a given pattern X to be classified in one of the pre-specified class. 95 Classes Pattern X w1 w2 … wm Classifier C1 p11 p12 … p1m Classifier C2 p21 p22 … p2m : : : : : : … : : Classifier CM pM1 pM2 … pMm Table 2.1 The pattern X gets its class membership in class wi if a predefined function f as described in Section 1, gives optimum value for class wi i.e. i j ),p,,p,(p )p,,p,(p Mj2j1jMi2i1i ≠∀−−−>−−− f f Now, instead of considering different types of classifiers, we propose to consider different representations of same set of patterns and allow a single classifier to take decision about class memberships. Going this way, in spite of having only one classifier, one can have different probable decisions and can apply any of the traditional fusion schemes. Further, if one have only two or very few classifiers available, then there will be more chances of having a tie instead of having a decision due to lack of majority of a single decision, specially when we are going to deal a multi- class problem. In that case, our approach presents a way to use fusion to have more authenticated decisions by considering many representations of set of patterns, according to the underlying problem. As stated before, we have chosen majority voting rule for fusion i.e. we accept the decision obtained in majority of the representation/feature spaces using a single classifier. Let we have ‘r’ representation spaces to observe a pattern X in ‘r’ different ways. With the help of a classifier C, we wish to classify X in one of the pre-specified m classes. Let pij, i = 1, 2, …, r and j = 1, 2, …, m be the probability for X of membership in jth class, while the ith representation is used to present the pattern. First we convert these soft decisions into hard decisions ∆ij, by allocating one class wj to the pattern X in ith representation space i.e. 96 ⎪⎩ ⎪ ⎨ ⎧ = =∆ = otherwise 0 pmaxp if 1 ik m 1k ij ij Table 2.2(a) Soft Decisions: Table2.2(a) Hard Decision: Table2.2(b) From the table 2.2(b), it is clear that a pattern will get its class membership in class wk if ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ∆= ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ∆ ∑∑ = = = r 1i ij m 1j r 1i ik max Classes Classifier C w1 w2 --- wm Representation X1 p11 p12 --- p1m Representation X2 p21 p22 --- p2m | | | | | | --- | | Representation Xr pr1 pr2 --- prm Classes Classifier C w1 w2 --- wm Representation X1 ∆11 ∆12 --- ∆1m Representation X2 ∆21 ∆22 --- ∆2m | | | | | | --- | | Representation Xr ∆r1 ∆r2 --- ∆rm 97 3. Problem Definition and Feature Computation In the present day communication scenario, any type of information viz. visual scenes, voice and text, is stored and communicated digitally. The authorized recipient at the other end recovers the same with precise accuracy and correctness. The adversary may intercept, record and retrieve all the plain transmission with some trial and error, using available means and technology. But, he will not be able to make any sense of it if the information is transmitted after encipherment by applying some cryptographic techniques. To experiment with the said problem, enciphered bit streams of scenes, voice and text have been generated from three independent stream ciphers respectively. The stream ciphers used are clock-controlled shift registers, Geffe generator and cascade of linear shift registers with nonlinear combiner. The details are described in Geffe ([9]:1973), Rueppel ([10]:1986), Schneier ([11]:1996), Kumar ([12]:1997) and Menezes et. al. ([13]:1997). We consider each fixed length sample (now onwards referred as a message) of enciphered bit stream as a pattern. These patterns require their representation in pattern space as multidimensional feature vectors so that these can become suitable for further analysis. The process of feature extraction from each message to form a suitable mathematical pattern is like an art and this is improved by experimentation and practice. Next, we will describe the procedure followed by us to extract significant feature vectors from these bit streams. 3.1 Mathematical Representation Let us denote the samples of enciphered binary streams by Mlk, where l = 1, 2, 3 and k = 1, 2, …, N. In this representation, l =1 stands for encrypted scene, l =2 stands for encrypted voice and l =3 stands for encrypted text. The number of messages taken from each respective source is ‘N’. All messages are assumed to be of a sufficiently long length of ‘c’ bits, where 1000 ≤ c ≤ 5000 bits usually. From each message Mlk, binary pattern word (i.e. small blocks of bits) of a suitable fixed length ‘b’ are read, 98 where b = 5 or 7 bits etc. Now, these binary words can be read from a message in two (overlapping and non-overlapping) ways. In overlapped reading, we proceed bit by bit i.e. first pattern word starts from the first bit of the message and second pattern word starts from the second bit of the message and so on. And in non-overlapped reading, we move block by block i.e. we divide the whole message into blocks of given pattern word length and then these blocks are taken as pattern words. One can take a pattern word of any length depending upon the prior knowledge of assignable character for a fixed group of bits. For a binary pattern word of length ‘b’, we have possibility of 2b different words. If we do a certain computation on given message, for each of these 2b possible words, then we will have 2b computed quantities. These 2b quantities or measures together will constitute a 2b-dimensional feature vector. So, by varying pattern word length ‘b’, we will get feature vectors of different dimensions from a particular message. For example, for b=5 and b=7, 32- dimensional and 128-dimensional feature vector will be obtained respectively. In each case, we get different feature space with different components and different dimensions. Following this method, we can have different representations of a particular raw pattern. In a message Mlk, number of total occurrences of pattern words,‘t’ is given by ⎪ ⎪ ⎪ ⎩ ⎪⎪ ⎪ ⎨ ⎧ =⎥⎦ ⎤ ⎢⎣ ⎡ =−− = )case goverlappin-onN ( t b c (3)--- )case verlappingO ( t1)(bc t d c In the subsequent sub-sections, we present further, the two different types of computations done to compute the feature vectors. In these subsections, we refer 'ith pattern word’ for binary equivalent of decimal number 'i', where 0 ≤ i ≤ 2b –1. For example, if b = 5 then dimension of the vector = 99 25 =32 and the indices of the vector will vary from 0 to 31. It can be better understood with the help of table given below. Binary Word Equivalent Decimal Feature component 00000 0 F[0] 00001 1 F[1] | | | 00111 7 F[7] 01000 8 F[8] | | | 11111 31 F[31] 3.1.1 Percentage Frequency Vector (PFV): First, we compute the frequency vector F. The ith component of the vector F, 'fi' is the frequency of ith pattern word in a particular message, where 0 ≤ i ≤ 2b -1. So, ith component of the percentage frequency vector P, 'pi' is the percentage of the ith component of the frequency vector F. Each component 'pi' where 0 ≤ i ≤ 2b –1, can be computed as ⎪ ⎪ ⎪ ⎩ ⎪⎪ ⎪ ⎨ ⎧ = × = × = )case goverlappin-onN ( p t 100f (4)--- )case verlappingO ( p t 100f p d i d i c i c i i 3.1.2 Average Distance Vector (ADV): In any message, any pattern word can occur more than once. Based on these different occurrences of the same pattern word and distances between them, we compute average distance vector. Each occurrence in 100 the message is marked by the first bit of a pattern word. Let Pi,j be the position of the first bit of ith pattern word, in jth occurrence in the message then the ith component of the average distance vector a is defined as ( ) )5( 2 1,, −−− − = ∑ = − i f j jiji i f PP a i As defined above, fi is the number of times the ith pattern word occur in the message sequence, where 0 ≤ i ≤ 2b -1. Now, from each given message bit-stream for a fixed pattern word length, four types of feature vectors can be generated with the help of (4) and (5) depending upon different choices, as shown below. One Message (or raw pattern) Pattern word length: ‘b’ bits Reading ways: Overlapping Non-overlapping Vectors: PFV. ADV PFV ADV In further discussion, we will use the following notations described in table 3 to refer the four possible cases. The notations have been taken as first letter of each words near the arrow in small letters. 101 Notation Pattern Word Length: b= 5 or7 Way of Reading Type of Vector bop ‘b’ bits overlapping percentage frequency boa ,, ,, Average distance bnp ,, non- overlapping percentage frequency bna ,, ,, Average distance Table 3 4. Algorithm: Suppose there are m classes w1, w2, ---, wm and N is the total number of raw patterns taken from each class. Thus, in total we have mN patterns. Let we denote the number of patterns to be taken for learning from each class by L. The remaining (N-L) patterns will be used for testing. Let we present all the patterns in ‘r’ different representations taking different combinations of choices i.e. varying the pattern word length, way of reading and type of vector. The dimension of each pattern in any representation will depend upon the pattern word length chosen for that representation. Let we denote the dimension in the pth representation by np, where p = 1, 2, ---, r. Step 1: Make a set of raw patterns (or message bit streams), keeping the patterns of all classes together. From this set, further compute ‘r’ sets by converting these patterns into vectors in ‘r’ different representations. Step 2: Select one of the classification technique such as minimum distance classifier, Bayes classifier or perceptron algorithm etc. as discussed in Tou and Gonzalez ([14]:1974), Bow ([15]:1984), Kant and Sharma ([16]:2000) etc. Step 3: Pass each representation of patterns to the classifier one by one i.e. for p = 1, 2, ---, r, apply classification algorithm to pth 102 representation which is a set containing np-dimensional vectors. Store class allotted to each pattern in each representation. Step 4: Set j = 1. Step 5:For jth pattern, initialize count [ i ] = 0, where i = 1,2,---, m. Step 6: Set p = 1. Step 7: If jth pattern in pth representation goes to class wk, increment the count [ k ] by 1. Step 8: Repeat Step 7 for p = 2, ---, r. Step 9: Finally, assign jth pattern to class wk if { }]i[countmax]k[count m 1i= = If there are more than one class such that the quantity count [k] of these classes are equal to the maximum value computed in the equation, then there arise uncertainty about the final class- membership of the pattern under consideration. In that case, the pattern is kept into the category of rejection. Step 10: Repeat Step 5 to Step 9 for j = 2, ---, mN. 5. Experimentation and Results: As discussed earlier, we have experimented with the problem of identification among the encrypted bit streams of scenes, speech and the text respectively. To deal with this three-class problem, we have first computed different suitable representations from these bit streams. Each representation is a set of vectors computed from the bit streams. Various techniques have been applied to classify the patterns for each 103 representation. Here, we are showing the classification results by two classifiers namely, maximum likelihood classifier and minimum distance classifier for each individual representation of patterns. And finally we have shown the results obtained by proposed fusion approach. . Maximum likelihood classifier: In the Tables 5.1(a) to 5.1(d), we have shown the percentage self-classification given by the maximum likelihood classifier for the four different representations of the same set of patterns. Notation used for each representation can be understood with the help of table 3. We have taken 150 patterns for learning of the classifier from each of the class. Table 5.2 shows the results obtained by fusion of classification results in individual representations. In Table 5.2, we have included the percentage of patterns, which cannot be allocated to any class due to uncertainty in deciding the final class membership. Representation: ‘5na’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 82.67 10 7.33 Encrypted Speech 8 84.67 7.33 Encrypted Text 8.67 13.33 78 Table 5.1(a) Representation: ‘5oa’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 82.67 9.33 10 Encrypted Speech 16.67 80 3.33 Encrypted Text 13.33 8.67 78 Table 5.1(b) Representation: ‘7na’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 98 0 2 Encrypted Speech 1 97 2 Encrypted Text 0.67 0 99.33 Table 5.1(c) 104 Representation: ‘7oa’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 97.33 1.33 1.33 Encrypted Speech 1 97 2 Encrypted Text 0 4.67 95.33 Table 5.1(d) Proposed Approach % Classification Encrypted Scene Encrypted Speech Encrypted Text Rejecte d Encrypted Scene 96 0 0 4 Encrypted Speech 0 98.67 0 1.33 Encrypted Text 0 0 96 4 Table 5.2 In Tables 5.1(a) to 5.1(d), we observe some wrong classifications i.e. the percentage of patterns, which are misclassified to other classes to whom they do not belong actually. But in Table 5.2, we can see that there are no wrong classifications among classes, though we have some rejections here. This means that misclassification occurred in case of individual representations is somewhat corrected by our approach of fusion. And the patterns, which cannot be still correctly classified due to lack of consensus, are shifted to rejection category. Knowing that a misclassification is costly than a rejection, we found our classification approach to be advantageous. This phenomenon is illustrated in the Graph(1) displayed next. In the graph, three series are plotted to show the percentage of number of misclassified patterns in each of the three classes. In each series, the classification results obtained in individual representations and by proposed fusion approach are compared. It is clear from the graph that using the proposed approach of fusion, we get a decrease to zero in percentage of misclassification in each of the class. 105 G r a p h ( 1 ) : C o m p a r i s i o n o f % M i s c l a s s i f i c a t i o n b y M a x i m u m L i k e l i h o d C l a s s i f i e r 0 5 1 0 1 5 2 0 2 5 5 n a 5 o a 6 n a 6 o a P r o p o s e d s c h e m e R e p r e s e n t a t i o n s % M is cl as si fic at io n E n c r y p t e d S c e n e E n c r y p t e d S p e e c h E n c r y p t e d T e x t Minimum Distance Classifier: The percentage self-classification for different representation of patterns with minimum distance classifier has been summarized in Table 5.3(a) to 5.3(f). After fusing the classification results in these six representations, we get improved results as shown in Table 5.4. Here also, we observe that by using fusion there is a great decrement in number of misclassified patterns, in each of the class. The patterns, which cannot be allocated to any class due to a tie of votes, are kept in rejection category Representation: ‘5oa’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 34 31.33 34.67 Encrypted Speech 27.33 48 24.67 Encrypted Text 25.33 30.67 44 Table 5.3(a) 106 Representation: ‘5na’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 55.33 18 26.67 Encrypted Speech 28.67 50 21.33 Encrypted Text 28.66 22 49.33 Table 5.3(b) Representation: ‘7na’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 52.67 24 23.33 Encrypted Speech 20 60 20 Encrypted Text 26 22 52 Table 5.3(c) Representation: ‘7na’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 66.67 20.67 12.67 Encrypted Speech 21.33 55.33 23.33 Encrypted Text 20.67 20 59.33 Table 5.3(d) Representation: ‘5np’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 51.33 25.33 23.33 Encrypted Speech 24.67 47.33 28 Encrypted Text 28 26.67 45.33 Table 5.3(e) Representation: ‘7np’ % Classification Encrypted Scene Encrypted Speech Encrypted Text Encrypted Scene 66.67 16.67 16.67 Encrypted Speech 16.67 64.67 18.67 Encrypted Text 20 16 64 Table 5.3(f) 107 Proposed Approach % Classification Encrypted Scene Encrypted Speech Encrypted Text Rejected Encrypted Scene 62.67 12.67 12 12.67 Encrypted Speech 8 60 11.33 20.67 Encrypted Text 14.67 10.67 59.33 15.33 Table 5.4 With minimum distance classifier, we are able to get more than 55% classification consistently for each class. These results of classification may not be very high for the practical application, but this consistency is extremely useful from a cryptanalysis point of view. The Graph(2) plotted to compare the results by minimum distance classifier, in different representations and those obtained by fusion, presents a similar trend as shown by maximum likelihood classifier. As compared to individual representations, the proposed fusion approach gives the least number of wrongly classified patterns. 108 Graaph(2): Comparison of % Misclassification by Minimum Distance Classifier 0 10 20 30 40 50 60 70 5oa 5na 6na 7na 5np 7np Proposed Approach Representations % M is cl as si fic at io n Encrypted Scene Encrypted Speech Encrypted Text Using both the classifiers, we have tested several sets of patterns from each class, and the test-classification is also found to be quite encouraging. 6. Observations and Conclusion: The experimentation done for the present work has given us enough idea about handling the problem of discrimination among various random sources. The proposed idea is quite general in nature and can be applied to other kind of classification problem as well, if it is possible to compute different measurements for the same set of patterns. Also, experimentation can be done for any number of classes as described in Algorithm, instead of restricting to a three class problem. According to the nature of underlying problem and knowledge of significant features, different measurements may be computed to get different representations of 109 patterns. For our problem, we adhered to the most suitable representations of patterns where the classification is more transparent. Dealing with the said problem, the following observations and constraints are found to be important: 1. While assigning class membership to a pattern in each of the representation space, the classifier has three possibilities. Either the pattern will be correctly classified, wrongly classified or the classifier will remain uncertain about the class membership of the pattern. The third possibility of uncertainty of decision arises due to tie between values of discriminating function for the possible classes. This situation of neutral position of the classifier leads to no decision or rejection, i.e. classification is neither correct, nor wrong. Here, for convenience, we have considered only those representation spaces in which classifier had only two alternatives, of being correct or wrong and no rejections. Again, while taking the final decision by fusion as proposed, there may be cases of no consensus. This situation of uncertainty in deciding final class membership of a pattern leads to a rejection. We have kept these patterns in a separate category. 2. After applying proposed fusion, it has been observed that there are no wrong classifications with maximum likelihood classifier, though there are few patterns, which cannot be allocated to any class and have been kept in no decision category. Minimum distance classifier shows the similar trend with less wrong classifications by fusion as compared to those obtained in individual representations. Here also, the patterns about which the classifier is not certain are kept in rejected category. For both the classifiers, graphs have also been plotted to compare the percentage misclassification of patterns, in individual representations and after fusion by proposed approach. It is clear from the graphs by using the proposed fusion approach that we are getting reduced percentage of misclassification, which is the merit of our approach. 110 3. Final results, obtained by fusion by proposed approach, are better than the results obtained by using single representation spaces. 4. As we have discussed earlier, each of the representation space has dimension as np, p = 1, 2, ---, r. The general observation is that we obtain consistently better performance when the size of learning set is more than 5x np. Acknowledgement We would like to express our sincere gratitude and deep veneration to DR. P K Saxena, Director SAG and Dr. Laxmi Narain Sc. ‘F’ for giving us this opportunity to carry out the present work. We are also thankful to Ms. Neelam Verma, Sc. ‘E’ for her constructive suggestion made during the preparation this paper. References: [1]. Lam, L., Suen, C. Y., 1997. Application of majority voting to pattern recognition: an analysis of its behaviour and performance. IEEE transactions on Systems, Man, and Cybernetics 27(5), 553-568. [2]. Kittler, J., Hatef, M., Duin, R., Matas, J., 1998. On combining classifiers. IEEE Trans. PAMI 20(3), 226-239. [3]. Alkoot, F. M., Kittler, J., 1999. Experimental evaluation of expert fusion strategies. Pattern Recognition Letters 20, 1361-1369. [4]. Kuncheva, L., Bazdek, J., Duin, R., 2001. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition Letters 34(2), 299-314. [5]. Chen, D., Cheng, X., 2001. An asymptotic analysis of some expert fusion methods. Pattern Recognition Letters 22, 901-904. 111 [6]. Alexandre, Luis A., Campilho, Aurelio C., Kamel, M., 2001. On combining classifiers using sum and product rules. Pattern Recognition Letters 22, 1283-1289. [7]. Kuncheva, Ludmilla I., 2002. A theoretical study on six classifier fusion strategies. IEEE Trans. PAMI 24(2), 281-286. [8]. Lee, D. S., Srihari, S. N., 1993. Handprinted digit recognition: A comparison of algorithms. In Proc. 3rd Int. Workshop Frontiers Handwriting Recognition. Buffalo, NY, pp. 153-162. [9]. Geffe, P. R., 1973. How to protect data with ciphers that are really hard to break. Electronics, 46(1), 99-101. [10]. Rueppel, R. A., 1986. Analysis & design of stream ciphers. Springer- Verlag. [11]. Schneier, B., 1996. Applied cryptography, Second Edition John Wiley & Sons, Inc. [12]. Kumar, I. J., 1997. Cryptology: System identification and key clustering. Agean Park Press, CA, USA. [13]. Menezes, A. J., Van Oorschot, P. C., Vanstone, S. A., 1997. Handbook applied cryptography. CRC Press, Boca Raton. [14]. Tou, J. T., Gonzalez, R. C., 1974. Pattern recognition principles. Addison-Wesley Publishing Company. [15]. Bow, Sing-Tze, 1984. Pattern recognition: Application to large data- set problems. Marcel Dekker, Inc., New York & Basel. [16]. Kant, S., Sharma, V., 2000. Discrimination among various type of encrypted bitstream. International Conference on Quality Reliability and Information Technology, 21-23 Dec, New Delhi.