International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol 16 No 21 (2022) Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms Improving the Security of Split Data When Using Multidimensional Parity Algorithms https://doi.org/10.3991/ijim.v16i21.36075 Tlek Akhmetgalym(*), Gulzira Mukasheva, Karipzhanova Ardak Zhumagazievna, Aukenov Bolat Mayzhanuly, Urazbaeva Kumys Toleubekovna Alikhan Bokeikhan University, Semey, Kazakhstan tlek.akhmetgalym@mail.ru Abstract—A model of distributed storage split data using algorithms, multidimensional parity, resistance to partial losses of the storage sites, is regarded as an alternative way of security that can replace the conventional multiple reservations and carrying costs of growth of the physical volume. In this model, the generated redundant data allows you to restore partial loss of the split parts. In this case, the recovery is performed at the expense of the parity files formed during the splitting procedure, in which the main action is the calculation of bitwise parity with summation modulo "two". With this additional operation between the source files, you can restore the corrupted file without distortion. A comparison with several models that use checksum codes for recovery shows that the estimated probability of losses due to storage failure in the case of multidimensional parity codes is significantly less than when using other split data storage options. Using the operation of adding the bits of undamaged files, you can completely restore corrupted files. According to the calculated data, even when theoretically non-recoverable losses are reached, it is always possible to find combinations of chain recovery. The complexity of alternative methods, including the iterative decoding method, makes a distributed split data storage system using multidimensional parity algorithms a more convenient error correction technology. Keywords—security, distributed storage, data splitting, multidimensional parity, loss recovery, parity files 1 Introduction The search for new methods of ensuring IT security is ongoing, but so far, replica- tion (multiple redundancies) is widely used to save data [1] [2]. The constant increase in the volume of information leads to the fact that replication turns into an increase in hardware, energy and as a result, financial costs [3]. You can try to avoid expensive backup storage by performing operational information recovery (error correction). In particular, the authors of this article tested the possibility of storing files in distributed databases in a split state, when splitting files and restoring them is carried out using proprietary multidimensional parity algorithms that are resistant to a partial loss of 168 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms storage locations [4-6] (patents: GB2467989 «Distributed Storage», UK; GB2463078 «Distributed Storage», UK; GB2463085 «Communication System», UK; GB2463087 «Data Storage», UK; GB2492981 «Data Storage», UK; HO1175001 «Data Storage», Hong Kong; US9026844 «Distributed Data Storage and Communication», USA). The frontend of the system is implemented on web technologies and uses a simple hypertext link protocol HTTP in conjunction with interactive technologies on AJAX/PHP (WEB 2.0). The basic content management system of the web part is implemented on CMS Word Press. The site is located on the dedicated server of PS Internet Company LLP, under the Linux Ubuntu Server operating system and the Apache 2.25 c PHP 7.0 web server. MySQL 5.0 is used as the database [7]. Due to proprietary algorithms, redundant data is generated that allows you to re- cover partial losses of split parts [8][9]. The split data does not carry meaningful content, and therefore it can be stored in any available places without fear of unauthorized access. Even after collecting all the parts of the split data, it is impossible to recover the information without knowing the splitting method. The splitting algorithm can use an infinite number of splitting methods, and therefore only the owner/creator of the information has access to the data [10][11]. At the same time, in the case of partial data loss, their recovery is performed at the expense of parity files formed during the splitting procedure, in which the main action is the calculation of bitwise parity with summa-tion modulo "two". Using this addition operation between the source files, you can restore a damaged (lost) file without distortion. 2 Methods Parity algorithms operate on binary information and the summation that we use when splitting the data is reduced to the definition of parity [12]. Moreover, the summation is carried out strictly positionally ("bitwise") and modulo. It turns out that the third file in which to store the bits are summarized. The summation is performed modulo "two", namely: the mathematical operation "exclusive OR" is included. If, for example, the first file has zero and the second file has zero, then the third file will have zero. If the first one is zero, and the second one is one, then the third one will be one. If the first one is one, and the second one is zero, then the third one will also be one. If both there and here are one, then in the third file it turns out to be zero. This third file is the parity file, and the whole operation is a parity operation: two files make three. If you delete any of these files, then by performing the addition operation between the remaining two, you can always restore the third file without distortion. This idea, based on the properties of parity, is the basis for solving problems of distributed information storage with data splitting using multidimensional parity algorithms [13]. So far, the parity property has only been used to detect file corruption and determine what information is lost, nothing more (RAID). In such systems, the information is divided into identical blocks, then the parity is calculated, which is written on a separate file containing the parity bits. When, for example, the parity bit is lost when transmitting data, the parity bit will not match when adding data. However, this result in RAID is only needed to delete the file if it is found to be corrupted. In case iJIM ‒ Vol. 16, No. 21, 2022 169 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms of data loss during transmission, a second request is made. The lost file is then restored from the backup locations (replication method). Meanwhile, in the method of multidimensional parity that we tested; the lost information is restored with a calculated probability. As mentioned above, if you divide a file (data carrier) into two identical parts, calculate the bitwise parity, and write the result to a third file, you can always recover the loss (or damage, if you know which file is damaged) of any of the three files. Consider, for example, the case of three files, when one of them is lost. Bitwise, we add the remaining two files modulo. The resulting parity file is exactly equal to the lost file. This is the case of one-dimensional parity. If we now divide the file into four identical parts, arrange them in the form of a square, and calculate the bitwise parity in rows and columns, we get 4+4+1=9 files. This is the case of two-dimensional parity. In the case of two-dimensional parity, it is obvious that we can lose any three files out of nine, which is equivalent to losing files at one of the coordinates, but then restore all the original information. To show this, consider for simplicity the case of writing one bit at a time (the information is stored on 9 disks). The result of the summation is shown in Table 1. Table 1. The result of the parity summation for 9 disks In such a system with two-dimensional parity, 3 disks out of 9 can fail at the same time, and the data can always be restored since bits can be restored at two coordinates. And this is equivalent to a 3-fold copy during replication. Moreover, it is possible to lose even 4 disks out of 9, and it will still be possible to recover the information with a 92% probability. You can lose even up to 5 disks and recover with a 67% probability [14]. In a system with two-dimensional parity, of course, in reality, more disks and a large amount of data are needed, because in this version there should be more parity data. But, in a two-dimensional parity, the redundancy is 2.25 (data is written to 4 disks and additionally to 5 disks: parity 9/4=2.25). If we compare it with replication, the two-dimensional parity redundancy is almost the same as with double replication with 2.0 redundancy, when two copies are written. And in a system with two-dimensional parity, you can lose from 3 to 5 disks, i.e., reliability-as with 3-5-fold replication. 1st disk 1 2nd disk 0 7th disk, parity on the 1st line 1 3rd disk 0 4th disk 1 8th disk, parity on the 2nd line 1 5th disk, parity, and 1-th column 1 6th disk, parity on the 2nd column 1 9th disk, parity of parity 0 170 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms 3 Related work Today, the standard way to ensure security is the method of multiple replications. Besides, to ensure the security of information, methods based on the Reed-Solomon algorithm [15], fountain technologies [16], as well as using checksums in the RAID5/6 system, in codes with local parity LRC [17] were tested. Within the framework of the model of correcting codes, the theory of channels with erasures were developed, which was proposed by V. Peterson [18]. The model of channels with erasures was subsequently developed in the work of M. Labi [19]. 3.1 Method of replication The most widely used correction algorithms have found simple and fast algorithms for parity. In particular, they are used with different efficiency in common RAID systems of fault-tolerant disk arrays for servers [20]. But, on the other hand, high-speed and simple parity algorithms in conventional applications do not allow you to recover multiple data losses. For example, a common variant of RAID5, when a single disk fails, goes into the rebuild state, in which intensive reading begins from all the disks in the array to restore the data of the failed disk. If the disk sizes are on the order of several gigabytes, this does not cause problems, then with an increase in the volume of disks to terabytes, the process can take a significant time, during which any failure of another disk leads to a complete loss of all data. Problems with the reliability of standard RAID arrays force the use of more complex RAID configurations with a combination of different schemes, such as RAID6, to apply additional mirroring [21]. Commercial proprietary RAID variants are known, such as HP EVA with its vRAID technology, RAID m + n using Reed-Solomon erasure codes. But all this leads to an even greater increase in the cost of the technology.Therefore, RAID arrays often allocate a single disk specifically for writing parity, which can be used to restore data from a broken disk. To explain the meaning of parity recovery, consider, for example, the case of a RAID of 5 disks. For simplicity, let's write four bytes to four disks: 1, 2, 3, and 4. The result of the parity summation is written to the 5th disk (Table 2). Table 2. The result of the parity summation in the RAID system 1 disk 1 1 1 0 1 0 1 0 2 disk 0 0 1 0 1 1 1 0 3 disk 0 1 0 1 1 1 0 1 4 disk 1 1 1 0 0 1 0 1 5 disk – parity 0 1 1 1 1 1 0 0 The sum of the parity calculation is made by the "exclusive OR" of the bits between the different drives. It is also called "addition modulo 2". At the same time 0^0=0, 0^1=1, 1^0=1, 1^1=0. The method is called parity summation because the result must always be an even number of bits along with the calculated one. iJIM ‒ Vol. 16, No. 21, 2022 171 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms The table shows that if an even number of units is added, the calculated parity bit must be zero: the parity does not change. For example, in the first column of bits: 1 disk = 1; 2 disk = 0; 3 disk = 0; 4 disk = 1. Two units, 0 is written to the 5th disk. If the number of bits is odd, then the parity bit will be 1, thus turning the total number of ones into an even number. For example, in the second column: 1 disk = 1; 2 disk = 0; 3 disk = 1; 4 disk = 1. Three units, 1 is written to the 5th disk. Then we get 4 units, i.e. we "achieve" an even number of 4. If any of the five disks now break, you can always restore the missing data by writing "1" to the new disk to even if the number of units on the remaining 4 disks is odd, and writing " 0 " if the number of units on the remaining 4 disks is even. But still, in modern RAID arrays, they prefer to refuse to allocate a disk for writing parity [20] Why? In RAID systems that use a parity disk, you can recover data if only one disk in the array fails. The disk is simply changed to a new one, and data recovery begins. This process is called rebuild, and data is read simultaneously from all the remaining disks to calculate the parity. If another disk fails during the rebuild, then all the data can be lost completely, since it is no longer possible to restore anything [22]. The longer the rebuild process takes, the higher the risk of double failure. At current volumes, when they reach terabytes or more, this can last for several days. There may be a situation with the presence of the so-called "rotten" bit (bad bit), which is one of the factory parameters of hard drives. In this case, which manifests itself precisely with large amounts of storage, the risk of data loss in RAID becomes unacceptably high. Therefore, in modern multi-disk Big Data storage systems, RAID arrays are abandoned, and the only de facto method of data protection is multiple copying (replication). Moreover, replication occurs not only between nodes but also within nodes. Even inside the DATA center, you have to do replication from one server to another [22]. 3.2 Correction codes with a checksum The most common and simplest method of recovering losses is to use checksum codes [12]. Let there be a data set D = {d0, d1, … dn}, divided into blocks of the same size. If we know the amount of CD = d0 + d1 +… di …+ dn, then any loss of the block di you can restore it by recalculating the amount of the remaining data C'D, the difference of which with the sum file will be the missing block di = CD – C'D. When adding, the summation operation is used modulo, since for control, you only need to know the differences in the data blocks, i.e. in general, the amount of data required for controlling the sums does not exceed the block size. In binary logic, the calculation of checksums is reduced to summation modulo 2, which is implemented by a single operation that excludes OR (XOR) and turns into parity control (we use the term "parity control" since we describe technologies related to the processing of binary data). When applied to multi-disk systems, this technology was first implemented in RAID systems [22]. In this technology (in particular, in RAID 3/4/5/6), data is divided into blocks, parity is calculated, and data blocks with a parity block are scattered across different disks in the array. Such an array can recover single disk failures. When storing 172 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms large amounts of data, RAID cannot provide an acceptable level of reliability. Recovery after a failure (recovery of data from a failed disk and writing it to a healthy new disk) with increased volumes begin to take an unacceptably long time since it is necessary to read data from all disks for calculations [13]. In the process of recovery (Rebuild), the storage system is not protected from repeated failures, and in the event of a failure of another disk, all data is lost irretrievably. On the other hand, RAID arrays are very expensive devices, because the main area of their application, for which everything is optimized – is high-speed devices for online storage and data exchange [23]. The device's RAID controller can simultaneously read and write across all disks, increasing the overall speed of data exchange. For high- capacity storage, specialized systems are being developed that use relatively slow but high-capacity disks, combined with distributed storage technology that is resistant to disk failures. Such storage systems are also called RAID, although they do not use "proprietary" parity technologies. There are various designations of multi-disk storage technologies, both open and proprietary, for example, RAID 7.3 or RAID m+n, which have nothing to do with traditional RAID systems and refer us to the original meaning of the abbreviation RAID-Redundant Array of Inexpensive Disks – - a redundant array of inexpensive disks [24]. 3.3 Application of Reed-Solomon codes The high redundancy of replication, the high cost and low reliability of RAID systems make it necessary to develop storage technologies using more complex methods. Various systems are known, usually referred to as RAID n+m with the Reed- Solomon noise-tolerant Data encoding (RS) algorithm. The most well–known is RAID 7.3 from RAIDIX of St. Petersburg, a storage system from IBM based on the developments of Cleversafe, purchased by IBM. The RS algorithm allows you to set the required fault tolerance in the form of the number of blocks whose loss can be restored. The PC can detect damaged or lost blocks and calculate the location of blocks that need to be restored. In theory, the algorithm can detect t errors and recover information using redundant data. The total number of blocks will then be n+2t, where n is the number of data blocks. The PC method most economically increases the total amount of stored data; the redundancy is almost close to the theoretical limit [25][26]. However, the algorithm is resource-intensive, because it uses complex calculations, and intensive information exchange with the storage system, it can greatly increase the latency of the system. Also, the algorithm does not scale well, any increase in storage locations requires a complete recalculation of the data of the entire array since the PC has a rigid algebraic structure. In the modern multi-drive storage with a high failure rate of a high risk of data loss at the slightest accidental exceeding of the threshold of failure, because fault tolerance is RS threshold character – the slightest excess leads to complete failure of the entire repository. Another feature of the algorithm, when applied to distributed storage, that causes big problems is that the PC requires data from all the disks in the array for calculations, and spikes in traffic in the storage network will greatly affect the availability of data. iJIM ‒ Vol. 16, No. 21, 2022 173 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms There are various methods of noise-tolerant coding, called multidimensional error correction codes, used in noise-tolerant coding [27]. We have developed a simple, non- resource-intensive algorithm with reliability comparable to that of a PC, but free from its disadvantages. The redundancy in this case, although higher than that of the PC, is much less than when using replications [14][28]. 4 Results 4.1 One-dimensional parity We divide the data into two blocks of equal size and calculate the parity: (3.1) which denotes the operation XOR, implementing bitwise addition modulo 2? Note that, based on the properties of the XOR operation, the resulting data blocks have the property of restoring any missing part. We show that, for example, Perform the operation with both sides of equality. Given the involution of the operation, we reduce, and we get the correct equality (3.1). (3.2) Similarly, we prove the equality using the operation: (3.3) From here we have a triple of relations, from which we can recover the loss of any data block from the triple: (3.4) The resulting data triple can be represented as a one-dimensional vector at the x coordinate, the data which is related by parity relations: (3.5) where calculated modulo 3. This means that in the parity formula (3.5) 0–1=3 and 3+1=0. 4.2 Two-dimensional parity Let us split the original data into four blocks of the same size. Let us write the blocks in a two-dimensional 2x2 matrix. Convert the matrix to 3x3 by adding an empty column 174 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms to the right and an empty row to the bottom of the original matrix. We will calculate the checksums for the rows and columns and write the results in empty spaces. (3.6) The resulting data matrix now allows you to recover any data loss in two ways-both by columns and by rows. The resulting matrix demonstrates the two-dimensional parity of the Multidimensional parity-check code [24]. Let us complete the matrix by calculating the parity values of the parity values. We get two equalities – and, where, recall, denotes the operation XOR, implementing bitwise addition modulo 2. We prove that. If you expand the parity values and take into account, the commutativity of the XOR operation: (3.7) we get. The resulting 3x3 two-dimensional matrix will be a matrix with all the data in the columns and rows connected by taking an XOR operation between adjacent data pairs. In this case, each data block can be obtained in two ways – by XOR operation, both by rows and columns: (3.8) where the index values are calculated modulo 3, i.e., they are the residue ring. These relations can be interpreted as two-dimensional parity in the x, y coordinates corresponding to one-dimensional vectors of rows and columns. 4.3 Three-dimensional parity According to the described algorithm, you can get a three-dimensional matrix by splitting the data into 8 blocks of the same size and generating parity data already on three coordinates: (3.9) with data blocks that are already linked by three relationships: (3.10) If the two-dimensional matrix described above consists of one-dimensional (3.5), then the three-dimensional matrix, respectively, consists of sets of three planes of two- dimensional matrices (3.9) arranged sequentially under each other. The third matrix- the plane with the coordinate z=2-is formed from the calculated data of the parity of the upper two planes. In general, a data block that has a value of any index equal to 2 is a parity block. iJIM ‒ Vol. 16, No. 21, 2022 175 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms Thus, we get a 2x2x2 cube with the original data, the block indexes of which have the values 0 or 1, and three boundary planes on which the parity data is located, one of the index values of which is 2. 4.4 N-dimensional parity The described model of data splitting into blocks and the method of generating parity blocks can be extended to any n-dimensional parity. In this case, the original data will be split into 2n blocks, after processing by the parity generation algorithm, we will get 3n blocks, where n is the parity dimension. The resulting blocks will be linked by parity relations: (3.11) Basic properties of multidimensional parity: 1. As the parity dimension increases, so does the number of ways to recover lost data blocks. In n-dimensional space, the main property of a one-dimensional parity vector is that restoring the loss of one of the three blocks with the help of the remaining two turns into cross-parity control in each of the n directions. 2. With increasing dimensionality, the options for not only cross-parity control but also additional options for chain recovery are growing simultaneously. With the growth of the dimension, additional recovery options appear, when the "missing" data for any of the coordinates (in the case of a conditionally fatal loss for this file) can, in turn, be restored using other files, with subsequent recovery, at the next step, of the required file. 4.5 Example of chained recovery Suppose that 5 blocks of data are lost in a two-dimensional matrix. For a file this is a conditionally fatal case since it is impossible to restore it from any of the coordinates: (3.12) But there are two options for chained recovery: or -in four steps (3.13) The presence of chained recovery options means that multidimensional parity codes do not have a threshold for the probability of data recovery. That is, even when theoretically non-recoverable losses are reached, there is always a chance to find combinations of chain recovery. 176 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms In a two-dimensional parity matrix of 9 files, each file is linked by parity relations to two neighboring files and can be restored using them. This means that: a) losses from 1 to 3 of any file storage locations are always guaranteed to be recovered. b) losses from 4 to 5 places have chain recovery options, so there is a non-zero probability of recovery. c) of course, the fatal scenario of file loss is the loss of 6 files out of 9. 4.6 Combinations of failures, storage locations, resulting in a loss of data Denote the fatal combination of DLkn losses, when the original file cannot be restored when k storage locations fail out of n. For example, for a two-dimensional matrix, if you lose 4 storage locations out of 9, the list of fatal combinations will look like this: (3.14) The total number of possible combinations of k places from n is equal to the combinations of n by k [27]: (3.15) The loss of four blocks has no chain recovery options if the coordinates of the failed storage locations are located at the vertices of a two-dimensional matrix in such a way that paired losses are formed over the intersecting coordinates of the files. For each combination of 2 out of 3, there will be 2 combinations of 2 out of 3. Total combinations: DL4,9 = 9. The loss of 5 blocks has no chain recovery options if the coordinates of the storage locations are arranged in such a way that any four of them form the combination described above. Given that there are five free positions, we get that there is a total of combinations: DL5,9 = DL4,9. 5 = 45. We calculate the combinatorial probability of losses in the event of storage failure. Suppose that 4 disks out of 9 randomly fail. What is the probability that we will get a combination of DL4,9, leading to data loss? Q4,9probability of data loss when four storage locations fail: (3.18) Now let us assume that 5 disks randomly fail. What is the probability that we will get a combination of DL5,9 leading to data loss? The probability of Q4,9 data loss when five storage locations fail: (3.19) The probability of Q6,9, Q7,9, Q8,9, Q9,9 data loss with a loss of 6 storage locations and higher is 1.0. That is, such storage losses cannot be restored, and they are fatal. iJIM ‒ Vol. 16, No. 21, 2022 177 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms The reliability of multi-disk systems decreases with the increase in the number of disks in the array. Using a statistical model of the probability of disk failures, it can be stated that the frequency of disk failure λ in an array of n disks increases almost proportionally by n times [14]. The probability of a Qdev disk failure during time t will be: (3.20) For an array of n disks, the probability of a Qnarr disk failure will be: (3.21) The parameter λ is responsible for the factory characteristic of the disk quality MTBF (Mean time between failures) – the conditional number of hours of disk operation before the first failure. In this case, we are interested in the probability of simultaneous failure disks. There will be such failures in an array of n disks, but only a fatal scenario will lead to the failure of the storage system DLk, n. This means that we have a joint probability of failure of k disks out of n with the probability of Qk, an occurrence of a fatal scenario. Considering the probability of failure of k disks as both random and independent events, we get the probability of failure of the storage system. (3.22) 4.7 Probability of failure of a two-dimensional parity storage system Here is an approximate calculation of the probability of failure of a storage system with two-dimensional parity, using typical hard drives with MTBF of ≈ 800 thousand hours (Mean Time Between failures or time between failures). The probability of a disk failure within a year with MTBF will be: (3.23) Where λ = 1/MTBF, and t = 1 year (8760 hours). Disks with this MTBF have a 1.09% chance of failure within a year. The probability of failure of any disk within a year in an array of 9 disks, respectively, will already be 9.38%: (3.24) The probability of storage failure when 4 disks fail, i.e., getting a fatal scenario will be 0.00055273%: (3.25) The probability of a storage failure with a failure of 5 disks will be 0,0002593%: (3.26) 178 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms 5 Discussion The redundancy of a storage system that uses multidimensional parity technology to increase the reliability of storage in multi-disk arrays is determined by the ratio of the original files to the total number of files, along with the redundancy [14]. As mentioned in the discussion of the properties of multidimensional parity, it is equal to the ratio R = (3/2)n, where n is the parity dimension. For storage systems with two-dimensional parity, R = 2.25, which is almost equivalent to double replication or mirroring. The probability of failure of a storage system with mirroring to two disks will be equal to the probability of failure of two disks at the same time. Given the formula (3.21) for the probability of disk failure in the array and considering the failure events to be independent and joint, we get the probability of failure 2.166%: (3.28) In storage systems with two-dimensional parity technology, with comparable redundancy, we get failure probabilities from 0.00055273% to 0.0002593%. Thus, the estimated probability of losses in the event of storage failure when using multidimensional parity codes is significantly less than when using other storage options[29]. In principle, multidimensional parity codes do not have a threshold value for the probability of data recovery. That is, even when theoretically non-recoverable losses are reached, there is always a chance to find combinations of chain recovery. Of course, the reliability of multi-disk systems decreases with the increase in the number of disks in the array. But the redundancy of storage systems that use multidimensional parity technology to increase the reliability of storage in multi-disk arrays is theoretically equivalent to multiple replications [30]. 6 Conclusion The method of storing and transferring files without using encryption, but only by splitting files according to multidimensional parity algorithms, ensures confidentiality and guarantees the security of the content from unauthorized access. The fundamental difference between this technology and existing security methods is the ability to implement an internally consistent and up-to-date model of stored and processed data with a high degree of protection against external intrusion . First, the system does not allow you to use the information in case of unauthorized access. Split data itself does not carry meaningful information. The second aspect is related to the fact that the recovery procedure can be performed using only a part of the split data, that is, the storage locations are regenerated . At the same time, unlike the backup method (multiple copies), the system is designed so that the more storage locations are used, the more information is split. Fragments of information are stored in different places. Anyone who wants to get information will have to collect fragments from all storage locations. But even if he collects what is split, he must know how to connect the parts. iJIM ‒ Vol. 16, No. 21, 2022 179 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms The method of splitting/restoring is written in a meta-file that provides access to the information. There is a key in the encryption and a metafile in our method. This is the file that describes the splitting method. Only with its help can you collect what is split. The ultimate goal of the technology involves the operation of a database in which private users, companies and organizations can store their files, being sure of their security. 7 References [1] Keser, H., & Semerci, A. (2019). Technology trends, Education 4.0 and beyond. Contemporary Educational Research Journal, 9(3), 39–49. https://doi.org/10.18844/ cerj.v9i3.4269 [2] Bagel, S., Kok, A., Zubeida, A., Suleimenova, Z., Riskulbekova, A., & Uaidullakyzy, E. (2019). Teaching Primary School Pupils Through Audio-Visual Means. International Journal of Emerging Technologies in Learning (iJET), 14(22), 122-140. https://doi.org/ 10.3991/ijet.v14i22.11760 [3] Elsayed, M., & Salama, R. (2020). Educational games for miss-concentration students (ADHD students). International Journal of Innovative Research in Education, 7(1). https://doi.org/10.18844/ijire.v7i1.4762 [4] Syrgabekov I., Sagauli E., Kurmanbaev E. Protection of databases by the method of distributed storage // Reports of the National Academy of Sciences of the Republic of Kazakhstan. – No. 5. - 2014. - pp. 141-153. [5] Zadauly E., Kurmanbayev E., Syrgabekov I. Innovative security system based on distributed information storage with data splitting / / Patriot Engineering. – №2 (7). – 2015. – Pp. 111- 119. [6] Karipzhanova A. Zh., Gudov A.M. Organization of distributed databases of information systems by the method of data splitting / / Materials of the XIV (XLV) International Scientific Conference of Students and Young Scientists "Education, Science, Innovation: the contribution of young researchers", Kemerovo, Russia, April 25, 2019 – Kemerovo, 2019. [7] Kurmanbaev E.A., Syrgabekov I. N., Zadauly E. Karipzhanova A.Zh., Urazbaeva K.T. Information Security System based on the Distributed Storage with Splitting of Data // International Journal of Applied Engineering Research. – 2017. – Vol. 12. – № 8. – pp. 1703-1711. [8] Ãœlker, E. D. (2020). The effect of applying 4-stages on learning analysis and design of algorithms. Cypriot Journal of Educational Sciences, 15(5), 1238–1248. https://doi.org/ 10.18844/cjes.v15i5.4621 [9] Taygan, U., & Ozsoy, A. (2020). Performance analysis and GPU parallelisation of ECO object tracking algorithm. New Trends and Issues Proceedings on Advances in Pure and Applied Sciences, (12), 109–118. https://doi.org/10.18844/gjpaas.v0i12.4991 [10] Tekkanat, E., & Topaloglu, M. (2018). Developing java design patterns modeller with object-oriented programming. Global Journal of Computer Sciences: Theory and Research, 8(3), 132–135. https://doi.org/10.18844/gjcs.v8i3.4024 [11] Bhuyan, M. H., & Tamir, A. (2020). Evaluating COs of computer programming course for OBE-based BSc in EEE program. International Journal of Learning and Teaching, 12(2), 86–99. https://doi.org/10.18844/ijlt.v12i2.4576 [12] Kurmanbaev E.A., Syrgabekov I. N., Zadauly E. Karipzhanova A.Zh., Urazbaeva K.T. Information Security System based on the Distributed Storage with Splitting of Data // 180 http://www.i-jim.org Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms International Journal of Applied Engineering Research. – 2017. – Vol. 12. – № 8. – pp. 1703-1711. [13] Kaseb, M. R., Khafagy, M. H., Ali, I. A., & Saad, E. M. (2018, March). Redundant independent files (RIF): a technique for reducing storage and resources in big data replication. In World Conference on Information Systems and Technologies (pp. 182-193). Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_18 [14] Kariрzhanova A., Sagindykov K., Dimitrov K. Justification of the method and algorithm of multidimensional parity control in distributed databases of information systems // Proc. X National Conference with International Participation «Electronica 2019», May 16-17, 2019, Sofia, Bulgaria. https://doi.org/10.1109/ELECTRONICA.2019.8825600 [15] Plank J.S. Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications. – Tennessee, 2005. [16] Shokrollahi A. Transactions on Information Theory // Raptor Codes. – 2006. – Vol. 52. – P. 2551-2567. https://doi.org/10.1109/TIT.2006.874390 [17] Lee J.H. WEAVER Codes: Highly Fault-Tolerant Erasure Codes for Storage Systems, in FAST-2005: 4th Usenix Conference on File and Storage Technologies, 2005. [18] Peterson W.W., Weldon E.J. Error-Correcting Codes / 2nd edition. – Cambridge, Massachusetts: MIT Press, 1972. [19] Luby M. LT Codes // Proc. of the 43rd Annual IEEE Symp. on Foundations of Computer Science (FOCS), 2002. [20] Farman, A., & Hasnain, M. (2019). Raid Storage Technology: A Survey. i-Manager's Journal on Computer Science, 7(4), 24. https://doi.org/10.26634/jcom.7.4.17269 [21] Huang, W. (2017). Coding for security and reliability in distributed systems (Doctoral dissertation, California Institute of Technology). [22] Rahman, P. A. (2017, February). Analysis of the mean time to data loss of nested disk arrays RAID-01 on basis of a specialized mathematical model. In IOP Conference Series: Materials Science and Engineering (Vol. 177, No. 1, p. 012088). IOP Publishing. https://doi.org/ 10.1088/1757-899X/177/1/012088 [23] Chan, H. H., Li, Y., Lee, P. P., & Xu, Y. (2018). Elastic Parity Logging for SSD RAID Arrays: Design, Analysis, and Implementation. IEEE Transactions on Parallel and Distributed Systems, 29(10), 2241-2253. https://doi.org/10.1109/TPDS.2018.2818171 [24] Patterson D.A., Gibson G., Katz R.H. A Case for Redundant Arrays of Inexpensive Disks (RAID) // Proceed. of the 1988 ACM SIGMOD conf. on Management of Data. – Chicago IL, 1988. – P. 109-116. https://doi.org/10.1145/971701.50214 [25] Karagozlu, D. (2020). Determination of cyber security ensuring behaviours of pre-service teachers. Cypriot Journal of Educational Sciences, 15(6), 1698–1706. https://doi.org/ 10.18844/cjes.v15i6.5327 [26] Benmammar, S. (2020). Teaching English for specific purposes to computer science students with reading difficulties. Global Journal of Foreign Language Teaching, 10(3), 159–166. https://doi.org/10.18844/gjflt.v10i3.5072 [27] Wong J., Shea M., Tan F. Multidimensional Codes. – The Wiley Encyclopedia of Telecommunications, 2016. [28] Belaidi, R., Bendib, B., Ghribi, D., Bouzidi, B., & Larafi, M. M. (2019). A comparative study on conventional and modern maximum power point tracking algorithms applied to photovoltaic systems. World Journal of Environmental Research, 9(2), 29–35. https://doi.org/10.18844/wjer.v9i2.4625 [29] Karipzhanova A. Zh., Sagyndykov K. M. Ways to improve the reliability of information stored in databases. Vestnik KazGUIU. – 2018. – №3(39). – Pp. 265-270. iJIM ‒ Vol. 16, No. 21, 2022 181 Paper—Improving the Security of Split Data When Using Multidimensional Parity Algorithms [30] Le, Q. M. (2019). Shingled Magnetic Recording Disks for Mass Storage Systems (Doctoral dissertation, Santa Clara University). 8 Authors Tlek Akhmetgalym is a 3rd year doctoral student, majoring in computer science (e-mail: tlek.akhmetgalym@mail.ru. ORCID - https://orcid.org/0000-0002-4361-9810. Address: Abay street 107, 071405, Semey city, Kazakhstan). Gulzira Mukasheva is a 3rd year doctoral student, majoring in computer science (e-mail: gulzira_7777@mail.ru. ORCID - https://orcid.org/0000-0001-9766-1371. Address: Abay street 107, 071405, Semey city, Kazakhstan). Karipzhanova Ardak Zhumagazievna PhD, is Dean of the Faculty of Information Technology and Economics, Senior Lecturer of the Department of Information and Technical Sciences (e-mail: kamilakz2001@mail.ru and ORCID - https://orcid.org/ 0000-0002-0113-6132. Address: Abay street 107, 071405, Semey city, Kazakhstan). Aukenov Bolat Mayzhanuly is a PhD Candidate of Physical and Mathematical Sciences, Associate Professor in mathematics, Department of Information and Technical Sciences (e-mail: abm58@mail.ru and ORCID - https://orcid.org/0000- 0002-1350-5660. Address: Abay street 107, 071405, Semey city, Kazakhstan). Urazbaeva Kumys Toleubekovna is a Candidate of Physical and Mathematical Sciences, Associate Professor (e-mail: urazbaeva57@mail.ru and ORCID - https://orcid.org/0000-0002-8296-8394. Address: Abay street 107, 071405, Semey city, Kazakhstan). Article submitted 2022-09-08. Resubmitted 2022-10-11. Final acceptance 2022-10-11. Final version published as submitted by the authors. 182 http://www.i-jim.org