Acta IMEKO, Title ACTA IMEKO ISSN: 2221-870X December 2017, Volume 6, Number 4, 61-67 ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 61 Frequency Domain Identification of Data Loss Models László Sujbert, György Orosz Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary Section: RESEARCH PAPER Keywords: data loss; measurement; FFT; PSD; system identification Citation: László Sujbert, György Orosz, Frequency domain identification of data loss models, Acta IMEKO, vol. 6, no. 4, article 9, December 2017, identifier: IMEKO-ACTA-06 (2017)-04-09 Editor: Alexandru Salceanu, Technical University of Iasi, Romania Received May 5, 2017; In final form July 11, 2017; Published December, 2017 Copyright: © 2017 IMEKO. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Corresponding author: L. Sujbert, e-mail: sujbert@mit.bme.hu 1. INTRODUCTION Nowadays measurement data transfer is frequently carried out in sensor networks or on the Internet. In this case data can be corrupted or the transmission medium can be partially damaged, etc. [1], [2]. In the era of Internet of Things (IoT) the significance of the problem culminates. Data loss modeling has a great interest for a long while in the communication field. Detailed models have been developed for video transmission [3], [4], or Voice over Internet (VoIP) applications [5]. Data loss is usually modeled by a random process, where a single sample or a block of samples (a packet) is randomly available or lost. Data availability or data loss is described by the so-called data availability indicator function 𝐾𝑛 [3], [4], [6], [7]. This function is a series of zeros and ones, and the data loss pattern depends on the data transmission system. In the simplest model the samples of 𝐾𝑛 are independent from each other [10], but it is often supposed that the successive samples depend on each other. Comprehensive studies utilize different classes of Markov models, see, e.g., [3], [4], [11]. The identification of the data loss model is based on statistical methods. First the structure and the order of the model are chosen, then the parameters of the model are determined by measurement data [3], [4]. Model verification is an important task, as the process is not necessarily stationary and even the model order is to be adaptively changed. To this end, a correlation-based method [3] or off-line processing [4] can be used. The presence of data loss in measurement systems motivated the investigation of the phenomena from a signal processing point of view. Our recently published paper [8] discussed one of such problems: the handling of data loss in the case of spectrum estimation. In spite of the existing methods concentrating to the spectrum estimation, this paper dealt with the characterization of distortion caused by missing data. Spectrum estimation based on time records with irregular sampling has been used for a long time. Records with missing data can be treated as a special irregular sampling, substituting the signal samples with zeros where the data are lost. Theoretically, such records can be synthesized by the multiplication of the original signal (without data loss) and the data availability indicator function 𝐾𝑛. The spectral estimator of the damaged record is the discrete convolution of the original spectrum and the spectrum of 𝐾𝑛. Thus the introduction of the distorted spectra involved the calculation of the spectra of the data availability indicator functions. Three data loss models have been investigated: random independent, random independent block-based, and two-state Markov model-based data loss. All their spectra have ABSTRACT Recently measurement data loss has been of greater interest, due to the spread of sensor networks and the idea of Internet of Things. A procedure is proposed that is able to identify the most frequently employed data loss models. It is assumed that the communication protocol provides information about data loss, i.e. the so-called data availability indicator function is known. The power spectral density (PSD) of the indicator function is representative for the model, and can be used for identification. Spectral estimation is carried out by Fast Fourier Transform (FFT) based techniques. The paper introduces the identification procedure for random independent, random block-based and general Markov model-based data loss patterns. The efficiency of the proposed method is demonstrated by simulation and measurement results. ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 62 been determined, and quantitative connection between the data loss model parameters and the spectral parameters have been calculated. In this paper we propose the inverse procedure: the data loss model can be identified by the Fourier transform of the data availability indicator function. First the PSD of the indicator function is calculated, then a parametric system identification method is used to get the spectral parameters. As the spectral shape is simple in most cases, this step is not critical. The last step is the calculation of the data loss parameters by the already known relations. Model order selection and verification can also easily be done by comparing the measurement spectra to the model spectra. The spectra are calculated by the Fast Fourier Transform (FFT), which makes the method computationally effective. Summing up, our method offers not only the spectral estimation of the lossy signal [8], but also provides an estimate for the data loss model along with its parameters. In Section 2 the data loss models and their spectra are introduced in detail. The analysis of the Markov model-based data loss has been expanded to the general case. Section 3 deals with the identification procedure itself, while Section 4 presents simulation and measurement results that confirm the procedure in practice, as well. 2. SPECTRUM ESTIMATION IN THE CASE OF DATA LOSS 2.1. Power spectrum estimation The Fourier transform of a sampled signal 𝑥(𝑡𝑛) can be estimated by a finite set of samples [9]. The signal 𝑥(𝑡) is usually equidistantly sampled, and the spectrum is calculated by the Discrete Fourier Transform (DFT): 𝑋(𝑓𝑘) = � 𝑥𝑛𝑒 −𝑗 2𝜋 𝑁 𝑛𝑘, 𝑛, 𝑘 = 0 … 𝑁 − 1 𝑁−1 𝑛=0 , (1) where 𝑓𝑘 = 𝑘/𝑁 and 𝑥𝑛 = 𝑥(𝑡𝑛). The DFT of a signal is usually calculated by the FFT. The transformed vector 𝑋(𝑓𝑘) is generally complex valued, and the spectral content of the signal is expressed by the real valued Power Spectral Density (PSD) function: 𝑆(𝑓𝑘) = 1 𝑁 |𝑋(𝑓𝑘)|2. (2) In order to reduce the variance of the PSD, a long series of samples is recorded, and many consecutive blocks of 𝑁 samples are transformed, and the estimator is obtained by averaging the individual PSDs. The mean of the individual estimates can be calculated either by linear or exponential averaging. 2.2. Formulation of data loss In order to model the data loss, a so-called data availability indicator function, 𝐾𝑛, is introduced [3], [4], [6], [7]: 𝐾𝑛 = � 1, if the sample is processed at 𝑛 0, if the sample is lost at 𝑛 . (3) Samples which are not lost will be termed as processed or available samples. The data loss rate can be defined with 𝐾𝑛 as: 𝛾 = ℙ{𝐾𝑛 = 0}, (4) where ℙ{. } stands for the probability operator. The probability that a sample is available is 𝜇 = 1 − 𝛾. 2.3. Spectrum Estimation with missing data Using the indicator function, 𝐾𝑛, (1) can be rewritten for the case of data loss: 𝑋�(𝑓𝑘) = DFT(𝑥𝑛𝐾𝑛) = � 𝑥𝑛𝐾𝑛𝑒 −𝑗 2𝜋 𝑁 𝑛𝑘 𝑁−1 𝑛=0 𝑛, 𝑘 = 0 … 𝑁 − 1 . (5) This formula means that by incorporating 𝐾𝑛 into the usual form of DFT, missing samples are practically substituted with zeros. Equation (5) can also be evaluated via FFT. The spectrum of the signal containing missing samples is obtained as the convolution of the spectrum of the lossless signal and the spectrum of the data loss indicator function. Now only the latter is interesting. Let 𝑋𝐾(𝑓𝑘) denote the Fourier transform of the data loss indicator function: 𝑋𝐾(𝑓𝑘) = DFT(𝐾𝑛). (6) Thus the PSD of the data loss indicator function is: 𝑆𝐾(𝑓𝑘) = 1 𝑁 |𝑋𝐾(𝑓𝑘)|2. (7) The variance of 𝑆𝑘(𝑓𝑘) can also be reduced by averaging. 2.4. Data loss models and their spectra In [8], three data loss models have been investigated: 1. random independent data loss, 2. random block-based data loss, 3. two-state Markov model-based data loss. The random data loss is one of the most essential data loss models; it is often used because of its simplicity [10]. Block- based data loss models are often used, e.g., when several measurement results are transmitted over packet-based communication systems. The Markov model has been proven to be useful, e.g., in the description of data loss pattern in real- time data transmission over Internet [11]. In [8] the two-state model has exhaustively been investigated, but real situations may require more states. In the following the main features of the data loss models are recalled. 2.4.1. Random independent data loss Random independent data loss can be defined as follows: 𝐾𝑛 = 1, with probability 𝜇 = 1 − 𝛾 𝐾𝑛 = 0, with probability 𝛾 for ∀ 𝑛. (8) The definition means that each sample is lost with probability 𝛾, and data losses at different time instants are independent of each other. The PSD of the data loss pattern is [8]: 𝑆𝐾(𝑓𝑘) = 𝜇(1 − 𝜇) 𝑁 + 𝜇2𝛿(𝑓𝑘), (9) where 𝛿(𝑓𝑘) stands for the Dirac-delta function. The PSD is white, which is represented by the first term, while the second term represents the power of the mean value 𝜇 of the data loss pattern. 2.4.2. Random block-based data loss To define the random block-based data loss, the indicator function is given as: {𝐾𝑘𝑘 … 𝐾(𝑘+1)𝑘−1} = 1, with probability 𝜇 {𝐾𝑘𝑘 … 𝐾(𝑘+1)𝑘−1} = 0, with probability 𝛾 for ∀ 𝑘. (10) ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 63 The definition means that each block of length 𝑀 is lost with probability 𝛾, and the data loss in different blocks are independent of each other. The power spectral density of the data loss pattern is [8]: 𝑆𝐾(𝑓𝑘) = 𝜇(1 − 𝜇) 𝑀𝑁 � sin (𝑓𝑘𝜋𝑀) sin(𝑓𝑘𝜋) � 2 + 𝜇2𝛿(𝑓𝑘). (11) 2.4.3. Two state Markov model-based data loss The first Markov model-based data loss is described by the two-state Markov chain shown in Figure 1. This is the simplest stochastic model that describes the dependency of the consecutive samples of 𝐾𝑛. The states of the Markov chain represent the value of the indicator function 𝐾𝑛. If a sample is available at time instant 𝑛, the next sample will be available with probability 𝑝, and will be lost with probability 1 − 𝑝. If a sample is missing at time instant 𝑛, the next sample will be available with probability 1 − 𝑞, and will be lost with probability 𝑞. The data availability rate 𝜇 is the following [12]: 𝜇 = 𝑞 − 1 𝑝 + 𝑞 − 2 . (12) Note that the parameters 𝑝, 𝑞, and 𝜇 cannot be prescribed simultaneously. The spectral property of a data loss sequence generated by the Markov chain shown in Figure 2 can be determined according to [12]. The PSD of 𝐾𝑛 is a first-order, low-pass type spectrum defined as [8]: 𝑆𝐾(𝑓𝑘) = 1 − 𝑎2 𝑁(1 − 𝑎2𝑁) ∙ 1 |1 − 𝑎𝑧−1|2 + 𝜇2𝛿(𝑓𝑘), (13) where 𝑧−1 = 𝑒−𝑗2𝜋𝑓𝑘 and 𝑎 = 𝑝 + 𝑞 − 1. (14) 2.4.4. General Markov model-based data loss The two-sate Markov model describes the relation of the actual and only the previous sample of 𝐾𝑛. Generally, the actual state could be determined by the last 𝑟 samples. As each sample could have two distinct values (zero or one), a general model could have 2𝑟 states. The parameter 𝑟 is the order of the model. The states and the transitions can be seen in Figure 2 for the case 𝑟 = 2. Note that there are impossible state transitions that have formally zero probability in the model. The relation between the transition probabilities and the PSD of the model can also be determined [12]. Let the PSD be as follows: 𝑆𝐾(𝑓𝑘) = 𝐴 |1 − ∑ 𝑎𝑚𝑧−𝑚𝑟𝑚=1 |2 ∙ +𝜇2𝛿(𝑓𝑘), (15) where 𝑧−1 = 𝑒−𝑗2𝜋𝑓𝑘. The constant 𝐴 is a function of the vector 𝑎𝑚, 𝑚 = 1, … , 𝑟 and the number of the FFT points 𝑁. The state transition probabilities are: 𝐏 = �𝑃𝑖𝑗� = = ⎩ ⎨ ⎧𝜇 + � 𝑎𝑚(𝐾𝑛−𝑚 − 𝜇 ), if 𝐾𝑛 = 1 𝑟 𝑚=1 1 − �𝜇 + � 𝑎𝑚(𝐾𝑛−𝑚 − 𝜇 ) 𝑟 𝑚=1 � , if 𝐾𝑛 = 0 , (16) where �𝑃𝑖𝑗� = ℙ�𝑠𝑖�𝑠𝑗�, (17) i.e., the probability of the state 𝑠𝑖 with a condition of the previous state 𝑠𝑗. The 𝑠𝑖 and 𝑠𝑗 states are the following: 𝑠𝑖 = [𝐾𝑛, 𝐾𝑛−1, … , 𝐾𝑛−𝑟+1], 𝑖 = 1 … 2𝑟 𝑠𝑗 = [𝐾𝑛−1, 𝐾𝑛−2, … , 𝐾𝑛−𝑟], 𝑗 = 1 … 2𝑟 . (18) Table 1 summarizes the PSDs of the data loss indicator functions for the most important data loss models. The small figures in the table illustrate the typical shapes of PSD functions. General Markov models can have diverse spectra, thus they are not represented in the table. 3. IDENTIFICATION OF DATA LOSS MODELS Data loss model identification consists of the model selection and the determination of the model parameters [3], [4]. The procedure utilizes some direct parameters of 𝐾𝑛 and is completed by the evaluation of 𝑆𝐾(𝑓𝑘). The theoretical basis for this method is the exact relation between spectral leakage and the parameters of different data loss models presented in the previous section. The identification process is summarized in Figure 3. An essential requirement is that the communication protocol provides information about each sample whether its transfer was successful or not. Without such information only qualitative assessment of the data loss can be done. In such a case the shapes depicted in Table 1 can be used for identification. If 𝐾𝑛 is available, it is also known whether the protocol is block-based. In the latter case, one sample of 𝐾𝑛 is enough for each block representing the data loss. It is a kind of decimation. The block length 𝑀 is obviously available. The data availability rate 𝜇 can easily be estimated as the mean value or DC level of 𝐾𝑛. This DC level should be subtracted from 𝐾𝑛, in order to remove 𝛿(𝑓) from the PSD, as its presence can impair the transfer function fitting. The next step is the calculation of 𝑆𝐾(𝑓𝑘). It can be done by the averaging of the FFTs of consecutive (possibly overlapping) blocks of 𝑁 samples of 𝐾𝑛. As the DC component is removed from the PSD, windowing is not necessary. Figure 2. A four-state Markov model of data loss. Figure 1. A two-state Markov model of data loss. State “1”: actual sample is available (Kn = 1). State “0”: actual sample is lost (Kn = 0). ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 64 The inverse Fourier transform (IFFT) of 𝑆𝐾(𝑓𝑘) provides the autocorrelation function of 𝐾𝑛. The FFT block size should be greater than the length of the autocorrelation function. It is not a hard requirement, as the usual FFT block size is much greater than required by 𝐾𝑛. The main part of the identification is the approximation of 𝑆𝐾(𝑓𝑘). As the previous investigations have shown, the transfer function can be well approached by an all pole or autoregressive (AR) system. Theoretically it has no pole if 𝐾𝑛 is random independent, and has only one pole if 𝐾𝑛 is Markov model-based with two states. Generally a Markov model with 2𝑟 states results in an AR system with 𝑟 poles. The PSD 𝑆𝐾(𝑓𝑘) of a block based data loss has zeros, but after decimation 𝐾𝑛 is either random independent or Markov model-based. As the system is quite simple, there are no special requirements for the identification. We have used a linear prediction filter (LPC) which determines the coefficients of a forward linear predictor by minimizing the prediction error in the least squares sense [14]. To this end, the lpc function of Matlab [15] has been applied. There are 𝑟 + 1 LPC coefficients, where 𝑎0 = 1. If the LPC coefficients 𝑎1, 𝑎2, … ≈ 0, the data loss can be handled as random independent. Its only parameter 𝜇 has already been calculated. However, if 𝑎1, 𝑎2, … ≠ 0, Markov model-based data loss has been happened. In the general case of a 𝑟th order Markov model, the transition probabilities are to be determined by the formula (16). In the special case of the two-state Markov model-based data loss the parameters 𝑝 and 𝑞 can be estimated using the relations (12) and (14): �̂� = �̂�(1 − 𝑎�) + 𝑎�, 𝑞� = �̂�(𝑎� − 1) + 1, (19) where 𝑎� = −𝑎1. (20) In (19) and (20) the hat operator indicates the estimation. At the end, the information whether the data loss is block based has to be incorporated. If so, the estimators �̂�, �̂�, and 𝑞� do not change, but the parameter set has to be completed by the block size 𝑀. 4. RESULTS The procedure presented above has been intensively tested by simulations and measurements. In this section results of both tests are presented. The data processing has followed the procedure given in Figure 3. Figure 3. Data loss model identification. Table 1. Summary of PSDs belonging to different data loss models. Model 𝑆𝐾(𝑓𝑘) Shape of 𝑆𝐾(𝑓𝑘) Random independent data loss 𝜇(1 − 𝜇) 𝑁 + 𝜇2𝛿(𝑓𝑘) Block-based data loss 𝜇(1 − 𝜇) 𝑀𝑁 � sin (𝑓𝑘𝜋𝑀) sin(𝑓𝑘𝜋) � 2 + 𝜇2𝛿(𝑓𝑘) Two-state Markov model- based data loss 1 − 𝑎2 𝑁(1 − 𝑎2𝑁) ∙ 1 |1 − 𝑎𝑧−1|2 + 𝜇2𝛿(𝑓𝑘). ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 65 4.1. Simulation results First a random independent data loss pattern has been generated, then it has been identified by the proposed method. The parameters of the simulation are summarized in Table 2, where 𝐿 is the total length of the record in samples, 𝑁 is the FFT size. The spectra have been exponentially averaged, with a smoothing factor 𝛼. The constant 𝜇 is the parameter of the data loss model. At the end a 10th order LPC model has been fitted, in order to check the dynamic behavior of the data loss. The identified model parameters are the following: �̂� = 0.9900, 𝑎� = 0.0013. (21) Note that 𝑎� = −𝑎1 is the spectral parameter of the two- state Markov model, according to (20). As already 𝑎1 ≈ 0, the random independent data loss has been verified. The PSD of the model has also been calculated. The upper plot of Figure 4 shows the PSD of the data loss pattern (green line), and the PSD of the model (blue line). The coefficients of the 10th order AR system are depicted in the lower plot. It can be seen that the fitted PSD is in good accordance with the generated one. All the AR coefficients equal approximately zero except the first one. The second simulation example deals with two-state Markov-based data loss. The parameters of the simulation are summarized in Table 3, where 𝐿 is the total length of the record in samples, 𝑁 is the FFT size, and 𝛼 is the smoothing factor, again. The constants 𝑝 and 𝑞 are the parameters of the Markov model. A 10th order LPC model has been fitted as before. The identified model parameters are the following: �̂� = 0.9118, 𝑎� = 0.8872, �̂� = 0.9900, 𝑞� = 0.8971 . (22) The original parameters of the model are 𝑝 and 𝑞, therefore the data availability rate 𝜇 is a resulted constant. The second LPC coefficient 𝑎1 is non-zero, but the rest of the coefficients are close to zero. The main result of the identification �̂� and 𝑞� are really close to the initial parameters given in Table 3. The PSD of the model has also been calculated. The upper plot of Figure 5 shows the PSD of the data loss pattern (green line), and the PSD of the model (blue line). The coefficients of the 10th order AR system are depicted in the lower plot. Both the spectra and the LPC coefficients verify that the calculated first order model is appropriate for the two-state Markov-based data loss. 4.2. Measurement results Measurements were carried out by a test system introduced in [13]. In this testbed, wireless sensors perform real-time data collection, and they transmit the data to a PC through a gateway node. In this measurement we used only one sensor. The data sent by the sensor are recorded and processed on the PC. Since data transmission and collection is performed in a hard real- time manner, there is no possibility to apply any acknowledge mechanism for the indication and retransmission of lost packets, hence data loss is inevitable. The data loss is recognized by a time-out mechanism. The sampling frequency is 𝑓𝑠 = 1800 Hz, and the sensor transmits data packets of 𝑀 = 25 samples. If data loss occurs, it can be described by the block-based model. In the first measurement setup, the sensor and the gateway were placed 4 m away from each other in a room, and the sensor was placed near to a big wardrobe made of metal, in order to degrade the radio transmission properties. The extensive metal surfaces of the wardrobe without any special arrangement (e.g., connection to the earth) produced data loss in the communication. Depending on the physical circumstances, data loss rates in the range of [0.1 … 30]% could be detected. Now the analysis results for the 3.75% case are introduced. The parameters of the measurement are summarized in Table 4, where 𝐿𝐵 is the length of the record in Figure 4. Identification of a random independent data loss: PSD of the data loss pattern, and the PSD of the model (upper plot). Coefficients of the identified AR system (lower plot). Figure 5. Identification of a Markov-model based data loss: PSD of the data loss pattern, and the PSD of the model (upper plot). Coefficients of the identified AR system (lower plot). Table 2. Main data of the first simulation. Record length 𝐿 FFT length 𝑁 Smoothing factor 𝛼 𝜇 106 1024 0.01 0.9900 Table 3. Main data of the second simulation. Record length 𝐿 FFT length 𝑁 Smoothing factor 𝛼 𝑝 𝑞 106 1024 0.01 0.9900 0.9000 Table 4. Main data of the first experiment. Record length in blocks, 𝐿𝐵 Duration of the record, 𝑡𝑚 FFT length, 𝑁 Smoothing factor, 𝛼 4638 64.4 sec 1024 0.01 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -30 -20 -10 0 10 Relative frequency, f P S D , dB S(f) Spectrum of K_n Spectrum of the identified model 0 1 2 3 4 5 6 7 8 9 10 -0.5 0 0.5 1 order, i A(i) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -30 -20 -10 0 10 Relative frequency, f P S D , dB S(f) Spectrum of K_n Spectrum of the identified model 0 1 2 3 4 5 6 7 8 9 10 -1 -0.5 0 0.5 1 order, i A(i) ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 66 blocks, while 𝑡𝑚 is its duration. The parameter 𝑁 denotes the FFT size, and 𝛼 is the smoothing factor. The result of the identification can be seen in Figure 6. The estimated data loss parameters are the following: �̂� = 0.9625, 𝑎� = 0.0201, �̂� = 0.9632, 𝑞� = 0.0569 . (23) In order to check our assumptions, the estimators �̂� and 𝑞� of a two-state Markov-model have also been calculated. The system is usually in the '1' state, and if it moves to '0', it has a small probability that stays also in '0' for the next time instant. The upper plot of Figure 6 shows the PSD of the measured data loss pattern (green line), and the PSD of the model (blue line). The coefficients of the 10th order AR system are depicted in the lower plot. Both the graphical result and the estimated parameters imply that this radio communication suffers from random independent block-based data loss. The second measurement aimed the investigation of a different data loss mode. A mobile phone has been placed next to the wireless sensor, and the WiFi function of the phone has been activated by playing an on-line media stream. As both devices use the same 2.4 GHz radio band, the communication of the phone causes a disturbance for the wireless sensor. The parameters of the measurement are summarized in Table 5, where 𝐿𝐵 is the length of the record in blocks, while 𝑡𝑚 is its duration. The parameter 𝑁 denotes the FFT size, and 𝛼 is the smoothing factor. The result of the identification can be seen in Figure 7. The estimated data loss parameters are the following: �̂� = 0.9533, 𝑎� = 0.2529, �̂� = 0.9651, 𝑞� = 0.2878 . (24) As 𝑎1 ≠ 0, the data loss introduced by the WiFi function of the mobile phone cannot be random independent. Nevertheless, a two-state Markov-based model can be well fitted to this data loss pattern, as the rest of the coefficients are sufficiently small (𝑎2, … ≈ 0). The PSD of the model has also been calculated. The upper plot of Figure 7 shows the PSD of the measured data loss pattern (green line), and the PSD of the model (blue line). The coefficients of the 10th order AR system are depicted in the lower plot. The last example demonstrates yet another lossy measurement. The wireless sensor has been carried by a walking person in the building, resulting in an occasionally very large distance from the base station. The parameters of the measurement are summarized in Table 6, where the parameters 𝐿𝐵, 𝑡𝑚, 𝑁, and 𝛼 have the same meaning as before. The result of the identification can be seen in Figure 8. Based on the PSD of the data loss pattern, we have tried to fit a two-state Markov model. The estimated data loss parameters are the following: �̂� = 0.8829, 𝑎� = 0.5915, �̂� = 0.9522, 𝑞� = 0.6393 . (25) Figure 7. Identification results of the second experiment: PSD of the measured indicator function, and the PSD of the model (upper plot). Coefficients of the identified AR system (lower plot). Table 5. Main data of the second experiment. Record length in blocks, 𝐿𝐵 Duration of the record, 𝑡𝑚 FFT length, 𝑁 Smoothing factor, 𝛼 25988 361 sec 1024 0.01 Figure 6. Identification results of the first experiment: PSD of the measured indicator function, and the PSD of the model (upper plot). Coefficients of the identified AR system (lower plot). Table 6. Main data of the third experiment. Record length in blocks, 𝐿𝐵 Duration of the record, 𝑡𝑚 FFT length, 𝑁 Smoothing factor, 𝛼 10200 142 sec 1024 0.01 Figure 8. Identification results of the third experiment: PSD of the measured indicator function, and the PSD of the model (upper plot). Coefficients of the identified AR system (lower plot). 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -30 -20 -10 0 Relative frequency, f P S D , dB S(f) Measured spectrum Spectrum of the identified model 0 1 2 3 4 5 6 7 8 9 10 -0.5 0 0.5 1 order, i A(i) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -30 -20 -10 0 Relative frequency, f P S D , dB S(f) Measured spectrum Spectrum of the identified model 0 1 2 3 4 5 6 7 8 9 10 -0.5 0 0.5 1 order, i A(i) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -30 -20 -10 0 Relative frequency, f P S D , dB S(f) Measured spectrum Spectrum of the identified model 0 1 2 3 4 5 6 7 8 9 10 -1 -0.5 0 0.5 1 order, i A(i) ACTA IMEKO | www.imeko.org December 2017 | Volume 6 | Number 4 | 67 The PSD of the model has also been calculated. The upper plot of Figure 8 shows the PSD of the measured data loss pattern (green line), and the PSD of the model (blue line). The coefficients of the 10th order AR system are depicted in the lower plot. In this case the PSD of the model does not fit well the measured PSD. It can also be observed that the higher- order LPC parameters are non-zero, as well. Finally we have found that a third-order Markov model describes the data loss well. The estimated LPC parameters are the following: 𝐚 = [1, −0.6031, −0.2235, −0.1483]. (26) The PSD of the model has also been calculated. Figure 9 shows the PSD of the measured data loss pattern (green line), and the PSD of the model (blue line). Here the graphical representation of the spectral coefficients has already been omitted. The model already fits to the measured spectrum, the identification is successful. Now the state transition probabilities can be calculated using the formula (16): 𝐏 = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0.978 0 0.83 0 0 0 0 0 0.022 0 0.17 0 0 0 0 0 0 0 0 0 0.75 0 0.61 0 0 0 0 0 0.25 0 0.39 0 0 0.37 0 0.23 0 0 0 0 0 0.63 0 0.23 0 0 0 0 0 0 0 0 0.15 0 0 0.003 0 0 0 0 0.85 0 0 0.997⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ . (27) Unfortunately, the probability matrix 𝐏 has a size of 8 by 8 as the model has 23 = 8 states. Note that the zero probabilities belong to impossible state transitions. Here the probabilities 𝑃11 = 0.978 and 𝑃88 = 0.997 are heightened. They can be interpreted as the probability that the system remains in the 'data loss' and the 'data available' state, respectively. During the walk the distance between the sensor node and the PC has been continuously increased and decreased, resulting in continuously increasing and decreasing data loss. It means that the data loss is not stationary which explains the necessity of the higher order Markov-model. 5. CONCLUSIONS Recently the analysis of measurement data loss by the spread of sensor networks and Internet-based technology has gained importance. The investigation of the FFT based PSD estimation in the case of data loss has discovered the exact relation between spectral leakage and some data loss models. This paper introduced the inverse procedure: the data loss model can be identified by the Fourier transform of the so- called data availability indicator function. The identification procedure has been elaborated for random independent, random block-based, and Markov model-based data loss. Frequency domain identification also supports the verification process, as model fitting and checking can be accomplished simultaneously. The use of the Fourier transform is the main novelty of our approach. By its use not only the parameters of the data loss model can be calculated, but the model order can be determined, as well. The method has been intensively tested by simulations and measurements. Based on the experiences, the proposed procedure is a promising solution for data loss model identification. Further research is required if the data availability function is not stationary or it is not directly available. REFERENCES [1] L. Kong et al., “Data Loss and Reconstruction in Wireless Sensor Networks”, Proc. IEEE Infocom 2013, Turin, Apr. 14-19, 2013, pp. 1654-1662. [2] M. Mathiesen, G. Thonet, N. Aakwaag, “Wireless ad-hoc networks for industrial automation: current trends and future prospects”, Proc. of the IFAC World Congr., Prague, Czech Republic, July 4-8, 2005, pp. 89-100. [3] M. Yajnik, et al., “Measurement and modelling of the temporal dependence in packet loss”, in: Proc. IEEE Infocom 1999, pp. 345-352. [4] M. Ellis et al., A two-level Markov model for packet loss in UDP/IP-based real-time video applications targeting residential users, Computer Networks 70 (2014), pp. 384-399. [5] J. Rachwalski, “Analysis of packet loss pattern for concatenated transmission channels using burst ratio parameter”, PhD Dissertation, AGH University of Science and Technology, 2016. [6] H. Sanneck, G. Carle, and R. Koodli. “Framework model for packet loss metrics based on loss runlengths”, PROC SPIE INT SOC OPT ENG. 3969 (2000). [7] G. Sinopoli et. al, Kalman Filtering with Intermittent Observations, IEEE Trans. Autom. Control 49 (2004) pp. 1453- 1464. [8] L. Sujbert, Gy. Orosz, FFT-based Spectrum Analysis in the Case of Data Loss, IEEE Trans. Instrum. Meas. 65 (2016) pp. 968- 976. [9] J. S. Bendat, A. G. Piersol, Random Data: Analysis and Measurement Procedures, John Wiley and Sons, Inc., New York, London, Sidney, Toronto, 1971. [10] T. Nagayama, B. F. Spencer, G. Agha, and K. Mechitov, “Model- based Data Aggregation for Structural Monitoring Employing Smart Sensors”, 3rd Int. Conf. on Networked Sensing Systems (INSS). [11] O. Hohlfeld, R. Geib, G. Hasslinger, “Packet Loss in Real-Time Services: Markovian Models Generating QoE Impairments”, 16th Int. Workshop on Quality of Service, Enschede, June 2008, pp. 239-248. [12] P. Boufounos, “Generating Binary Processes with All-pole Spectra”, IEEE Int. Conf. Acoustics, Speech and Signal Processing, Honolulu, HI, vol. 3, Apr. 15-20, 2007, pp. 981-984. [13] Gy. Orosz, L. Sujbert, G. Péceli, “Testbed for wireless adaptive signal processing systems”, Proc. IEEE Instrum. and Meas. Conf., Warsaw, Poland, May 1-3, 2007, pp. 123-128. [14] Jackson, L.B., Digital Filters and Signal Processing, 2nd ed., Kluwer Academic Publishers, 1989. [15] MATLAB (2010), www.mathworks.com Figure 9. Identification of a third order Markov-model in the third experiment: PSD of the measured indicator function, and the PSD of the model. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -40 -35 -30 -25 -20 -15 -10 -5 0 Relative frequency, f P S D , dB S(f) Measured spectrum Spectrum of the identified model Frequency Domain Identification of Data Loss Models << /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /CMYK /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments true /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /ARA /BGR /CHS /CHT /CZE /DAN /DEU /ESP /ETI /FRA /GRE /HEB /HRV (Za stvaranje Adobe PDF dokumenata najpogodnijih za visokokvalitetni ispis prije tiskanja koristite ove postavke. Stvoreni PDF dokumenti mogu se otvoriti Acrobat i Adobe Reader 5.0 i kasnijim verzijama.) /HUN /ITA /JPN /KOR /LTH /LVI /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /POL /PTB /RUM /RUS /SKY /SLV /SUO /SVE /TUR /UKR /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice