Papers in Physics, vol. 13, art. 130001 (2021) Received: 23 September 2020, Accepted: 11 January 2021 Edited by: D. H. Zanette Licence: Creative Commons Attribution 4.0 DOI: https://doi.org/10.4279/PIP.130001 www.papersinphysics.org ISSN 1852-4249 A method for continuous-range sequence analysis with Jensen-Shannon divergence M. A. Ré1, 2*, G. G. Aguirre Varela2, 3� Mutual Information (MI) is a useful Information Theory tool for the recognition of mutual dependence between data sets. Several methods have been developed fore estimation of MI when both data sets are of the discrete type or when both are of the continuous type. However, MI estimation between a discrete range data set and a continuous range data set has not received so much attention. We therefore present here a method for the estimation of MI for this case, based on the kernel density approximation. This calculation may be of interest in diverse contexts. Since MI is closely related to the Jensen Shannon divergence, the method developed here is of particular interest in the problems of sequence segmentation and set comparisons. I Introduction Mutual Information (MI) is a quantity whose the- oretical base originates in Information Theory[1]. Since MI between two independent random vari- ables (RV) is zero, a non-null value of MI between these variables gives a measure of mutual depen- dence. When analyzing two data sets X and Y (assumed to be the realization of two mutually de- pendent RVs) MI can give us a measure of the *re@famaf.unc.edu.ar �guiava@gmail.com 1 Centro de Investigación en Informática para la Inge- nieŕıa, Universidad Tecnológica Nacional, Facultad Re- gional Córdoba, Maestro López esq. Cruz Roja Ar- gentina, (5016) Córdoba, Argentina. 2 GFA - Facultad de Matemática, Astronomı́a, F́ısica y Computación, Universidad Nacional de Córdoba, Av. Medina Allende s/n, Ciudad Universitaria, (5000) Córdoba, Argentina. 3 Instituto de F́ısica Enrique Gaviola (IFEG), Facultad de Matemática, Astronomı́a, F́ısica y Computación, Univer- sidad Nacional de Córdoba, Ciudad Universitaria, (5000) Córdoba, Argentina. mutual dependence of these sets. Although MI may be straightforwardly calculated when the underly- ing probability distributions are known, this is not usually the case when only the data sets are avail- able. Therefore, MI must be estimated from the data sets themselves. When X and Y are the dis- crete type, MI may be estimated by substituting the joint probability of these variables by the rela- tive frequency of appearance of each pair (x,y) in the data sequence [2, 3]. For real value data sets (or the discrete type with a wide range) estimation of MI by frequency of appearance is not applica- ble. The binning method [4] in turn requires large bins or large sequences in order to produce reason- able results. Alternative proposals have been made for cases when both data sets are the continuous type [5]. Estimation of MI between a discrete RV and a continuous one has not been so extensively con- sidered, in spite of being a problem of interest in diverse situations. For instance, we could compare the day of the week (weekday-weekend, discrete) with traffic flow (continuous), quantifying this ef- fect. In a different context we might wish to quan- 130001-1 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. tify the effect of a drug (administered or not, dis- crete) in medical treatment evaluation (electroen- cephalograms in epilepsy, continuous data). Ross[6] has proposed a scheme for estimating MI based on the nearest neighbour method [4]. Assuming a se- quence of (x,y) pairs, with X being discrete and Y continuous, the nearest neighbour method requires the pairs to be ordered by the Y values. This re- quirement makes the proposal impractical, in se- quence analysis for instance. The nearest neigh- bour method was also considered by Kraskov et al. [7]. In their paper they suggest two ways of evaluat- ing MI with this method. An alternative definition for MI is presented by Gao et al. [8], also based on the distance between the elements of the sequence. In this paper we propose a more direct method for estimating MI between a discrete and a con- tinuous data set, based on the kernel density ap- proximation (KDA)[4] for estimating the probabil- ity density function (PDF) of the continuous vari- able. For the discrete variable we make use of the usual frequency approximation [2, 3]. Finally, MI is computed by the Monte Carlo integration. As shown by Grosse et al. [2] MI can be iden- tified with the Jensen Shannon Divergence (JSD), a measure of dissimilarity between two probabil- ity distributions. JSD is a non-negative functional that equals zero when the distributions being com- pared are the same. This property makes JSD a useful tool for sequence segmentation [2, 3]. Fur- thermore, in diverse contexts it is of interest to evaluate whether a given sequence matches a par- ticular probability distribution. The most usual case is that of a normal distribution. Neverthe- less, this is a more general problem. For instance, in satellite synthetic aperture radar (SAR) im- ages the backscatter presents a multiplicative noise assumed to have an exponential distribution [9]. Also, models for cloud droplet spectra assume a Weibull distribution [10,11]. Several indirect meth- ods have been proposed for analysis of continuous range sequences. Pereyra et al. [12] outlined a method based on wavelet transform to analyze elec- troencephalograms. Recently, Mateos et al. [13] have proposed a mapping from continuous value sequences into discrete state sequences previous to JSD calculation. Several other mapping methods have been proposed in the literature to associate a discrete probability distribution with a real value series. Here, by means of the KDA we avoid resorting to any indirect method, approximating the probabil- ity densities of continuous range variables by this non parametric method. In section II we present the calculation of MI and the arrangement for se- quence segmentation with JSD. In section III we test the perfomance of this method through numer- ical experiments. Also considered is application of the method in edge detection in a satellite synthetic aperture radar (SAR) image. In section IV we con- sider the results obtained. II Method In this section we present our proposal for estimat- ing MI between discrete and continuous RVs, based on the KDA estimator of a PDF. Let us consider a sequence of pairs (x,y) with x as a variable of discrete range and y of continuous range. To calcu- late MI we resort only to the sequence itself, making use of no extra information. We start from the se- quence of data pairs (x,y), and assume that these data are sampled from a joint probability density µ (x,y), although unknown at first. From the joint PDF the marginal probabilities p (x) = ∫ ∞ −∞ dy µ (x,y) (1a) φ (y) = ∑ x µ (x,y) (1b) are defined. The MI between the RVs X and Y is expressed in terms of these PDFs as [1] I (X,Y ) = ∑ x ∫ ∞ −∞ dyµ (x,y) ln [ µ (x,y) p (x) φ (y) ] . (2) Note that if the variables X and Y are statistically independent then µ (x,y) = p (x) φ (y), and in this case I (X,Y ) = 0. In this way a value of I (X,Y ) 6= 0 gives a measure of the mutual dependence of these variables. We may rewrite I (X,Y ) in terms of the conditional PDFs µ (y | x) = µ (x,y) p (x) (3) as I (X,Y ) = ∑ x p (x) ∫ ∞ −∞ dy µ (y | x) ln [ µ (y | x) φ (y) ] . (4) 130001-2 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. Figure 1: Kernel Density Approximation (KDA) for the Probability Density in (10) calculated from 1000 pairs generated by the Monte Carlo method. For plot A ym = 1, while for plot B ym = 5. In both cases σg = 1. Solid lines correspond to the analytic function and dashed lines to the KDA. i Kernel density approximation To carry out the calculation in (4), knowledge of the conditional PDFs is necessary. As mentioned, these densities are assumed to be unknown, and have to be estimated from the data themselves. Here we make use of the KDA [4], as summarized in the following. The conditional PDFs in Eq. (3) are estimated considering separately each data set pair with a given value of x. We define the set Cκ = {(x,y) /x = κ} (5) and for each set we approximate the conditional densities using a KDA with a Gaussian kernel µ̂ (y |x=κ) = 1 nκhκ 1 √ 2π ∑ yj�Cκ exp [ − (y −yj) 2 2h2κ ] . (6) Note that the sum is over the yj values in the set Cκ, and nκ is the number of pairs in this set. The bandwidth, hκ, is chosen as the optimal value, as reported by [4] and followed by Steuer et al. [5] hκ ' 1.06sκn−0.2κ (7) where s2κ is the variance of the sample. Sheather [14] considered alternative values to detect bi- modality; however, as they mention, there is little visual difference. The marginal probability of X is approximated by the frequency of ocurrence of each value p̂ (x = κ) = nκ n (8) and the marginal probability density of Y by φ̂ (y) = ∑ x p̂ (x) µ̂ (y | x) . (9) We illustrate the results obtained with the KDA by an example: let us consider the joint probability distribution µ (x,y) µ (x = 1,y) = 1 3 1 √ 2π exp [ − y2 2 ] (10a) µ (x = 2,y) = 2 3 1 √ 2πσg exp [ − (y −ym) 2 2σ2g ] (10b) and the corresponding marginal PDF φ (y) = 1 3 1 √ 2π e− y2 2 + 2 3 1 √ 2πσg e −(y−ym) 2 2σ2g . (11) We sampled 1000 pairs from this distribution for two different values of ym, and from these pairs we made an estimation of the conditional PDFs using the KDA. In Fig. 1A and 1B we plot the probability functions in (10) and (11) for two values of ym and the corresponding approximations. 130001-3 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. v1v2 . . . vn1−1vn1︸ ︷︷ ︸ n1 values vn1+1 . . . vn1+n2︸ ︷︷ ︸ n2 values Figure 2: The segmentation problem. Consider a se- quence S made up of two stationary subsequences S1 and S2, with n1 and n2 elements respectively. The problem consists in determining the value of n1; i.e., the point when the statistical properties change. ii Monte Carlo integration After approximating the PDFs we have to compute the integrals in (4) to estimate MI. We recognize in these integrals the expectation value 〈 ln µ (y |x) φ (y) 〉 = ∞∫ −∞ dy µ (y |x) ln [ µ (y |x) φ (y) ] (12) that can be estimated by Monte Carlo integra- tion [15] 〈 ln µ (y |κ) φ (y) 〉 ' 1 nκ ∑ yj�Cκ ln [ µ̂ (yj |x=κ) φ̂ (yj) ] . (13) Here the sum is again restricted to the yj values associated with a particular x value. Note that in this sum we make use of the kernel approximation of the conditional PDFs in (6). Substituting both approximations we finally get Î (X,Y ) ' 1 n ∑ x ∑ yj�Cκ ln [ µ̂ (yj | x) φ̂ (yj) ] . (14) iii Sequence segmentation The JSD is a measure of dissimilarity between probability distributions. Originally proposed by Burbea and Rao [16] and Lin [17] as a symmetrized version of Kulback Leibler divergence [1, 18], a gen- eralized weighted JSD between two PDFs, f1,f2 is defined as D [f1,f2] = H (π1f1 + π2f2)−π1H (f1)−π2H (f2) (15) with πi arbitrary weights satisfying π1 + π2 = 1. Here H is Gibbs Shannon entropy, defined for con- tinuous range variables as H (fi) = − ∞∫ −∞ dy fi (y) ln [fi (y)] . (16) As shown by Grosse et al. [2] JSD may be inter- preted as MI between a discrete and a continuous variable by identifying the weights πi with the dis- crete variable probability in (1a): πi = p (x = i) (17) and the probability densities fi (y) with the condi- tional densities in (3) fi (y) = µ (y | x = i) . (18) With these identifications, the functionals in (15) and (4) are the same. The JSD and several generalizations have been succesfully applied to the sequence segmentation problem, the partition of a non-stationary sequence into stationary subsequences, for discrete range se- quences. We propose here the extension of this method to continuous range sequences without re- sorting to discrete mapping, wavelet decomposition or any other indirect method of estimation of the probability distribution. The procedure for sequence segmentation may be stated in the following way: let us consider a se- quence S with n elements made of two stationary subsequences S1 and S2, with n1 and n2 values re- spectively (n1 + n2 = n), schematically illustrated v1v2 . . .︸ ︷︷ ︸ ν1 . . .︸ ︷︷ ︸ ν1 vn1−1vn1vn1+1 . . . vn1+n2 Figure 3: The sliding window method. A sliding win- dow is defined for sequence segmentation. The window is divided into two subwindows of equal size. The cen- ter of the window is considered as the window position. The window is displaced along the sequence and the JSD between the subwindows is calculated. The seg- mentation point is identified as the window position at which JSD has its maximun value. 130001-4 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. Figure 4: Mutual information estimation for the joint distribution in (10). For the distribution in (10) the dots represent the average MI value for 100 data sets of 1000 (x,y) pairs each, with the bars indicating the standard deviation of each set. The black line is the analytical value of MI: A) as a function of the mean value ym in (10b)(the inset shows the distribution of MI for a particular value of ym for a dependent and an independent set), and B) changing σg, the standard deviation in (10b). The inset shows the same plot but in log-log scale to highlight the MI value for independent sets. in Fig. 2. The aim is to determine the value of n1; i.e., the position of the last element in S1. In the algorithm proposed here we define a sliding window of fixed width over the sequence. The window is di- vided into two segments, each including ν1 elements (see Fig. 3). We define the window position as that of the last element in the left section of the window. This window is displaced over the sequence and the window position where JSD reaches its maximun value is taken as the segmentation point. III Assessment results In this section we present the results of our assess- ment of the proposed method by considering two applications: the detection of mutual dependence between two RV sequences and the segmentation of a sequence. In the first case we generate sequences of two jointly distributed variables: one of discrete range and one of continuous range, and then we compute MI between these variables. In the second case we consider sequences made of two subsequences generated from diferent distributions. We detect the segmentation point following the procedure de- scribed in the previous section. We also apply the method to detect the edges between homogeneous regions in SAR images. i Mutual information between a discrete and a continuous variable We computed the MI between discrete and contin- uous variables. We generated 100 data sets, sam- pling 1000 (x,y) pairs from the distribution in (10) with different values of ym or σg, and from the joint distribution µ (x= 1,y) = 1 3 [Θ (y−0.5) − Θ (y+0.5)] (19a) µ (x= 2,y) = 2 3 1 a [ Θ ( y + ym − a 2 ) − Θ ( y + ym + a 2 )] (19b) with Θ (y) the step function Θ (y) = { 0 for y < 0 1 for y > 0 (20) with different values of ym or a. We estimated the MI, I (X,Y ), from each set by the method de- scribed in the previous section. Given that we are sampling the data pairs from known distributions, we are also able to calculate MI from the analyti- cal expressions. In this way we may compare the results obtained from the approximation with the corresponding analytical results. 130001-5 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. Figure 5: Mutual information estimation for the joint distribution in (19). For the distribution in (19) the dots represent the average MI value for 100 data sets of 1000 (x,y) pairs each, with the bars indicating the standard deviation. The black line is the analytical value of MI while the dots represent the Kernel Density Approximation (KDA) values; A) as a function of mean value ym in (19b), and B) changing the width parameter a in (19b). In addition, we calculated the MI for samples of statistically independent variables to establish a significance value for the MI of the dependent vari- ables. The analytical value in this case is zero, as already mentioned. The results of the calculation are shown in Fig. 4 for the distribution in (10) and in Fig. 5 for the distribution in (19), respectively. We include the average value of MI over the 100 data sets for the different values of the parame- ters, and the bars correspond to the standard de- viation in each set. A small underestimation of the MI value can be seen in this last case. This may be attributed to a shortcoming of the KDA at the borders of the interval of the uniform distribution. Nevertheless, it is still possible to detect mutual de- pendence between the discrete and the continuous value sequences. To consider the effect of sample size, we repeated the experiment with the distribution in (10) for dif- ferent values of n, the number of data pairs in each set. We again generated 100 data sets of n data pairs each. The results are shown in Fig 6 for three sets of parameters. A slightly increasing overesti- mation of MI can be appreciated as n decreases. Fi- nally, we considered an usual situation when there is only one sample of data pairs available. We sam- pled 1000 pairs from the distribution in (10), the distribution in (19) and from the distribution µ (x = 1,y) = 1 3 exp (−y) Θ (y) µ (x = 2,y) = 2 3 1 2 exp (−y/2) Θ (y) . (21) For each sample we estimated MI by the approxi- mate method in (14). To set up a significance value Figure 6: Mutual information estimation for the dis- tribution in (10). For the distribution in (10) the dots represent the average value for 100 data sets of different numbers of (x,y) pairs. Bars indicate the standard de- viation, and dashed lines represent the analytical values of MI for the different sets of parameters. 130001-6 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. Figure 7: Segmentation point in artificial sequences. The JSD average computed for 500 sequences gener- ated from Rayleigh distributions. Each sequence has a length of 500 elements divided into two subsequences with 250 elements each. The ratio of the mean val- ues of the subsequences is given by rm = ml/mr = 5, where ml and mr are the mean values in the left and right subsequences, respectively. The sequences are an- alyzed with different window widths (ww). In all cases the window position (wp) of the maximum JSD average is at the segmentation point. for each sample we generated 100 data sets of 1000 pairs of independent variables. The discrete values were sampled from the distribution p (x) = nx 1000 (22) where nx gives the number of times that the value x appears in the original sequence, and the contin- uous values were sampled from the Gaussian distri- bution µ (y) = 1 √ 2πs exp [ − (y −m)2 2s2 ] (23) independently of the value of x. Here m is the mean value in the original sequence and s2 the sample variance. We calculated the MI for each data set and then the MI mean value and its variance. The results are included in Table 1. A clear difference can be seen between the MI of the dependent values and those of the independent sequences. ii Sequence segmentation To test the sequence segmentation method, we gen- erated sets of 500 sequences of 500 values each, di- Figure 8: Segmentation point in artificial sequences. The JSD average computed for 200 sequences gener- ated from Rayleigh distributions. Each sequence has a length of 500 elements divided into two subsequences with 250 elements each. Different values of the mean quotient rm = mr/ml are considered, where mr is the mean value of the right subsequence and ml the mean value of the left subsequence. In all cases a window width of 50 elements was used. Even for the lowest quotient value the window position (wp) of the maxi- mum JSD average is coincident with the segmentation point. vided into two subsequences with 250 values in each one. The sequences were generated from Rayleigh distributions with a different mean value for each subsequence. The mean value of the first subse- quence is denoted by ml , and the mean value of the second segment by mr ; we define the ratio of the mean values as rm = ml/mr. Using the sliding window method, we analyzed a set with rm = 5 with several window widths. In Fig. 7 we present the average value across the 500 se- quences of JSD at each window position for the different widths considered. The average JSD has Table 1: Mutual information and significance value. MI of the sampled dependent sequences (see text) and the corresponding significance values computed from the independent sets. PDF MI Significance Value mean st. dev. Gaussian 0.6359 4.5 × 10−3 1.8 × 10−3 Uniform 0.1429 4.5 × 10−3 1.8 × 10−3 Exponential 0.0718 4.5 × 10−3 1.9 × 10−3 130001-7 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. a maximum value at position 250, the segmentation point, even for a narrow window with 20 elements (10 elements in each subwindow), although in this case statistical fluctuations are more noticeable. To test the sensitivity of the method we generated sets with rm = 1.2, 1.5, 2, 5, 10. The results of the algo- rithm, with a window of 50 elements, are included in Fig. 8. Even for the smallest ratio considered, the segmentation point can be detected. Finally, we present an example of application of the segmentation algorithm to detect the edge be- tween homogeneous regions in a SAR image. In SAR images the backscatter is affected by speckle noise (a multiplicative noise). This noise in the backscatter amplitude is modelled by a Rayleigh distribution in homogeneous regions. In Fig. 9 we include a section of the SAR image of an Antarctic region, and the boundary detected between water and ice. On the right a plot of the values of the backscatter amplitude of the highlighted lines in the image and the JSD is included. There is good coincidence of the detected boundary with the con- tour in the image. IV Discussion and conclusions In this paper we have presented a method for com- puting Mutual Information (MI) between discrete and continuous data sets, or alternatively, the JSD between continuous range data sets. The algorithm developed gives a measure of dissimilarity without resorting to an indirect method like those proposed in [12, 13]. Neither is it necessary to have the con- tinuous values ordered as in the nearest neighbour method [4, 6]. In fact, the calculation in (14) is based only on the registered data as they were recorded. The measure may be applied to two similar prob- lems. On the one hand we can quantify the mutual dependence between discrete and continuous data sets, and on the other hand we can quantify the dissimilarity between two continuous data sets, as discussed in section II. In section III we applied the method to artificially-generated pairs of vari- ables, finding good agreement with the correspond- ing analytical values as shown in Figs. 4 and 5, al- though systematic underestimation occurs mainly when the difference is given by the width in uni- form distributions (fig 5-B). We attribute this dis- crepancy to the abrupt decay of the uniform dis- tribution at the borders of the interval, while the KDA with a Gaussian kernel extends to infinity. The MI values in these cases of mutually dependent variables are clearly distinguishable from the MI values of independent variables. We also considered the dependence of the results of this method on the length of the sequence. IIn Fig 6 a slightly increas- ing overestimation of MI is seen with decreasing length. Nevertheless, there is good agreement for sequences of more than 400 pairs. In real situations we frequently have only one sequence of (X,Y ) pairs. We have proposed a method for establishing a significance value by gen- erating 100 sequences of independent variables with probability distributions given by the estimated marginal distribution for the discrete variable, and by a Gaussian distribution for the continuous vari- able with the same mean value and variance as the marginal distribution of the original sequence. We have considered sequences generated from three distributions. In all three cases MI establishes a clear difference between dependent and indepen- dent sets, as shown in Table 1. It has been shown that the Jensen Shannon Di- vergence (JSD) is equivalent to MI [2]. Therefore, the calculation method developed here will also be suitable for computing JSD between two contin- uous range data sets, and in this format the JSD may be applied to the sequence segmentation prob- lem as proposed in section II-iii. In this section we suggested a method based on a fixed-length sliding window. We considered the segmentation of arti- ficially generated sequences in section III-ii. The JSD average at each position in the sequences ex- hibits a maximum at the segmentation point, as shown in Fig. 7. As we continue this work we will address the problem of comparing and analyz- ing electrophysiological signals. The segmentation method may also be of interest in detecting borders in images. Work along these lines will be published elsewhere. Acknowledgements - We wish to acknowledge partial support from SCyT - UTN through grant UTI4811 and from SeCyT - UNC through grant 30720150100199CB. 130001-8 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. Figure 9: Border detection in SAR images. The segmentation method was applied to detection of the border between homogeneous regions in a SAR image. The image was analyzed line by line and the segmentation point at each line detected. The segmentation points are coincident with the border. [1] T Cover, J Thomas, Elements of Information Theory, J. Wiley, New York (2006). [2] I Grosse, P Bernaola-Galván, P Carpena, R Román-Roldán, J Oliver, H. E. Stanley, Anal- ysis of symbolic sequences using the Jensen- Shannon divergence, Phys. Rev. E, 65, 041905 (2002). [3] M A Ré, R K Azad, Generalization of entropy based divergence measures for symbolic se- quence analysis, PLoS ONE 9, e93532 (2014). [4] B W Silverman, Density estimation for statis- tics and data analysis, Chapman and Hall, London (1986). [5] R Steuer, J Kurths, C O Daub, J Weise, J Selbig, The mutual information: Detecting and evaluating dependencies between variables, Bioinformatics 18, S231 (2002). [6] B C Ross, Mutual Information between dis- crete and continuous data sets, PLoS ONE 9, e87357 (2014). [7] A Kraskov, H Stögbauer, P Grassberger, Esti- mating mutual information, Phys. Rev. E 69, 066138 (2004). [8] W Gao, S Kannan, S Oh, P Viswanath Estimating mutual information for discrete- continuous mixtures, 31st Conference on neu- ral information processing systems (NIPS), 5986 (2017). [9] A Moreira, P Prats-Iraola, M Younis, G Krieger, I Hajnsek, K P Papathanassiou, A tutorial on Synthetic Aperture Radar, IEEE Geosci. Remote S. Magazine 1, 6 (2013). [10] Y Liu, J Hallett, On size distributions of cloud droplets growing by condensation: a new con- ceptual model, J. Atmos. Sci. 55, 527 (1998). [11] Y Liu, P H Daum, J Hallett, A generalized systems theory for the effect of varying fluc- tuations on cloud droplet size distributions J. Atmos. Sci. 59, 2279 (2002). 130001-9 https://doi.org/10.1103/PhysRevE.65.041905 https://doi.org/10.1103/PhysRevE.65.041905 https://doi.org/10.1371/journal.pone.0093532 https://doi.org/10.1093/bioinformatics/18.suppl_2.S231 https://doi.org/10.1371/journal.pone.0087357 https://doi.org/10.1371/journal.pone.0087357 https://doi.org/10.1103/PhysRevE.69.066138 https://doi.org/10.1103/PhysRevE.69.066138 https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html 10.1109/MGRS.2013.2248301 10.1109/MGRS.2013.2248301 https://doi.org/10.1175/1520-0469(1998)055<0527:OSDOCD>2.0.CO;2 https://doi.org/10.1175/1520-0469(2002)059<2279:AGSTFT>2.0.CO;2 https://doi.org/10.1175/1520-0469(2002)059<2279:AGSTFT>2.0.CO;2 Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al. [12] M E Pereyra, P W Lamberti, O A Rosso, Wavelet Jensen-Shannon divergence as a tool for studying the dynamics of frequency band components in EEG epileptic seizures, Phys. A 379, 122 (2007). [13] D M Mateos, L E Riveaud, P W Lamberti, Detecting dynamical changes in time series by using Jensen Shannon divergence, Chaos 27, 083118 (2017). [14] S J Sheather, Density estimation Stat. Sci. 19, 588 (2004). [15] A Papoulis, Probability, random variables and stochastic processes, McGraw-Hill, New York (1991). [16] J Burbea, C R Rao, On the convexity of some divergence measures based on entropy func- tions, IEEE T. Inform. Theory 28, 489 (1982). [17] J Lin, Divergence measures based on the Shan- non entropy, IEEE T. Inform. Theory 37, 145 (1991). [18] S Kullback, R A Leibler, On information and sufficiency, Ann. Math. Stat. 22, 79 (1951). 130001-10 https://doi.org/10.1016/j.physa.2006.12.051 https://doi.org/10.1016/j.physa.2006.12.051 https://aip.scitation.org/doi/abs/10.1063/1.4999613 https://aip.scitation.org/doi/abs/10.1063/1.4999613 https://www.jstor.org/stable/4144429 https://www.jstor.org/stable/4144429 10.1109/TIT.1982.1056497 10.1109/18.61115 10.1109/18.61115 https://www.jstor.org/stable/2236703 Introduction Method Kernel density approximation Monte Carlo integration Sequence segmentation Assessment results Mutual information between a discrete and a continuous variable Sequence segmentation Discussion and conclusions