Papers in Physics, vol. 13, art. 130001 (2021)

Received: 23 September 2020, Accepted: 11 January 2021
Edited by: D. H. Zanette
Licence: Creative Commons Attribution 4.0
DOI: https://doi.org/10.4279/PIP.130001

www.papersinphysics.org

ISSN 1852-4249

A method for continuous-range sequence analysis with Jensen-Shannon
divergence

M. A. Ré1, 2*, G. G. Aguirre Varela2, 3�

Mutual Information (MI) is a useful Information Theory tool for the recognition of mutual
dependence between data sets. Several methods have been developed fore estimation of
MI when both data sets are of the discrete type or when both are of the continuous
type. However, MI estimation between a discrete range data set and a continuous range
data set has not received so much attention. We therefore present here a method for the
estimation of MI for this case, based on the kernel density approximation. This calculation
may be of interest in diverse contexts. Since MI is closely related to the Jensen Shannon
divergence, the method developed here is of particular interest in the problems of sequence
segmentation and set comparisons.

I Introduction

Mutual Information (MI) is a quantity whose the-
oretical base originates in Information Theory[1].
Since MI between two independent random vari-
ables (RV) is zero, a non-null value of MI between
these variables gives a measure of mutual depen-
dence. When analyzing two data sets X and Y
(assumed to be the realization of two mutually de-
pendent RVs) MI can give us a measure of the

*re@famaf.unc.edu.ar
�guiava@gmail.com

1 Centro de Investigación en Informática para la Inge-
nieŕıa, Universidad Tecnológica Nacional, Facultad Re-
gional Córdoba, Maestro López esq. Cruz Roja Ar-
gentina, (5016) Córdoba, Argentina.

2 GFA - Facultad de Matemática, Astronomı́a, F́ısica
y Computación, Universidad Nacional de Córdoba,
Av. Medina Allende s/n, Ciudad Universitaria, (5000)
Córdoba, Argentina.

3 Instituto de F́ısica Enrique Gaviola (IFEG), Facultad de
Matemática, Astronomı́a, F́ısica y Computación, Univer-
sidad Nacional de Córdoba, Ciudad Universitaria, (5000)
Córdoba, Argentina.

mutual dependence of these sets. Although MI may
be straightforwardly calculated when the underly-
ing probability distributions are known, this is not
usually the case when only the data sets are avail-
able. Therefore, MI must be estimated from the
data sets themselves. When X and Y are the dis-
crete type, MI may be estimated by substituting
the joint probability of these variables by the rela-
tive frequency of appearance of each pair (x,y) in
the data sequence [2, 3]. For real value data sets
(or the discrete type with a wide range) estimation
of MI by frequency of appearance is not applica-
ble. The binning method [4] in turn requires large
bins or large sequences in order to produce reason-
able results. Alternative proposals have been made
for cases when both data sets are the continuous
type [5].

Estimation of MI between a discrete RV and a
continuous one has not been so extensively con-
sidered, in spite of being a problem of interest in
diverse situations. For instance, we could compare
the day of the week (weekday-weekend, discrete)
with traffic flow (continuous), quantifying this ef-
fect. In a different context we might wish to quan-

130001-1


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

tify the effect of a drug (administered or not, dis-
crete) in medical treatment evaluation (electroen-
cephalograms in epilepsy, continuous data). Ross[6]
has proposed a scheme for estimating MI based on
the nearest neighbour method [4]. Assuming a se-
quence of (x,y) pairs, with X being discrete and Y
continuous, the nearest neighbour method requires
the pairs to be ordered by the Y values. This re-
quirement makes the proposal impractical, in se-
quence analysis for instance. The nearest neigh-
bour method was also considered by Kraskov et al.
[7]. In their paper they suggest two ways of evaluat-
ing MI with this method. An alternative definition
for MI is presented by Gao et al. [8], also based on
the distance between the elements of the sequence.

In this paper we propose a more direct method
for estimating MI between a discrete and a con-
tinuous data set, based on the kernel density ap-
proximation (KDA)[4] for estimating the probabil-
ity density function (PDF) of the continuous vari-
able. For the discrete variable we make use of the
usual frequency approximation [2, 3]. Finally, MI is
computed by the Monte Carlo integration.

As shown by Grosse et al. [2] MI can be iden-
tified with the Jensen Shannon Divergence (JSD),
a measure of dissimilarity between two probabil-
ity distributions. JSD is a non-negative functional
that equals zero when the distributions being com-
pared are the same. This property makes JSD a
useful tool for sequence segmentation [2, 3]. Fur-
thermore, in diverse contexts it is of interest to
evaluate whether a given sequence matches a par-
ticular probability distribution. The most usual
case is that of a normal distribution. Neverthe-
less, this is a more general problem. For instance,
in satellite synthetic aperture radar (SAR) im-
ages the backscatter presents a multiplicative noise
assumed to have an exponential distribution [9].
Also, models for cloud droplet spectra assume a
Weibull distribution [10,11]. Several indirect meth-
ods have been proposed for analysis of continuous
range sequences. Pereyra et al. [12] outlined a
method based on wavelet transform to analyze elec-
troencephalograms. Recently, Mateos et al. [13]
have proposed a mapping from continuous value
sequences into discrete state sequences previous to
JSD calculation. Several other mapping methods
have been proposed in the literature to associate a
discrete probability distribution with a real value
series.

Here, by means of the KDA we avoid resorting to
any indirect method, approximating the probabil-
ity densities of continuous range variables by this
non parametric method. In section II we present
the calculation of MI and the arrangement for se-
quence segmentation with JSD. In section III we
test the perfomance of this method through numer-
ical experiments. Also considered is application of
the method in edge detection in a satellite synthetic
aperture radar (SAR) image. In section IV we con-
sider the results obtained.

II Method

In this section we present our proposal for estimat-
ing MI between discrete and continuous RVs, based
on the KDA estimator of a PDF. Let us consider
a sequence of pairs (x,y) with x as a variable of
discrete range and y of continuous range. To calcu-
late MI we resort only to the sequence itself, making
use of no extra information. We start from the se-
quence of data pairs (x,y), and assume that these
data are sampled from a joint probability density
µ (x,y), although unknown at first. From the joint
PDF the marginal probabilities

p (x) =

∫ ∞
−∞

dy µ (x,y) (1a)

φ (y) =
∑
x

µ (x,y) (1b)

are defined.
The MI between the RVs X and Y is expressed

in terms of these PDFs as [1]

I (X,Y ) =
∑
x

∫ ∞
−∞

dyµ (x,y) ln

[
µ (x,y)

p (x) φ (y)

]
. (2)

Note that if the variables X and Y are statistically
independent then µ (x,y) = p (x) φ (y), and in this
case I (X,Y ) = 0. In this way a value of I (X,Y ) 6=
0 gives a measure of the mutual dependence of these
variables. We may rewrite I (X,Y ) in terms of the
conditional PDFs

µ (y | x) =
µ (x,y)

p (x)
(3)

as

I (X,Y ) =
∑
x

p (x)

∫ ∞
−∞

dy µ (y | x) ln
[
µ (y | x)
φ (y)

]
.

(4)

130001-2


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

Figure 1: Kernel Density Approximation (KDA) for the Probability Density in (10) calculated from 1000 pairs
generated by the Monte Carlo method. For plot A ym = 1, while for plot B ym = 5. In both cases σg = 1. Solid
lines correspond to the analytic function and dashed lines to the KDA.

i Kernel density approximation

To carry out the calculation in (4), knowledge of
the conditional PDFs is necessary. As mentioned,
these densities are assumed to be unknown, and
have to be estimated from the data themselves.
Here we make use of the KDA [4], as summarized
in the following. The conditional PDFs in Eq. (3)
are estimated considering separately each data set
pair with a given value of x. We define the set

Cκ = {(x,y) /x = κ} (5)

and for each set we approximate the conditional
densities using a KDA with a Gaussian kernel

µ̂ (y |x=κ) =
1

nκhκ

1
√

2π

∑
yj�Cκ

exp

[
−

(y −yj)
2

2h2κ

]
.

(6)
Note that the sum is over the yj values in the set
Cκ, and nκ is the number of pairs in this set. The
bandwidth, hκ, is chosen as the optimal value, as
reported by [4] and followed by Steuer et al. [5]

hκ ' 1.06sκn−0.2κ (7)

where s2κ is the variance of the sample. Sheather
[14] considered alternative values to detect bi-
modality; however, as they mention, there is little
visual difference.

The marginal probability of X is approximated
by the frequency of ocurrence of each value

p̂ (x = κ) =
nκ
n

(8)

and the marginal probability density of Y by

φ̂ (y) =
∑
x

p̂ (x) µ̂ (y | x) . (9)

We illustrate the results obtained with the KDA
by an example: let us consider the joint probability
distribution µ (x,y)

µ (x = 1,y) =
1

3

1
√

2π
exp

[
−
y2

2

]
(10a)

µ (x = 2,y) =
2

3

1
√

2πσg
exp

[
−

(y −ym)
2

2σ2g

]
(10b)

and the corresponding marginal PDF

φ (y) =
1

3

1
√

2π
e−

y2

2 +
2

3

1
√

2πσg
e
−(y−ym)

2

2σ2g . (11)

We sampled 1000 pairs from this distribution for
two different values of ym, and from these pairs we
made an estimation of the conditional PDFs using
the KDA. In Fig. 1A and 1B we plot the probability
functions in (10) and (11) for two values of ym and
the corresponding approximations.

130001-3


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

v1v2 . . . vn1−1vn1︸ ︷︷ ︸
n1 values

vn1+1 . . . vn1+n2︸ ︷︷ ︸
n2 values

Figure 2: The segmentation problem. Consider a se-
quence S made up of two stationary subsequences S1
and S2, with n1 and n2 elements respectively. The
problem consists in determining the value of n1; i.e.,
the point when the statistical properties change.

ii Monte Carlo integration

After approximating the PDFs we have to compute
the integrals in (4) to estimate MI. We recognize in
these integrals the expectation value

〈
ln
µ (y |x)
φ (y)

〉
=

∞∫
−∞

dy µ (y |x) ln
[
µ (y |x)
φ (y)

]
(12)

that can be estimated by Monte Carlo integra-
tion [15]

〈
ln
µ (y |κ)
φ (y)

〉
'

1

nκ

∑
yj�Cκ

ln

[
µ̂ (yj |x=κ)

φ̂ (yj)

]
. (13)

Here the sum is again restricted to the yj values
associated with a particular x value. Note that in
this sum we make use of the kernel approximation
of the conditional PDFs in (6). Substituting both
approximations we finally get

Î (X,Y ) '
1

n

∑
x

∑
yj�Cκ

ln

[
µ̂ (yj | x)
φ̂ (yj)

]
. (14)

iii Sequence segmentation

The JSD is a measure of dissimilarity between
probability distributions. Originally proposed by
Burbea and Rao [16] and Lin [17] as a symmetrized
version of Kulback Leibler divergence [1, 18], a gen-
eralized weighted JSD between two PDFs, f1,f2 is
defined as

D [f1,f2] = H (π1f1 + π2f2)−π1H (f1)−π2H (f2)
(15)

with πi arbitrary weights satisfying π1 + π2 = 1.
Here H is Gibbs Shannon entropy, defined for con-
tinuous range variables as

H (fi) = −
∞∫
−∞

dy fi (y) ln [fi (y)] . (16)

As shown by Grosse et al. [2] JSD may be inter-
preted as MI between a discrete and a continuous
variable by identifying the weights πi with the dis-
crete variable probability in (1a):

πi = p (x = i) (17)

and the probability densities fi (y) with the condi-
tional densities in (3)

fi (y) = µ (y | x = i) . (18)
With these identifications, the functionals in (15)
and (4) are the same.

The JSD and several generalizations have been
succesfully applied to the sequence segmentation
problem, the partition of a non-stationary sequence
into stationary subsequences, for discrete range se-
quences. We propose here the extension of this
method to continuous range sequences without re-
sorting to discrete mapping, wavelet decomposition
or any other indirect method of estimation of the
probability distribution.

The procedure for sequence segmentation may
be stated in the following way: let us consider a se-
quence S with n elements made of two stationary
subsequences S1 and S2, with n1 and n2 values re-
spectively (n1 + n2 = n), schematically illustrated

v1v2 . . .︸ ︷︷ ︸
ν1

. . .︸ ︷︷ ︸
ν1

vn1−1vn1vn1+1 . . . vn1+n2

Figure 3: The sliding window method. A sliding win-
dow is defined for sequence segmentation. The window
is divided into two subwindows of equal size. The cen-
ter of the window is considered as the window position.
The window is displaced along the sequence and the
JSD between the subwindows is calculated. The seg-
mentation point is identified as the window position at
which JSD has its maximun value.

130001-4


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

Figure 4: Mutual information estimation for the joint distribution in (10). For the distribution in (10) the dots
represent the average MI value for 100 data sets of 1000 (x,y) pairs each, with the bars indicating the standard
deviation of each set. The black line is the analytical value of MI: A) as a function of the mean value ym in
(10b)(the inset shows the distribution of MI for a particular value of ym for a dependent and an independent
set), and B) changing σg, the standard deviation in (10b). The inset shows the same plot but in log-log scale to
highlight the MI value for independent sets.

in Fig. 2. The aim is to determine the value of n1;
i.e., the position of the last element in S1. In the
algorithm proposed here we define a sliding window
of fixed width over the sequence. The window is di-
vided into two segments, each including ν1 elements
(see Fig. 3). We define the window position as that
of the last element in the left section of the window.
This window is displaced over the sequence and the
window position where JSD reaches its maximun
value is taken as the segmentation point.

III Assessment results

In this section we present the results of our assess-
ment of the proposed method by considering two
applications: the detection of mutual dependence
between two RV sequences and the segmentation
of a sequence.

In the first case we generate sequences of two
jointly distributed variables: one of discrete range
and one of continuous range, and then we compute
MI between these variables. In the second case
we consider sequences made of two subsequences
generated from diferent distributions. We detect
the segmentation point following the procedure de-
scribed in the previous section. We also apply the
method to detect the edges between homogeneous
regions in SAR images.

i Mutual information between a discrete
and a continuous variable

We computed the MI between discrete and contin-
uous variables. We generated 100 data sets, sam-
pling 1000 (x,y) pairs from the distribution in (10)
with different values of ym or σg, and from the joint
distribution

µ (x= 1,y) =
1

3
[Θ (y−0.5) − Θ (y+0.5)] (19a)

µ (x= 2,y) =
2

3

1

a

[
Θ
(
y + ym −

a

2

)
−

Θ
(
y + ym +

a

2

)] (19b)
with Θ (y) the step function

Θ (y) =

{
0 for y < 0
1 for y > 0

(20)

with different values of ym or a. We estimated
the MI, I (X,Y ), from each set by the method de-
scribed in the previous section. Given that we are
sampling the data pairs from known distributions,
we are also able to calculate MI from the analyti-
cal expressions. In this way we may compare the
results obtained from the approximation with the
corresponding analytical results.

130001-5


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

Figure 5: Mutual information estimation for the joint distribution in (19). For the distribution in (19) the dots
represent the average MI value for 100 data sets of 1000 (x,y) pairs each, with the bars indicating the standard
deviation. The black line is the analytical value of MI while the dots represent the Kernel Density Approximation
(KDA) values; A) as a function of mean value ym in (19b), and B) changing the width parameter a in (19b).

In addition, we calculated the MI for samples
of statistically independent variables to establish a
significance value for the MI of the dependent vari-
ables. The analytical value in this case is zero, as
already mentioned. The results of the calculation
are shown in Fig. 4 for the distribution in (10) and
in Fig. 5 for the distribution in (19), respectively.
We include the average value of MI over the 100
data sets for the different values of the parame-
ters, and the bars correspond to the standard de-
viation in each set. A small underestimation of the
MI value can be seen in this last case. This may
be attributed to a shortcoming of the KDA at the
borders of the interval of the uniform distribution.
Nevertheless, it is still possible to detect mutual de-
pendence between the discrete and the continuous
value sequences.

To consider the effect of sample size, we repeated
the experiment with the distribution in (10) for dif-
ferent values of n, the number of data pairs in each
set. We again generated 100 data sets of n data
pairs each. The results are shown in Fig 6 for three
sets of parameters. A slightly increasing overesti-
mation of MI can be appreciated as n decreases. Fi-
nally, we considered an usual situation when there
is only one sample of data pairs available. We sam-
pled 1000 pairs from the distribution in (10), the
distribution in (19) and from the distribution

µ (x = 1,y) =
1

3
exp (−y) Θ (y)

µ (x = 2,y) =
2

3

1

2
exp (−y/2) Θ (y) .

(21)

For each sample we estimated MI by the approxi-
mate method in (14). To set up a significance value

Figure 6: Mutual information estimation for the dis-
tribution in (10). For the distribution in (10) the dots
represent the average value for 100 data sets of different
numbers of (x,y) pairs. Bars indicate the standard de-
viation, and dashed lines represent the analytical values
of MI for the different sets of parameters.

130001-6


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

Figure 7: Segmentation point in artificial sequences.
The JSD average computed for 500 sequences gener-
ated from Rayleigh distributions. Each sequence has a
length of 500 elements divided into two subsequences
with 250 elements each. The ratio of the mean val-
ues of the subsequences is given by rm = ml/mr = 5,
where ml and mr are the mean values in the left and
right subsequences, respectively. The sequences are an-
alyzed with different window widths (ww). In all cases
the window position (wp) of the maximum JSD average
is at the segmentation point.

for each sample we generated 100 data sets of 1000
pairs of independent variables. The discrete values
were sampled from the distribution

p (x) =
nx

1000
(22)

where nx gives the number of times that the value
x appears in the original sequence, and the contin-
uous values were sampled from the Gaussian distri-
bution

µ (y) =
1
√

2πs
exp

[
−

(y −m)2

2s2

]
(23)

independently of the value of x. Here m is the mean
value in the original sequence and s2 the sample
variance. We calculated the MI for each data set
and then the MI mean value and its variance. The
results are included in Table 1. A clear difference
can be seen between the MI of the dependent values
and those of the independent sequences.

ii Sequence segmentation

To test the sequence segmentation method, we gen-
erated sets of 500 sequences of 500 values each, di-

Figure 8: Segmentation point in artificial sequences.
The JSD average computed for 200 sequences gener-
ated from Rayleigh distributions. Each sequence has a
length of 500 elements divided into two subsequences
with 250 elements each. Different values of the mean
quotient rm = mr/ml are considered, where mr is the
mean value of the right subsequence and ml the mean
value of the left subsequence. In all cases a window
width of 50 elements was used. Even for the lowest
quotient value the window position (wp) of the maxi-
mum JSD average is coincident with the segmentation
point.

vided into two subsequences with 250 values in each
one. The sequences were generated from Rayleigh
distributions with a different mean value for each
subsequence. The mean value of the first subse-
quence is denoted by ml , and the mean value of
the second segment by mr ; we define the ratio
of the mean values as rm = ml/mr. Using the
sliding window method, we analyzed a set with
rm = 5 with several window widths. In Fig. 7
we present the average value across the 500 se-
quences of JSD at each window position for the
different widths considered. The average JSD has

Table 1: Mutual information and significance value.
MI of the sampled dependent sequences (see text) and
the corresponding significance values computed from
the independent sets.

PDF MI Significance Value
mean st. dev.

Gaussian 0.6359 4.5 × 10−3 1.8 × 10−3
Uniform 0.1429 4.5 × 10−3 1.8 × 10−3
Exponential 0.0718 4.5 × 10−3 1.9 × 10−3

130001-7


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

a maximum value at position 250, the segmentation
point, even for a narrow window with 20 elements
(10 elements in each subwindow), although in this
case statistical fluctuations are more noticeable. To
test the sensitivity of the method we generated sets
with rm = 1.2, 1.5, 2, 5, 10. The results of the algo-
rithm, with a window of 50 elements, are included
in Fig. 8. Even for the smallest ratio considered,
the segmentation point can be detected.

Finally, we present an example of application of
the segmentation algorithm to detect the edge be-
tween homogeneous regions in a SAR image. In
SAR images the backscatter is affected by speckle
noise (a multiplicative noise). This noise in the
backscatter amplitude is modelled by a Rayleigh
distribution in homogeneous regions. In Fig. 9 we
include a section of the SAR image of an Antarctic
region, and the boundary detected between water
and ice. On the right a plot of the values of the
backscatter amplitude of the highlighted lines in
the image and the JSD is included. There is good
coincidence of the detected boundary with the con-
tour in the image.

IV Discussion and conclusions

In this paper we have presented a method for com-
puting Mutual Information (MI) between discrete
and continuous data sets, or alternatively, the JSD
between continuous range data sets. The algorithm
developed gives a measure of dissimilarity without
resorting to an indirect method like those proposed
in [12, 13]. Neither is it necessary to have the con-
tinuous values ordered as in the nearest neighbour
method [4, 6]. In fact, the calculation in (14) is
based only on the registered data as they were
recorded.

The measure may be applied to two similar prob-
lems. On the one hand we can quantify the mutual
dependence between discrete and continuous data
sets, and on the other hand we can quantify the
dissimilarity between two continuous data sets, as
discussed in section II. In section III we applied
the method to artificially-generated pairs of vari-
ables, finding good agreement with the correspond-
ing analytical values as shown in Figs. 4 and 5, al-
though systematic underestimation occurs mainly
when the difference is given by the width in uni-
form distributions (fig 5-B). We attribute this dis-

crepancy to the abrupt decay of the uniform dis-
tribution at the borders of the interval, while the
KDA with a Gaussian kernel extends to infinity.
The MI values in these cases of mutually dependent
variables are clearly distinguishable from the MI
values of independent variables. We also considered
the dependence of the results of this method on the
length of the sequence. IIn Fig 6 a slightly increas-
ing overestimation of MI is seen with decreasing
length. Nevertheless, there is good agreement for
sequences of more than 400 pairs.

In real situations we frequently have only one
sequence of (X,Y ) pairs. We have proposed a
method for establishing a significance value by gen-
erating 100 sequences of independent variables with
probability distributions given by the estimated
marginal distribution for the discrete variable, and
by a Gaussian distribution for the continuous vari-
able with the same mean value and variance as
the marginal distribution of the original sequence.
We have considered sequences generated from three
distributions. In all three cases MI establishes a
clear difference between dependent and indepen-
dent sets, as shown in Table 1.

It has been shown that the Jensen Shannon Di-
vergence (JSD) is equivalent to MI [2]. Therefore,
the calculation method developed here will also be
suitable for computing JSD between two contin-
uous range data sets, and in this format the JSD
may be applied to the sequence segmentation prob-
lem as proposed in section II-iii. In this section we
suggested a method based on a fixed-length sliding
window. We considered the segmentation of arti-
ficially generated sequences in section III-ii. The
JSD average at each position in the sequences ex-
hibits a maximum at the segmentation point, as
shown in Fig. 7. As we continue this work we
will address the problem of comparing and analyz-
ing electrophysiological signals. The segmentation
method may also be of interest in detecting borders
in images. Work along these lines will be published
elsewhere.

Acknowledgements - We wish to acknowledge
partial support from SCyT - UTN through grant
UTI4811 and from SeCyT - UNC through grant
30720150100199CB.

130001-8


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

Figure 9: Border detection in SAR images. The segmentation method was applied to detection of the border
between homogeneous regions in a SAR image. The image was analyzed line by line and the segmentation point
at each line detected. The segmentation points are coincident with the border.

[1] T Cover, J Thomas, Elements of Information
Theory, J. Wiley, New York (2006).

[2] I Grosse, P Bernaola-Galván, P Carpena, R
Román-Roldán, J Oliver, H. E. Stanley, Anal-
ysis of symbolic sequences using the Jensen-
Shannon divergence, Phys. Rev. E, 65, 041905
(2002).

[3] M A Ré, R K Azad, Generalization of entropy
based divergence measures for symbolic se-
quence analysis, PLoS ONE 9, e93532 (2014).

[4] B W Silverman, Density estimation for statis-
tics and data analysis, Chapman and Hall,
London (1986).

[5] R Steuer, J Kurths, C O Daub, J Weise,
J Selbig, The mutual information: Detecting
and evaluating dependencies between variables,
Bioinformatics 18, S231 (2002).

[6] B C Ross, Mutual Information between dis-
crete and continuous data sets, PLoS ONE 9,
e87357 (2014).

[7] A Kraskov, H Stögbauer, P Grassberger, Esti-
mating mutual information, Phys. Rev. E 69,
066138 (2004).

[8] W Gao, S Kannan, S Oh, P Viswanath
Estimating mutual information for discrete-
continuous mixtures, 31st Conference on neu-
ral information processing systems (NIPS),
5986 (2017).

[9] A Moreira, P Prats-Iraola, M Younis, G
Krieger, I Hajnsek, K P Papathanassiou, A
tutorial on Synthetic Aperture Radar, IEEE
Geosci. Remote S. Magazine 1, 6 (2013).

[10] Y Liu, J Hallett, On size distributions of cloud
droplets growing by condensation: a new con-
ceptual model, J. Atmos. Sci. 55, 527 (1998).

[11] Y Liu, P H Daum, J Hallett, A generalized
systems theory for the effect of varying fluc-
tuations on cloud droplet size distributions J.
Atmos. Sci. 59, 2279 (2002).

130001-9

https://doi.org/10.1103/PhysRevE.65.041905
https://doi.org/10.1103/PhysRevE.65.041905
https://doi.org/10.1371/journal.pone.0093532
https://doi.org/10.1093/bioinformatics/18.suppl_2.S231
https://doi.org/10.1371/journal.pone.0087357
https://doi.org/10.1371/journal.pone.0087357
https://doi.org/10.1103/PhysRevE.69.066138
https://doi.org/10.1103/PhysRevE.69.066138
https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html
https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html
https://papers.nips.cc/paper/2017/hash/ef72d53990bc4805684c9b61fa64a102-Abstract.html
10.1109/MGRS.2013.2248301
10.1109/MGRS.2013.2248301
https://doi.org/10.1175/1520-0469(1998)055<0527:OSDOCD>2.0.CO;2
https://doi.org/10.1175/1520-0469(2002)059<2279:AGSTFT>2.0.CO;2
https://doi.org/10.1175/1520-0469(2002)059<2279:AGSTFT>2.0.CO;2


Papers in Physics, vol. 13, art. 130001 (2021) / M. A. Ré et al.

[12] M E Pereyra, P W Lamberti, O A Rosso,
Wavelet Jensen-Shannon divergence as a tool
for studying the dynamics of frequency band
components in EEG epileptic seizures, Phys.
A 379, 122 (2007).

[13] D M Mateos, L E Riveaud, P W Lamberti,
Detecting dynamical changes in time series by
using Jensen Shannon divergence, Chaos 27,
083118 (2017).

[14] S J Sheather, Density estimation Stat. Sci. 19,
588 (2004).

[15] A Papoulis, Probability, random variables and
stochastic processes, McGraw-Hill, New York
(1991).

[16] J Burbea, C R Rao, On the convexity of some
divergence measures based on entropy func-
tions, IEEE T. Inform. Theory 28, 489 (1982).

[17] J Lin, Divergence measures based on the Shan-
non entropy, IEEE T. Inform. Theory 37, 145
(1991).

[18] S Kullback, R A Leibler, On information and
sufficiency, Ann. Math. Stat. 22, 79 (1951).

130001-10

https://doi.org/10.1016/j.physa.2006.12.051
https://doi.org/10.1016/j.physa.2006.12.051
https://aip.scitation.org/doi/abs/10.1063/1.4999613
https://aip.scitation.org/doi/abs/10.1063/1.4999613
https://www.jstor.org/stable/4144429
https://www.jstor.org/stable/4144429
10.1109/TIT.1982.1056497
10.1109/18.61115
10.1109/18.61115
https://www.jstor.org/stable/2236703

	Introduction
	Method
	Kernel density approximation
	Monte Carlo integration
	Sequence segmentation

	Assessment results
	Mutual information between a discrete and a continuous variable
	Sequence segmentation

	Discussion and conclusions