Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844
Vol. IV (2009), No. 2, pp. 104-117

A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy

Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

Răzvan Andonie
Computer Science Department
Central Washington University, Ellensburg, USA
and
Department of Electronics and Computers
Transylvania University of Braşov, Romania
E-mail: andonie@cwu.edu

Angel Caţaron
Department of Electronics and Computers
Transylvania University of Braşov, Romania
E-mail: cataron@vega.unitbv.ro

Lucian Mircea Sasu
Applied Informatics Department
Transylvania University of Braşov, Romania
E-mail: lmsasu@unitbv.ro

Abstract: Fuzzy ARTMAP with Relevance factor (FAMR) is a Fuzzy ARTMAP
(FAM) neural architecture with the following property: Each training pair has a rel-
evance factor assigned to it, proportional to the importance of that pair during the
learning phase. Using a relevance factor adds more flexibility to the training phase,
allowing ranking of sample pairs according to the confidence we have in the infor-
mation source or in the pattern itself.
We introduce a novel FAMR architecture: FAMR with Feature Weighting (FAM-
RFW). In the first stage, the training data features are weighted. In our experiments,
we use a feature weighting method based on Onicescu’s informational energy (IE). In
the second stage, the obtained weights are used to improve FAMRFW training. The
effect of this approach is that category dimensions in the direction of relevant features
are decreased, whereas category dimensions in the direction of non-relevant feature
are increased. Experimental results, performed on several benchmarks, show that
feature weighting can improve the classification performance of the general FAMR
algorithm.
Keywords: Fuzzy ARTMAP, feature weighting, LVQ, Onicescu’s informational en-
ergy.

1 Introduction

The FAM architecture is based upon the adaptive resonance theory (ART) developed by Carpenter
and Grossberg [7]. FAM neural networks can analyze and classify noisy information with fuzzy logic,
and can avoid the plasticity-stability dilemma of other neural architectures. The FAM paradigm is prolific
and there are many variations of Carpenter’s et al. [7] initial model: ART-EMAP [9], dARTMAP [8],
Boosted ARTMAP [27], Fuzzy ARTVar [12], Gaussian ARTMAP [28], PROBART [21], PFAM [20],

Copyright © 2006-2009 by CCC Publications



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 105

Ordered FAM [11], and µARTMAP [14]. The FAM model has been incorporated in the MIT Lincoln
Lab system for data mining of geospatial images because of its computational capabilities for incremental
learning, fast stable learning, and visualization [25].

One way to improve the FAM algorithm is to generalize the distance measure between vectors [10].
Based on this principle, we introduced in previous work [2] a novel FAM architecture with distance mea-
sure generalization: FAM with Feature Weighting (FAMFW). Feature weighting is a feature importance
ranking algorithm where weights, not only ranks, are obtained. In our approach, training data feature
weights were first generated. Next, these weights were used by the FAMFW network, generalizing the
distance measure. Potentially, any feature weighting method can be used, and this makes the FAMFW
very general.

Feature weighting can be achieved, for example, by LVQ type methods. Several such techniques
have been recently introduced. These methods combine the LVQ classification with feature weighting.
In one of these approaches, RLVQ (Relevance LVQ), feature weights were determined to generalize the
LVQ distance function [16]. A modification of the RLVQ model, GRLVQ (Generalized RLVQ), has
been proposed in [18]. The SRNG (Supervised Relevance Neural Gas) algorithm [17] combines the NG
(Neural Gas) algorithm [22] and the GRLVQ. NG [22] is a neural model applied to the task of vector
quantization by using a neighborhood cooperation scheme and a soft-max adaptation rule, similar to the
Kohonen feature map.

In [1], we introduced the Energy Supervised Relevance Neural Gas (ESRNG) feature weighting
algorithm. The ESRNG is based on the SRNG model. It maximizes Onicescu’s IE as a criteria for
computing the weights of input features. The ESRNG is the feature weighting algorithm we used in [2],
in combination with our FAMFW algorithm .

FAMR is a FAM incremental learning system introduced in our previous work [4]. During the
learning phase, each sample pair is assigned a relevance factor proportional to the importance of that
pair. The FAMR has been successfully applied to classification, probability estimation, and function
approximation. In FAMR, the relevance factor of a training pair may be user-defined, or computed, and
is proportional to the importance of the respective pair in the learning process.

In the present paper, we focus on the FAMR neural network, the ESRNG feature weighting algorithm,
and the distance measure generalization principle. We contribute the following:

1. We introduce a novel FAMR architecture with distance measure generalization: FAMR with Fea-
ture Weighting (FAMRFW), adapting the FAMFW model for the FAMR case.

2. Compared to [2], we include new experiments on standard benchmarks.

We first introduce the basic FAM and FAMR notations (Section 2), and the ESRNG feature weighting
algorithm (Section 3). In Section 4, we describe the new FAMRFW algorithm, which uses a weighted
distance measure. Section 5 contains experimental results performed with the FAMRFW method. Sec-
tion 6 contains the final remarks.

2 A brief description of the FAMR

We will summarize the FAM standard architecture and the FAMR learning mechanism, which dif-
ferentiates it from the standard FAM.

2.1 The FAM architecture

A detailed FAM description can be found in Carpenter’s et al. seminal paper [7], but more simplified
presentations are given in [26] and [19].



106 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

Figure 1: Fuzzy ARTMAP architecture [7].

The FAM architecture consists of a pair of fuzzy ART modules, ARTa and ARTb, connected by
an inter–ART module called Mapfield (see Fig. 1). ARTa and ARTb are used for coding the input
and output patterns, respectively, and Mapfield allows mapping between inputs and outputs. The ARTa
module contains the input layer Fa1 and the competitive layer F

a
2 . A preprocessing layer F

a
0 is also added

before Fa1 . Analogous layers appear in ARTb.
The initial input vectors have the form: a = (a1, . . . , an) ∈ [0, 1]n. A data preprocessing technique

called complement coding is performed by the Fa0 layer in order to avoid node proliferation. Each input
vector a produces the normalized vector A = (a, 1 − a) whose L1 norm is constant: |A| = n.

Let Ma be the number of nodes in Fa1 and Na be the number of nodes in F
a
2 . Due to the preprocessing

step, Ma = 2n. The weight vector between Fa1 and F
a
2 is w

a. Each Fa2 node represents a class of inputs
grouped together, denoted as a category. Each Fa2 category has its own set of adaptive weights stored in
the form of a vector waj , j = 1, . . . Na, whose geometrical interpretation is a hyper-rectangle inside the
unit box. Similar notations are used for the ARTb module. For a classification problem, the class index
is the same as the category number in Fb2 , thus ARTb can be substituted with a vector.

The Mapfield module allows FAM to perform associations between ARTa and ARTb categories. The
number of nodes in Mapfield is equal to the number of nodes in Fb2 . Each node j from F

a
2 is linked to

each node from Fb2 via a weight vector w
ab
j .

The learning algorithm is sketched below. For each training pattern, the vigilance parameter factor
ρa is set equal to its baseline value, and all nodes are not inhibited. For each (preprocessed) input A, a
fuzzy choice function is used to get the response for each Fa2 category:

Tj(A) =
|A ∧ waj |
αa + |waj |

, j = 1, . . . , Na (1)

Let J be the node with the highest value computed as in (1). If the resonance condition from eq. (2)
is not fulfilled:

ρ(A, waJ ) =
|A ∧ waJ |

|A|
≥ ρa, (2)

then the Jth node is inhibited such that it will not participate to further competitions for this pattern and a
new search for a resonant category is performed. This might lead to creation of a new category in ARTa.



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 107

A similar process occurs in ARTb and let K be the winning node from ARTb. The Fb2 output vector is set
to:

ybk =

{
1, if k = K
0, otherwise

k = 1, . . . , Nb (3)

An output vector xab is formed in Mapfield: xab = yb ∧ wabj . A Mapfield vigilance test controls the
match between the predicted vector xab and the target vector yb:

|xab|
|yb|

≥ ρab (4)

where ρab ∈ [0, 1] is a Mapfield vigilance parameter. If the test from (4) is not passed, then a sequence
of steps called match tracking is initiated (the vigilance parameter ρa is increased and a new resonant
category will be sought for ARTa); otherwise learning occurs in ARTa, ARTb, and Mapfield:

wa(new)J = βa
(

A ∧ wa(old)J
)

+ (1 − βa)w
a(old)
J (5)

(and the analogous in ARTb) and wabJk = δkK, where δij is Kronecker’s delta. With respect to βa, there
are two learning modes: i) fast learning for βa = 1 for the entire training process, and ii) fast-commit
and slow-recode learning corresponds to setting βa = 1 when creating a new node and βa < 1 for
subsequent learning.

2.2 The FAMR learning mechanism

The main difference between the FAMR and the original FAM is the updating scheme of the wabjk
weights. The FAMR uses the following iterative updating [4]:

w
ab(new)
jk =





w
ab(old)
jk if j 6= J

w
ab(old)
JK +

qt
Qnew

J

(
1 − w

ab(old)
JK

)

w
ab(old)
Jk

(
1 −

qt
Qnew

J

)
if k 6= K

(6)

where qt is the relevance assigned to the tth input pattern (t = 1, 2, . . . ), and QnewJ = Q
old
J + qt. The

relevance qt is a real positive finite number directly proportional to the importance of the experiment
considered at step t. This wabjk approximation is a correct biased estimator of the posterior probability
P(k|j), the probability of selecting the k-th ARTb category after having selected the j-th ARTa category
[4].

Let Q be the vector [Q1 . . . QNa ]; initially, each Qj (1 ≤ j ≤ Na) has the same initial value q0. Na
and Nb are the number of categories in ARTa and ARTb, respectively. These are initialized at 0. For
incremental learning of one training pair, the FAMR Mapfield learning scheme is described by Algorithm
1. The vigilance test is:

Nb w
ab
JK ≥ ρab (7)

For a clearer presentation, not to create a confusion between vector relevancies and feature weights,
we will assume in all our following experiments that relevancies are set to a constant positive value.
Since we actually do not use relevances, is this FAMR equivalent to the standard FAM model, as intro-
duced in [7]? The answer is no, because, unlike the standard FAM: i) the FAMR accepts one-to-many
relationships; and ii) the FAMR is a conditional probability estimator, with an estimated convergence
rate computed in [4].



108 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

Algorithm 1 The t-th iteration in the FAMR Mapfield algorithm [4].
Step 1. Accept the t-th vector pair (a, b) with relevance factor qt.
Step 2. Find a resonant category in ARTb or create a new one.
if |b ∧ wbk|/|b| < ρb, for k = 1, . . . , Nb then

Nb = Nb + 1{add a new category to ARTb}
K = Nb

if Nb > 1 then
wabjK =

q0
NbQj

, for j = 1, . . . , Na {append new component to wabj }

wabjk = w
ab
jk −

wab
jK

Nb−1
, for k = 1, . . . , K − 1; j = 1, . . . Na{normalize}

end if
else

Let K be the index of the ARTb category passing the resonance condition and with maximum
activation function.

end if
Step 3. Find a resonant category in ARTb or create a new one.
if |a ∧ waj |/|a| < ρa, for j = 1, . . . , Na then

Na = Na + 1{add a new category to ARTa}
J = Na
QJ = q0 {append new component to Q}
wabJk = 1/Nb, for k = 1, . . . , Nb {append new row to w

ab}
else

Let J be the index of the ARTa category passing the resonance condition and with maximum
activation function.

end if
Step 4. J, K are winners or newly added nodes. Check if match tracking applies.
if vigilance test (7) is passed then

{learn in Mapfield}
QJ = QJ + qt
wabJK = w

ab
JK +

qt
QJ

(1 − wabJK )

wabJk = w
ab
Jk

(
1 −

qt
QJ

)
, for k = 1, . . . , Nb, k 6= K

else
perform match tracking and restart from step 3

end if

3 The ESRNG feature weighting algorithm

We use the ESRNG feature weighting algorithm to compute the generalized distance measure in the
FAMRFW. Details of the ESRNG algorithm can be found in [1]. Is is based on Onicescu’s IE, and
approximates the unilateral dependency of random variables by Parzen windows approximation. Before
outlining the principal steps of the ESRNG method, we review the basic properties of the IE.

3.1 Onicescu’s informational energy

For a discrete random variable X with probabilities pk, the IE was introduced in 1966 by Octav
Onicescu [24] as E(X) =

∑n
k=1 p

2
k. For a continuous random variable Y, the IE was defined by Silviu

Guiaşu [15]:

E(Y) =

∫ +∞
−∞

p2(y)dy,



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 109

where p(y) is the probability density function.
For a continuous random variable Y and a discrete random variable C, the conditional IE is defined

as:

E(Y|C) =

∫

y

M∑

m=1

p(cm)p
2(y|cm)dy.

In order to study the interaction between two random variables X and Y, the following measure of
unilateral dependency was introduced by Andonie et al. [3]:

o(Y, X) = E(Y|X) − E(Y)

with the following properties:

1. o is not symmetrical with respect to its arguments;

2. o(Y, X) ≥ 0 and the equality holds iff Y and X are independent;
3. o(Y, X) ≤ 1 − E(Y) and the equality holds iff Y is completely dependent on X.

This measure quantifies the unilateral dependence characterizing Y with respect to X and corresponds
to the amount of information detained by X about Y.

3.2 The feature weighting procedure

ESRNG is an online algorithm which adapts a set of LVQ reference vectors by minimizing the
quantization error. At each iteration, it also adapts the input vector feature weights. The core of the
method is based on the maximization of the o(Y, C) measure.

To connect input vector xi with its class j, represented by vector wj, we use a simple transform. We
consider a continuous random variable Y with its samples yi = λI(xi − wj), i = 1, . . . , N, where:

• λ is the vector of weights;

• xi, i = 1, . . . , N, are the training vectors, each of them from one of the classes c1, c2, . . . , cM;

• wj, j = 1, . . . , P, are the LVQ determined class prototypes.

Assuming that the M class labels are samples of a discrete random variable denoted by C, we can
use gradient ascend to iteratively update the feature weights by maximizing o(Y, C):

λ(t+1) = λ(t) + α

N∑

i=1

∂o(Y, C)

∂yi
I (xi − wj) .

From the definition of o(Y, X), we obtain:

o(Y, C) = E(Y|C) − E(Y) =

M∑

p=1

1

p(cp)

∫

y
p2(y, cp)dy −

∫

y
p2(y)dy. (8)

This expression involves a considerable computational effort. Therefore, we approximate the prob-
ability densities from the integrals using the Parzen windows estimation method. The multidimensional
Gaussian kernel is [13]:

G(y, σ2I) =
1

(2π)
d
2 σd

· e−
yty
2σ2 (9)



110 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

where d is the dimension of the definition space of the kernel, I is the identity matrix, and σ2I is the
covariance matrix.

We approximate the probability density p(y) replacing each data sample yi with a Gaussian kernel,
and averaging the obtained values:

p(y) =
1

N

N∑

i=1

G(y − yi, σ2I).

We denote by Mp the number of training samples from class cp. We have:

∫

y
p2(y, cp)dy =

1

N2

Mp∑

k=1

Mp∑

l=1

G(ypk − ypl, 2σ2I)

and ∫

y
p2(y)dy =

1

N2

N∑

k=1

N∑

l=1

G(yk − yl, 2σ2I),

where ypk, ypl are two training samples from class cp, whereas yk, yl represent two training samples
from any class.

Equation (8) can be rewritten, and we obtain the final ESRNG update formula of the feature weights:

λ(t+1) = λ(t) − α
1

4σ2
G(y1 − y2, 2σ2I) · (y2 − y1)I ·

·(x1 − wj(1) − x2 + wj(2)),

where wj(1) and wj(2) are the closest prototypes to x1 and x2, respectively.
The ESRNG algorithm has the following general steps:

1. Update the reference vectors using the SRNG scheme.

2. Update the feature weights.

3. Repeat Steps 1 and 2, for all training set samples.

This algorithm uses a generalized Euclidean distance. The updating formula for the reference vectors
can be found in [1]; we will not explicitly use this formula in the present paper.

The ESRNG algorithm generates numeric values assigned to each input feature, quantifying their
importance in the classification task: the most relevant feature receives the highest numeric value. We
use these factors as feature weights in the FAMRFW algorithm.

4 FAMRFW – a novel neural model

The FAMRFW is a FAMR architecture with a generalized distance measure. For an ARTa category
wj, we define its size s(wj):

s(wj) = n − |wj| (10)

and the distance to a normalized input A:

dis(A, wj) = |wj| − |A ∧ wj| =
n∑

i=1

dji, (11)



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 111

where (dj1, . . . , djn) = wj − A ∧ wj. In [10] it is shown that:

Tj(A) =
n − s(wj) − dis(A, wj)

n − s(wj) + αa
(12)

ρ(A, waJ ) =
n − s(wj) − dis(A, wj)

n
(13)

A generalization of dis(A, wj) is the weighted distance:

dis(A, wj; λ) =
n∑

i=1

λidji, (14)

where λ = (λ1, . . . , λn), and λi ∈ [0, n] is the weight associated to the ith feature. We impose the
constraint |λ| = n. For λ1 = · · · = λn = 1, we obtain in particular the FAMR.

Charalampidis et al. [10] used the following weighted distance:

dis(x, wj|λ, ref) =
n∑

i=1

(1 − λ)lrefj + λ

(1 − λ)lji + λ
dji, (15)

where lrefj is a function of category j’s lengths of the hyper-rectangle, and λ is a scalar in [0, 1]. In
our case, the function dis(A, wj; λ) does not depend on sides of the category created during learning, but
on the computed feature weights. This makes our approach very different than the one in [10].

The effect of using distance dis(A, wj; λ) for a bidimensional category is depicted in Fig. 2(a). The
hexagonal shapes represent the points situated at constant distance from the category. These shapes are
flattened in the direction of the feature with a larger weight and elongated in the direction of the feature
with a smaller weight. This is in accordance with the following intuition: The category dimension in
the direction of a relevant feature should be smaller than the category dimension in the direction of a
non-relevant feature. Hence, we may expect that more categories will cover the relevant directions than
the non-relevant ones.

(a) Bounds for constant weighted distance
dis(A, wj; λ) for various values of λ. The rect-
angle in the middle represents a category.

(b) Bounds for constant distance dis(A, wj; λ) for
the null feature weight. The rectangle in the middle
represents the category.

Figure 2: Geometric interpretation of constant distance when using dis(A, wj; λ) for bidimensional pat-
terns.

For a null weight feature (Fig. 2(b)), the bounds are reduced to parallel lines on both sides of the
rectangle representing the category. In this extreme case, the discriminative distance is the one along the
remaining feature dimension. This is another major difference between our approach and the one in [10],
where, while using function dis(x, wj|λ, ref), the contours of a constant weighted distance are inside



112 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

some limiting hexagons. In our method, the contour is insensitive to the actual value of the null weighted
feature.

5 Experimental results

We test the FAMRFW for several standard classification tasks, all from the UCI Machine Learning
Repository [5]. The experiments are performed on the FAMR and the FAMRFW architectures. The two
FAMRFW stages are: i) the λ feature weights are obtained by the ESRNG algorithm; ii) these weights
are used both for training and testing the FAMR.

A nice feature of the FAM architectures and the ESRNG algorithm is the on-line (incremental) learn-
ing capability, i.e., the training set is processed only once. This type of learning is especially useful when
dealing with very large datasets, since it can reduce significantly the computational overhead. For FAMR
training and for both FAMRFW stages we use on-line learning.

5.1 Methodology

For each experiment, we use three-way data splits (i.e., the available dataset is divided into training,
validation, and test sets) and random subsampling. Random subsampling is a faster, simplified version
of k-fold cross validation:

1. The dataset is randomized.

2. The first 60% of the dataset is used for training and the next 20% for validation (i.e., for tuning
the model parameters). The following parameters are optimized using a simple “grid-search” for
ρa, ρab ∈ {0, 0.1, . . . , 0.9} and βa ∈ {0, 0.1, . . . , 1}. The goal is to allow both fast learning and
fast-commit slow-recode. The optimal parameter values are the ones producing the highest PCC
and the lowest number of ARTa categories.

3. The network with optimal parameters is trained with the joint training + validation data.

4. The last 20% of the dataset is used for testing. As a result, the percent of correct classification
(PCC) and the number of generated ARTa categories are computed.

5. Repeat this procedure six times.

The ρa value, optimized during training/validation, controls the number of generated ARTa cate-
gories. After training/validation, this number does not change. For ρa > 0, some test vectors may be
rejected (i.e., not classified).

In all our experiments, after the ARTa categories were generated, we set ρa = 0 for testing. This has
the following positive effects:

• All test vectors are necessarily classified.

• We obtain experimentally better classification results, both for the FAMR and the FAMRFW, com-
pared to the ones with optimized ρa values. This is shown in Table 1, for all considered classifi-
cation tasks. The feature weights values in the FAMRFW are the ones mentioned in the following
sections.



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 113

Table 1: Average PCC test set results using the optimized ρa (computed in the validation phase) vs.
using ρa = 0.

FAMR FAMRFW
optimized ρa ρa = 0 optimized ρa ρa = 0

Breast cancer 86.54% 91.22% 91.22% 91.22%
Balance scale 75.86% 76.53% 75.92% 78.13%

Wine recognition 83.33% 84.72% 83.79% 89.35%
Ionosphere 85.44% 88.96% 85.91% 89.43%

5.2 Breast cancer classification

This dataset (formally called Wisconsin Diagnostic Breast Cancer) includes 569 instances. The
instances are described by 30 real attributes. The given features are computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.

The FAMRFW generated weights are: [0.784, 0.816, 0.795, 2.847, 0.784, 0.784, 0.784, 0.784, 0.784,
0.784, 0.784, 0.784, 0.785, 0.808, 0.784, 0.784, 0.784, 0.784, 0.784, 0.784, 0.784, 0.829, 0.828, 5.047,

0.784, 0.784, 0.784, 0.784, 0.784, 0.784]. In Table 2, we observe that the average PCC for the FAMR
and the FAMRFW is the same, but the FAMRFW has much less ARTa categories than the FAMR.

Table 2: Classification performance for the Breast Cancer Problem.

Test FAMR FAMRFW
no. No. of ARTa categories PCC No. of ARTa categories PCC
1 61 93.85% 24 87.71%
2 7 90.35% 7 93.85%
3 10 95.61% 8 91.22%
4 39 85.08% 6 88.59%
5 6 92.98% 6 94.73%
6 6 89.47% 5 91.22%

Average 21.5 91.22% 9.33 91.22%

5.3 Balance scale classification

This dataset was generated to model psychological experimental results. Each example is classified
as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are the left weight,
the left distance, the right weight, and the right distance. The correct way to find the class is the greater of
(left-distance * left-weight) and (right-distance * right-weight). If they are equal, it is balanced. The set
contains 625 patterns, with a uneven distribution of the three classes; each input pattern has 4 features.

The ESRNG generated feature weights are λ = [1.002, 1.113, 0.827, 1.058]. The FAMRFW has
better classification accuracy and less ARTa categories than the FAMR (Table 3).

5.4 Wine recognition

The Wine recognition data are the results of a chemical analysis of wines grown in the same region in
Italy, but derived from three different cultivars. The analysis determined the quantities of 13 constituents
found in each of the 3 types of wines. The dataset contains 178 instances.



114 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

Table 3: Classification performance for the Balance Scale Problem.

Test FAMR FAMRFW
no. No. of ARTa categories PCC No. of ARTa categories PCC
1 95 74.4% 53 75.2%
2 70 80.0% 39 80.0%
3 22 78.4% 54 81.6%
4 75 75.2% 44 85.6%
5 125 71.2% 69 72.0%
6 62 80.0% 107 74.4%

Average 74.83 76.53% 61 78.13%

The ESRNG algorithm produced the weights λ = [0.900, 0.757, 0.659, 1.668, 2.349,
0.702, 1.028, 0.668, 0.774, 0.874, 0.666, 0.701, 1.253]. The FAMRFW classification results are better,
with less generated ARTa categories (Table 4).

Table 4: Classification performance for the Wine Recognition Problem.

Test FAMR FAMRFW
no. No. of ARTa categories PCC No. of ARTa categories PCC
1 10 88.88% 6 86.11%
2 15 97.22% 10 97.22%
3 32 69.44% 11 86.11%
4 17 83.33% 11 86.11%
5 55 80.55% 39 94.44%
6 12 88.88% 8 86.11%

Average 23.5 84.71% 14.16 89.35%

5.5 Ionosphere

This binary classification problem starts from collected radar datasets. The data come from 16 high-
frequency antennas, targeting the free electrons in the ionosphere. “Good” radar returns are those show-
ing evidence of some type of structure in the ionosphere. “Bad” returns are those passing through the
ionosphere. There are 351 instances and each input pattern has 34 features.

The ESRNG generated λ vector is: [0.551,0.520,1.179,1.168,1.301,1.180,0.940,1.272,1.024,0.903,
0.843,0.976,0.870,0.844,0.807,0.877,0.893,1.012,0.994,1.012,0.964,1.061,1.029,1.227,0.978,1.020,0.943,
1.027,1.087,1.032,0.978,1.117,0.999,1.374]. On average, FAMRFW produced much less ARTa cate-
gories than the FAMR. This time, the FAMR produced a slightly better PCC (Table 5).

6 Conclusions

According to our experiments, using the feature relevances and the generalized distance measure
may improve the classification accuracy of the FAMR algorithm. In addition, the FAMRFW uses less
ARTa categories, which is an important factor. The number of categories controls the generalization



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 115

Table 5: Classification performance for the Ionosphere Problem.

Test FAMR FAMRFW
no. No. of ARTa categories PCC No. of ARTa categories PCC
1 28 81.69% 8 90.14%
2 20 81.69% 8 85.91%
3 17 91.54% 7 83.09%
4 9 94.36% 8 88.73%
5 5 90.14% 5 94.36%
6 9 94.36% 5 94.36%

Average 14.66 88.96% 6.83 89.43%

capability and the computational complexity of a FAM architecture. This generalization is a trade-off
between overfitting and underfitting the training data. It is good to minimize the number of categories if
this does not decrease too much the classification accuracy.

The ESRNG feature weighting algorithm can be replaced by other weighting methods. We have
not tested the function approximation capability of the FAMRFW neural network because the ESRNG
weighting algorithm is presently restricted to classification tasks. LVQ methods can be extended to
function approximation [23] and we plan to adapt the ESRNG algorithm in this sense. This would
enable us to test the FAMRFW + ESRNG procedure on standard feature approximation and prediction
benchmarks.

Our approach is at the intersection of two major computational paradigms:
1. Carpenter and Grossberg’s adaptive resonance theory, an advanced distributed model where par-

allelism is intrinsic to the problem, not just a mean to speed up [6].
2. Onicescu’s informational energy and the unilateral dependency measure. To the best of our

knowledge, we are the only ones using Onicescu’s energy in neural processing systems.

Bibliography

[1] R. Andonie and A. Caţaron. Feature ranking using supervised neural gas and informational energy.
In Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN2005), Canada,
Montreal, July 31 - August 4, 2005.

[2] R. Andonie, A. Caţaron, and L. Sasu. Fuzzy ARTMAP with feature weighting. In Proceedings
of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2008),
Innsbruck, Austria, Febr. 11-13, 2008, 91–96.

[3] R. Andonie and F. Petrescu. Interacting systems and informational energy. Foundation of Control
Engineering, 11, 1986, 53–59.

[4] R. Andonie and L. Sasu. Fuzzy ARTMAP with input relevances. IEEE Transactions on Neural
Networks, 17, 2006, 929–941.

[5] A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. Uni-
versity of California, Irvine, School of Information and Computer Sciences
http://www.ics.uci.edu/∼mlearn/MLRepository.html



116 Răzvan Andonie, Lucian Mircea Sasu, Angel Caţaron

[6] I. Dziţac and B. E. Bărbat. Artificial intelligence + distributed systems = agents. International
Journal Computers, Communications, and Control, 4, 2009, 17–26.

[7] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B. Rosen. Fuzzy ARTMAP:
A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional
Maps. IEEE Transactions on Neural Networks, 3, 1992, 698–713.

[8] G. A. Carpenter, B. L. Milenova, and B. W. Noeske. Distributed ARTMAP: A neural network for
fast distributed supervised learning. Neural Networks, 11, 1998, 793–813.

[9] G. A. Carpenter and W. Ross. ART-EMAP: A neural network architecture for learning and predic-
tion by evidence accumulation. IEEE Transactions on Neural Networks, 6, 1995, 805–818.

[10] D. Charalampidis, G. Anagnostopoulos, M. Georgiopoulos, and T. Kasparis. Fuzzy ART and Fuzzy
ARTMAP with adaptively weighted distances. In Proceedings of the SPIE, Applications and Sci-
ence of Computational Intelligence, Aerosense, 2002.

[11] I. Dagher, M. Georgiopoulos, G. L. Heileman, and G. Bebis. An ordering algorithm for pattern pre-
sentation in Fuzzy ARTMAP that tends to improve generalization performance. IEEE Transactions
on Neural Networks, 10, 1999, 768–778.

[12] I. Dagher, M. Georgiopoulos, G. L. Heileman, and G. Bebis. Fuzzy ARTVar: An improved fuzzy
ARTMAP algorithm. In Proceedings IEEE World Congress Computational Intelligence WCCI’98,
Anchorage, 1998, 1688–1693.

[13] J. C. Principe et al. Information-theoretic learning. In S. Haykin, editor, In Unsupervised Adaptive
Filtering. Wiley, New York, 2000.

[14] E. Gomez-Sanchez, Y. A. Dimitriadis, J. M. Cano-Izquierdo, and J. Lopez-Coronado. µARTMAP:
Use of mutual information for category reduction in fuzzy ARTMAP. IEEE Transactions on Neural
Networks, 13, 2002, 58–69.

[15] S. Guiaşu. Information theory with applications. McGraw Hill, New York, 1977.

[16] B. Hammer, D. Schunk, T. Bojer, and T. K. von Toschanowitz. Relevance determination in learning
vector quantization. In Proceedings of the European Symposium on Artificial Neural Networks
(ESANN 2001), Bruges, Belgium, 2001, 271–276.

[17] B. Hammer, M. Strickert, and T. Villmann. Supervised neural gas with general similarity measure.
Neural Processing Letters, 21, 2005, 21–44.

[18] B. Hammer and T. Villmann. Generalized relevance learning vector quantization. Neural Networks,
15, 2002, 1059–1068.

[19] C. P. Lim and R. Harrison. ART-Based Autonomous Learning Systems: Part I - Architectures
and Algorithms. In L. C. Jain, B. Lazzerini, and U. Halici, editors, Innovations in ART Neural
Networks. Springer, 2000.

[20] C. P. Lim and R. F. Harrison. An incremental adaptive network for on-line supervised learning and
probability estimation. Neural Networks, 10, 1997, 925–939.

[21] S. Marriott and R. F. Harrison. A modified fuzzy ARTMAP architecture for the approximation of
noisy mappings. Neural Networks, 8, 1995, 619–641.



A Novel Fuzzy ARTMAP Architecture with Adaptive Feature Weights
based on Onicescu’s Informational Energy 117

[22] T. M. Martinetz, S. G. Berkovich, and K. J. Schulten. Neural-gas network for vector quantization
and its application to time-series prediction. IEEE Transactions on Neural Networks, 4, 1993,
558–569.

[23] S. Min-Kyu, J. Murata, and K. Hirasawa. Function approximation using LVQ and fuzzy sets. In
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Tucson, AZ,
2001, 1442–1447.

[24] O. Onicescu. Theorie de l’information. Energie informationnelle. C. R. Acad. Sci. Paris, Ser. A–B,
263, 1966, 841—842.

[25] O. Parsons and G. A. Carpenter. ARTMAP neural networks for information fusion and data mining:
map production and target recognition methodologies. Neural Networks, 16, 2003, 1075–1089.

[26] M. Taghi, V. Baghmisheh, and P. Nikola. A Fast Simplified Fuzzy ARTMAP Network. Neural
Processing Letters, 17, 2003, 273–316.

[27] S. J. Verzi, G. L. Heileman, M. Georgiopoulos, and M. J. Healy. Boosted ARTMAP. In Proceedings
IEEE World Congress Computational Intelligence WCCI’98, 1998, 396–400.

[28] J. Williamson. Gaussian ARTMAP: A neural network for fast incremental learning of noisy multi-
dimensional maps. Neural Networks, 9, 1996, 881–897.