Koncor1_2014.PM6


63

B
IO

M
E

D
IC

A
L

S
C

IE
N

C
E

S

IJMMR 2015 Vol. 1 No. 1

INFORMATION SUPPORT SYSTEM OF MEDICAL SYSTEM RESEARCH

V. P. Martsenyuk1, I. Ye. Andrushchak2
1TERNOPIL STATE MEDICAL UNIVERSITY, TERNOPIL, UKRAINE

2LUTSK NATIONAL TECHNICAL UNIVERSITY, LUTSK, UKRAINE

Background. Medical system research requires information support system of implementing data mining
algorithms resulting in decision trees or IF-THEN rules. Besides that, this system should be object-oriented and
web-integrated.

Objective. The aim of this study was to develop information support system based on data mining algorithms
applied to system analysis method for medical system research.

Methods. System analysis methods are used for qualitative analysis of mathematical models diseases.
Algorithms such as decision tree induction and sequential covering algorithm are applied for data mining from
learning data set.

Results. Taking into consideration the complexity of mathematical equations (nonlinear systems with delays),
scientific community requires the appearance of new powerfull methods of exact parameter identification and
qualitative analysis. From the point of view of theoretical medicine, uncertainties arising in models of diseases
require to develop treatment schemes that are effective, take into account toxicity constraints, enable better life
quality, have cost benefit. Multivariate method of qualitative analysis of mathematical models can be used for
pathologic process forms of classification.

Conclusions. The complex qualitative behavior of diseases models depending on parameters and controllers
was observed in our investigation even without considering probabilistic nature of the majority of quantities and
parameters of information models.

KEY WORDS: data mining, system analysis, medical research, decision making

Introduction
Here, we would like to present our results in field

of application of system analysis methods to problem
of clinical medicine. We emphasize that effects of
uncertainty should be taken into account in such
complex systems. It will be shown that even consi-
dering deterministic models of such nonlinear
systems, we observe different qualitative behavior
closely dealt with parameters values. Let’s start from
the origin of this problem. Nowadays, a lot of models
describing physiological indices of human body at
different diseases and treatment schemes are
obtained. Primarily, they are based on regression
analysis. More complex ones use neural networks
and evolutionary programming. The most significant
attempts to construct mathematical models at diffe-
rent levels of hierarchy of human organism were
made by John Murray [3], Keener and Sneyd [2],
G.I.Marchuk [1], Mackey and Glass (they investi-
gated nonlinear phenomena applying dynamic
systems and introduced notion of dynamic diseases).

Without considering uncertainty all these models can
be applied for patients from determined groups
(primarily for given age and a lot of other restrictions).

Methods
As for projects stimulating given research, we

would like to note the following. During the last years
Medical Informatics Department performs investi-
gations initiated by Healthcare Ministry of Ukraine in
order to develop and use general system analysis
algorithm to study different diseases [4–9]. Namely,
in fields of oncology (melanoma, leukemia), infec-
tious diseases (flu), therapy (bone tissue diseases).
Naturally, there arises a problem to develop a general
model for disease. It is incorrect to state that we
managed to offer unique universal algorithm to
construct disease general model. More correctly is
to say that this approach can be used for diseases
of different nature. We believe this approach can be
extended to processes in sociology and demogra-
phy, as well as for economy and finance. A lot of
them have the same nature as human diseases. Let’s
take into consideration special medical terminology
(as little as possible). First of all, the most recognized
definition of disease states that disease is a set of

Address for correspondence: Vasil Martsenyuk, I. Ya. Hor-
bachevsky Ternopil State Medical University, m. Voli, 1,
Ternopil, 46001, Ukraine
E-mail: marceniuk@yahoo.com

V. P. Martsenyuk et al.

International Journal of Medicine and Medical Research
2015, Volume 1, Number 1, p. 63-67
copyright © 2015, TSMU, All Rights Reserved


64

B
IO

M
E

D
IC

A
L

S
C

IE
N

C
E

S

IJMMR 2015 Vol. 1 No. 1

pathologic processes weakening vitality and activity
of a human organism. Here, pathologic process is a
set of pathologic (that is abnormal) and protectoral
reactions within human organism. The most signi-
ficant is modeling pathologic process.

Results
Based on this reason we offered general model

for pathologic process including three counterparts:
(i) the reason or cause of disease (it may be

some external factor (like bacteria, chemicals) or own
modified cells (tumor cells);

(ii) immune system supports organism with help
of specific antibodies (sort of predators) and
plasmatic cells (their ancestors);

(iii) normal cells, tissues and organs (it is
necessary to consider them to satisfy some con-
straints of toxicity).

We used our own software for these researches:
Software Environment for Medical System Rese-
arches (SEMSR). Conceptual model of software
environment of system medical investigations
support is developed. Model implementation of data
structure for medical investigations in terms of XML-
technology is offered. Interface which is Web-
integrated, user-oriented and adjustable is devel-
oped. Mathematical methods of system analysis of
pathologic processes in form of Java-classes
hierarchy are implemented. Software tools to
execute system medical investigations, to prepare
results obtained for presentation in Internet and
visualization are developed.

Uncertainties in medical system research
Uncertainties in such models may be parametric.

Some of the parameters may be unknown functions.
As for uncertainty in control, it is necessary to take
into account all possible scenarios. Note, the purpose
of this article is not to present methods to identify
these uncertainties. For these purpose we need to
present powerful and deep mathematical apparatus
of adjoint systems, sensitivity functions and minimax
aposteriorial estimation. Here, we would like to
answer two questions:

(i) why is it so important to take into account
uncertainties?

(ii) the basic uncertainties in models of diseases.
To answer question № 1, we should say that

the mathematical solutions of equations have
different qualitative behavior. In practice we can
observe different forms of disease (subclinical, acute,
chronic, lethal). Search of treatment scheme is
dependent on such forms.

In our research we investigated uncertainties in
the following issues: maturation time for plasmatic
cells �, influence of antigen on target-organ damage
rate �, relation between target-organ damage rate

and immune response ��(m), therapy scheme (poly-
chemiotherapy, radiotherapy), surgery interventions.
Note, the three last ones are non-parametric. They
depend on unknown function like controller.

Approach of Compartmental Systems
Problems of population dynamics, pharmaco-

kinetics, mathematical epidemiology, and others are
described by compartmental systems with time
delay. Even in the linear case, the solution of such
equations leads to approximate computation
procedures, which makes it impossible to find
solutions of the following problems in explicit form:

– determining the time instant at which the
number of infected persons does not exceed some
level i* (mathematical epidemiology);

– estimating the time when no more than d*
medical product units (pharmacokinetics) remain in
the organism of a patient, etc.

Explicit solutions of such problems can be
obtained on the basis of exponential type estimates.
A number of works are devoted to the construction
of exponential estimates for systems with delay. In
[1], an estimate for a linear system is obtained on
the basis of the Cauchy formula. An approach based
on Lyapunov functions with conditions of the
Razumikhin type was developed in [2]. In [3], an
estimate is found from the solution of a difference
inequality for a Lyapunov–Krasovskii functional. In
[4], a differential difference inequality is constructed
for a Lyapunov–Krasovskii functional. For com-
partmental systems, a promising approach is
proposed in [5] and the method of construction of a
class of exponential estimates is based on the Hale–
Lunel inequality.

Software Development Based on Data Mining
Technology

The objective is to develop and implement an
algorithms of diagnostic classification applying
decision tree induction and sequential covering
methods and to study problem of their computational
complexity.

The solved problem belongs to wide class of
differential diagnostics problems. In medicine the
notion of “differential diagnostics” means systemic
approach based on evidence for determining causes
of symptoms observed in case if there are few
alternative explanations and also to reduce list of
possible diagnoses.

One of approaches expressing natural process
of thinking for differential diagnostics is data mining
method. We are interested in the problem of compu-
tational complexity of the algorithms for real clinical
data such as, for a example, for biochemical data in
case of polytraumas.

V. P. Martsenyuk et al.


65

B
IO

M
E

D
IC

A
L

S
C

IE
N

C
E

S

IJMMR 2015 Vol. 1 No. 1

Software implementation of decision tree
induction

The methods are implemented within Netbeans
developer system in Java language. The database
of learning tuples is deployed on MySQL server. At
fig.1 the conceptual model of informational system
is presented. Class DecisionTree implements
decision tree induction method. Class DataManager
is processing calls from DecisionTree running
queries to mysql database retrieving learning data.

Database mysql consists of two tables – table
attribute for storage of information on attributes and
table categorized_data – for learning tuples. The
structure of tables in SQL syntax is shown below:

CREATE TABLE mysql.attribute (
id integer not null unique,

attribute_name varchar(25),
attribute_field_name varchar(25),

primary key (id)
 ) ENGINE=InnoDB;
CREATE TABLE mysql.categorised_data (
id integer not null unique,

A1 varchar(12),
A2 varchar(8),
A3 varchar(7),

     …………………..
A21 varchar(7),
class varchar(28),

primary key (id)
 ) ENGINE=InnoDB;

Classes of this project are included in package
decision_tree.model. There are beans-classes
Attribute, Attribute_for_list and CategorisedData for
processing data of corresponding tables. SQL-
queries for retrieving corresponding data including
calculations of information indices are implemented
in class AttributeListPeer.

Problem of computational complexity of decision
tree induction algorithm

As it was shown in the work [11], time of decision
tree induction algorithm running is estimated with
value

����������������	 �� (1)
Our goal was to check this result experimentally.

Experiments were executed varying amount of
attributes �. Decision trees were constructed for
each value of �. At fig. 2 and 3 there are shown
estimates of decision tree induction times according
to [4].

Computational complexity of sequential covering
algorithm

Due to analysis of sequential covering algorithm
we conclude that computational complexity is
determined by product of amount of possible values
of class attribute � (quantity of external cycle itera-

tions) and computational complexity of procedure
Mine_one_rule (D, Att_vals, c) executed inside each
cycle.

Procedure Mine_one_rule (D, Att_vals, c)
includes execution of �  iterations. For each iteration
for a certain attribute 

�
�  we calculate the measure

for each of 
�
�  values of attribute. That is internal

body of cycle in procedure Mine_one_rule (D,

Att_vals, c) is executed �
�

�

��

��  times. The measure
is executed as a result of 4 SQL-queries with
complexity ���������  (according with MySQL 5.0
documentation). That is procedure Mine_one_rule
(D, Att_vals, c) has computational complexity

��
�

�
��
�

�
��

�

��������
�

��

� .

Summarizing we have sequential covering
algorithm complexity of the order

          ��
�

�
��
�

�
���

�

�

��

� ��������� (2)

Conclusions
So, even without considering probabilistic nature

of the majority of quantities and parameters, we saw
the complex qualitative behavior of diseases models
depending on parameters and controllers. At different
values of these quantities we observed subclinical,
acute, chronic or lethal forms of pathologic pro-
cesses.

Taking into consideration complexity of mathe-
matical equations (nonlinear systems with delays),

���������������	���

�������
�
�
�
�	�

������	�
���������������

��
��������������

���������

������

Fig. 1. Conceptual model of informational system of decision
tree induction

V. P. Martsenyuk et al.


66

B
IO

M
E

D
IC

A
L

S
C

IE
N

C
E

S

IJMMR 2015 Vol. 1 No. 1

it requires appearance of new powerfull methods of
exact parameter identification and qualitative
analysis.

From point of view of theoretical medicine,
uncertainties arising in models of diseases require
development of treatment schemes that are effective,
take into account toxicity constraints, enable better
life quality, have cost benefit.

In future works our idea will be to compare
behavior of pathologic processes using both
deterministic and stochastic models and to extend
such models to demographic processes.

In the work here we considered the problem of
development and implementation of decision tree
induction and sequential covering methods based
on information indices for construction of diagnostic
classification algorithm.

While investigating this example, the problem
of computational complexity of decision tree induction
algorithm was observed that:

– decision tree induction time based on
information indices is well approximated with
estimate (1) at small number of attributes (in this
case to 15–16);

– when increasing number of attributes (in this
example over 15–16), the time of decision tree
induction begins to deviate essentially from
estimate (1) independent on search of information
measure;

– at small number of attributes decision trees
induced constructed based on either information gain
or information gain are identical; i.e., information
measure determining splitting attribute doesn’t affect
on decision tree induced;

��

����

�����

�����

�����

�����

�����

�����

�� �� �� �� �� ��� ��� ��� ��� ��� ���

��������	�
������������

����������

�	
��
���������	����


���������
����
����

������������	����


���������
����

Fig. 2. Estimate of algorithm complexity based on information gain.

�
����
����
����
����
�����
�����

� � � 	 
 �� �� �� �	 �
 ��
��������	�
�����������

���������

���������������������
������������������

���������������������
������������

Fig. 3. Estimate of complexity of sequential covering algorithm.

��

�����

�����

�����

�����

�����

�����

�� �� �� �� 	� ��� ��� ��� ��� �	� ���

��������	�
������������

����������


��������������

�������������

�������������������

�����������������

Fig. 4. Estimate of complexity based on information gain ratio.

V. P. Martsenyuk et al.


67

B
IO

M
E

D
IC

A
L

S
C

IE
N

C
E

S

IJMMR 2015 Vol. 1 No. 1

References
1. Mathematical modelling in immunology and

medicine. – Proc. of the IFIP TC-7 Working Conf.,
Moscow, USSR, 5–11 July 1982, Ed. by G.I.Marchuk,
L.N.Belykh. Amsterdam, New York, Oxford: North-
Holland; 1983: 246.

2. Keener J, Sneyd J. Mathematical physiology. New
York: Springer Verlag; 1998: 149.

3. Murray JM. Mathematical biology. New York:
Springer-Verlag; 1989: 214.

4. Martsenyuk VP. On the problem of chemotherapy
scheme search based on control theory. J Automation
Information Sci 2003; 35 (4): 64–69.

5. Martsenyuk VP. On Hopf bifurcation and periodic
solutions in G.I. Marchuk model of immune protection.
J Automation Information Sci 2003; 35 (8): 154–157

6. Marzeniuk VP. Taking into account delay in the
problem of immune protection of organism. Nonlinear
Analysis: Real World Applications 2001; 2 (4): 483–496.

7. Nakonechnyi AG, Martsenyuk VP. Controllability
problems for differential gompertzian dynamic equations.
Cybernetics Systems Analysis 2004; 40 (2): 252–259.

8. Martsenyuk VP. On stability of immune protection
model with regard for damage of target organ: the
degenerate Liapunov functionals method. Cybernetics
Systems Analysis 2004; 40 (1): 126–136.

9. Marzeniuk VP. Qualitative analysis of human cells
dynamics: stability, periodicity, bifurcations, control
problems. Adv Math Res 2003; 1 (5): 137–200.

10. Khusainov DYa, Martsenyuk VP. Double-ended
estimates for solutions of linear systems with delay. Dop.
NAN Ukr 1996; 8: 8–13.

11. Han J, Kamber M. Data Mining: Concepts and
Techniques, Morgan Kaufmann, San Francisco, 1st edi-
tion; 2001: 312.

12. Hastie T, Tibshirani R, Friedman JH. The
Elements of Statistical Learning, Springer, New York, 1st

edition; 2001: 125.
13. Ordonez C., Comparing association rules and

decision trees for disease prediction. In Proc. ACM HIKM
Workshop 2006: 17–24.

14. Ordonez C. Integrating K-means clustering with
a relational DBMS using SQL, IEEE Transactions on
Knowledge and Data Engineering (TKDE) 2006; 18(2):
188–201.

15. Quinlan JR. Induction of decision trees. Machine
Learning 1986; 1: 81–106.

16. Quinlan JR. C4.5: Programs for machine learning.
Morgan Kaufmann; 1993: 205.

17. Breiman L, Friedman J, Olshen R, Stone C. Clas-
sification and Regression Trees. Wadsworth International
Group; 1984: 124.

18. Martsenyuk VP, Semenets AV. Medical Infor-
matics. Developer and Expert Systems, Ternopil, Ukr-
medknyha, 2004: 222.

Received: 2014.05.07

– computational complexity of sequential
algorithm is well approximated by (2). Such estimate
was checked changing an amount of attributes as
well as number of learning tuples.

The perspective of this investigation is
comparative performance analysis depending on
volume of set of learning tuples.

V. P. Martsenyuk et al.