Journal of Software Engineering Research and Development, 2021, 9:6, doi: 10.5753/jserd.2021.1094
 This work is licensed under a Creative Commons Attribution 4.0 International License..

EVEREST: An Automatic Model­Based Testing Tool for
Asynchronous Reactive Systems*
 Adilson Luiz Bonifacio  Universidade Estadual de Londrina  bonifacio@uel.br
 Camila Sonoda Gomes  Universidade Estadual de Londrina  camilasonoda@uel.br

Abstract Reactive systems are characterized by their interaction with the environment, where the exchange of the
input and output stimuli, usually, occurs asynchronously. Systems of this nature, in general, require a rigorous testing
activity in the development process. Therefore model­based testing has been successfully applied over asynchronous
reactive systems using Input Output Labeled Transition System (IOLTS) as the basis. In this work, we present a
reactive testing tool to check conformance, generate test suites, and run test cases using IOLTS models. Our tool
can check whether the behavior of an implementation under test (IUT) complies with the behavior of its respective
specification. We have implemented two conformance relations in our tool: the classical ioco relation; and the
conformance based on regular languages. The tool also provides a test suite generation in a black­box testing setting
for finding faults over IUTs according to a specific domain. In addition, we describe some case studies to probe the
tool’s functionalities and also give a comparative analysis. Finally, we offer practical experiments to evaluate the
performance of our tool using several scenarios.

Keywords: Model­based testing, conformance checking, test generation, reactive systems, automatic tool

 Published under the Creative Commons Attribution 4.0 International Public License (CC BY 4.0)

1 Introduction
Several real­world systems are ruled by reactive behaviors
that interact constantly with the environment by receiving
input stimuli and producing outputs in response. Systems
of this nature, in general, are also critical thus requiring
precise and automatic support in the development process.
Model­based testing methods and their respective tools have
been largely applied in the testing activity when develop­
ing systems. The Input Output Labeled Transition System
(IOLTS) (Tretmans, 2008) has been commonly employed as
the formalism on modeling and testing asynchronous reac­
tive systems. An IOLTS can then specify desired behaviors
of an implementation candidate and the testing task can be
applied to find faults on it.
One important issue of model­based testing is confor­

mance checking where we can verify whether a given Imple­
mentation Under Test (IUT) complies with its correspond­
ing specification according to a certain fault model. Here
we treat the classical notion of Input Output Conformance
Testing (ioco) (Tretmans, 2008) and the more recent testing
conformance relation based on regular languages (Bonifa­
cio and Moura, 2019) to define fault models. The test gen­
eration is also an important task of model­based testing, es­
pecially when generating test cases for reactive systems in
a black­box setting. In this work, we present an automatic
tool, named Everest1(Gomes and Bonifacio, 2019), that can
check conformance between a given IUT and its respective
specification. Our tool can also generate test suites based on
specifications modeled by IOLTS and enable black­box test­
ing over IUTs.

*Supported by CAPES.
1conformancE Verification on tEsting ReactivE SysTems

We show that Everest has a wider range of applications
when compared to other testing tools since it implements
not only the classical ioco relation but also the more recent
language­based conformance checking. We also describe
real­world scenarios to relate both approaches, where the
language­based conformance method has been able to find
faults which cannot be detected by using ioco relation. Fur­
ther, experiments are performed to evaluate our tool when
generating and running test suites in a black­box scenario,
and also to compare and evaluate Everest against a well­
known tool (Belinfante, 2010a) from literature w.r.t. the con­
formance checking task.
The remainder of this paper is organized as follows. We

comment on related works in Section 2. Section 3 describes
the conformance checking approaches and the test suite gen­
eration method. In Section 4 we discuss important aspects
comparing Everest to another tool from the literature and
also present a real­world case study. Practical experiments
of conformance checking and test suite generation are given
in Section 5. Section 6 offers some concluding remarks and
future directions.

2 Related Works
Reactive systems have been properly specified by IOLTS
models to describe their syntax and semantics. Hence model­
based testing techniques and practical tools have been ap­
plied in testing activities to support the system development
using IOLTSs. Therefore several works have studied aspects
related to IOLTS­based testing such as test generation, con­
formance relations, and their checking methods. Here we sur­
vey on some works that are more closely related to our testing
tool and its features.

http://orcid.org/0000-0002-7348-8508
mailto:bonifacio@uel.br
mailto:camilasonoda@uel.br


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

The ioco relation has been proposed by Tretmans (2008)
for IOLTS models, where IUTs are treated as black­boxes,
i.e., the tester, which is seen as an artificial environment that
drives the test activity, has no access to the internal struc­
ture of IUTs. However, some restrictions must be guaran­
teed over the specification, IUT and tester models, such as
input­completeness and output­determinism. Further, the al­
gorithms therein are more theoretical and may lead to infi­
nite test suites, making it more difficult to devise solutions
for practical applications.
An ioco­based testing theory has been also proposed

by de Vries (2001) to obtain e­complete test suites. This ap­
proach focuses on specific test purposes that share particular
properties related to certain testing goals. Only observable
behaviors based on specific criteria are considered when test­
ing black­box IUTs, which turns out that the test purposes
somewhat limit the fault coverage spectrum, e.g., producing
inconclusive verdicts. The test generation method also pro­
duces large, even infinite, test suites thus requiring a test se­
lection criteria to avoid this problem.
Simão and Petrenko (2014) have described an approach to

generate finite ioco­complete test suites for a class of IOLTS
models. However, their approach imposes a number of re­
strictions on the models. Test purposes must be single­input
and output­complete, and specifications and IUTs must be
input­complete, progressive, and initially­connected. So the
class of IOLTS models that can be tested are very restricted
according to their fault model.
Roehm et al. (2016) have introduced a conformance test

based on safety properties. Despite being a weaker rela­
tion than trace­inclusion conformance, it allows for tuning
a trade­off between accuracy and computational complex­
ity when checking conformance. So their approach searches
for counter­examples instead of verifying the whole system.
However, this approach and previous ones have a more the­
oretical leaning and we are not aware of practical tools and
their algorithms.
A more recent work has been proposed by Bonifacio and

Moura (2019) where few restrictions are considered and fi­
nite sets of test purposes can be generated in practical situa­
tions. In some rare cases their algorithm may lead to exponen­
tial sized testers, but the approach allows for a wider class of
IOLTS models, and a low degree polynomial time algorithm
is devised for efficiently testing ioco­conformance in practi­
cal applications.
In this work, we have implemented the more general

and recent approach (Bonifacio and Moura, 2019) with the
language­based and ioco conformance relations, as well as
the test suite generation for white­box and black­box scenar­
ios. In the literature, we have found JTorx (Belinfante, 2014,
2010b), a more closely related testing tool that implements
the ioco relation and the uioco variation for underspecified
models. TGV (Mark Utting, 2007; Calamé, 2005; Jard and
Jéron, 2005) is also a testing tool designed for checking ioco­
conformance, similarly to Testor (Marsso et al., 2018), an on­
the­fly test case generation tool. However, the test generation
methods of TGV and Testor are only sound, i.e., they are not
exhaustive, and so we cannot get complete test suites. Fur­
ther, the soundness property over the generated test suites is
only guaranteed from both tools over specific test purposes.

Although several other tools have been proposed for
model­based testing many of them somewhat move away
from the scope of our work. For instance, some tools and
approaches implement a variation of ioco­theory, e.g. rioco
and sioco relations, such as STG (Symbolic Test Genera­
tor) (Clarke et al., 2002), TorXAkis (Mostowski et al., 2009)
and UPPAAL­TRON (Larsen et al., 2005) that deal with sym­
bolic and timed models.

3 A Model­Based Testing Method
Asynchronous reactive systems are commonly specified by
IOLTS models, a variation of Labeled Transition Systems
(LTSs) (Tretmans, 1993) with the partitioning of input and
output labels.

Definition 1 An Input/Output Labeled Transition System
(IOLTS) is a tuple S = (S, s0, LI , LU , T ) where:

• S is a finite set of states;
• s0 ∈ S is the initial state;
• LI is a set of input labels;
• LU is a set of output labels;
• L = LI ∪ LU and LI ∩ LU = ∅;
• T ⊆ S × (L ∪ {τ }) × S is a finite set of transitions,
where the internal action τ /∈ L; and

• (S, s0, L, T ) is an underlying LTS associated with S.

We indicate a transition by (s, l, r) ∈ T where s ∈ S
is the source state and r ∈ S is the target state labeled by
l ∈ (L ∪ {τ }). A transition (s, τ, r) ∈ T indicates an internal
action, which means that an external observer cannot see the
movement from s to r in the model.
An IOLTS may also have quiescent states. A state s is qui­

escent if no output x ∈ LU or an internal action τ is defined
on it (Tretmans, 2008). When a state s is quiescent a transi­
tion (s, δ, s) is added to T , where δ /∈ Lτ . Note that L∪{τ } is
denoted by Lτ to ease the notation. We also note that in a real
black­box testing scenario, where an IUT sends messages to
the tester and receives back responses, quiescence will indi­
cate that the IUT can no longer respond to the tester, it has
timed out, or that it is slow (Bonifacio and Moura, 2019).
In what follows we define the semantics over IOLTS/LTS

models, but first we introduce the notion of paths.

Definition 2 Let S = (S, s0, L, T ) be a LTS and p, q ∈ S.
Let σ = l1, · · · , ln be a word in L⋆τ . We say that σ is a path
from p to q in S if there are states ri ∈ S, and labels li ∈ Lτ ,
1 ≤ i ≤ n, such that (ri−1, li, ri) ∈ T , with r0 = p and
rn = q. We say that α is an observable path from p to q in S
if we remove all internal actions τ from σ.

A path can also be denoted by s σ−→ s′, where the behavior
σ ∈ L⋆τ starts in the state s ∈ S and reaches the state s′ ∈ S.
An observable path σ, from s to s′, is denoted by s σ=⇒ s′.

We can also write s σ−→ or s σ=⇒ when the target state is not
important. All paths starting at a state s we call paths of s.
Now we give the semantics over IOLTS/LTS models.

Definition 3 Let S = (S, s0, L, T ) be a LTS and s ∈ S:


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

1. the set of all paths from s is denoted by tr(s) = {σ|s σ−→
} and the set of all observable paths from s is denoted
by otr(s) = {σ|s σ=⇒}.

2. the semantics of S is given by tr(s0) or tr(S) and
the observable semantics of S is denoted by otr(s0) or
otr(S).

The semantics of an IOLTS is defined by the semantics of its
underlying LTS.

3.1 Checking Conformance on Reactive Sys­
tems

Given an IOLTS specification, a conformance checking task
can determine whether an IUT complies with the correspond­
ing specification according to a specific fault model. The
classical ioco (Tretmans, 2008) relation establishes a notion
of conformance where input stimuli are applied to both IUT
and specification models to observe whether outputs pro­
duced by the IUT are also defined in the specification (Boni­
facio and Moura, 2019).

Definition 4 Let S = (S, s0, LI , LU , T ) be a specification
and let I = (Q, q0, LI , LU , R) be an IUT, we say that
I ioco S if, and only if, out(q0 af ter σ) ⊆ out(s0 af ter σ)
for all σ ∈ otr(S), where s af ter σ = {q|s σ=⇒ q} for every
s ∈ S. Otherwise, we get that rI ioco S does not hold.

A more recent conformance relation (Bonifacio and
Moura, 2019) has also been proposed using regular lan­
guages. Given an IUT I, a specification S, and regular lan­
guages D and F , we say that I complies with S according
(D, F ), i.e, I confD,F S if, and only if, no undesirable be­
havior of F is observed in I and it is specified in S, and all
desirable behaviors of D are observed in I and they are also
specified in S.

Definition 5 Given an alphabet L = LI ∪ LU and lan­
guages D, F ⊆ L∗ over L. Let S and I be IOLTS models
over L we have that I confD,F S if, and only if,

(i) σ ∈ otr(I) ∩ F , then σ /∈ otr(S); and
(ii) σ ∈ otr(I) ∩ D, so σ ∈ otr(S).

This new notion with a wider fault coverage is established
by Proposition 1, where desirable and undesirable behaviors
can be specified by regular languages.

Proposition 1 (Bonifacio and Moura, 2019). Let S and I be
IOLTS models over an alphabet L = LI ∪ LU , and the regu­
lar languages D, F ⊆ L∗ over L. We say that I confD,F S
if, and only if, otr(I) ∩ [(D ∩ otr(S)) ∩ (F ∩ otr(S))] = ∅,
where otr(S) = L∗ − otr(S).

Both notions of conformance can be related by the fol­
lowing lemma, where the language­based conformance rela­
tion given in Definition 5 restrains the classical ioco relation
given by Definition 4.

Lemma 1 (Bonifacio and Moura, 2019). Let I =
(Q, q0, LI , LU , R) be an IUT and let S = (S, s0, LI , LU , T )
be a specification, we say that I ioco S if, and only if,
I confD,F S when D = otr(S)LU and F = ∅.

Bonifacio and Moura (2019) have proposed the language­
based conformance checking using the theory of au­
tomata (Sipser, 2006). LTS/IOLTS models are transformed
into Finite State Automata (FSA), where the semantics of
FSA is given by the language it accepts. So R ⊆ L⋆ is regu­
lar if there exists an FSA M such that L(M) = R, where L
is an alphabet. Therefore we can effectively construct the au­
tomatons AD and AF where D and F are regular languages
such that D = L(AD ) and F = L(AF ).
Now we define test case and test suite according to regular

languages.

Definition 6 Let L be a set of symbols, a test suite T over L
is a language T ⊆ L⋆, where each σ ∈ T is a test case.

We can see that there will always be an FSA A that accepts
a test suite since it is a regular language, where the final
states are fault states. Thus the set of undesirable behaviors,
so­called fault model of S (Bonifacio and Moura, 2019), is
defined by the fault states.
Therefore we can obtain a complete test suite for an IOLTS

specification S and a pair of languages (D, F ) using Propo­
sition 1. That is, we can detect the absence of desirable be­
haviors specified by D and the presence of undesirable be­
haviors specified by F in the specification S using the test
suite T = [(D ∩ otr(S)) ∪ (F ∩ otr(S))]. An IUT I is then
declared in compliance to a specification S if there is no test
case of the test suite T that is also a behavior of I (Bonifacio
and Moura, 2019).
The testing process first obtains an automaton A1 induced

by the IOLTS specification S. Since L(A1) = otr(S) we
can effectively construct an FSA A2 such that L(A2) =
L(AF ) ∩ L(A1) = F ∩ otr(S). Also, consider the FSA B1
obtained from A1 by reversing its set of final states, that is, a
state s is a final state in B1 if, and only if, s is not a final state
in A1. Clearly, L(B1) = L(A1) = otr(S). We can now get
an FSA B2 such that L(B2) = L(AD )∩L(B1) = D∩otr(S).
Since A2 and B2 are FSAs, we can construct an FSA C such
that L(C) = L(A2) ∪ L(B2), where L(C) = T . We can con­
clude that when D and F are regular languages and S is a
deterministic specification, then a complete FSA T can be
constructed such that L(T ) = T .
Next we state an algorithm with a polynomial time com­

plexity using the language­based conformance relation.

Proposition 2 (Bonifacio and Moura, 2019) Let S and I be
the deterministic specification and implementation IOLTSs
over L with nS and nI states, respectively. Let also |L| = nL.
Let AD and AF be deterministic FSAs over L with nD and
nF states, respectively, and such that L(AD ) = D and
L(AF ) = F . Then, we can effectively construct a complete
FSA T with (nS + 1)2nD nF states, and such that L(T ) is a
complete test suite for S and (D, F ). Moreover, there is an al­
gorithm, with polynomial time complexity Θ(n2S nI nD nF nL)
that effectively checks whether IconfD,F S holds.

Now using Lemma 1 we establish a relationship between
the ioco and language­based relations in Theorem 1.

Theorem 1 (Bonifacio and Moura, 2019) Let S and I be
deterministic IOLTSs over L with nS and nI states, respec­
tively. Let L = LI ∪ LU , and |L| = nL. Then, we can effec­


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

tively construct an algorithm with polynomial time complex­
ity Θ(nS nI nL) that checks whether I ioco S holds.

3.2 Complete Test Suite Generation
In this work, we also provide the test suite generation in
a black­box testing setting using the notion of test pur­
poses (Tretmans, 2008). A Test Purpose (TP) is formally de­
fined by an IOLTS with two special states {pass, f ail} and,
in practice, it represents an external tester that interacts with
an IUT. Thus a fault model is composed of TPs that are de­
rived from a given specification.
To ease the notation from now on we will denote by

IO(LI , LU ) the class of all IOLTSs over L = LI ∪ LU .

Definition 7 Let LI and LU be the input and output alpha­
bets, respectively, with L = LI ∪ LU . A Test Purpose (TP)
over L is defined by an IOLTS T ∈ IO(LU , LI ) such that
for all σ ∈ L∗ does not hold f ail σ=⇒ pass and pass σ=⇒
f ail. The fault model over L is the finite set of TPs over L.

The test case generation proposed by Tretmans (2008),
based on ioco relation, imposes some restrictions over the
formal models. All TPs must be acyclic, with a finite run, and
input­enabled, since the tester cannot predict the output pro­
duced by a black­box IUT. Therefore, all output actions that
are produced by the IUT must be enabled in the respective
TP. Moreover, they must be output­deterministic, i.e. each
state can send only one output symbol to the IUT in order
to avoid arbitrary and non­deterministic choices. In the pass
and f ail states only self­loop transitions are allowed since
verdicts are obtained in these states.

Definition 8 Let S ∈ IO(LI , LU ). We say that S is output­
deterministic if |out(s)| = 1 and S is input­enabled if
inp(s) = LI for all s ∈ S, where out(s) and inp(s) give
outputs and inputs, respectively, defined at state s.

Hence all restrictions imposed by Tretmans (2008) are sat­
isfied when a TP is input­enabled, output­deterministic, and
acyclic except for pass and f ail states. However, we see
that a bound over the number of states to be considered in
the IUTs must be imposed to keep the TP acyclic. So the test
suite completeness property is guaranteed if given an IUT I
and a specification S, I ioco S for all IUT that conforms to
S. Otherwise we say that I ioco S does not hold. Therefore
we define a class of implementations to guarantee the ioco
completeness property on generating test suites establishing
an upper bound on the number of states over the IUTs.
Now we are in a position to construct a complete test suite

using the notion of TPs. But first we generate a multigraph
structure as proposed by (Bonifacio and Moura, 2019). So
given an IUT I and a specification S, we remark that m is
the bound over the number of states to be considered on the
IUTs, and n is the number of states in S. Then the multigraph
must have mn + 1 levels, and at each level if a transition of
S gives rise to a cycle then we must create a transition onto
states on next level of the multigraph. A f ail state is also
added and new transitions from every state of the multigraph
are defined to the f ail labeled by all l ∈ LU when l is not
defined.

Having an acyclic multigraph at hand we can extract TPs
using a simple breadth­first search algorithm from the ini­
tial state to f ail. We can guarantee the input­enabledness
property by adding the pass state to the TP and, for every
output of LU and all states, we add transitions to the pass
state where the output is not defined. Self­loops labeled by
each l ∈ LU are also added to the pass and f ail states. The
output­deterministic property is also guaranteed by adding a
transition from every state to pass where there is no input of
LI defined. Note that we always refer to an input symbol of
LU or an output symbol of LI from the perspective of the
IUT, as commonly denoted in the literature (Tretmans, 2008;
Bonifacio and Moura, 2019).
The test run is then defined by the synchronous product

between a TP T and an IUT I, denoted by I × T . The TP
interacts with the IUT producing outputs that are sent to I as
inputs. Likewise, the IUT receives actions from the TP and
produces outputs that are sent to T as inputs. So the output
alphabet of T corresponds to LI, the input alphabet of the
IUT, and the input alphabet of T corresponds to LU , the out­
put alphabet of the IUT.

Definition 9 Let I = (SI , q0, LI , LU , TI ) ∈ IO(LI , LU )
be an implementation and T = (ST , q0, LU , LI , TT )) ∈
IO(LU , LI ) be a TP. We say that I passes T if for any
σ ∈ (LI , LU )∗ and any state q ∈ SI, we do not have
(t0, q0)

σ=⇒ (f ail, q) in T × I. A path can be denoted by

q0
σ=⇒ q where the behavior σ starts in the state q0 and

reaches the state q. Let M be the fault model, we say that
I pass M, if I passes all TPs in M. Then given an IOLTS
S and a set IMP ⊆ IO(LU , LI )[m], we say that M
is m­ioco­complete to S concerning IMP if for all IUT
I ∈ IMP we have I ioco S if, and only if, I passes M.

The verdicts are obtained when TPs reach the special
states. The fail verdict gives rise to a fault behavior whereas
the pass verdict denotes a desirable behavior. Further details
can be found in (Bonifacio and Moura, 2019; Gomes and
Bonifacio, 2019).
Finally, the next proposition determines a fault model that

is composed of TPs obtained from a multigraph which, in
turn, is constructed based on the corresponding specification.

Proposition 3 Let the deterministic IOLTS S ∈
IO(LI , LU ) and m ≥ 1. Then there is a fault model M
that is m­ioco­complete for S relatively to IO(LI , LU )[m],
IOLTSs at most m states, whose TPs are deterministic,
output­deterministic, input­enabled, and acyclic except for
self­loops on pass and fail states.

4 A Testing Tool for Reactive Systems
Everest (Gomes and Bonifacio, 2019) has been developed
to check conformance, generate test suites, and run tests over
reactive systems specified by LTS/IOLTS models. We have
organized the tool’s architecture in four modules: configura­
tion; ioco conformance; language­based conformance; and
test generation & run. The configuration module allows us
to settle the testing scenario, and the checking conformance
modules can yield verdicts of testing. When an IUT does not


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

conform to the specification our tool yields the verdict along
with the paths induced by the test cases that could detect the
corresponding faults. The test generation & run module en­
ables the multigraph and test purpose generation, and also
allows for running test suites over the IUTs.
In this section, we look over the checking conformance

and test suite generation processes. First we present some
general examples to compare the conformance checking pro­
cesses of Everest and JTorx. Next we show how our test
suite generation method using the language­based confor­
mance stands out from the classical approach. Finally we
describe a real­world case study of an Automatic Teller Ma­
chine (ATM) to explore some real scenarios, and then give a
comparative analysis between the practical tools.

4.1 Conformance checking process

We apply some examples to explore characteristics from both
Everest and JTorx tools when checking conformance. Let S
be a specification depicted in Figure 1a and let R and Q be
IUTs depicted in Figures 1b and 1c, respectively, with LI =
{a, b} and LU = {x}.

s0

s3

s1

s2

a

b a b, x

x

b

b

a

(a) Specification S

q0

q3

q1

q2

a

b a b, x

a

b

b,x

a

(b) IUT R

q0

q3

q1

q2

a

b a b, x

a,x

b

b

a

(c) IUT Q

Figure 1. IOLTS Models

In the first checking run we have verified whether the
IUT R conforms to the specification S. Our tool yielded
a verdict of non­conformance and generated the test suite
T1 = {b, aa, ba, aaa, ab, ax, abb, axb}. All test cases were
induced by different paths that reach a fault and were ex­
tracted using a transition cover strategy over the specifica­
tion. JTorx also yielded the same verdict for this first run, as
expected, but it has generated a test suite T2 = {b, ax, ab}.
We can see that T2 ⊆ T1, i.e., JTorx has generated only
one test case per fault in contrast to Everest that has pro­
duced several test cases using a transition coverage. Hence
we notice that Everest has provided a wide range of cover­
age which can be more useful in a fault mitigation process.
In a second scenario, we checked the IUT Q against

the specification S. At this time no fault was detected by
both tools using the classical ioco relation. However, Ever-
est could find a fault using the language­based conformance
relation, where the set of desirable behaviors were specified
by the regular language D = (a|b)∗ax and no undesirable
behavior was defined, so F = ∅. The set D denotes behav­
iors that are induced by paths finishing with an input action
a followed by an output x produced in response. A verdict of
non­conformance was obtained by our tool revealing a fault
detected by the test suite T = {ababax, abaabax}. We re­
mark that JTorx, using the classical ioco relation, was not
able to detect this fault. So we can note that Everest is more

s0,0 s1,0 s2,0 s3,0

s0,1 s1,1 s2,1 s3,1

s0,15 s1,15 s2,15 s3,15

s0,16 s1,16 s2,16 s3,16

f ail f ail
f ail

f ail f ail
f ail

a

b

δ

b,x

a

x

b a, δ

b

a

b

b,x

a

x

a, δbb

b

a

b

δ

b,x

a

x

b a, δ

b

a

b

b,x

a, δ

x

x

x

δ

δ

δ

δ

x

x

x

x, δ δ

δ
δ

δ

x

x,δ

... ...

Figure 2. A direct acyclic multi­graph D for specification S

general in this sense and can be applied to a wider range of
scenarios when compared to JTorx.

4.2 Everest test suite generation
We have seen that a conformance checking is run over an
IUT against a given specification to yield test verdicts. If the
verdict is positive, i.e., faults are detected, then an associated
test suite is generated with test cases that can reveal such
faults. In addition, Everest can also generate complete test
suites relative to a given specification.
To illustrate the test suite generation process of Ever-

est again we assume S as the specification depicted in Fig­
ure 1a. In the first step a direct acyclic multigraph must be
constructed according to the specification, as described in
Section 3. Figure 2 partially depicts the multigraph with four
states at each level once the specification S has four states
(n = 4). Every transition in the multigraph must go either to
the next level or from left to right in the figure at the same
level to secure the acyclic property. In this case we have con­
sidered IUTs with at most four states, i.e. the same number of
states as found in the specification (m = n = 4). Therefore
the multigraph has mn + 1 = 17 levels. Figure 2 shows the
first two levels and also the two last levels of the multigraph.
Note that we replicate the fail state in order not to clutter the
figure.
With the multigraph at hand, we can apply a breadth­first

search algorithm to extract paths from the initial node s0,0
up to the fail state. We can take, for instance, the sequence
α1 = aabbx. We see that α1 induces the path s0,0 → s1,0 →


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

s3,0 → s0,1 → s3,1 → f ail in the multigraph. From Propo­
sition 3 we can then obtain a deterministic, acyclic, input­
enabled, and output­deterministic test purpose T1 over α1 as
depicted in Figure 3a.

s0, 0 s1, 0 s3, 0 s0, 1

s3, 1

f ail

pass

a

b, δ

a

b, δ

b

a, δ b
a, δ

x

a, b

δ, x

δ, x

(a) TP T1 induced by aabbx

s0, 0 s1, 0 s3, 0

s3, 1

f ail

pass

a

b, δ

a

b, δ a
b, δ

x

a, b

δ, x

δ, x

(b) TP T2 induced by aaax

Figure 3. TPs from multigraph of Figure 2

Note that the input­enabledness property is also guar­
anteed by adding a pass state and transitions from states
where no output is defined to the pass state. The construc­
tion is complete by adding self­loops to the pass and fail
states labeled by all output actions. Regarding the output­
determinism property, for every state that no input action is
defined, we also create a new transition from this state to the
pass state labeled by any input action.
For the sake of exemplification we take α2 = aaax as a

distinct sequence. In the same way we obtain the induced
path over the multigraph and construct the corresponding de­
terministic, acyclic, input­enabled and output­deterministic
test purpose T2 as depicted in Figure 3b. Everest has indeed
automatically constructed other fifteen test purposes based
on paths induced by the set {α1, α2, x, aδ, bx, δx, aax, bbx,
axδ, abδ, δbx, bδx, aabx, bbbx, aaδx} of se­
quences. From the TPs of Figure 3, Ever-
est could generate the test suite T =
{α1, α2, b, δ, bδ, bx, ab, aδ, aaδ, aaa, aab, aabδ, aaba, aaaa,
aaab, aaaxδ, aaaxx, aabba, aabbb, aabbxδ, aabbxx}.
We then apply the test suite T to the IUT R and a fault

could be detected. By a simple inspection we see that all test
cases that lead R from state q0 to the same state q0 can detect
this fault. Notice that the output x is produced at state q3 of R
whereas x is not defined at state s3 of S. So Everest exhibits
a verdict of non­conformance which means that R does not
pass the test suite, declaring that R ioco S does not hold.

4.3 A real­world case study
Now we present a real­world system to be put under test us­
ing the automatic tools. We specify functionalities of an Au­

tomatic Teller Machine (ATM) (Mark Utting, 2007; Naik and
Tripathy, 2018) using an IOLTS model with the input stimuli
LI = {ic, pin, acc, tra, sta, wd, amo}, and the output re­
sponses LU = {cpi, bpi, mon, rec, ins, sho}. The intended
meaning of the input actions are: ic, denotes the action when
the user inserts his/her card into the ATM; pin, indicates the
pin code has been provided by the user; tra, requires the
transfer amount; acc, indicates that a target account has been
provided; sta, requires an account statement; wd, indicates
that the user has requested a withdrawal; and amo, denotes
the balance account. Also we give the meaning of the out­
put alphabet: cpi, says the pin code is correct; bpi, says the
provided pin is wrong; mon, indicates the money has been
released; rec, indicates the receipt has been provided to the
user; ins, denotes an insufficient balance on the account; and
sho, indicates the statement has been shown to the user.
We model the withdrawal operation by the IOLTS A of

Figure 4. Note that if the requested amount (amo) is greater
than the available amount (ins) then the withdrawal cannot be
performed and the process reaches state s3 where a new with­
drawal operation can be requested again. Some additional

s0 s1 s2

s3

s4

s5

?ic ?pin

!cpi

!bpi

?wd
?amo

!mon !ins

Figure 4. ATM specification A

functionalities are also specified by the IOLTS B of Figure 5.
In this case we consider not only the withdrawal (wd) opera­
tion but also the transfer (tra) and statement (sta) operations.

s0

s1 s2 s3

s4

s5

s6

s7

s8

s9

?ic

?pin
!cpi

!bpi

?tra

?sta

?wd

?acc

?amo
!rec

!ins

?amo

!mon

!ins

!sho

Figure 5. ATM specification B


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

Assume the IOLTS Z depicted in Figure 6 as an IUT that
implements the withdrawal (wd) and transfer (tra) operations.
We observe that if the requested amount (amo) in a with­
drawal is greater than the available amount then the IUT
reaches the state s7 where the user can choose a new amount
again.

s0

s1 s2 s3

s4

s5

s6

s7

s8

?ic

?pin !cpi

!bpin

?tra? wd

?acc

?amo
!rec

!ins

?amo

!mon

!ins

Figure 6. IUT Z

Now as a first testing scenario we check whether the
IUT Z conforms to the specification A. We, then, run
JTorx and Everest over these models to obtain conformance
verdicts using the ioco relation. Both tools have returned
the same verdict where Z complies with A. In a second
round, we run Everest using the language­based confor­
mance relation and, at this time, a fault could be detected
in the IUT Z. The set of desirable behaviors was given by
D = {ic pin cpi wd amo ins amo}, i.e., a sequence of ac­
tions where the account balance is not enough according to
the requested withdrawal, and the user must provide a new
value. Everest has generated the test case {ic −→ pin −→
cpi −→ wd −→ amo −→ ins −→ amo} because the behavior
specified in D is not an observable behavior in the specifica­
tion model but the IUT Z implements it.
In a second scenario we want to verify the reliability of

verdicts obtained by JTorx using ioco relation. We know that
original underspecified models (See Section 4.4) requires
some labor in such a way that self­loop transitions must be
added to get all states completely specified so handing over
an input­enabled model. Notice that the IUT Z is underspec­
ified, so JTorx must change the model to guarantee the input­
enabledness property on the IUT. After changing the model
we check whether Z ioco­conforms to the specification B us­
ing JTorx. It is easy to see that the original behavior of the
IUT has been modified and, in this case, a fault is then de­
tected by the test case {ic −→ pin −→ cpin −→ sta}.
We have also applied this second scenario to Ever-

est using the ioco relation. In the opposite direction to the
JTorx, no fault was detected by Everest once the fault be­
havior {ic −→ pin −→ cpin −→ sta} is not specified in the
IUT Z. We see that the detection of this fault by JTorx is, in
fact, a false positive, due to an extra behavior that has been
added after changing the IUT Z to become it input­enabled.
We also remark that Everest can detect this same fault

when checking ioco conformance over the same modified
model. Note that ic is the only single action defined at state
s0 of the IUT Z. When JTorx turns Z into an input­enabled
model all input actions become enabled at all states, which is
contradictory to the real functionality. For instance, we see
that the action amo, i.e., the amount value to be withdrawn,
becomes enabled at state s0. However, if a transfer opera­
tion (tra) is chosen instead of a withdrawal (wd), the amount
value to be withdrawn should not be enabled at this moment.
Hence we see that any change performed over the former
model modifies the original behavior of the IUT, leading to
an inaccurate conformance checking verdict relative to the
real functionality of the ATM.
In the last scenario we take the IOLTS Y depicted in Fig­

ure 7 as a new IUT. The IUT Y differs from the specification

s0 s1 s2

s3

s4

s5

?ic ?pin

!cpi

!bpi

?wd
!mon

!mon !ins

Figure 7. IUT Y

A depicted in Figure 4 only by the transitions (s4, ?amo, s5)
and (s4, !mon, s5), respectively. So the IUT allows a with­
drawal operation with no checking over the balance (amo)
before releasing the money (mon). By contrast, in the spec­
ification model, the balance is checked before releasing the
money when the account balance is positive. The fault model
was bounded at six states for the class of IUTs and Ever-
est has generated eighty TPs based on the corresponding
specification A. The generated test suite has then been sub­
mitted to the IUT Y, and a fault verdict could be obtained by
the path ic −→ pin −→ cpin −→ wd using our tool. For the
sake of completeness we have also applied this last scenario
to JTorx, and a fault has also been detected by the test case
?ic, δ, ?pin, !bpi, ?pin, !cpi, ?wd, !mon.

4.4 A Comparative Analysis

Here we list some main aspects and compare Everest and
JTorx. We have seen that both tools provide a mechanism
to generate test suites, run test cases and check ioco confor­
mance. Everest also provides the more general conformance
checking based on regular languages. Further, our tool allows
a complete test generation not only for the ioco relation but
also for this more general conformance relation with a wider
range of possibilities to specify desirable and undesirable be­
haviors.
JTorx’s test generation employs an exhaustive strategy

leading to the state space explosion problem making the pro­
cess infeasible in practice. In the opposite direction, Ever-
est is more flexible and allows for a complete test suite gen­


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

eration by setting the maximum number of states on the IUTs
to be taken into account in the fault model.
JTorx also implements a random approach that chooses

transitions to induce paths over the specification when gener­
ating test suites.Everest, however, only applies a random ap­
proach over the language­based conformance relation when
desirable and/or undesirable behaviors are not provided by
the tester. In this case, the test run is reduced to the problem
of checking isomorphism between the IUT and the specifica­
tion model.
We also note that both tools implement an online testing

approach when IUTs are provided together with the specifi­
cation. But only Everest provides an offline test generation
process using the notion of multigraph and test purposes.
Regarding the conformance checking process, JTorx de­

fines an online strategy where test cases are generated
and right after they are already applied to the IUT. Ever-
est follows an offline process where the whole test suite is
generated and then all test cases are applied to the IUT. How­
ever we remark that Everest also has an online alternative
process to check conformance where each test case obtained
from the fault model is applied to IUT right after it is gener­
ated. Table 1 summarizes these aspects.

Table 1. Methods and Features
JTorx Everest

Conformance checking
ioco theory

√ √

Language­based X
√

Generation
Test suite generation

√ √

Test strategy
online/offline

√ √

Test purpose
√ √

Random approach
√ √

We also probe some properties over the specification and
IUT models, test verdicts, and strategies of testing. See Ta­
ble 2. Some restrictions are naturally imposed over the mod­

Table 2. Properties and Tools
JTorx Everest

Properties
Underspecified models

√a √

Non­input­enabledness X
√

Quiescence
√ √

Veredicts
Test run

√ √

Conformance
√ √

Test mode
White/black boxes testing

√ √

aBut the internal structure of models must be changed.

els when checking ioco conformance. Underspecified mod­
els, for instance, are not allowed on the IUT side and their
internal structure must be changed to guarantee the input­
enabledness.
The language­based conformance relation does not require

any restriction, that is, the more general method can deal
with underspecified IUTs and specification models. There­
fore Everest can handle underspecified models with no
change over the models when checking conformance and

also generating test suites using the language­based rela­
tion. JTorx, on the other hand, must completely explore the
model’s structure to add new transitions to guarantee the
input­enabledness property.
We see that both tools can deal with quiescence models,

where self­loops with δ actions are added at the quiescent
states, and also give verdicts of conformance and run test
cases in a similar way.

5 Practical Evaluation
In this section, we present the results of practical experiments
that we have run to evaluate the tools’ performance. First, we
provide experiments to compare the checking conformance
methods of Everest and JTorx in Subsection 5.1. Given an
IUT and a specification, both tools can check whether the
IUT is in conformance to the specification under ioco rela­
tion. Secondly, Subsection 5.2 assays the additional feature
of Everest on generating and running test suites. In this case,
given a specification model, we can generate test suites for
a certain class of IUTs and then de facto apply them to the
IUTs.
The experiments are classified into different groups ac­

cording to the parameters under evaluation. Therefore, each
group of experiments represents a different scenario, where
either the specifications and the IUT models are changed to
capture different situations of conformance checking, or test
suite generation or test runs, e.g. the models must have a cer­
tain number of states and transitions. All experiments were
performed using randomly generated models both for spec­
ifications and IUTs, while satisfying all required properties,
if any, to avoid bias in the results. In some groups, we have
taking into account submachines of specification models as
the basis to generate IUTs with a certain percentage of mod­
ification.
We have organized all experiments by Research Questions

(RQs) to get the desired analyses using different groups of
scenarios. Our experiments have been performed on Intel
Core i5 1.8 GHz CPU, with 8 GB of RAM on Windows 10.

5.1 Conformance checking of Everest and
JTorx Tools

Here we report on some experiments to compare Ever-
est and JTorx when checking conformance between an IUT
and a given specification using their respective implementa­
tions of the ioco relation. So a single conformance checking
run is defined by a pair of models, an IUT and a specification,
where the result can be positive or negative. A verdict is said
to be positive (ioco­conformance), when the IUT complies
with the specification, or negative (non­ioco­conformance),
when the IUT does not comply with the specification, accord­
ing to the ioco relation.
We evaluate several parameters related to the specifica­

tions and IUTs, such as the number of states and the number
of input/output actions on the models. In addition, we also
consider experiments that derive verdicts of conformance
and non­conformance on separated scenarios. But we remark
that only input­enabled and deterministic models have been


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

generated in our experiments to comply with the restrictions
imposed by JTorx.
Each group of experiments on checking conformance is de­

fined between one specification and ten IUT models. There­
fore, each run is settled down by a pair of models, one IUT
against the corresponding specification, and a group of ex­
periments with ten specifications and ten IUTs for each spec­
ification that outcomes in hundred runs. s
Experiments with verdicts of ioco­conformance were run

over IUT models obtained as submachines of their respec­
tive specifications, while IUT models with verdicts of non­
ioco­conformance were randomly constructed by changing
transitions from their corresponding specifications with a cer­
tain percentage of modification. Regarding IUT models with
more states than the corresponding specifications, new states
and new transitions have been randomly added to the models.
To illustrate, let S be a specification model with 20 states

and 120 transitions. In order to get positive verdicts we ran­
domly take IUT models as submachines of S, choosing sub­
sets of states from S and their respective transitions. When
we want to guarantee negative verdicts for IUTs with 4% of
modification from its respective specification we have to ran­
domly choose 5 transitions to be modified, i.e., over these 5
transitions we have to change the source state, or the target
state, or even the action symbol. We have decided to gener­
ate IUTs using the specification models as the basis instead
of completely randomized IUT models because in practical
situations developers can make mistakes but, in general, they
minimally implement the specified model, that is, real­world
IUTs usually are not very assorted from the corresponding
specification.
Hence in the first group of experiments the size of the al­

phabets are interchanged while we increase the number of
states on the models; and in the second group we varied the
number of states (and transitions, consequently) of the spec­
ification and IUT models in a stress testing. All processing
times that are found in the graphics have been figured out
from the mean value of the processing time of all experiments
in the group.

5.1.1 Reversing the size of input/output alphabets

In this first scenario we investigate the impact over confor­
mance checking runs when the size of input and output alpha­
bets are inversely proportional. We state the RQ as follows:
“How does the size of the input alphabet (output alphabet)
impact the processing time on checking conformance?”.
To answer this question we have run experiments where

the size of input and output alphabets have been reversed on
the models. First, we take a group of IOLTS models with 2
symbols in the input alphabet and 10 symbols in the output
alphabet. In a second group of models we reserve the size
of alphabets, so taking input alphabets with 10 symbols and
output alphabets with only 2 symbols. We vary the number of
states by 15, 25 and 35 on IUTs and get the specification with
a fixed number of 10 states, both for verdicts of conformance
and non­conformance.
From the results, we note only a small variation on the pro­

cessing time when running experiments with verdicts of con­
formance either when the input alphabet is larger than the out­

put alphabet or when we reverse them in size. See Figure 8.
Our tool is only 2.56% faster when running models with 2
inputs and 10 outputs (See Figure 8(a)) than when checking
models whose size of their alphabets are reversed with 10 in­
puts and 2 outputs (See Figure 8(b)). Similarly, JTorx is only
3.51% faster when reversing the size of the alphabets.
However, we can notice that Everest is faster than JTorx

in both scenarios, where the size of the input alphabet is
larger than the output alphabet, and vice­versa.

15 20 25 30 35

0.4

0.45

0.5

0.55

Everest
JTorx

Specifications	with	10	states

Number	of	states	on	IUTs

T
im

e
	(
se

co
n
d
s)

(a) 2 inputs, 10 outputs

15 20 25 30 35

0.4

0.45

0.5

0.55

Everest
JTorx

Specifications	with	10	states

Number	of	states	on	IUTs

T
im

e
	(
se

co
n
d
s)

(b) 10 inputs, 2 outputs

Figure 8. Reversing I/O alphabets with ioco verdicts

In contrast, regarding verdicts of non­conformance, we see
an expressive impact on the verification time when running
experiments with the same scenarios where the size of the
input and output alphabets are reversed in size. In this case,
both tools have taken less processing time for models with 2
inputs and 10 outputs. Everest is 12.73% to 42.86% faster,
according to the number of states on IUTs, for models with
2 inputs and 10 outputs (See Figure 9(a)) than when running
models with 10 inputs and 2 outputs (See Figure 9(b)). JTorx
is around 200% to 352% faster, depending on the IUT size,
for the same scenarios. We can observe that the impact over
the processing time is very expressive in JTorx for verdicts of
non­conformance when we reverse the size of the alphabets.
We remark that in practical applications we usually need

more input actions than output actions to specify real­world
systems. That is, input alphabets with a large number of ac­


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

15 20 25 30 35

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95
Everest
JTorx

Specifications	with	10	states

Number	of	states	on	IUTs

T
im
e
	(
se
co
n
d
s)

(a) 2 inputs, 10 outputs

15 20 25 30 35

0.6

0.9

1.2

1.5

1.8

2.1

2.4

2.7

3

3.3

Everest
JTorx

Specifications	with	10	states

Number	of	states	on	IUTs

T
im
e
	(
se
co
n
d
s)

(b) 10 inputs, 2 outputs

Figure 9. Reversing I/O alphabets and non­ioco verdicts

tions can weigh down the performance of JTorx tool. Further,
notice that Everest always outperforms JTorx for all scenar­
ios as depicted in all figures.

5.1.2 Varying the number of states

We also performed some experiments varying the number of
states (and transitions) to evaluate the tools’ scalability. In
this case the RQ is: “How does the number of states in spec­
ifications and IUTs impact the processing time on checking
conformance?”.
We answer this question running three groups of exper­

iments: (i) specifications with 10 states and IUTs ranging
from 20 to 200 states; (ii) specifications with 50 states and
IUTs ranging from 60 to 200 states; and (iii) specifications
with 100 states and IUTs varying from 110 to 200 states. We
remark that all groups of IUT models were increased by 10
states in each group.
In the experiments with verdicts of conformance, specifi­

cations with 10 states and IUTs with up to 120 states, Ever-
est attains a better performance compared to JTorx. JTorx is
just slightly better when the IUT models have more than 120
states. See Figure 10a. When running experiments with ver­
dicts of conformance, specifications with 50 and 100 states,
and groups of IUTs with up to 200 states, Everest has always
outperformed JTorx. See Figures 10b and 10c.

20 40 60 80 100 120 140 160 180 200

0.42

0.45

0.48

0.51

0.54

0.57

0.6
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(
se

co
n
d
s)

(a) Specification with 10 states

60 80 100 120 140 160 180 200
0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(
se

co
n
d
s)

(b) Specification with 50 states

120 140 160 180 200
1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(
se
co
n
d
s)

(c) Specification with 100 states

Figure 10. Varying the number of states and ioco verdicts

Now we turn into experiments with verdicts of non­
conformance, specifications with 10 and 50 states, and IUTs
with up to 120 states. We see from Figures 11a and 11b that
Everest has always outperformed JTorx for any group of
IUTs. JTorx gets a better performance only for IUTs with
more than 200 states and specifications with 100 states. See
Figure 11c.


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

20 40 60 80 100 120 140 160 180 200
0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(

se
co

n
d

s)

(a) Specification with 10 states

60 80 100 120 140 160 180 200
1.5

3

4.5

6

7.5

9

10.5

12

13.5

15
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(

se
co

n
d

s)

(b) Specification with 50 states

110 120 130 140 150 160 170 180 190 200
24

26

28

30

32

34

36

38

40

42
Everest
JTorx

Number	of	states	on	IUTs

T
im

e
	(
se

co
n
d
s)

(c) Specification with 100 states

Figure 11. Varying the number of states and non­ioco verdicts

5.2 Everest Test Suite Generation
Now we evaluate our tool for test suite generation by running
experiments using a more recent approach (Bonifacio and
Moura, 2019), where multigraphs must be first constructed
to then generate test purposes. We vary the number of states
in the specification models and also the bound to be consid­
ered over the number of states on the IUTs. We look out to
construct distinguishing IUTs from their respective specifi­
cations with a certain percentage of modification over the
transitions in order to assess different scenarios.

We remark that the experiments on generating test suites
were performed using solely Everest due to two main rea­
sons: (i) JTorx implements an online strategy where an IUT
is always required to run the test generation mechanism; and
(ii) JTorx’s test generation process finishes at the very first
detected fault, so it cannot generate complete test suites.
In the first group of experiments we vary the number of

states on specification models together with the number of
states to be considered on IUTs; in the second group we gen­
erate test purposes over the multigraphs obtained in the first
group; and in the third group we run test suites which were
extracted from the test purposes of the second group, over
IUTs that were generated by modifying the corresponding
specification models by a certain percentage.

5.2.1 Multigraph generation step

We define the following RQ for the multigraph generation
step as follows: “What is the impact on the processing time
when generating multigraphs?”.
To answer this question we vary the number of states on

specification models and also the bound m associated with
the maximum number of states to be considered on IUT mod­
els. Moreover, we consider 5 to 35 states specifications and
construct the corresponding multigraphs to get fault models
for IUTs with 5 to 55 states. Alphabets were fixed at 5 inputs
and 5 outputs and we increase the number of states by 10 for
each group of IUTs. Transitions were randomly generated to
ensure unbiased results.
Next we briefly describe all scenarios that are taken into

account in the multigraph generation step: (i) specifications
with 5 states and m from 5 to 55 states; (ii) specifications
with 15 states and m from 15 to 55 states; (iii) specifications
with 25 states and m from 25 to 55 states; and (iv) specifica­
tions with 35 states and m from 35 to 55 states.
Figure 12 shows that the processing time for generating

multigraphs grows, in general, as the number of states also
grows on specification and IUT models. We first notice that
the median values are lying on the medium of the boxes,
which means that as the size of the models grows we also
observe a well­behaved growth of the processing time.
We particularly see in Figure 12a that the multigraph con­

struction for specifications with 5 states and m = 35 takes
0.038 seconds, whereas the construction for m = 55 takes
0.047 seconds. So the processing time rose by 23.68%. Sim­
ilarly, we observe that the construction process for specifica­
tions with 15 states takes 0.186 seconds, with m = 35, and
takes 0.253 seconds, with m = 55. In this case, the process­
ing time rose by 36.02%.
Taking specifications with 25 states and m = 35 the time

consumption of the multigraph construction is 0.428 seconds,
and for m = 55 it takes 0.676 seconds, as we can see in
Figure 12b. The processing time rose by 57.94%. In the last
group, we take specifications with 35 states and m = 35,
resulting in a time consumption of 0.994 seconds, while it
takes 1.867 seconds with m = 55. Here the processing time
rose by 87.82%.
Notice that the multigraph generation with m = 35 is

46 times faster for specification models with 5 states than
specifications with 35 states. Likewise the construction with


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

0.0

0.1

0.2

0.3

5 15 25 35 45 55
max IUT states

T
im

e
 (

s
e

c
o

n
d

s
)

spec states

5

15

Multigraph generation − specification with 5, 15 states

(a) Specifications with 5 and 15 states

1

2

3

25 35 45 55
max IUT states

T
im

e
 (

s
e

c
o

n
d

s
)

spec states

25

35

Multigraph generation − specification with 5, 15 states

(b) Specifications with 25 and 35 states

Figure 12. Multigraph generation

m = 55 is about 26 times faster for specification models with
5 states than specifications with 55 states. Therefore we can
conclude that the performance of the multigraph generation
decreases as the number of states on specifications and m
increase. But the most important issue is that the processing
time is not meaningfully affected, i.e., the processing time
does not substantially increase as the number of states rises.

5.2.2 Test purpose generation process

Now we turn into the TP generation step based on multi­
graphs that have been generated in the previous experiments.
The associated RQ, in this case, is: “How the TP generation
is impacted w.r.t. the processing time when we take multi­
graphs that have been generated by varying the number of
states from the corresponding specifications and also vary­

ing the number of states on IUT models?”.
Here we get multigraphs associated to specifications with

5 to 35 states, and vary m from 5 to 55. We fixed the number
of TPs to be generated at 1000. Figure 13 shows that the test
generation process takes much more time compared to the
multigraph generation step.

20

25

30

35

40

45

5 15 25 35 45 55
max IUT states

T
im

e
 (

s
e

c
o

n
d

s
)

spec states

5

15

1000 TPs generation from 
 multigraph of specification with 5 and 15 states

(a) Specifications with 5 and 15 states

30

50

70

90

110

25 35 45 55
max IUT states

T
im

e
 (

s
e

c
o

n
d

s
)

spec states

25

35

1000 TPs generation from 
 multigraph of specification with 25 and 35 states

(b) Specifications with 25 and 35 states

Figure 13. TP generation

From Figure 13a we see that the processing time is more
uniform for specifications with 5 states no matter we vary
m. When the number of states grows Figure 13b shows that
the processing time of the TP generation grows fast as m in­
creases. The processing time for specifications with 35 states
and m = 35 takes 66.82 seconds whereas using m = 55 it
takes 91.67 seconds. So we see that the rate rose by 37.19%.
Considering specifications with 15 states and, respectively,


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

m = 35 and m = 55, the rate rose by 33.75%, whereas
for specifications with 25 states and, m = 35 and m = 55,
respectively, the rate rose by 30.48%.

5.2.3 Running test suites

In the last group of experiments we evaluate the processing
time on running test suites. Here the RQ is given as follows:
“What is the impact on the processing time when running
test suites over IUTs with 1%, 2% and 4% of modification
w.r.t. the specifications which were used to generate the cor­
responding multigraphs?”.
To answer this question we have taken test suites from TPs

that were generated for specifications with 15 and 25 states.
We fixed m = n, that is, the number of states to be consid­
ered on IUT models is the same to the number of states in the
specification models.
Figure 14 shows the processing time according to the mod­

ification rate over the IUTs. The time consumption of the test

1% 2% 4%

15 20 25 15 20 25 15 20 25

80

90

100

110

120

max IUT states

T
im

e
 (

se
co

n
d

s)

max IUT states

15

25

1000 Tps generation from multigraph

Figure 14. Test run

run over IUTs with 1% of modification takes 83.03 seconds
with m = 15 and 89.78 seconds with m = 25. Regarding
IUTs with 2% of modification, the process takes 86.19 sec­
onds with m = 15 and 80.83 seconds with m = 25. Finally,
for IUTs with 4% of modification, the test run takes 87.54
seconds with m = 15 and 77.83 seconds with m = 25.
We see that the processing time of test runs over IUTs with

m = 15 and 1% of modification is 5.15% faster than the
test run over IUTs with 4% of modification. If we consider
m = 25 then the test run over IUTs with 4% of modification
is 13.31% faster than over IUTs with 1% of modification.

5.3 Threats to Validity
We list some aspects that may arise as a threat to the valid­
ity of the experiments. First we have to report a substantial

intricacy to obtain the JTorx tool. Several libraries were miss­
ing and we did not have full access to the source code. We
have had access only to a binary code whereupon we could
make some small amendments to adapt and run it from the
command line. Had we accordingly compiled and configured
both tools they could be appropriately set up under the same
conditions, and so the time consumption could be more eas­
ily and precisely obtained on running the experiments.
The computational resource where the experiments were

run may also be a threat. We have run all experiments in a
general­purpose machine whose results might be biased in
some way. But we remark that both tools have run all exper­
iments under the same conditions.
Another threat is related to the random generation of the

models. Although we have randomly generated all models
in order to avoid biases in the process, we had to guarantee
some properties on specific classes of experiments. For in­
stance, in some groups of experiments, we had to construct
IUTs that were in conformance to their corresponding speci­
fication while in other groups we had to guarantee a certain
rate of modification over the IUTs to get verdicts of non­
conformance. So the results might have somehow be biased
by all these extra checking tasks.
We also list as a threat, those properties that must be guar­

anteed over the models following restrictions imposed by
JTorx. We see that the size of alphabets, the number of states,
and transitions of the specification and IUT models are mod­
ified from the original models to secure such properties. So
we cannot make any claim about the similarity between these
modified models and the original ones w.r.t. their behaviors.

6 Conclusion
Conformance checking and test suite generation are impor­
tant activities to improve the reliability of developing reac­
tive systems. In this work we have presented an automatic
testing tool for checking conformance and generating test
suites for IOLTS models.
We have implemented the classical ioco relation and the

more general approach based on regular languages. The lat­
ter, and consequently Everest tool, imposes few, if any, re­
strictions over the models and allows a wider range of fault
models described by regular languages when checking con­
formance. Several works have dealt with ioco theory and
its variations. However, we are not aware of any other tool
that implements a different notion of conformance, such as
the language­based conformance. Further our tool has imple­
mented a complete black­box test suite generation using the
notion of test purposes for certain classes of fault models.
We described some case studies to probe both tools and

their functionalities in practice. We then could observe from
a comparative analysis that Everest provides a wider range
of testing scenarios since it was able to detect faults, using the
language­based approach, that were not detected by JTorx,
using the ioco theory. The effectiveness of our test suite gen­
eration method is also evaluated in black­box scenarios.
We also offered practical experiments of conformance

checking to compare the performance of Everest against the
JTorx. We can see that Everest outperforms JTorx in most sce­


An IOLTS Model­Based Testing Tool Bonifacio and Gomes 2020

narios unless for those where the structure of IUT models are
quite different from the corresponding specifications. Hence
we remark that although Everest implements a more general
conformance relation the time consumption has not been im­
pacted on checking runs. Also we observed from the results
that Everest has a more stable behavior w.r.t. the processing
time even for IUT models with quite a different number of
states. We also performed experiments of test suite genera­
tion and test run using Everest tool. Our tool was able to
handle specifications and implementation candidates with a
reasonable number of states as seen in the experiments.
The main contribution of this work is our practical tool that

can check conformance based on different relations and can
generate test suites in a black­box setting. Moreover, we have
presented some case studies, a comparative analysis, and also
practical experiments to evaluate and compare our tool.
An extension on the current version of Everest is under­

way with a new module to allow conformance checking, test
suite generation and test run in a batch mode, i.e., it will be
able to automatically test several IUT models at once. As
future directions, we intend to improve our strategies and al­
gorithms to generate test suites and run test cases more effi­
ciently.

References
Belinfante, A. (2010a). Jtorx: A tool for on­line model­
driven test derivation and execution. In Esparza, J. and
Majumdar, R., editors, Tools and Algorithms for the Con­
struction and Analysis of Systems, 16th International Con­
ference, TACAS 2010, Lecture Notes in Computer Science,
pages 266–270. Springer.

Belinfante, A. (2010b). Jtorx: A tool for on­line model­
driven test derivation and execution. In Esparza, J. and
Majumdar, R., editors, Tools and Algorithms for the Con­
struction and Analysis of Systems, pages 266–270, Berlin,
Heidelberg. Springer Berlin Heidelberg.

Belinfante, A. (2014). JTorX: Exploring Model­Based Test­
ing. Centre for Telematics and Information Technology
(CTIT), Netherlands. IPA Dissertation series no. 2014­09.

Bonifacio, A. L. and Moura, A. V. (2019). Complete test
suites for input/output systems. CoRR, abs/1902.10278.
Accessed on: 2019­06.

Calamé, J. (2005). Specification­based test generation with
tgv. Software Engineering Notes.

Clarke, D., Jéron, T., Rusu, V., and Zinovieva, E. (2002).
STG: A Symbolic Test Generation Tool. In Katoen, J.­P.
and Stevens, P., editors, Tools and Algorithms for the Con­
struction and Analysis of Systems, pages 470–475, Berlin,
Heidelberg. Springer Berlin Heidelberg.

de Vries, R. (2001). Towards formal test purposes. In Tret­
mans, G. and Brinksma, H., editors, Formal Approaches
to Testing of Software 2001 (FATES’01) ­ Volume NS­01­4,
BRICS Notes Series, pages 61–76, Aarhus, Denkmark.

Gomes, C. S. and Bonifacio, A. L. (2019). Automatically
checking conformance on asynchronous reactive systems.
In The Fourteenth International Conference on Software
Engineering Advances, pages 17–23.

Jard, C. and Jéron, T. (2005). Tgv: theory, principles and

algorithms. International Journal on Software Tools for
Technology Transfer, 7(4):297–315. Accessed on: 2019­
08.

Larsen, K. G., Mikucionis, M., Nielsen, B., and Skou, A.
(2005). Testing Real­time Embedded Software Using
UPPAAL­TRON: An Industrial Case Study. In Proceed­
ings of the 5th ACM International Conference on Embed­
ded Software, EMSOFT ’05, pages 299–306. ACM.

Mark Utting, B. L. (2007). practical model­based testing a
tools approach. Elsevier, 1nd edition.

Marsso, L., Mateescu, R., and Serwe, W. (2018). Testor: A
modular tool for on­the­fly conformance test case gener­
ation. In Beyer, D. and Huisman, M., editors, Tools and
Algorithms for the Construction and Analysis of Systems,
pages 211–228, Cham. Springer International Publishing.

Mostowski, W., Poll, E., Schmaltz, J., Tretmans, J., and
Wichers Schreur, R. (2009). Model­Based Testing of Elec­
tronic Passports. In Alpuente, M., Cook, B., and Joubert,
C., editors, Formal Methods for Industrial Critical Sys­
tems, pages 207–209, Berlin, Heidelberg. Springer Berlin
Heidelberg.

Naik, K. and Tripathy, P. (2018). Software Testing and Qual­
ity Assurance: Theory and Practice. Wiley Publishing,
2nd edition.

Roehm, H., Oehlerking, J., Woehrle, M., and Althoff, M.
(2016). Reachset conformance testing of hybrid automata.
In Proceedings of the 19th International Conference on
Hybrid Systems: Computation and Control, HSCC ’16,
pages 277–286, New York, NY, USA. ACM.

Simão, A. d. S. and Petrenko, A. (2014). Generating com­
plete and finite test suite for ioco: Is it possible? In Pro­
ceedings Ninth Workshop on Model­Based Testing, MBT
2014, Grenoble, France, 6 April 2014., pages 56–70. Ac­
cessed on: 2019­07.

Sipser, M. (2006). Introduction to the Theory of Computa­
tion. Course Technology, second edition.

Tretmans, J. (1993). A formal approach to conformance test­
ing. In Rafiq, O., editor, Protocol Test Systems, VI, Pro­
ceedings of the IFIP TC6/WG6.1 Sixth International Work­
shop on Protocol Test systems, Pau, France, 28­30 Septem­
ber, 1993, volume C­19 of IFIP Transactions, pages 257–
276. North­Holland.

Tretmans, J. (2008). Model based testing with labelled transi­
tion systems. In Hierons, R. M., Bowen, J. P., and Harman,
M., editors, Formal Methods and Testing, An Outcome of
the FORTEST Network, Revised Selected Papers, volume
4949 of Lecture Notes in Computer Science, pages 1–38.
Springer.


	Introduction
	Related Works
	A Model-Based Testing Method
	Checking Conformance on Reactive Systems
	Complete Test Suite Generation

	A Testing Tool for Reactive Systems
	Conformance checking process
	Everesttest suite generation
	A real-world case study
	A Comparative Analysis

	Practical Evaluation
	Conformance checking of Everestand JTorx Tools
	Reversing the size of input/output alphabets
	Varying the number of states

	EverestTest Suite Generation 
	Multigraph generation step
	Test purpose generation process
	Running test suites

	Threats to Validity

	Conclusion