Scalable Software Testing and Verification for Industrial-Scale Systems: The Challenges


Electronic Communications of the EASST
Volume 77 (2019)

Interactive Workshop on the Industrial Application of
Verification and Testing,
ETAPS 2019 Workshop

(InterAVT 2019)

Scalable Software Testing and Verification for Industrial-Scale Systems:
The Challenges

Anila Mjeda and Goetz Botterweck

7 pages

Guest Editors: Anila Mjeda, Stylianos Basagiannis, Goetz Botterweck
ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122

http://www.easst.org/eceasst/


ECEASST

Scalable Software Testing and Verification for Industrial-Scale
Systems: The Challenges

Anila Mjeda∗ and Goetz Botterweck†

Lero–The Irish Software Research Centre
University of Limerick

Limerick, Ireland
name.surname@lero.ie

Abstract: In this position paper, we argue that more collaborative research is needed
to increase the use of research-led verification and testing techniques in industrial-
scale projects. We focus on the a) practical applicability and scalability of verifica-
tion and testing techniques in industrial projects, and b) to autonomous systems. We
identify the challenges involved and bring forward some initial suggestions.

Keywords: testing; verification; industrial-scale systems

1 Introduction

Modern verification and testing techniques are highly relevant for industrial software-intensive
systems. While recent technological trends increase the need for industrial-scale robust ap-
proaches, issues such as education of practitioners, insufficient tools, and specifics of the in-
dustrial environment [6] create barriers for the application of modern verification and testing
techniques in industrial practice.

We direct our focus on two areas: the practical applicability and scalability of verification and
testing techniques in realistic industrial projects and the application of verification and testing
to autonomous systems. In this position paper, we (1) argue that to overcome such barriers
more collaborative research is needed and (2) identify research areas that show potential on
this regards.We conclude the paper with an overview of the current challenges aiming to sketch
potential areas for collaborative future work.

2 Scalability and Practical Applicability

In this section, we focus on the scalability of verification and testing techniques and applicability
to industrial projects. In particular, we are looking at (1) model-based testing, (2) the integration
of model-checking and testing, and (3) metaheuristic approaches.

Model-based Testing (MBT). MBT approaches typically rely on a specification-based model
of the system under test. The model is used to generate tests according to some previously

∗ Supported, in part, by Science Foundation Ireland grant 13/RC/2094.
† Supported, in part, by Science Foundation Ireland grant 13/RC/2094.

1 / 7 Volume 77 (2019)


Scalable Software Testing and Verification for Industrial-Scale Systems: The Challenges

specified test generation and selection criteria. MBT is reportedly cost-effective; it provides a
platform for repeatable and traceable results which link tests to the requirements (one of the
key needs for certification); and, can be used to automatically generate tests to achieve coverage
criteria (such as MC/DC). MBT approaches can be distinguished by the kind of model they em-
ploy (e.g., deterministic/non-deterministic, discrete/continuous/hybrid, etc.); the test generation
technology (e.g. graphs search algorithms) and test selection criteria (e.g., structural model cov-
erage); the test execution options (e.g., MiL, SiL, HiL); and, the test evaluation approach (e.g.,
signal-feature based) [23]. For a review on MBT see [29].

Integration of Model-Checking and Testing. Model checking (MC) ’reasons’ in terms of the
states a system can be in, with state-space explosion as a fundamental challenge. Mitigation
strategies typically aim to (i) consider only a subset of all possible states (e.g., probabilistic MC)
or (ii) raise the level of abstraction for the state representation (e.g., symbolic MC).

Callahan et al. [4] and Engels et al. [8] were the first to suggest deriving tests from counterex-
amples found by a model checker, e.g., generate a test that drives the system towards a known
property violation (see the survey in [13]). Such approaches can be broadly categorised into: (i)
ones where the test criteria/test objectives are mutated into their logical complement (also called
trap properties [13]); and, (ii) ones where the model-under-analysis is mutated [13] (e.g., by
reversing the logic of transition guards in a statechart). There are a number of industrial eval-
uations for test generation via MC (e.g., [10]), nevertheless, it is still difficult to scale up for
testing complex systems. Bounded model checking [3] and directed model checking [7] have
been proposed as promising approaches in the field of test generation [12].

Metaheuristic Search and Optimization. Metaheuristic approaches mitigate the scalability
problem by translating testing or verification into (automated) search and optimization tech-
niques [5, 22, 17, 1]. Quite often the optimization problems are too complex, they could be
NP-complete or NP-hard, for which optimal solutions would be impractical to compute. Meta-
heuristics are a subset of techniques (such as evolutionary algorithms, particle swarm optimiza-
tion) that solve optimization problems via the use of non-exhaustive search algorithms; and while
they do not provide guarantees to find the best solution, they return a good nearly-optimal solu-
tion at a reasonable computational cost. For an overview of the field see [17].

3 Testing and Verification of Autonomous Systems

The correct behaviour of autonomous systems requires automated correct decisions taken in real
time. The decision-making is typically driven by a machine learning technique such as neural
networks where the ’learning’ phase mimics empirical learning based on a set of training data.
These systems tend to be opaque to humans, e.g. a human typically cannot read or deduce
the decision rules that have been learned by a fully trained neural network [19]. The classic
testing approaches of these systems typically revolve around randomly selected test inputs and
scenario simulations and can leave corner-case behaviours completely untested (e.g. autonomous
driving in poor visibility). Metamorphic techniques and new coverage metrics – Metamorphic
testing of autonomous behaviour is pursued by a number of researchers, such as [24]. Pei et al.

InterAVT 2019 2 / 7


ECEASST

propose using neuron coverage (defined as the number of unique neurons that get activated for
given inputs over the total number of neurons [25]) as a new testing metric applicable to neural
networks. Tian et al. [28] use neuron coverage for guiding test generation and metamorphic
relations to identify erroneous behaviours. Adversarial techniques– A number of techniques
use adversarial examples such as the introduction of small distortions or noise in the inputs
to detect possible erroneous behaviour in some corner cases (e.g., image distortion) [11, 20].
Verification– Katz et al. [18] propose using an SMT solver to verify safety properties in deep
neural networks and are working to improve on the scalability of their technique for real-world
systems.

4 High-Potential Industry-Academia’ Collaborations

A large and growing body of literature has reported on the need for more collaboration between
industry and academia and has analysed the solutions needed to address it.

N
o

n
-t

ri
v

ia
l 

C
h

a
ll

e
n

g
e

s
(S

c
ie

n
ti

fi
c

 R
ig

o
r,

 C
o

m
p

le
x

it
y)

Needed in Practice
(Relevancy)

Inn ovation  with 
little impact

Feasible, but 
trivial projects

Easy, but 
not n eeded

Nirvana

Very hard to 
re ach

Tendency o f
academic
research

Tendency o f
industry

Target for
high-potential
collaborations

1

2

3

Figure 1: Conflicting objectives.

For example, Bertolino [2] analysed the discordance
between the state of practice and state of art in test-
ing and argued for the need of more empirical research
in industrial software testing. Garousi et.al [15] con-
ducted a systematic literature review on the challenges
of industry-academia collaborations and concluded that
different focus areas are the key impediment to better
collaboration.

Whereas, Engström et. al. [9] proposed a taxon-
omy for supporting industry-academia collaboration
and Gorschek at. al. [16] proposed a model for tech-
nology transfer to happen in practice.

Comparative 
Evaluation

T2T1

Integration

T2T1

T3Results

T1

Application

Use Case

Results

Figure 2: Models of collaboration.

Yet, this industry-academia viewpoint’ divide re-
mains and its current span damages both ’parties’. One
could discuss, whether there is an inherent conflict be-
tween addressing challenges that are attractive for both
academic research and industry. Typically, academic
researchers tend to be attracted by scientifically chal-
lenging complex problems [14] (illustrated by 1 in Fig-
ure 1); can often use artificial examples when validating
the (albeit complex) research; and, can often focus on
generic problems while not addressing the domain constraints [26]. Whereas, industry tends to
be interested in challenges that could be scientifically trivial but tangibly relevant and feasible
for them (e.g. improve effectiveness and efficiency of testing) and not particularly interested in
scientifically challenging techniques that are too complex to implement in practice [14] 3 .

Here, collaborative projects [21] could serve as a motivation to leave one’s comfort zone and
tackle problems that are both scientifically novel and relevant in practice 2 while industry could
benefit from access to novel techniques and innovation potential. The collaboration can follow a
number of different models such as (1) comparative evaluations of techniques, (2) integration of

3 / 7 Volume 77 (2019)


Scalable Software Testing and Verification for Industrial-Scale Systems: The Challenges

techniques, or (3) the application of novel techniques in a realistic setting (Figure 2). This calls
for joined efforts throughout the stages of scientific collaboration (e.g, foundation, formulation,
sustainment, conclusion [27]). In collaborations, the involved can learn from each other, boost
innovation, and enrich their own knowledge by relating different world-views.

5 Challenges and Conclusions

Scalable Testing and Verification. A close analysis of the state-of-the-art identifies a gap in
addressing the formal testing and verification of industrial-scale software systems. The majority
of the proposed solutions are based on a number of assumptions on the system under test, which
can prove too restrictive for real-world systems. Moreover, the sometimes different world-views
of researchers and industry and unfamiliarity of practitioners with prototype tools proposed by
researchers create a real or perceived barrier to industry application.

Even though there are some industrial evaluations on model checking for test generation (e.g.,
[10]) there remain many issues when attempting to scale up these approaches for testing complex
systems. At the core of the issues remains the fact that model checkers have not been originally
designed for test generation [12] and the state-explosion problem that typically accompanies
model-checking.

Scalability remains one of the biggest challenges and the main conclusions to take on board
are: (1) scalability analysis should be considered in early stages of any project’s design; (2)
currently, the most direct route is to aim for combinations of testing and verification approaches
that complement each other, i.e., mitigate each other’s shortcomings; and, (3) collaborate to un-
derstand each other’s world view, to compare or combine techniques, and to apply and evaluate
techniques in practice.

Testing and Verification of Autonomous Systems. The complex nature of autonomous sys-
tems translates into developing methods of testing the unknown and the unpredictable. Some
promising emerging techniques include neuron coverage used as a classic black-box testing tech-
nique (testing the unknown)[25], grey-box test generation led by a neuron-coverage metric and
oracle generation via metamorphic testing techniques [28], leveraging adversarial techniques to
detect erroneous behaviour [11, 20]; and, using solvers to verify (all be it not very rich) safety
properties [18].

Collaborative Research. In this position paper, we argue that, to overcome barriers to the ap-
plication of verification and testing techniques in industry, we need more collaborative research
which is both scientifically novel and relevant in practice (Figure 1). This calls for joined efforts
throughout the stages of scientific collaboration which could follow cooperation models such as
(1) comparative evaluations of techniques, (2) integration of techniques, or (3) the application of
novel techniques in a realistic setting (Figure 2).

Researchers could use this to avoid an over-emphasis of internal validity (e.g., experiments
that are easy to run but irrelevant) and mitigate bias towards selecting evaluations that artificially
confirm their techniques and all the involved can learn from each other, boost innovation, and

InterAVT 2019 4 / 7


ECEASST

enrich their own knowledge by relating different world-views. Last but not least, many current
challenges require interdisciplinary approaches and, hence, collaborations.

Bibliography

[1] Shaukat A., L. C Briand, H. Hemmati & R. K. Panesar-Walawege (2010): A systematic
review of the application and empirical investigation of search-based test case generation.
TSE 36(6), pp. 742–762.

[2] Antonia Bertolino (2004): The (im) maturity level of software testing. ACM SIGSOFT
Software Engineering Notes 29(5), pp. 1–4.

[3] Armin Biere, Alessandro Cimatti, Edmund M. Clarke, Ofer Strichman & Yunshan Zhu
(2003): Bounded Model Checking. Advances in Computers 58, pp. 117–148.

[4] John Callahan, Francis Schneider, Steve Easterbrook et al. (1996): Automated software
testing using model-checking. In: Proc. of 1996 SPIN workshop, 353.

[5] J. Clarke, J. J. Dolado, M. Harman, R. Hierons, B. Jones, M. Lumkin, B. Mitchell, Man-
coridis et al. (2003): Reformulating software engineering as a search problem. IEE Proc.-
Sw. 150(3), pp. 161–175.

[6] J. A. Davis, M. Clark, D. Cofer, A. Fifarek, J. Hinchman, J. Hoffman, B. Hulbert, S. P.
Miller & L. Wagner (2013): Study on the Barriers to the Industrial Adoption of Formal
Methods, pp. 63–77. Springer, Berlin.

[7] Stefan Edelkamp, Alberto Lluch Lafuente & Stefan Leue (2001): Directed explicit model
checking with HSF-SPIN. In: Proc. of 8th SPIN workshop on Model Checking of Software,
Springer, pp. 57–79.

[8] André Engels, Loe Feijs & Sjouke Mauw (1997): Test generation for intelligent networks
using model checking. Tools and Algorithms for the Construction and Analysis of Systems,
pp. 384–398.

[9] Emelie Engström, Kai Petersen, Nauman bin Ali & Elizabeth Bjarnason (2017): SERP-test:
a taxonomy for supporting industry–academia communication. Software Quality Journal
25(4), pp. 1269–1305.

[10] Eduard P. Enoiu, A. Čaušević, T. J. Ostrand, E. J. Weyuker, D. Sundmark & P. Pettersson
(2016): Automated test generation using model checking: an industrial evaluation. STT
18(3), pp. 335–353.

[11] Ivan Evtimov, K. Eykholt, E. Fernandes, T. Kohno, Bo Li, A. Prakash, A. Rahmati &
D. Song (2017): Robust physical-world attacks on machine learning models. arXiv preprint
arXiv:1707.08945.

[12] Gordon Fraser, Franz Wotawa & Paul Ammann (2009): Issues in using model checkers for
test case generation. Journal of Systems and Software 82(9), pp. 1403–1418.

5 / 7 Volume 77 (2019)


Scalable Software Testing and Verification for Industrial-Scale Systems: The Challenges

[13] Gordon Fraser, Franz Wotawa & Paul E. Ammann (2009): Testing with model checkers: a
survey. Software Testing, Verification and Reliability 19(3), pp. 215–261.

[14] Vahid Garousi, Michael Felderer, Marco Kuhrmann & Kadir Herkiloğlu (2017): What in-
dustry wants from academia in software testing?: Hearing practitioners’ opinions. In:
Proceedings of the 21st International Conference on Evaluation and Assessment in Soft-
ware Engineering, ACM, pp. 65–69.

[15] Vahid Garousi, Kai Petersen & Baris Ozkan (2016): Challenges and best practices in
industry-academia collaborations in software engineering: A systematic literature review.
Information and Software Technology 79, pp. 106–127.

[16] Tony Gorschek, Per Garre, Stig Larsson & Claes Wohlin (2006): A model for technology
transfer in practice. IEEE software 23(6), pp. 88–95.

[17] Mark Harman (2007): The current state and future of search based software engineering.
In: 2007 Future of Software Engineering, IEEE Computer Society, pp. 342–357.

[18] Guy Katz, Clark Barrett, David L Dill, Kyle Julian & Mykel J Kochenderfer (2017): Relu-
plex: An efficient SMT solver for verifying deep neural networks. In: CAV 2017, Springer,
pp. 97–117.

[19] Alex Krizhevsky, Ilya Sutskever & Geoffrey E Hinton (2012): Imagenet classification with
deep convolutional neural networks. In: Advances in neural information processing sys-
tems, pp. 1097–1105.

[20] Alexey Kurakin, Ian Goodfellow & Samy Bengio (2016): Adversarial examples in the
physical world. arXiv preprint arXiv:1607.02533.

[21] Satoshi Masuda (2017): Software testing in industry and academia: A view of both sides
in japan. In: 2017 IEEE International Conference on Software Testing, Verification and
Validation Workshops (ICSTW), IEEE, pp. 40–41.

[22] Phil McMinn (2004): Search-based software test data generation: a survey. Software
testing, Verification and reliability 14(2), pp. 105–156.

[23] Anila Mjeda (2013): Standard-compliant testing for safety-related automotive software.
Ph.D. thesis.

[24] Anh Nguyen, Jason Yosinski & Jeff Clune (2015): Deep neural networks are easily fooled:
High confidence predictions for unrecognizable images. In: Proc. of the CVPR 2015, pp.
427–436.

[25] Kexin Pei, Yinzhi Cao, Junfeng Yang & Suman Jana (2017): Deepxplore: Automated
whitebox testing of deep learning systems. In: Proc.of the 26th Symp. on Operating Sys-
tems Principles, ACM, pp. 1–18.

InterAVT 2019 6 / 7


ECEASST

[26] Rick Rabiser, Klaus Schmid, Martin Becker, Goetz Botterweck, Matthias Galster, Iris Gro-
her & Danny Weyns (2018): A study and comparison of industrial vs. academic software
product line research published at SPLC. In: Proceeedings of the 22nd International Con-
ference on Systems and Software Product Line-Volume 1, ACM, pp. 14–24.

[27] Diane H Sonnenwald (2007): Scientific collaboration. Annual review of information sci-
ence and technology 41(1), pp. 643–681.

[28] Yuchi Tian, Kexin Pei, Suman Jana & Baishakhi Ray (2018): Deeptest: Automated testing
of deep-neural-network-driven autonomous cars. In: Proc. of ICSE’18, ACM, pp. 303–
314.

[29] Mark Utting, Alexander Pretschner & Bruno Legeard (2012): A taxonomy of model-based
testing approaches. Software Testing, Verification and Reliability 22(5), pp. 297–312.

7 / 7 Volume 77 (2019)


	Introduction
	Scalability and Practical Applicability
	Testing and Verification of Autonomous Systems
	High-Potential Industry-Academia' Collaborations
	Challenges and Conclusions