Journal of Software Engineering Research and Development, 2020, 8:5, doi: 10.5753/jserd.2020.546
 This work is licensed under a Creative Commons Attribution 4.0 International License..

Software Operational Profile vs. Test Profile: Towards a better
software testing strategy
Luiz Cavamura Júnior  [ Federal University of São Carlos | luiz_cavamura@ufscar.br ]
Ricardo Morimoto  [ Federal University of São Carlos | rmmorimoto@gmail.com ]
Sandra Fabbri  [ Federal University of São Carlos | sfabbri@ufscar.br ]
Ana C. R. Paiva  [ School of Engineering, University of Porto & INESC TEC | apaiva@fe.up.pt ]
Auri Marcelo Rizzo Vincenzi  [ Federal University of São Carlos | auri@ufscar.br ]

Abstract The Software Operational Profile (SOP) is a software specification based on how users use the software.
This specification corresponds to a quantitative representation of the software that identifies its most used parts.
As software reliability depends on the context in which users operate the software, the SOP is used in software
reliability engineering. However, there is evidence of a misalignment between the software tested parts and the
SOP. Therefore, this paper investigates a potential misalignment between SOP and the tested software parts to
obtain more evidence of this misalignment based on experimental data. We performed a set of Experimental Studies
– EXS to verify: a) whether there are significant variations in how users operate the software; b) whether there is
a misalignment between the SOP and the tested software parts; c) whether failures occur in untested SOP parts in
case of misalignment; d) whether a test strategy based on the amplification of the existent test set with additional
test data generated automatically can contribute to reduce the misalignment between SOP and untested software
parts. We collected data from four software while users were operating them. We analyzed this data to reach the
goals of this work. The results show that there is significant variation in how users operate software and that there is
a misalignment between SOP and the tested software parts after evaluating the four software studied. There is also
indication of failures in the untested SOP parts. Although the aforementioned test strategy has reduced the potential
misalignment, the test strategy is not enough to avoid it, thus indicating a need for specific test strategies using SOP
as a test criterion. These results indicate that SOP is relevant not only to software reliability engineering but also to
testing activities, regardless of the adopted testing strategy.

Keywords: Software Quality, Software Testing, Operational Profile, Test Profile

1 Introduction
Software users provide relevant data related to the many pos-
sible ways they explore a given software feature. We create
software based on the expression of the creative nature of our
intellect (Assesc, 2012). Using their previous professional
experience, this same creative aspect allows software users to
adapt to different ways of using the software due to changes
in the process initially supported by the program (Som-
merville, 1995). This feature makes software functionalities
parameterizable to meet specific and particular needs, even
if they are designed to meet business rules that are common
to many organizations.
The Software Operational Profile (SOP) corresponds to

the manner in which a given user operates the software.
The SOP may be quantitatively characterized by assigning
a probabilistic distribution to the software operations, show-
ing what users use the most in software (Musa, 1993; Gittens
et al., 2004; Sommerville, 1995). A given user may not repro-
duce the same failure identified by another one. The reason
for this is that software can have many different operational
profiles and experienced users can adapt how they operate
the software. As such, software quality is dependent on its
operational use (Cukic and Bastani, 1996).
A survey by Cukic and Bastani (1996) states that infor-

mation about SOP is considered either essential or relevant
to issues related to activities inherent to software develop-
ment. Examples of these questions are: “Which are the most

used parts of the software?”; “How do users use the applica-
tion?”; “What are the software usage patterns?”; and “How
does test coverage correspond to the code that was indeed
executed by users?”. Additionally, Rincon (2011) analyzed a
set of ten open-source software and, in only one of them, the
available functional test set reached a code coverage close to
70%. Even if this interval level of code coverage is consid-
ered acceptable, there is a significant percentage of untested
code which may be related to critical features for the major-
ity of software users. This fact highlights the possibility of a
misalignment between the tested parts and the parts that users
effectively use. Thus, there are indications of the relevance
of SOP in ensuring software quality and also in evidencing a
possible misalignment between SOP and the tested software
parts (Rincon, 2011; Begel and Zimmermann, 2014). This
misalignment can often lead to failures when operating the
software.
The term misalignment refers to the potential dissonance

between the software tested parts and the SOP, which corre-
sponds to the software parts most used by users. Thus, it rep-
resents situations in which the SOP or parts of the SOP may
not have been previously executed by the software test suite,
indicating that the adopted test strategy may not be aligned
with the user’s interests in terms of software functionality.
Therefore, this study investigates a potential misalignment

between the tested software parts and SOP. The research re-
sults, based on a set of Experimental Studies (EXS), provide
the following contributions:

https://orcid.org/0000-0001-5090-6845
mailto:luiz_cavamura@ufscar.br
https://orcid.org/0000-0001-8113-6900
mailto:rmmorimoto@gmail.com
https://orcid.org/0000-0003-3052-3016
mailto:sfabbri@ufscar.br
https://orcid.org/0000-0003-3431-8060
mailto:apaiva@fe.up.pt
https://orcid.org/0000-0001-5902-1672
mailto:auri@ufscar.br


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

1. Evidence that there are significant variations in how
users operate software, even when they perform the
same operations, i.e., there are different software usage
patterns;

2. Evidence of a possible misalignment between SOP and
software testing;

3. Evidence that there are faults concentrated on untested
parts of the software;

4. Definition and introduction of the term “test profile”;
5. Evidence that even when using an automated test gen-

erator to extend an existent test set the misalignment be-
tween the SOP and the tested parts of the software has
little improvement.

In addition, the related studies briefly present the results
obtained by a Systematic Review of the Literature (SLR),
which we carried out before the execution of the experimen-
tal studies. These results show that, to the best of our knowl-
edge, there is no previous study with the same purpose as this
one (Cavamura Júnior et al., 2020). We adapted the method-
ology proposed by Mafra et al. (2006) to plan and perform
the activities described in this paper.
The remaining of this paper is as follows: Section 2

presents concepts related to the definition of SOP. Section 3
describes the adopted methodology for this study. Section 4
presents the related studies identified and selected by the
SLR (Cavamura Júnior et al., 2020). Section 5 describes the
results of the experimental studies. Section 6 presents some
lessons learned with the results. Section 7 presents threats
to validity. Lastly, Section 8 describes the conclusions and
future work.

2 Software Operational Profile (SOP)
SOP is a way to obtain a specification of how users oper-
ate software (Musa and Ehrlich, 1996; Sommerville, 1995).
Musa (1993) proposed one of the most relevant approaches
for SOP registration. Musa (1993) defines SOP as a quanti-
tative characterization based on the way software is operated.
This definition corresponds to software operations, to which
an occurrence probability is assigned. An operation corre-
sponds to a task performed by the software. We delimit this
operation by external factors related to software implemen-
tation.
Software operations can present different behavior and,

consequently, provide different results. In this way, there are
different possible execution paths, depending on the given
input data. These different ways of execution are named ex-
ecution types. In Figure 1 we present an example of software
operations and their respective execution types.
Input data, which characterize an execution type, create a

data set named input state (“IS” in Figure 1). Input states, as-
sociated with execution types, form the software input space.
As input states characterize the execution types of an oper-
ation, the input space can be fractioned by operations, as-
sociating an input state set in each operation, named opera-
tion domain. Thus, it is possible to assign an input domain to
each software operation (“ID” in Figure 1) that determines
how the software executes the operation; i.e., the input do-

Figure 1. Concepts involved in the definition of the operational profile.

main elements (input states) determine the execution type of
an operation.
In Figure 1 are shown: i) the input states, identified

by “IS1, IS2, IS3, . . . , ISn”, ii) the software input space,
and iii) the input domain of each operation, identified by
“IDop1, IDop2, . . . , IDopn”.
Although the operation set available in software is finite,

the execution types correspond to a set with infinite elements,
given that the input domain can be infinite. Thus, assigning
an occurrence probability to execution types is possible since
we can partition the input domain into sub-domains. Each
generated sub-domain corresponds to an execution category.
These categories group the execution types whose different
input states produce the same behavior in operation.
In Figure 1 we present the execution categories, identi-

fied by “EC1, EC2, . . . , ECn”, which divide the input do-
main of each operation and group the execution types with
the same behavior. In Figure 1 we present the existing re-
lation between operational concepts, execution types, input
state, input space, input domains, and execution categories.
In Musa (1994, 1993) studies, the author assigns an oc-

currence probability to the execution categories in order to
obtain a quantitative characterization of the software corre-
sponding to the operational profile. The data used to get the
occurrence probabilities of operation can be obtained from
log files generated by previous version of the software or
from similar software (Musa, 1993; Takagi et al., 2007).
Developer expectations can also determine these probabil-
ities (Takagi et al., 2007).
In the context of this study, the term granularity corre-

sponds to the level of fragmentation (be it conceptual or
structural) we use to assign an occurrence probability or exe-


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

cution frequency to the generated software fragments. Then,
it is possible to identify the most used software parts when
users are operating the software, i.e., the SOP. According
to the object-oriented programming paradigm, subprograms
correspond to the methods implemented in data structures,
called classes. Thus, the methods in this paradigm represent
actions assigned to the operations performed by the software.
As SOP is a software specification based on how users

operate software (Musa and Ehrlich, 1996; Sommerville,
1995), showing the software parts most used by users, the
SOP in the context of this paper corresponds to the frequency
of the processed methods while the software is performed by
users, thus indicating the most operated software parts.

2.1 The SOP and the Software Quality
Pressman (2010) defines software quality as an effective pro-
cess of creating a valuable product for those who produce
it and will use it. Thus, software quality can be subjective
in that it depends on the point of view of who is analyzing
the software’s characteristics. Considering the user’s point of
view, for example, software of quality is software that meets
its needs and is easily operated (Falbo, 2005). However, from
a developer’s point of view, software of quality is e.g., one
that demands less maintenance effort.
Software reliability corresponds to the probability of a

software operation occurring without any occurrence of fail-
ure in a specified period and in a specific environment (Musa,
1979; Cukic and Bastani, 1996). Thus, as software reliability
depends on the context in which software is used, software
reliability meaning software maintainability and efficiency
(among others) is one of the software’s attributes related to
software quality, and it represents the user’s point of view on
software quality (Musa, 1979; Bittanti et al., 1988).
Since the SOP represents the way software will be used

by its users and considers software reliability as dependent
on the context in which users operate the software, SOP can
support activities related to the reliability of software engi-
neering. Thus, the purpose of SOP is to generate test data
that reproduces the way software is executed in its produc-
tion environment, ensuring the validity of reliability indica-
tors (Musa and Ehrlich, 1996).
In the software reliability process, a usage model repre-

senting the SOP is created to design test cases and perform
the test activity. The elements constituting the usage model
correspond to the adopted granularity to determine the SOP,
whose execution frequencies or occurrence probability iden-
tify the most used software parts.
In the literature, studies using models representing SOP

in their testing techniques have classified these techniques
as statistical testing, statistical use testing, reliability testing,
model-based testing, use-based testing and SOP-based test-
ing (Poore et al., 2000; Kashyap, 2013; Sommerville, 2011;
Pressman, 2010; Musa and Ehrlich, 1996).
It is worth noting that the frequency with which a fault be-

comes apparent during the software operation is more signif-
icant for users than the remaining faults (Takagi et al., 2007)
and a defect affecting reliability for one user may never be
revealed to another who has a different work routine (Som-
merville, 2011). The use of SOP does not guarantee the de-

tection of all faults, but it ensures that the most used software
operations are tested (Ali-Shahid and Sulaiman, 2015).

2.2 Problems related to the use of SOP
Although the SOP can be obtained from log files recording
events that occur in the operating software, in previous ver-
sions of the software, in similar software and even from the
developers’ experience (Musa, 1993; Takagi et al., 2007),
there are several problems related to the identification of the
SOP reported in the literature.
In this study, we observed that the use of an instrumented

version of the software to identify the SOP of the data col-
lected during operation of software by users affects the per-
formance of operating the software and generates a large vol-
ume of data. According to Namba et al. (2015), the effort to
identify the SOP depends on the complexity of the software.
Other kinds of problems are also reported in the literature.
Thus, reports of difficulties and issues related to SOP iden-
tified in the literature are relevant and will be addressed in
possible test approaches defined according to the results pre-
sented in this paper. Table 1 summarizes the main challenges
and problems identified.

3 Research Methodology
The results presented in this paper are part of a PhD
Project (Cavamura Júnior, 2017) that follows the method-
ology proposed by Mafra et al. (2006). The methodologi-
cal steps proposed by Mafra et al. (2006) were instantiated
into the context of the research presented in this article. This
methodology is an extension of the methodology proposed
by Shull et al. (2001) for introducing software processes. The
methodology proposed by Mafra et al. (2006) is shown in
Figure 2.
We defined five research questions to guide our investiga-

tion in this paper:

• RQ1: Are there other studies with the same goal or sim-
ilar goals whose results provide the contributions pro-
posed in this paper?

• RQ2: Are there any relevant variations in how users op-
erate software?

• RQ3: Is there misalignment between SOP and the tested
software parts?

• RQ4: Given the misalignment between SOP and the
tested software parts, do the failures occur in the
untested SOP parts?

• RQ5: Given the misalignment between SOP and the
tested software parts, can a test strategy including auto-
mated test data1 generator contribute to reduce the mis-
alignment?

To answer RQ1 and considering the methodology pre-
sented in Figure 2, the step “Secondary Study” included
a Systematic Mapping Study (SMS) and a Systematic Lit-
erature Review (SLR) to identify studies whose contribu-
tions were similar or equivalent to the research contribu-

1In the remaining of the paper, we use test data to refer to inputs auto-
matically generated.


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Table 1. Problems related to SOP.

Reference Year of publication Reported problem

(Cukic and
Bastani,
1996)

1996 Identifying the SOP is difficult because it requires predicting software usage.

(Leung, 1997) 1997 Estimation errors and SOP changes are inevitable when software is operated in
a production environment.

(Shukla,
2009)

2009 Studies related to SOP focus on exploring software operations. The parameters
of these operations are little explored.

(Sommerville,
2011)

2011 Software reliability depends on the context in which software will be used. Ex-
perienced users can constantly adapt their behavior regarding software usage.

(Namba et al.,
2015)

2015 SOP identification requires a lot of effort, making this activity difficult depend-
ing on the complexity of the software.

(Fukutake
et al., 2015)

2015 The probability of use decreases when the software usage model has multiple
states.

(Bertolino
et al., 2017)

2017 SOP-based testing can be saturated and lose effectiveness because it focuses
only on failures most likely to occur.

Figure 2. Adopted Research Methodology (extracted from Travassos et al. (2008))
.

tions reported in this article and, thus, evaluate its originality.
The results obtained from the SMS are available elsewhere
at http://lcvm.com.br/artigos/anexos/jserd2020/
cap-3-rs-ms.pdf. Also, a detailed description of the SLR
can be found elsewhere in (Cavamura Júnior et al., 2020). We
present a brief description of the main results of both SMS
and SLR in Section 4.

The “First Draft” stage comprised the planning of the ex-
perimental studies presented in this study. We adopted the
model proposed by the GQM (Basili et al., 2002)’s technique
to guide the planning of this research. The instantiated model
for the planning phase is presented in Table 2.

The “Feasibility Study”, “Observational Study” and “Case
Study: Lifecycle” stages comprised the accomplishment of
a set of EXS subdivided into four activities (AT) associated
with the research questions, called EXS−AT1, EXS−AT2,
EXS−AT3, and EXS−AT4. The purpose of each activity and
the research questions associated with each one of them are
summarized in Table 3.

To perform the EXS activities we instrumented four soft-
ware, S1, S2, S3 and S4, to collect data that allowed us to
identify the SOP for each individual user. Table 4 shows the
characterizations of used software and associates them with
the EXS activities.

http://lcvm.com.br/artigos/anexos/jserd2020/cap-3-rs-ms.pdf
http://lcvm.com.br/artigos/anexos/jserd2020/cap-3-rs-ms.pdf


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Table 2. Exploratory Study Planning.

Stage Analyze For the purpose of Focus Perspective Context

1
( RQ1 )

studies that
addressed the
use of SOP

Check whether
there are researches
for the same or
similar purposes

Answered base on
a previous work

Software test
researchers

Software applications
users

2
( RQ2, RQ3,
RQ4 , RQ5 )

the SOP (a) Check if there
are significant
variations;
(b) Check if there
is a misalignment;
(c) Show the occurrence
of failures;
(d) Check if the insertion
of additional test data,
generated automatically
by EvoSuite, can contribute
to reduce the misalignment

(a) The way software
is operated by
its users
(b) SOP and tested
software parts
(c) SOP’s parts
not tested
(d)SOP and tested
software parts

Software test
researchers

Software applications
users

Table 3. Research Activities.

Activity Purpose Question
SMS/SLR Evaluate research originality (Cavamura Júnior et al., 2020) RQ1
EXS−AT1 Check for relevant variations in how the users operate the software RQ2
EXS−AT2

Find out through the SOP and the software’s test suite whether there is a misalignment between
SOP and the tested parts of the software RQ3

EXS−AT3
Once we confirm the misalignment between SOP and the tested parts of the software, check if
there is any failure in the SOP’s parts not tested RQ4

EXS−AT4
Check whether a test strategy, based on the amplification of the existent test set with additional
test data automatically generated, can contribute to reducing the misalignment between the SOP
and the tested parts of the software

RQ5

The “Feasibility Study” stage comprised the accomplish-
ment of EXS−AT1. The “Observational Study” stage com-
prised the accomplishment of EXS−AT2, EXS−AT3, and
EXS−AT4 based on operational profiles collected from S1
and S2. The “Case Study: Lifecycle” stage comprised the ac-
complishment of EXS−AT2, EXS−AT3, and EXS−AT4 again
but based on operational profiles collected from S3 and S4.
The “Case Study: Industry” stage is in progress and its results
will be published in a future work.
Once the methodology was defined, this study was

planned in two stages to provide answers for the research
questions. The research questions associated with these
stages is shown in the “Stage” column of Table 2.

• Stage 1: performing an SMS and an SLR;
• Stage 2: performing the EXS composed of four activi-
ties: EXS−AT1, EXS−AT2, EXS−AT3, and EXS−AT4.

The focus of this paper is on Stage 2 of Table 2, i.e., the set
of EXS we performed to obtain evidence of the possible mis-
alignment between SOP and the tested software parts. The
other kinds of experiments were also carried out as part of
the ongoing work (Cavamura Júnior, 2017).
In Section 4, we present a brief description of the main

findings of the SLR. An interested reader can find more in-

formation elsewhere (Cavamura Júnior et al., 2020). In Sec-
tion 5, the EXS and their respective results are described.

4 Related Work
We conducted SMS and SLR (Stage 1 of Table 2) to pro-
vide the theoretical basis and evidence of the originality of
this study. The SMS process together with the SLR process
consist of the planning, conducting and results publishing
phases (Nakagawa et al., 2017). A detailed description of the
SMS, SLR and their respective detailed results can be found
at http://lcvm.com.br/artigos/anexos/jserd2020/
cap-3-rs-ms.pdf and at (Cavamura Júnior et al., 2020),
respectively.
We conducted a SMS to: i) verify how the distribution of

primary studies related to SOP in software engineering areas
is characterized; ii) acquire knowledge of the contributions
provided by the use of SOP in the areas of software engi-
neering, focusing on the software quality field. iii) check if
the use of SOP in quality assurance activities has been a topic
of interest to researchers.
The SMS found 4726 studies, of which we selected 182

for data extraction. The distribution of the primary studies
in software engineering areas is shown in Figure 3. After we

http://lcvm.com.br/artigos/anexos/jserd2020/cap-3-rs-ms.pdf
http://lcvm.com.br/artigos/anexos/jserd2020/cap-3-rs-ms.pdf


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Figure 3. Distribution of the studies in software engineering areas.

analyzed the extracted data, we concluded that software qual-
ity is the most explored area by studies that used SOP as a
resource in the strategies addressed in these studies. Most
of these strategies are associated with software reliability.
Although software quality is the most approached area, we
found some studies related to software testing. Thus, this sce-
nario evidences a gap in the software quality field, mainly in
its subareas that are not associated with software reliability.
Therefore, the results of the SMS motivated us on conducting
the SLR, whose purpose was to identify, analyze and under-
stand the studies whose contributions are similar or equiva-
lent to the contributions of the research reported in this pa-
per, i.e identify, analyze and understand the studies that used
SOP as evaluation criteria to check is there a possible mis-
aligned between SOP and tested software parts (Cavamura
Júnior et al., 2020).
At the end of the SLR (Cavamura Júnior et al., 2020), as

highlighted in Figure 4, we observed only three studies clos-
est to ours: Bertolino et al. (2017), Chen et al. (2001), and
Amrita and Yadav (2015), briefly described next. Figure 4
shows the number of processed studies by SLR. The inter-
ested reader may find additional information about the com-
plete SLR protocol elsewhere (Cavamura Júnior et al., 2020).
Bertolino et al. (2017) mention the test based on the op-

erational profile can suffer saturation and loss of effective-
ness since it focuses on the occurrence of most likely fail-
ures. Thus, to improve software reliability, the test should
also focus on faults with a low probability of occurrence.
In this context, Bertolino et al. (2017) present an adaptive
and iterative software testing technique based on SOP. In the
first iteration, the authors selected the test cases following a
traditional test based on operational profile, i.e., the authors
randomly selected the test cases according to the occurrence
probability of each partition of the software input domain un-
der test. In each subsequent iteration, the technique: a) cal-
culates the number of ideal test cases to be selected for each
partition, and; b) selects, prioritizes and executes the number
of test cases.
Bertolino et al. (2017) obtained a probability calculation to

represent how much the partition test will contribute to pro-

gram reliability. Based on this information, Bertolino et al.
(2017) determine the optimal amount of test cases for testing
each partition.
In this probability calculation, Bertolino et al. (2017) con-

sidered the failure rate and the occurrence probability of each
partition. The failure rate is the ratio of the number of failed
test cases and the number of test cases assigned to the par-
tition. Thus, Bertolino et al. (2017) obtained the occurrence
probability from SOP. To select and prioritize test cases, the
frequency with which the program parts are exercised when
running the tests is obtained from the previous iterations. As
the focus of Bertolino et al. (2017)’s approach is to select
test cases covering portions of the program that are poorly
exercised, test cases associated with the uncovered parts of
software have high priority.
We can determine software reliability by the time elapsed

between the detected faults. In this way, Chen et al. (2001)’s
technique considers the context in which a test suite can over-
estimate software reliability when it is not able to detect new
faults due to the use of an obsolete SOP. The more redundant
the test cases are about the covered code, the more overesti-
mated will be the reliability of the software. Thus, this tech-
nique adjusts the time interval between failures when run-
ning redundant test cases. Chen et al. (2001)’s identified the
redundant test cases through coverage analysis during the ex-
ecution of the tests.
According to Amrita and Yadav (2015), researchers have

approached the selection of test cases based on SOP, but the
authors did not find much discussion about the infrequent
software parts. Amrita and Yadav (2015) propose a model
that provides the flexibility to allocate test cases according
to the priority defined by SOP and by the experience of the
testing team. Based on this information, Amrita and Yadav
(2015)’s model selects test cases using fuzzy logic.
We observed that Bertolino et al. (2017), Amrita and Ya-

dav (2015) addressed the use of SOP in the selection and
prioritization of test cases, focusing on those software parts
whose operation is infrequent. Chen et al. (2001)’s study ad-
dressed the selection of test cases, using SOP to identify re-
dundant test cases and treat them in the process of software


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Figure 4. Processed studies by SLR.

reliability, and thus obtain more accurate reliability. Never-
theless, the studies identified and processed by SLR did not
directly investigate in their approaches whether there is a
misalignment between the existing test suite and SOP, thus
providing an answer to research question RQ1. We believe
the selection and prioritization activities will not be produc-
tive if we do not align test cases with SOP.

5 Experimental Studies (EXS)
The studies by Begel and Zimmermann (2014) and Rincon
(2011), briefly described in Section 1, provided initial ev-
idence about the possible misalignment between the tested
software parts and the SOP. We performed the EXS to obtain
empirical data that, after analyzed, could provide answers to
the research questions RQ2, RQ3, RQ4, and RQ5, thus re-
sulting in more evidence, based on experimental data, on the
possible misalignment between the tested software parts and
the SOP. As described in Section 3, we defined four activ-
ities for the EXS, named EXS−AT1, EXS−AT2, EXS−AT3,
and EXS−AT4. In order to perform these activities, we instru-
mented four software, S1, S2, S3 and S4, to collect data that
allowed to identify the SOP for each software during its op-
eration by users. S1, S2, S3 and S4 were implemented under
the object-oriented programming paradigm. A characteriza-
tion of the software used and their association to the activities
of the EXS is presented in Table 4.
During these activities, users had to perform tasks at a

given period when they were operating S1, S2, S3, and S4.
Thus, we collected data automatically in an attempt to ob-
tain the operational profile of the software used. In the fol-
lowing subsections, we describe the strategy adopted for the
data collection, the activities of the EXS, and their results.

5.1 Strategy for data collection
In each activity, we instrumented the S1, S2, S3, and S4
software to collect data during their operation by the users
participating in the activity. We adopted aspect-oriented pro-
gramming (Ferrari et al., 2013; Laddad, 2009; Rocha, 2005),
which allows us to obtain information and to manipulate spe-
cific software parts without modifying the implementation

of the S1, S2, and S3. For S4, we developed a monitoring
tool using the javassist framework. The javassist allows for
the manipulation of Java bytecode. This feature allowed us
to monitor S4 execution and collect S4 information while
participants were operating it. Although the aspect-oriented
paradigm makes it possible to perform the instrumentation
without modifying the source code of the software, it requires
the created aspects to be compiled together with the software
for instrumentation. Javaassist was adopted to perform the
instrumentation without having to compile the software that
is to be instrumented.
We defined the strategy for data collection and applied it

at the subprogram level. The developed tool and the instru-
mentation collect information about the methods execution
of S1, S2, S3, and S4’. From that information, we obtained
the execution frequency of the processed methods during the
S1, S2, S3, and S4 software execution in the activities.

5.2 EXS–AT1: Evaluating the variation in how
software is operated by users

We performed the EXS−AT1 activity to evidence whether
there are relevant variations in how users operate the soft-
ware to carry out the same task. To measure this variation, we
obtained the SOP used in this activity for each user through
data coming from the instrumented S1 software.
In order to reduce the risks associated with the threats

to validity of the activity, 30 undergraduate students of the
Computer Science and Computer Engineering courses par-
ticipated in this activity. These participants had equivalent
experience and knowledge. We trained the participants in an
attempt to make them familiar with S1 and the concepts in-
volved with its use. Additionally, we assigned the same task
to the participants in this activity. We assigned to each par-
ticipant the task of inspecting the Java source code of S1
Project, named Software Under Inspection (SUI), consider-
ing an object-oriented paradigm. We set a time limit for par-
ticipants to complete the task. The tasks performed within
the defined time period were considered successfully com-
pleted. Thus, data obtained from all participants were used
in the activity.


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Table 4. Characterization of the software used in the EXS.

Software Purpose Source Methods Test cases Origin of test cases Usage

S1
Provide software

inspection support (Crista) closed source 2749 716 computational tool
EXS−AT1,
EXS−AT2

S2
Bibliographic reference
management (JabRef ) open source 7100 514 Community

EXS−AT2,
EXS−AT3,
EXS−AT4

S3
Process Automation

(developed on demand) closed source 869 351 Test team EXS−AT2

S4
CASE tool
(ArgoUml) open source 18099 2272 Community

EXS−AT2,
EXS−AT3,
EXS−AT4

We stored the data collected by the instrumentation of S1
and we, subsequently, analyzed it. Through this data, we
identified the SOP of each participant. It is worth noting that
all participants have the same goal and artifacts to conclude
the task. In the following subsections, we describe the anal-
ysis of the collected data and the results obtained by the ac-
tivity EXS−AT1.

5.2.1 EXS–AT1: Data Analysis

We grouped the data collected by the EXS −AT1 activity
according to the participant who originated them; that is,
for each participant, we obtained and recorded information
about the execution of the S1 methods, allowing to compute
the execution frequency of the methods.
To identify the variations in how users operate S1, we cre-

ated a representation of the operational profile of S1 for each
participant. Each representation corresponds to a homoge-
neous one-dimensional data structure that recorded the exe-
cution frequency of each method in S1 for each participant
during the execution of the task. The structure elements rep-
resent the methods implemented in S1, regardless of whether
they were executed during the activity or not. Thus, each
structure was composed of 2749 elements corresponding to
the 2749 methods implemented in S1 (Table 4). For each
of these elements, we assigned the execution frequency of
the method when performing the activity. For non-executed
methods, we assigned the numeric value 0. Figure 5 presents
a graphical representation of the data structure correspond-
ing to a part of the S1 profile. We show some elements (M1,
M2, M3, ..., M2749). Each element corresponds to an imple-
mented method of S1. The number in the cells represents the
execution frequency of a given method for a given partic-
ipant after concluding an activity. Thus, according to Fig-
ure 5, four methods (M1, M3, M2748, and M2749) were not
executed during the activity, while the remaining ones (M2,
M100, M101, and M102) were executed 500, 10000, 15725
and 87000 times, respectively.
As the variations in how users operate S1 depends on the

processed volume, the processed volume for each partici-
pant was measured. The S1 software is a computational tool
that provides support for the inspection activity of source
code based on the stepwise abstraction reading technique.

The purpose of the stepwise abstraction reading technique
is to determine the program’s functionality according to the
functional abstractions generated by the source code (Linger
et al., 1979).

The S1 software analyzes the SUI and, for each class, gen-
erates a treemap visual metaphor providing a simple mode to
visualize the source code. The code blocks are represented
by rectangles disposed hierarchically. These rectangles are
named declarations on the tool context. When a declara-
tion is selected the respective source code is shown to make
the inspection and to register the functional abstraction for
that declaration. A functional abstraction is an annotation
inserted by S1 user that represents the pseudo-code with re-
spect to the selected declaration.

During the S1 operation, for each inspected class the S1
user assigns a functional abstraction for each declaration
identified by the tool in the class, identifying that the dec-
laration was inspected. The discrepancies found during the
inspection process are recorded in a similar manner in the
tool, i.e., assigning the discrepancy to the declaration. Fig-
ure 6 shows an S1 user interface during a class inspection.

S1 provided metrics that allowed us to measure the pro-
cessing volume generated by each participant. In this ac-
tivity, the volume of processing corresponds to the number
of functional abstractions attributed to each class that struc-
turally composes the SUI as well as to the number of discrep-
ancies found in each class. Thus, it was possible to determine
which classes and how much of each class were inspected by
each participant. It should be noted that the same tool con-
figuration parameters were applied to all participants.

In an attempt to obtain homogeneity in the processing vol-
ume generated by each participant, we grouped them accord-
ing to the generated processing volume. An indicator was
calculated to represent the processing volume generated by
each participant. The indicator corresponds to the ratio be-
tween the sum of abstractions and discrepancies of all classes
of one participant by the sum of declarations of all classes.
For instance, the total of inspected software declarations was
1526. Among the participants, the largest amount of the func-
tional abstractions and discrepancies registered by one par-


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Figure 5. Graphical representation of the data structure.

Figure 6. S1 user interface.

ticipant was 284. For this participant the indicator value was
0.186 (284/1526).
The corresponding calculated indicator classified partici-

pants. This classification allowed us to identify 3 groups of
participants with similar indicator value. In other words, we
assigned participants who demanded similar processing vol-
ume and resulted in the same group. In Table 5 we show the
created groups.

Table 5. Groups of participants in activity EXS−AT1 .

Group Participants
A P10, P11, P12, P13, P30

B P4, P5, P6, P7, P8, P9, P24, P25, P26, P27, P28, P29

C
P1, P2, P3, P15, P16, P17, P18, P19, P20, P21, P22,
P23, P14

According to Table 5, 30 individuals participated in the ex-
periment. Group A comprises the data obtained by 5 partici-
pants; group B compiles the data obtained by 12 participants,
and group C compiles the data obtained by 13 participants.

We compared the representations of the operational profile
of S1 to highlight the variations concerning how the users op-
erate the software. This comparison is possible through the
data structures corresponding to these representations. Thus,
we considered the same group of participants when we per-
formed this comparison.

As previously described, homogeneous one-dimensional
data structures were used to generate the operational pro-
file representations of S1. The elements that constitute these
data structures represent the methods implemented in S1, and
their stored values correspond to the execution frequency.
As the number of elements and their association to the meth-
ods of S1 are common to these structures, we compared the
data stored in them, that is, the execution frequency of each
method of S1. We compared each element of a data structure
to the corresponding element of a different data structure.
Thus, each representation contained in a group was com-
pared with all other representations contained in the same
group. As an example, we compared the representation of the
S1 operational profile generated by the data collected by par-
ticipant P 10 to the ones generated by the participants P 11,
P 12, P 13 and P 30 (Table 5).


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

We defined an indicator to measure the variations in the
execution frequency of each method among the representa-
tions. The value of this indicator ranges from 0 to 1. The
value of this indicator represents the difference between the
method execution frequency, stored in an element of one rep-
resentation, with the method execution frequency, stored in
the respective element in another representation. The indi-
cator is calculated for each comparison made between the
elements of one representation with the respective elements
of another representation. The indicator value corresponds to
the ratio between the difference resulting from the compared
frequencies and the highest compared frequency.
In Figure 7 we illustrate the systematic approach to com-

pare the representations of the operational profile of S1.
Figure 7 evidences that: a) the closer to 1 is the value of

the indicator, the higher the difference between the execu-
tion frequencies of the evaluated method; b) the closer to 0
the value of the indicator is, the lower the difference between
the execution frequencies of the evaluated method. Indica-
tors whose value was equal to 0 denote the participants did
not execute that particular method during the accomplish-
ment of the activity. Indicators whose values were equal to 1
denote methods executed by only one participant during the
activity.
Table 6 shows the results of the comparison between the

operational profile of S1 for each participant of Group A.

Table 6. Comparison among participants in group A .

ID P-1 P-2 DMF IM
01 P10 P11 59 0.37
02 P10 P12 42 0.47
03 P10 P13 39 0.62
04 P10 P30 77 0.53
05 P11 P12 45 0.59
06 P11 P13 80 0.56
07 P11 P30 68 0.65
08 P12 P13 73 0.51
09 P12 P30 57 0.38
10 P13 P30 92 0.43

The value in the column “ID” in Table 6 corresponds to a
comparison identification made between two representations
of the operational profile of S1. The values in the columns
“P-1” e “P-2” refer to the identification of the participants
whose collected data gave rise to the representations of the
operational profile of S1. The value contained in the “DMF”
column refers to the number of methods whose indicator
value was equal to 1. The values in column “IM” refer to
the average value of the indicators originated by the differ-
ences between the execution frequencies recorded in the rep-
resentations of the operational profile (Figure 7). As an ex-
ample, the result obtained from the comparison between the
representations of the operational profile of S1 obtained from
participants P 12 and P 13 (line 08 of Table 6) indicates that
73 methods were performed only by one of the participants,
P 12 or P 13. The results of the comparisons also indicate
that, on average, the execution frequency of the methods dif-
fers by 0.51 for the compared participants, i.e., the frequency

of these methods is approximately 50% higher for one of the
participants.
We created a graphical representation to facilitate the dis-

tinction in the operational profile, considering two different
participants. As an example, in Figure 8 we illustrate the re-
sults from the comparison of the operational profile represen-
tations obtained from P 12 and P 13. In the graphical repre-
sentation, each array element represents a method. The infor-
mation displayed in each element refers to the value obtained
for the indicator which quantifies the variation between the
execution frequencies of the represented method. Methods
whose value is one (1) were registered in only one of the op-
erational profile representations of S1 (cells painted black in
the graphic representation illustrated by Figure 8). The meth-
ods whose value obtained by the indicator was between 0.5
(inclusive) and 1 (exclusive) were painted gray in the graphic
representations shown in Figure 8. The other methods whose
value obtained for the indicator were below 0.5 were painted
white in the graphic representation shown by Figure 8.

5.2.2 EXS–AT1: Results

We verified significant differences in the execution fre-
quency of methods for S1 when the participants were op-
erating it. The methods not executed during the activity also
had a significant difference between participants. The aver-
age value of the indicator used to measure the variations in
the execution frequencies of each method was 0.51 for par-
ticipants of Group A. For this same group, the average value
in the number of methods whose execution was registered in
only one of the representations of the comparisons was 63.2.
These averages for the participants of Group B and Group
C were, respectively, 0.5/66.19 and 0.57/43.75. Given the
EXS–AT1 results, significant variations were verified among
the representations of operational profiles, thus providing an
answer to research question RQ2.

5.3 EXS–AT2: SOP vs. Test Profile
We performed the EXS−AT2 activity to obtain evidence of the
possible misalignment between SOP and the tested software
parts. In an attempt to verify a misalignment between SOP
and the tested software parts, we evaluated the operational
profile of S1, S2, S3, and S4, along with their test suites.
We obtained the operational profile of S1 during EXS−AT1.
The same procedure we performed to identify the operational
profile of S1 we also applied for S2, S3, and S4. As stated
in Session 5.1, we instrumented S2 and S3 to collect data
when users operated the software. These data allowed us to
identify the SOP of S2 and S3. The operational profile of the
S4 software was identified with use of a tool to monitor S4
execution.
Undergraduate students of the Technology in Analysis and

Development Systems course participated in the activity as
S2 users. Thus, we trained the participants, who had equiv-
alent experience and knowledge, to use S2. We repeated the
same process above, but now with Postgraduate students of
the Web Software Development course, who also had equiv-
alent experience and knowledge to participate in the activity
as S4 users. In addition, public servants participated in the ac-


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Figure 7. Systematic to measure variances.

Figure 8. Differences between the P 12 and P 13 representations.

tivity as S3 users performing their daily tasks using the soft-
ware features. The task assigned to S2 users was to operate
S2 to record 10 bibliography references. The task assigned to
S4 users was to operate S4 to create a class diagram from a
given software requirement specification. We set a time limit
for the S2, S3, and S4 users to perform the task. The tasks per-
formed within the defined period were considered success-
fully completed, thus data obtained of all participants were
used in the activities. S2, S3, and S4 users obtained similar
performance and results in their respective performed tasks.

In addition to the data that identified the SOP of S1, S2,
S3, and S4, we collected data about the test suite execution of
these software to obtain evidence of the mismatch between
SOP and the tested software parts. The same procedure used
to collect the data that provided the SOP was used to collect
data during the execution of the test suites. These data al-
lowed us to obtain the test profile of S1, S2, S3, and S4. We
defined the term “test profile” in this paper as the software
parts executed after the test suite run.

Note that the test cases of the used software had different
origins (as shown in Table 4). We established this charac-
teristic to allow the analysis of SOP with test cases defined
and created based on different strategies. We compared the
test profile of S1, S2, S3, and S4 software to the operational
profile of the respective software to verify the mismatch be-
tween the SOP and the tested software parts. In the following
section, we describe the data analysis and the results of the
data obtained from these comparisons.

5.3.1 EXS–AT2: Data Analysis

We compared the test profile of S1, S2, S3, and S4 to the
operating profiles of the respective software in an attempt to
find the possible mismatch between SOP and the test profile.
As we already described, in the context of this paper, SOP
is determined by the frequency of methods execution. We
classified the methods implemented in S1, S2, S3, and S4
based on their processing in SOP and the test profile. Thus,
four classification categories are possible:


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

• Category 0: method not executed in SOP and not exe-
cuted by the test profile;

• Category 1: method executed in SOP (by at least 1 par-
ticipant) but not executed in the test profile;

• Category 2: method not executed in SOP but executed
by the test profile;

• Category 3: method executed in both operational and
test profiles.

As an example, in Figure 9 we show a fraction of the clas-
sification table of the methods implemented in S1. In this ex-
ample, the test profile (0) is compared to the operational pro-
files of participants 0, 1 and 2. We also classified the methods
implemented in S2, S3 and S4, generating a classification
table for each software. The complete tables are available
at http://lcvm.com.br/artigos/anexos/jserd2020/
tabelas/.
In Figure 9 we show the classification table of S1’ meth-

ods. For each method, we assigned a classification category
resulting from the comparison between SOP and the test pro-
file of S1. The columns “Participant OP Id./Test Profile”,
“CL” and “FREQ” refer, respectively, to:

a) the operational profile obtained by participants com-
pared to the test profile. The line below column title in-
forms the compared participant and the test profile;

b) the classification category assigned to the method;
c) the difference between the execution frequencies ob-

tained in the operational profile of the participant and
the test profile.

In Figure 10 we show the results from the comparison be-
tween SOP and the test profile for each evaluated software
(S1, S2, S3, and S4).
For each evaluated software (S1, S2, S3 and S4) shown in

Figure 10, the following information is provided:

• OP ∩ T P : Number of methods processed by at least 1
participant and processed by the test profile.

• OP ̸⊂ T P : Number of methods processed by at least 1
participant and not processed by the test profile.

• T P ̸⊂ OP : Number of methods processed by the test
profile and not processed by the participants.

The results show that:

a) 131 out of 280 methods from S1 processed by at least
1 of the participants were not processed by the test pro-
file; 30 methods processed by the test profile were not
processed by the participants;

b) 313 out of 1308 methods from S2 processed by at least
1 of the participants were not processed by the test pro-
file; 1340 methods processed by the test profile were not
processed by the participants;

c) 203 out of 437 methods from S3 processed by at least
1 of the participants were not processed by the test pro-
file; 134 methods processed by the test profile were not
processed by the participants.

d) 4743 out of 8910 methods from S4 processed by at least
1 of the participants were not processed by the test pro-
file; 1319 methods processed by the test profile were not
processed by the participants.

5.3.2 EXS–AT2: Results

For the S1, S3 and S4 software, approximately 50% of the
methods processed by SOP were not processed by the test
profile. The S2’s methods processed by SOP and not pro-
cessed by the test profile correspond to approximately 25%.
It is also possible to verify the occurrence of methods pro-
cessed by the test profile and not processed by SOP for S1,
S2, S3 and S4. For S2, the number of methods processed
by the test profile and not processed by SOP corresponds to
approximately 30%. The results show a mismatch between
SOP and the test profile for S1, S2, S3 and S4. According to
Rincon (2011), only one open-source software among the ten
open-source software researched by him obtained a coverage
code between 70 and 80%. If we considered this interval ac-
ceptable, in the best case, we are delivering the software with
20% to 30% of the source code not having been executed dur-
ing the testing phase. According to Ivanković et al. (2019),
the median code coverage for all Google projects with suc-
cessful coverage computation in the period between 2015
and 2018 varied between 80 and 85%, i.e., an interval be-
tween 15 and 20% of the uncovered code. Thus, even if we
consider acceptable a percentage range for the misalignment
between the SOP and the test profile that equals the range of
uncovered code shown by Rincon (2011) and Ivanković et al.
(2019), i.e., between 15 and 30%, the results obtained from
EXS−AT2 for S1, S3 and S4 are greater than that considered
an acceptable range when the methods processed by SOP and
not processed by the test profile. For S2, the obtained result
is equal to the acceptable range considered when it comes to
the methods processed in the test profile and not processed
by SOP. These results show that there may be a misalign-
ment between the SOP and tested software parts, providing
an answer to question RQ3.

5.4 EXS–AT3: Failures in untested SOP parts
Bach et al. (2017) investigated the relationship between the
coverage provided by a test suite and its effectiveness. The
approach adopted in Bach et al. (2017) can also be used as
another strategy to get evidence of the possible mismatch be-
tween SOP and the tested software parts, as well as the re-
lation between this misalignment and software faults. The
approach used in Bach et al. (2017) defines two scenarios
referring to the hypothesis investigated:

1. Coverage does not influence the detection of future
bugs;

2. A high coverage rate can reduce the volume of future
bugs.

Bach et al. (2017) analyzed identified faults using the fail-
ures reported by software users and the relation of the data
obtained by this analysis to the coverage provided by the test
suite of the respective software.
In the context of this paper, we assumed that the failures

reported by software users occurred in software parts that
constitute the SOP since such failures occur during the oper-
ation of the software by users. As such, the modified software
parts resulting from fault corrections constitute the SOP and
denote the occurrence of failures in the software parts that

http://lcvm.com.br/artigos/anexos/jserd2020/tabelas/
http://lcvm.com.br/artigos/anexos/jserd2020/tabelas/


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Figure 9. Classification of software S1 methods.

Figure 10. Results.

comprise the operational profile. Given these considerations,
the activity EXS−AT3 serves to verify:

1. If the misalignment between SOP and the tested soft-
ware parts is relevant to software quality (faults not pro-
cessed by the test profile do occur in SOP parts);

2. Although there is a misalignment between SOP and the
tested software parts, this misalignment is irrelevant to
software quality (no faults were registered in SOP parts
not executed by the test profile).

We verified the fault history of S2 and S4. S2 and S4 are
open-source software, and their source code is available on
a hosting platform providing resources to manage modifica-
tions in the source code.

5.4.1 Analyzing failures in the untested SOP of S2

By means of a pull request, we verified the changes in the
S2’s source code classified as bug fix. This verification al-
lowed us to identify the S2’s methods modified for attending
a bug fix. We identified 79 methods that have corrections
of faults identified by failures reported by users. As we as-
sumed, these methods compose the SOP identified through

data provided by the software community (bug reports),
named SOPsup in this section. We compared the methods
comprising SOPsup to the methods processed by S2’s test
profile, identified in EXS−AT2. We found that the test pro-
file did not execute 49 out of the 79 methods constituting
the SOPsup, i.e., SOPsup parts not covered by the test suite
where we identified faults.

SOPsup is based on the assumption that the methods cor-
rected due to failures reported by the community constitute
the SOP. Thus, these failures were not generated by the spo-
radic actions of users. Based on this assumption, we veri-
fied if the SOPsup methods not processed by the test profile
were contained in the SOP obtained by EXS−AT2 partici-
pants. Among these methods, 7 methods were found in the
SOP obtained by EXS−AT2 participants. These 7 methods
were classified as SOP methods not processed by the test
profile. This indicates that, possibly, if the approach used in
the activity is applied to the SOP obtained from the real users
in a real scenario, the 7 methods contended in SOPsup, i.e.,
methods presenting defects, would be found and classified as
methods in SOP and continue untested. Thus, the approach
applied in EXS−AT2 improves new releases of the test suite
since it identifies untested and faulty parts of the SOP.

5.4.2 Analyzing failures in the untested SOP parts of S4

Unlike the procedure adopted to identify the SOPsup of S2,
we obtained the SOPsup’s methods of S4 from a bug report
available in its official website. For the bugs reported an er-
ror log was associated.By utilizing these error logs we could
identify 15 methods that revealed failures during their exe-
cution. These methods comprise the SOPsup of the S4.
As with S2, we compared the SOPsup of S4 to their test

profile identified in EXS−AT2. We found that the test pro-
file did not execute 5 out of the 15 methods constituting the
SOPsup, i.e., SOPsup parts not covered by the test suite
where faults were identified.

5.4.3 EXS–AT3: Results

Table 7 summarizes the data obtained from S2 and S4 about
existing failures in untested SOP parts.
The investigation performed in EXS−AT2 provided evi-

dence of a mismatch between SOP and the tested software
parts, and that failures occur in SOP parts left untested. For
S2, 62.02% of the SOPsup parts in which faults identified


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Table 7. S2 and S4 SOPsup parts in which faults were identified.

SOPsup parts in which faults were identified

Software
Identified methods

with faults
Identified methods
not covered by test

S2 79 49
S4 15 5

were not covered by the test profile. For S4 the respective
value was 33.33%. This evidence answers research question
RQ4, showing that failures may occur in SOP parts not cov-
ered by the test profile.

5.5 EXS–AT4: Attempting to decrease the mis-
alignment between the SOP and the Test
Profile

We performed the EXS−AT4 activity to assess whether a test
strategy based on the use of automated test data generator
can contribute to reduce the possible misalignment between
SOP and untested software parts.
To perform the EXS−AT4 activity, we selected S2 and S4

software. The reasons why we selected these software are
because we used them in EXS−AT2 and EXS−AT3 and be-
cause they are more representative regarding the number of
implemented methods.
For each selected software, we generated a test set using

an automated tool, named in this section as S2TCtool and
S4TCtool for S2 and S4 software, respectively. The sets of
existing test cases for S2 and S4 are named in this section
as S2TCexis and S4TCexis (Table 4). We used EvoSuite,
an automated generation tool, to write JUnit tests for Java
software (Fraser and Arcuri, 2011). For the generation of
S2TCtool and S4TCtool, among the coverage criteria made
available by the test generation tool, we adopted the cover-
age criterion method, given that SOP is represented by the
execution frequency of the implemented methods in this pa-
per. For S2 and S4 were generated 4322 and 2803 test cases
respectively. We did not use SOP data in the planning and ex-
ecution of EXS−AT4 test strategy, considering that the SOP
was unknown for the generation of S2TCtool and S4TCtool.
Then, we generated automated test cases for all S2 and S4
parts.
We incorporated the S2TCtool and S4TCtool test cases into

S2TCexis and S4TCexis respectively, thus obtaining an ex-
tended test set resulted for S2 and S4 from the union of
these sets. We named the extended test sets of S2 and S4 as
S2TCext and S4TCext, respectively, in this section. In Table
8 we show the coverage for S2 and S4 provided by each set
of test cases. The numeric values in percentage are presented
in Table 8.

Table 8. S2 and S4 software coverage provided by test cases.

Coverage provided by test cases
Software TCexis TCtool TCext

S2 15% 27% 30%
S4 32% 42% 60%

In Table 8 we show that the S2TCext and S4TCext test
cases increased the coverage of S2 and S4 provided by

S2TCexis and S4TCexis respectively, showing that new parts
of S2 and S4 were tested and, consequently, extending the
S2 and S4 test profiles. We named the initial test profiles ob-
tained from S2TCexis and S4TCexis as S2TPini and S4TPini
in this section. Also, we named the extended test profiles of
S2 and S4 in this section as S2TPext and S4TPext, respec-
tively.
We adopted the same procedure to identify the S2TPini

and S4TPini, described in Section 5.3, to obtain S2TPext and
S4TPext.
The same procedure used to compare the S2TPini and

S4TPini to the S2’s SOP and S4’s SOP respectively was used
to compare the S2TPext and S4TPext to the S2’s SOP and
S4’s SOP respectively.

5.5.1 EXS–AT4: Data Analysis

In Figures 11 and 12 we show, for S2 and S4 respectively,
the results obtained from the comparison between the SOP
and the extended test profile. Results obtained by comparing
the SOP of these software to the initial test profiles (S2TPini
and S4TPini) are presented again in Figures 11 and 12 to
compare them with the results obtained from the S2TPext
and S4TPext.

Figure 11. S2TPini and S2TPext results.

Figure 12. S4TPini and S4TPext results.

We defined the categories OP ∩ T P , OP ̸⊂ T P and
T P ̸⊂ OP , shown in Figures 11 and 12, in Section 5.3.1
In Figures 11 and 12, we can see that:

1. 143 out of 1328 methods from S2 processed by at least
1 of the participants were not processed by the TPext;


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

2524 methods processed by the test profile were not pro-
cessed by the participants.

2. 4189 out of 8910 methods from S4 processed by at least
1 of the participants were not processed by the TPext;
2977 methods processed by the test profile were not pro-
cessed by the participants.

5.5.2 EXS–AT4: Results

In Table 9 we show the difference resulted from TPini and
TPext.
After comparing the results obtained by S2TPini and

S4TPini, the test strategy we adopted in activity EXS−AT4
reduced the number of methods processed by SOP and not
processed by the test profile (S2TPext and S4TPext), being
more effective for the S2 software. However, it is notewor-
thy that, regarding the number of implemented methods, S2
is less representative than S4, for which the adopted strat-
egy reduced the amount of methods processed by SOP and
not processed by the test profile (S4TPext) in, approximately,
10% compared to the initial test profile (S4TPini).
Theadoptedteststrategyalsoreducedthenumberofmeth-

ods constituting the SOPsup of S2 and S4 and were not cov-
ered by the respective test profile, S2TPini and S4TPini. For
S2, 2 out of 49 methods constituting the SOPsup and were
not processed by S2TPini were processed by S2TPext. For
S4, 1 out of 5 methods constituting the SOPsup and were not
processed by S4TPini was processed by S4TPext.
The adopted test strategy aimed to reduce the misalign-

ment between SOP and Test Profile by increasing the set of
existing test cases of S2 and S4 using an automated tool. We
did not use SOP data in the test strategy planning and exe-
cution, considering that the SOP was unknown for the auto-
matic generation of test cases, which implied generating test
cases for all parts of S2 and S4, demanding time and process-
ing because they depend on the applied criteria and parame-
ters as well as on the size of the software for which the test
cases were generated.
In response to question RQ5, we observed that, although

we generated test cases for all parts of S2 and S4 and incor-
porated these cases into the set of existing test cases for the
software, the test strategy reduced the misalignment, but the
misalignment between SOP and the test profile of S2 and S4
was unavoidable. In addition, the automated test generator
generates only the test data and assumes the produced output
is correct. As such, even if we have improved the coverage
of SOP, we still need to verify whether the resultant output
corresponds to the expected output according to the software
specification. Thus, the data obtained from the SOP is rele-
vant and can be used in existing testing strategies or in the
definition of new strategies to contribute to their effective-
ness and efficiency.

6 Lessons Learned
First of all, we would like to make it clear that the results
obtained so far are not conclusive and they are part of an on-
going work Cavamura Júnior (2017), and more experimental
studies are coming. However, based on the data presented

in Section 5, we can provide some directions (albeit not ex-
haustive) on how to use the knowledge about SOP in favor
of software quality.

• We verified during the experimental studies that the
identification of SOP through instrumentation may af-
fect software performance and produce a huge vol-
ume of data depending on the level of fragmentation
adopted. Nevertheless, the information obtained about
the SOP can contribute to software test activities.

• High levels of coverage do not necessarily indicate a
test set is effective in detecting faults and it is unlikely
that the use of a fixed value of coverage as a quality tar-
get will produce an effective test set (Inozemtseva and
Holmes,2014). Ourdataindicatesthatagoodtestset is
one with good coverage of the software parts related to
the SOP. In the occurrence of misalignment between the
SOP and the tested software parts, the SOP can also be
used as a criterion for generating test cases to improve
the test suite in order to minimize the misalignment.

• Another possible use of the SOP is related to
what de Andrade Freitas et al. (2016) called as “Market
Vulnerability”, wherein each fault in software affects
users differently. We should avoid bothering most
of our users with constant failures as much as possi-
ble when using features most important from their point
of view. The SOP reflects these software areas. It
is possible to use SOP to assess the impact caused by
each fault in software operability. Thus, a rank of known
faults can be built based on their impact to the majority
of users, providing information able to assist in preci-
fying these faults with respect to the software market.

• Since the SOP represents the most used parts of the soft-
ware, information about the SOP can be used as a cri-
terion to prioritize any other activities inherent to the
software development process.

7 Threats to Validity
Regarding the EXS activities, we considered the participants’
level of knowledge in EXS a threat to validity. We selected
undergraduate and postgraduate students, who had equiva-
lent experience and knowledge required to perform the ac-
tivity, to operate S1, S2 and S4 software in order to minimize
the risks. We conducted training on S1, S2 and S4, as well
as a review of the theoretical concepts inherent in S1, S2 and
S4. As S3 was developed on demand, participants already
knew the processes automated by it.
On EXS−AT2 the execution of some test cases belonging to

the test sets of S1, S2 and S4 run with errors. For S1 0.69% of
the automatically generated test cases finished the execution
with errors. For S2 1.36% of the automatically generated test
cases finished the execution with errors. For S4 17.10% of
the automatically generated test cases finished the execution
with errors.
With the configuration and execution environment in con-

formity, we chose not to modify the implementation of the
existing test cases in order to eliminate the execution errors.


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Table 9. Comparison of the results obtained by the test profiles.

TCcomm S2 S4
– SOP vs S2TPini SOP vs S2TPext (%) SOP vs S4TPini SOP vs S4TPext (%)

OP ∩ T P 995 1185 19.09 (+) 4167 4721 13.29 (+)
OP ̸⊂ T P 333 143 57.05 (-) 4743 4189 11.68 (-)
T P ̸⊂ OP 1340 2524 88.35 (+) 1319 2977 125.7 (+)

We considered these a threat to validity because some meth-
ods may have been executed as a result of these errors, thus
not being part of the test profile.
On EXS−AT3 activity, we assumed failures reported by

users were revealed by the software parts composing SOP,
i.e., these failures did not occur in operations sporadically
processed by users. We are performing a more comprehen-
sive EXS using data obtained from free software repositories.
On EXS−AT4 the execution of some test cases automat-

ically generated for S2 (S2TCtool) and S4 (S4TCtool) ren-
dered errors. For S2, 4.2% of the automatically generated test
cases generated errors during their execution. For S4 0.53%
of the automatically generated test cases generated errors.
Although these errors have low representativeness, they are
considered a threat to validity since some methods may have
been executed as a result of these errors, thus not being part
of the extended test profiles of S2TPext and S4TPext, respec-
tively.
In further experiments we intend to investigate the cause

of such errors and compute their impact on the test profile.

8 Conclusions
This paper investigates the possible mismatch between SOP
and the tested software parts by introducing the term “test
profile”. The results provided answers to the defined research
questions, stating: a) the originality of this study; b) that there
aresignificantvariationsinthewaysoftwareisusedbyusers;
c) there may exist a misalignment between the SOP and the
test profile; d) the existing misalignment is relevant due to
the evidence that failures occur in the untested SOP parts; e)
Although the adopted test strategy reduced the misalignment
between the SOP and test profile, it was not enough to avoid
the misalignment.
The answers to the research questions provide the ex-

pected contributions to this work. These contributions may
motivate new research or contribute to existing research in
Software Engineering, more specifically in the field of Soft-
ware Quality. The contributions also show that information
about software operating profiles can contribute to the soft-
ware quality activities applied in the industry since the qual-
ity of software also depends on its operational use (Cukic and
Bastani, 1996).
Thus, the contributions provide evidence that SOP is rele-

vant not only to activities that determine software reliability
but also to the planning and execution of the test activity re-
gardless of the adopted test strategy. For future research we
intend to improve software quality from the users’ point of
view considering the SOP (Cavamura Júnior, 2019).
We expect that the proposed strategy allows: (i) to dynam-

ically adapt an existing test suite to the SOP, and; (ii) use

SOP as a prioritization criterion which, given a set of faults,
allows to identify the ones that cause the most significant im-
pact on users’ experience when operating the software, and
thus consider such impact on pricing the faults for correction,
alongside other criteria. We are investigating and approach-
ing the use of machine learning and genetic algorithms to
enable the proposed strategy. Lastly, we are working on the
implementation of a tool to automate the proposed strategy
and to provide support for technology transfer and experi-
mentation.

References

Ali-Shahid, M. M. and Sulaiman, S. (2015). Improving
reliability using software operational profile and testing
profile. In 2015 International Conference on Computer,
Communications, and Control Technology (I4CT), pages
384–388. IEEE Press.

Amrita and Yadav, D. K. (2015). A novel method for allo-
cating software test cases. In 3rd International Conference
on Recent Trends in Computing 2015 (ICRTC-2015), vol-
ume 57, pages 131 – 138, Delhi, India. Elsevier.

Assesc, F. (2012). Propriedade intelectual e software - cursos
de mídia eletrônica e sistema de informação.

Bach, T., Andrzejak, A., Pannemans, R., and Lo, D. (2017).
The impact of coverage on bug density in a large indus-
trial software project. In 2017 ACM/IEEE International
Symposium on Empirical Software Engineering and
Measurement (ESEM), pages 307–313.

Basili, V. R., Caldiera, G., and Rombach, D. H. (2002).
Encyclopedia of Software Engineering, volume 1, chap-
ter The Goal Question Metric Approach, pages 528–532.
John Wiley Sons.

Begel, A. and Zimmermann, T. (2014). Analyze this! 145
questions for data scientists in software engineering. ICSE
2014, pages 12–23.

Bertolino, A., Miranda, B., Pietrantuono, R., and Russo, S.
(2017). Adaptive coverage and operational profile-based
testing for reliability improvement. In Proceedings of the
39th International Conference on Software Engineering,
ICSE ’17, pages 541–551, Piscataway, NJ, USA. IEEE
Press.

Bittanti, S., Bolzern, P., and Scattolini, R. (1988). An
introduction to software reliability modelling, chapter 12,
pages 43–67. Springer Berlin Heidelberg, Berlin, Heidel-
berg.

Cavamura Júnior, L. (2017). Impact of The Use of
Operational Profile on Software Engineering Activities.
Phd thesis, Computing Department – Federal University


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

of São Carlos, São Carlos, SP, Brazil. On going PhD
Project (in Portuguese).

Cavamura Júnior, L. (2019). Operational profile and soft-
ware testing: Aligning user interest and test strategy.
In 2019 12th IEEE Conference on Software Testing,
Validation and Verification (ICST), pages 492–494.

Cavamura Júnior, L., Fabbri, S. C. P. F., and Vincenzi,
A. M. R. (2020). Software operational profile: in-
vestigating specific applicabilities. In Proceedings
of the XXIII Iberoamerican Conference on Software
Engineering, CIbSE’2020, Curitiba, PR, Brazil. Curran
Associates. Accepted for publication. (in Portuguese).

Chen, M. ., Lyu, M. R., and Wong, W. E. (2001). Effect of
code coverage on software reliability measurement. IEEE
Transactions on Reliability, 50(2):165–170.

Cukic, B. and Bastani, F. B. (1996). On reducing the sensi-
tivity of software reliability to variations in the operational
profile. In Proceedings of the International Symposium
on Software Reliability Engineering, ISSRE, pages 45–54,
White Plains, NY, USA. IEEE, Los Alamitos, CA, United
States.

de Andrade Freitas, E. N., Camilo-Junior, C. G., and Vin-
cenzi, A. M. R. (2016). SCOUT: A Multi-objective
Method to Select Components in Designing Unit Testing.
In XXVII IEEE International Symposium on Software
Reliability Engineering – ISSRE’2016, pages 36–46.
IEEE Press. bibtex*[organization=IEEE Computer Soci-
ety] event-place: Ottawa, Canadá.

Falbo, R. A. (2005). Engenharia de software.
Ferrari, F. C., Rashid, A., and Maldonado, J. C. (2013). To-
wards the practical mutation testing of aspectj programs.
Science of Computer Programming, 78(9):1639 – 1662.

Fraser, G. and Arcuri, A. (2011). Evosuite: automatic
test suite generation for object-oriented software. In
Proceedings of the 19th ACM SIGSOFT Symposium and
the13thEuropeanConferenceonFoundationsofSoftware
Engineering, ESEC/FSE ’11, pages 416–419, New York,
NY, USA. ACM.

Fukutake, H., Xu, L., Takagi, T., Watanabe, R., and Yae-
gashi, R. (2015). The method to create test suite based
on operational profiles for combination test of status.
In 2015 IEEE/ACIS 16th International Conference on
Software Engineering, Artificial Intelligence, Networking
and Parallel/Distributed Computing, SNPD 2015 -
Proceedings, pages 1–4, White Plains, NY, USA. Institute
of Electrical and Electronics Engineers Inc.

Gittens, M., Lutfiyya, H., and Bauer, M. (2004). An extended
operational profile model. In Proceedings - International
Symposium on Software Reliability Engineering, ISSRE,
pages 314 – 325, Saint-Malo, France.

Inozemtseva, L. and Holmes, R. (2014). Coverage is
not strongly correlated with test suite effectiveness. In
Proceedings of the 36th International Conference on
Software Engineering, ICSE 2014, page 435–445, New
York, NY, USA. Association for Computing Machinery.

Ivanković, M., Petrović, G., Just, R., and Fraser, G. (2019).
Code coverage at google. In Proceedings of the 2019 27th
ACM Joint Meeting on European Software Engineering
Conference and Symposium on the Foundations of

Software Engineering, ESEC/FSE 2019, page 955–963,
New York, NY, USA. Association for Computing Machin-
ery.

Kashyap, A. (2013). A Markov Chain and Likelihood-Based
Model Approach for Automated Test Case Generation,
Validation and Prioritization: Theory and Application.
Proquest dissertations and theses, The George Washing-
ton University.

Laddad, R. (2009). AspectJ in Action: Enterprise AOP with
Spring Applications. Manning Publications Co., Green-
wich, CT, USA, 2nd edition.

Leung, Y.-W. (1997). Software reliability allocation under an
uncertain operational profile. Journal of the Operational
Research Society, 48(4):401 – 411.

Linger, R. C., Mills, H. D., and Witt, B. I. (1979). Struc-
tured programming - theory and practice. In The systems
programming series.

Mafra, S. N., Barcelos, R. F., and Travassos, G. (2006).
Applying an evidence based methodology to define new
software technologies. In XX Brazilian Symposium
on Software Engineering - SBES’2006, pages 239–254,
Florianópolis, SC, Brazil. Available at: http://www.
ic.uff.br/~esteban/files/sbes-prova.pdf. Ac-
cess on: 05/04/2020. (in Portuguese).

Musa, J. (1993). Operational profiles in software-reliability
engineering. IEEE Software, 10(2):14–32. cited By 396.

Musa, J. and Ehrlich, W. (1996). Advances in software re-
liability engineering. Advances in Computers, 42(C):77–
117. cited By 1.

Musa, J. D. (1979). Software reliability measures ap-
plied to systems engineering. In Managing Requirements
Knowledge, International Workshop on(AFIPS), vol-
ume 00, page 941, S.I. IEEE.

Musa, J. D. (1994). Adjusting measured field failure in-
tensity for operational profile variation. In Proceedings
of the International Symposium on Software Reliability
Engineering, ISSRE, pages 330–333, Monterey, CA,
USA. IEEE, Los Alamitos, CA, United States.

Nakagawa, E. Y., Scannavino, K. R. F., Fabbri, S. C. P. F., and
Ferrari, F. C. (2017). Revisão Sistemática da Literatura em
Engenharia de Software: Teoria e Prática. Elsevier Brasil.

Namba, Y., Akimoto, S., and Takagi, T. (2015).
Overview of graphical operational profiles for gen-
erating test cases of gui software. In K., S., editor,
2015 IEEE/ACIS 16th International Conference on
Software Engineering, Artificial Intelligence, Networking
and Parallel/Distributed Computing, SNPD 2015 -
Proceedings, pages 1–3, White Plains, NY, USA. Institute
of Electrical and Electronics Engineers Inc.

Poore, J., Walton, G., and Whittaker, J. (2000). A constraint-
based approach to the representation of software us-
age models. Information and Software Technology,
42(12):825 – 833.

Pressman, R. S. (2010). Software Engineering A
Practitioner’s Approach. McGraw-Hill, New York, NY,
7rd edition.

Rincon, A. M. (2011). Qualidade de conjuntos de teste
de software de código aberto, uma análise baseada em
critérios estruturais.

http://www.ic.uff.br/~esteban/files/sbes-prova.pdf
http://www.ic.uff.br/~esteban/files/sbes-prova.pdf


Software Operational Profile vs. Test Profile: Towards a better software testing strategy Cavamura Jr et al. 2020

Rocha, A. D. (2005). Uma ferramenta baseada em aspectos
para apoio ao teste funcional de programas java.

Shukla, R. (2009). Deriving parameter characteristics.
In Proceedings of the 2nd India Software Engineering
Conference, ISEC 2009, pages 57–63, New York, NY,
USA. ACM.

Shull, F., Carver, J., and Travassos, G. H. (2001). An
empirical methodology for introducing software pro-
cesses. In Proceedings of the 8th European Software
Engineering Conference Held Jointly with 9th ACM
SIGSOFT International Symposium on Foundations of
Software Engineering, ESEC/FSE-9, page 288–296, New
York, NY, USA. Association for Computing Machinery.

Sommerville, I. (1995). Software Engineering. Addison-
Wesley, Wokingham, England, fifth edition edition.

Sommerville, I. (2011). Software Engineering. Addison-
Wesley, Harlow, England, 9 edition.

Takagi, T., Furukawa, Z., and Yamasaki, T. (2007). An
overview and case study of a statistical regression test-
ing method for software maintenance. Electronics and
Communications in Japan Part II Electronics, 90(12):23–
34.

Travassos, G. H., dos Santos, P. S. M., Mian, P. G., Neto, P.
G. M., and Biolchini, J. (2008). An environment to sup-
port large scale experimentation in software engineering.
In 13th IEEE International Conference on Engineering of
Complex Computer Systems (iceccs 2008), pages 193–
202.


	Introduction
	Software Operational Profile (SOP)
	blackThe SOP and the Software Quality
	blackProblems related to the use of SOP

	Research Methodology
	Related Work
	Experimental Studies (EXS)
	Strategy for data collection 
	EXS–AT1: Evaluating the variation in how software is operated by users
	EXS–AT1: Data Analysis
	EXS–AT1: Results

	EXS–AT2: SOP vs. Test Profile
	EXS–AT2: Data Analysis
	EXS–AT2: Results

	EXS–AT3: Failures in untested SOP parts
	blackAnalyzing failures in the untested SOP of S2
	blackAnalyzing failures in the untested SOP parts of S4
	blackEXS–AT3: Results

	blackEXS–AT4: Attempting to decrease the misalignment between the SOP and the Test Profile
	blackEXS–AT4: Data Analysis
	blackEXS–AT4: Results


	Lessons Learned
	Threats to Validity
	Conclusions