Microsoft Word - Article 7 - 69-728-2-LE.docx ACTA IMEKO December 2013, Volume 2, Number 2, 34 – 40 www.imeko.org ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 34 GUM conformity of software products - a discussion from a software tester’s perspective Norbert Greif1, Heike Schrepf1 1 Physikalisch-Technische Bundesanstalt, Institute Berlin, Abbestraße 2-12, 10587 Berlin, Germany Section: RESEARCH PAPER Keywords: Measurement uncertainty; GUM conformity; measurement software quality; software validation Citation: Norbert Greif, Heike Schrepf, GUM conformity of software products - a discussion from a software tester’s perspective, Acta IMEKO, vol. 2, no. 2, article 7, December 2013, identifier: IMEKO-ACTA-02 (2013)-02-07 Editor: Paolo Carbone, University of Perugia Received February 12th, 2013; In final form November 5th, 2013; Published December 2013 Copyright: © 2013 IMEKO. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Funding: Information not available Corresponding author: Norbert Greif, e-mail: norbert.greif@ptb.de 1. INTRODUCTION An increasing number of software products that claim to offer a GUM-compliant calculation of measurement uncertainties are available on the market. In order to ensure that these products perform the calculations in accordance with the GUM [1], a specific validation of the software products with respect to the GUM is necessary. Additionally, to guarantee comparability of the measurement uncertainties calculated by different software products, a defined comparability of the software products themselves is required. Consequently, a reusable, automated test environment has been developed which supports both a GUM-oriented validation and GUM-related comparisons of different software products by tracing back the product features to the rules and requirements of the GUM (see figure 1). The paper presents the benefit of the test environment, but also the limitations of validation and product comparisons. To bridge the gap between the GUM guideline and the required test specification, an analysis of the GUM from the perspective of software testing is presented. This detailed analysis of the GUM has uncovered some issues and inconsistencies within the GUM. Included are, for example, non-testable GUM statements, alternative options of implementations of GUM statements, and missing definitions. To ensure unambiguous implementations of the GUM and corresponding explicit test specifications, these ambiguities of the GUM have to be overcome or minimised (see figure 1). To be sound, first of all, the GUM guideline had to be transformed into a formal specification. For the core clauses of the GUM guideline, such a specification was already presented in [6]. In this paper, the underlying specification is not the focal point. Instead of that, the problems of deriving the specification from the GUM guideline are dealt with. For example, as outlined in section 3.3, the GUM often allows several computations resulting in differing solutions. The concept of the paper is that each of these possible solutions is considered to be GUM-compliant as long as the solution itself is computed correctly. Thus, several differing nominal results are possible. In the paper, the tester’s procedures to deal with different results are described in the clauses called accepted solutions. 2. MOTIVATION AND AIM For several years, the authors have been involved in the evaluation of software products which implement the GUM [3, 4, 5]. Recently, three further software products were comparatively evaluated concerning GUM conformity. During this work, special experience was gathered and an ABSTRACT This paper describes how to assess GUM conformity of software products which claim to offer a GUM-compliant calculation of measurement uncertainties. To bridge the gap between the GUM guideline and the required test specification, an analysis of the GUM from the perspective of software testing is presented. Problems of testability and ambiguity of GUM statements are analysed in detail. The benefit and the limits of the developed validation procedure and test environment are outlined. ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 35 implementation-oriented view on the GUM has emerged. One aim of the paper is to outline this special experience. For example, the following fundamental questions have come up regarding an implementation of the GUM:  Completeness What does it mean when a software product claims to implement the GUM? Is the GUM completely implementable? Is it possible to reformulate each of the GUM statements so that it can be represented as a computational step? Is a pocket calculator already compliant to GUM when it correctly implements only one formula such as the average of repeated observations (as described in GUM 4.2.1)?  Correctness What does it mean when a software product claims to be correct? Is the product able to calculate the correct values? Or is it able to calculate the correct values, round them with a correct rounding procedure, and display them with a correct number of digits?  Compliance What does it mean when a software product claims to be compliant (conforming) to the GUM? The GUM guideline does not contain a conformity clause. Thus, is it allowed to claim a conformity statement based on completeness and correctness? Already these few questions lead to one of the core problems of both testing GUM software and comparing GUM test results: The need to trace back each computational step and each test result to a certain well-defined, well-understood and uniformly interpreted GUM statement. Consequently, traceability should be the precondition for the validation of a specific software product as well as the comparison of different products. To get repeatable, comparable, and traceable validation results, the questions mentioned above and some further queries have to be answered. The corresponding answers have an important impact on the set-up of the software test environment. For the software specification step in between, specific introduction and guidance is given in [6]. In summary, the objectives of the work are  to prove a GUM-compliant calculation of measurement uncertainties (to prove “conformity” of GUM-supporting software products with the GUM guideline);  to provide comparability of measurement uncertainties calculated using different software products (useful for key comparisons);  to provide comparability of test results and of the software products themselves. To achieve these objectives, the main tasks are  to perform a detailed analysis of the GUM from the software testing and software implementation point of view;  to trace back each computational step and each test result to a certain well-defined GUM statement;  to develop a GUM-oriented validation procedure;  to develop a reusable, automated test environment;  to support a GUM-related comparison of different software products by tracing back the product features to the requirements of the GUM. This paper describes the analysis of the GUM (see section 3) and gives a short overview of the validation procedure and the test environment developed (see section 4). Figure 1. Basic task: Bridging the gap between the ambiguous GUM guideline with inconsistencies or missing definitions, and an explicit test specification. ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 36 3. ANALYSIS OF THE GUM FROM THE PERSPECTIVE OF SOFTWARE TESTING In this main section of the paper, some problems of testability and ambiguity of GUM statements are analysed in detail. These issues have a straight influence on the traceability and comparability of validation results belonging to different software products. In the following, the GUM issues under discussion are classified according to these (non-disjunct) four main categories:  Testability of GUM statements Non-testable statements are analysed with respect to additional context information.  Strictness of GUM statements Diffuse statements with regard to the possibility to use alternative options are analysed.  Ambiguity of GUM statements Ambiguous statements regarding informal wording are discussed.  Specific problems Missing GUM statements, GUM inconsistencies and the handling of calculation results not covered by the GUM are analysed. For each problem, the necessary decisions to guarantee testability, the unambiguous definition of the validation procedure, and the direct consequences for the development of the test environment are derived (cf. the examples with accepted solutions). Some solutions cannot be realised within an automated test environment. 3.1. Testability of GUM statements The GUM includes a series of statements which are not testable, even in the core clauses 4 through 8. These statements require additional decisions or context information to guarantee the unambiguous definition of the test process. Software packages should be able to ask for the necessary context information. This has not been the case for all software packages validated so far. Example 1 (GUM 4.2.1, GUM 4.2.3): The GUM describes the computation of an arithmetic mean for an observation series and allows the use of the computed mean as an estimator for the quantity’s value as long as certain preconditions are met. One of these preconditions is the repeatability of observations. Being allocated a list of observation values, no software package is able to decide whether the required repeatability conditions have been met. If the repeatability condition is to be tested, context information is necessary. Accepted solution 1: The package asks the user to check the conditions. Accepted solution 2: The package always assumes certain repeatability conditions. The user manual points out the responsibility of the user. Example 2 (GUM 3.2.3, GUM 3.2.4, GUM 8.1): The GUM states that systematic deviations have to be incorporated into the model equation in the form of correction terms. Being allocated a model equation, no software package is able to decide whether the model equation is complete in this sense. Accepted solution: The package always assumes completeness of the model equations. The user manual points out the responsibility of the user. Example 3 (GUM G.2.1): The GUM states that the output quantity is approximately normally distributed if its variance is “much larger than …”. A software package is not able to compare two values in this informal way. Accepted solution: The package offers all information necessary to decide on distribution of the output quantity to the user (distributions of all input quantities, their uncertainty contributions, the linearity of the model equation). The user should be able to decide whether the distribution of the output quantity may be understood as normal or is “unknown”. In the case “unknown”, the package should not calculate and report the expanded uncertainty of that output quantity. Example 4 (GUM 5.1.2, GUM 5.1.5): The GUM states that “higher terms (of the Taylor series of the model function) must be negligible”. Should a software package neglect a term when its value is 1/20, 1/100, or 1/1000 of the sum of the low-order terms? Accepted solution 1: The software package calculates results for both, the standard GUM case (first order of Taylor series), and the sophisticated case (including higher order). If the results for uc(y) are equal after rounding and shortening, then the higher order terms are negligible. Accepted solution 2: Alternatively to solution 1, the package can check the linearity of the model equation as long as the results are the same as for solution 1. Example 5 (GUM F.1.2.1 a) and c)): The GUM explains that the covariance of two input quantities may be treated as insignificant if certain conditions are met. The software package is not able to decide whether the required conditions have been met. Accepted solution 1: The software package calculates both, the result with and without correlation. If the results for uc(y) are equal after rounding and shortening, then the correlation is negligible. Accepted solution 2: The software package asks the user to check the correlation values. 3.2. Strictness of GUM statements The possibility to use alternative options requires additional decisions or assumptions to ensure testability. Such options have to be exercised, for example, in case of the formulation of model equations, or in case of the evaluation of sensitivity coefficients. Example 6 (GUM 3.1.7, GUM 4.1.1, GUM 4.1.2): GUM 3.1.7 mentions that the presented concept, although only discussed for scalars, is applicable to vector results, too. However, there is no further treatment. GUM 4.1.1 presents the model relationship as an equation solved for the scalar output quantity. ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 37 GUM 4.1.2 states that the relation between input quantities and output quantities are not necessarily an explicit functional relationship. Instead, an algorithm or a computer program which is able to produce result values y for certain input values xi may be used. Accepted solution: The model equation may be represented by an explicit functional relationship or not. It may be formed for scalars or for vector results. Each variant is considered to be compliant. Example 7 (GUM 5.1.3, GUM 5.1.4): GUM 5.1.3 describes the sensitivity coefficients as partial derivatives of the output quantity with respect to the input quantities at the point of the estimates of the input quantity values. Note 2 of the same section states that the partial derivatives may be calculated using common numerical methods. GUM 5.1.4 allows the experimental determination of these sensitivity coefficients. Accepted solution: Sensitivity coefficients may be determined analytically, numerically, or experimentally. Each variant is considered to be GUM-compliant. Example 8 (GUM 4.1.4): GUM 4.1.4 states, that the estimated value of the output quantity is calculated using the estimated values of the input quantities and the model equation. The following note says that the estimate of the output quantity may also be calculated as the average of several output values, each of them calculated from a set of input values and the model equation. Accepted solution: The value y may be calculated as a function value or as an average of function values. Each variant is considered to be GUM-compliant. In case of linear models, the results do not differ. Example 9 (GUM G.4.1, Note 1): GUM G.4.1 describes the treatment of a degrees-of-freedom value calculated by the Welch-Satterthwaite formula. To derive the coverage factor, two different methods are allowed, interpolation or truncation (cf. figure 2). Both computations result in significantly differing values. Accepted solution: Each variant is considered to be GUM- compliant. 3.3. Ambiguity of GUM statements Ambiguity is caused by informal GUM wording. Usually, the informal wording shall improve readability and comprehensibility of the GUM. Example 10 (GUM 7.2.6, GUM H): GUM 7.2.6 explains that the uncertainty should be given with "at most" two significant digits. More digits are allowed to avoid rounding errors in subsequent calculations. GUM annex H mostly uses two, in some cases only one digit for uncertainty values (H.3, H.5, H.6). What should the programmer of a GUM package do regarding the question of digits? How should the tester of a GUM package formulate the nominal output for a test case? Accepted solution: The software package should use two digits by default. It should allow a manual adjustment if necessary. Example 11 (GUM 7.2.6): GUM 7.2.6 states that "it may sometimes be appropriate" to round uncertainties up rather than to the nearest digit. Two examples are given: A value like 10.47 should be better rounded up to 11 instead of rounding it to the nearest digit, i.e. 10. In another case, a value like 28.05 should be rounded to the nearest digit, i.e. 28, instead of rounding up to 29. The GUM obviously uses a rounding principle that is describable as "rounding up or down with a fraction limit somewhere between 0.1 and 0.4, instead of 0.5 as is usual". Since this is not formulated explicitly, each programmer is free to use rounding up or rounding to the nearest digit (and half up). Accepted solution: Concerning testing, the decision was made to expect rounding to the nearest digit (and half up). Example 12 (GUM G.6.6): GUM G.6.6 explains that in certain cases one may use the coverage factor values of 2 (to get a level of confidence of nearly 95%) or 3 (to get nearly 99%). Afterwards, the GUM discusses that in these cases significant over- and underestimations of the confidence interval may occur and that a better estimation may be necessary. The user is recommended to choose a better estimation if the approximation is not sufficient for his purposes. The question arises whether a GUM package should use the (GUM-compliant) approximation or the (GUM-compliant) better estimation. Accepted solution(s): The user decides on the kind of distribution of the result quantity using the information delivered by the software package (see Example 3). If the result quantity may be considered normally distributed: Accepted solution 1: The user delivers the level of confidence p and the package calculates the coverage factor t(ν) based on a t- distribution, or Accepted solution 2: The user delivers the level of confidence p and the package calculates the coverage factor k based on a normal distribution, and it delivers the deviation between k and t(ν). In all other cases: Accepted solution 1: The distribution is unknown; the package does not calculate the coverage factor. Accepted solution 2: The distribution is known to the user; the user delivers the coverage factor. Example 13 (GUM F.2.3.3): GUM F.2.3.3 discusses the case in which only a minimum and maximum value (and therefore the half width a) for an input quantity is available. Figure 2. Illustration for example 9 (alternative implementation options). ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 38 The suggestions for the uncertainty of this input quantity vary from a/√3 (for a uniform distribution assumption), to a/√6 (for a triangular distribution assumption), and to a/√9 (for a normal distribution assumption). GUM S1 6.4.2.1 suggests to assume a uniform distribution, based on the principle of maximum entropy [2]. Accepted solution: The package has to ask the user. He has to decide which assumption holds. 3.4. Specific problems Finally, some problems regarding missing GUM statements, GUM inconsistencies, and the handling of calculation results not covered by the GUM are considered. Example 14 (GUM G.4): The problem of effective degrees of freedom of the output quantity is discussed in relation to the problem of the output quantity’s distribution and other aspects (central limit theorem). The given formula (G.2b) and the reference to section GUM 5.1.3 suggest that the formula is valid for uncorrelated input quantities only, but this is not expressed explicitly or discussed in detail. In particular, there is no explicit prescription not to use formula (G.2b) in case of correlated input quantities. Accepted solution: The calculation of a degree-of-freedom value for the output quantity in case of correlated input quantities is not considered to be compliant. A value may be given, but its calculation has to be documented, and it has to be marked as outside the GUM scope. Example 15 (GUM 4.3.8, GUM G, GUM F): A topic which is discussed very roughly is the usage of input quantities with asymmetric distributions. In this case, GUM statements consist of a single section in the main text (GUM 4.3.8), a short discussion in annex G (GUM G.5.3), and the discussion of a particular case in annex F (GUM F.2.4.4). The question arises: How should the user deal with asymmetrically distributed input quantities? They cannot be omitted, since GUM does not prohibit their use. Accepted solution: The distributions of the input quantities do not influence the computation of the value of the output quantity y and the standard measurement uncertainty uc(y). Displaying y and uc(y), and omitting U(y) is considered GUM-compliant. Example 16 (GUM 6, GUM G): The problem of how to evaluate the expanded uncertainty of an output quantity (which is in practice of greater interest than the standard uncertainty) is only briefly discussed. GUM 6 suggests to use a coverage factor between 2 and 3, and mentions that the selection of a proper value depends on experience or, alternatively, on knowledge about the output quantity’s distribution. The details of this discussion take place in annex G. For testers, the question arises whether a software product is GUM-compliant if it uses an arbitrary coverage factor between 2 and 3 ignoring the statements of annex G. Accepted solution: The statements of annex G are considered relevant for achieving GUM-compliance. Example 17 (overall GUM): It is common sense that a correlation matrix should be checked with respect to its being symmetric and non-negative definite. Most of the GUM packages allow the user to do these checks, but the definiteness is not discussed in the GUM. Accepted solution 1: The software package checks the non- negative definiteness of the correlation matrix. Accepted solution 2: The software package does not check the non-negative definiteness of the correlation matrix. Instead of that, before the output of the standard measurement uncertainty uc(y) of the output quantity, the package checks that the expression for uc2(y) is non-negative. Example 18 (overall GUM): The experience from the GUM packages that have been validated is that most of these packages compute - confidence intervals for output quantities with rectangular distribution, - effective degrees of freedom in case of correlated inputs, and - confidence intervals for correlated output quantities, irrespective of the fact that the GUM does not prescribe anything in these cases. Accepted solution: Because these calculation results are not covered by the GUM, they do not belong to a validation of a package with respect to GUM conformity. On the other hand, however, these results are important in practice. With regard to the test process, testing of these calculations is performed, but the corresponding test cases are marked as “outside GUM conformity testing”. 4. OVERVIEW OF THE TEST ENVIRONMENT In this section of the paper, the test environment as it has been developed for the validation and GUM-related comparison of software products is described very roughly. A schematic overview of the test environment is illustrated in figure 3. A detailed presentation is given in [5]. The following description is restricted to the overall understanding of the test concept and to some aspects which are of importance for the analysis of benefits and the problems mentioned above. Implementation details are omitted. The objective to validate software products that implement the GUM is best achieved by establishing a well-defined, GUM- oriented test process supported by a reliable technical test environment. The environment itself has to obey certain quality requirements, for example, correctness and completeness. Especially, the test cases must be designed in a way that they generally fit for any GUM-supporting software product under test. Consequently, comparability of certain validation results and after all the comparability of the whole validation process has to be ensured. To meet these requirements, the test environment consists of the following components:  Data model defining the structure of information necessary for uncertainty calculations and corresponding tests. Main components of the model are the test case identification, the test purpose with classification (cf. figure 3) and GUM reference, the inputs for the software under test, and the nominal outputs which are criteria for the package’s results.  Set of universal test cases which do not contain any product-specific or technical information. The test case repository is implemented based on the data model. ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 39 Each test case is represented by a separate file with a unique identifier.  Test case converter which translates universal test cases into product-oriented specific ones. The converter needs information about the package to be tested, the underlying operating system, and the test tool which will be used, for example. Depending on the software under test and the validation task, the converter has to filter the test cases.  Several sets of product-oriented specific test cases, each of which belongs to a specific software package to be tested. These test cases contain, for example, package-specific commands, input values, buttons to push and menu items to select.  Capture-replay test tools to operate the test cases and to repeat automatically the overall test process. The universal and package-specific test cases are arranged concerning a well-defined classification scheme. This classification hierarchy is based on the software quality characteristics defined in the international software standard ISO/IEC 25010 [7]. The respective position of a test case in the hierarchy corresponds to the purpose of the test. In this way, the classification scheme allows a certain control of completeness and traceability of the validation process. In accordance with figure 4, the main levels of the classification hierarchy are:  Assignment of the test cases to the set of software quality characteristics according to the software standard ISO/IEC 25010 [7], for example, functionality, usability, and reliability.  Subdivision of test cases into positive cases (prove that the GUM is correctly implemented) and negative cases (prove that in case of the non-applicability of the GUM no calculation is carried out).  Specific subdivisions depending on the value for the first level. An example for the third classification level is closely connected with the software quality characteristic functionality (see figure 4). In this case, the classification hierarchy represents the detailed calculation steps needed to prove the conformity of the software packages to the core sections and formulas of the GUM. The calculations are split into the following steps (branches of the classification hierarchy, see figure 4):  Calculations of Type A uncertainties (without correlation of inputs);  calculations of Type B uncertainties (without correlation of inputs);  interpretation of model equations and calculation of sensitivity coefficients (SCs in figure 4);  calculation of values, standard measurement uncertainties, and coverage intervals of output quantities without and with the correlation of input quantities;  calculation of the correlations between output quantities (vector results);  calculation of the examples from GUM Annex H. For each of these calculation steps, further classification levels depending on the degree of complexity of the test cases can be defined. Normally, we use between five and nine classification levels. In addition to the quality characteristic functionality, the characteristics usability and reliability were used to design and implement test cases. In future, the characteristic efficiency might become relevant to include response time evaluations of Monte Carlo simulation engines. 5. CONCLUSIONS A number of software packages which claim to implement the GUM are on the market. However, they differ in functionality and the have deficiencies which are not obvious. Thus, a validation of these packages with respect to the GUM is necessary. The PTB test environment has been used successfully to validate and compare three different GUM-supporting software packages. To bridge the gap between the GUM guideline and the explicit test specification, a detailed analysis of the GUM from a tester’s perspective and certain decisions regarding the test process (cf. the accepted solutions of the examples in section 3) Figure 3. Schematic overview of the test environment. ACTA IMEKO | www.imeko.org December 2013 | Volume 2 | Number 2 | 40 were necessary. Based on the results of this analysis, unambiguous and detailed test cases could be developed. The benefits of the test environment and the validation procedure are:  General procedure usable for any GUM-supporting software product;  automated and reusable process;  comparability of the validation procedure and, especially, of the validation results;  automated documentation process. However, there are also limitations in the validation procedure, and in the process of product comparison. The current procedure includes sections 5.5 to 5.8, and 6 of [2], but does not consider sections 5.9, 5.10, and 7 (Monte Carlo simulations), and does not regard the handling of complex numbers. The general limitation is, that several obstructive characteristics of GUM statements (with regard to software testing), such as ambiguities, missing or inexact specifications/definitions, do restrict the applicability and the objectiveness of the test environment. Thus, some of the accepted solutions cannot be realised within an automated test environment. Concerning the software quality characteristics, up to now, the validation procedure does not include efficiency testing (e.g. duration of Monte Carlo simulations). In principle, the test environment is prepared to realise the extensions mentioned above. Some extensions concerning Monte Carlo simulations and vector results are already under construction. The work reported reveal some problems regarding the objectives of testing GUM-supporting software products and the corresponding GUM statements. These problems, for example, GUM inconsistencies or ambiguities, have to be minimised. Directly, they concern the implementation of GUM-supporting software products and the corresponding product validations. The further discussion of these problems would enhance the traceability of implementation and validation results to the GUM and the comparability of uncertainty calculations performed by different software products. REFERENCES [1] ISO/IEC Guide 98-3:2008, Uncertainty of measurement - Part 3: Guide to the expression of uncertainty in measurement, 2008. [2] ISO/IEC Guide 98-3:2008/Suppl 1:2008, Propagation of distributions using a Monte Carlo method, 2008. [3] N. Greif, H. Schrepf, D. Richter, Software validation in metrology: A case study for a GUM-supporting software, Measurement, Volume 39, 2006, pp. 849-855. [4] N. Greif, H. Schrepf, Validierung von Software zur Bestimmung von Messunsicherheiten, VDI-Berichte 1947, Messunsicherheit praxisgerecht bestimmen, VDI, 2006, pp. 409-418. [5] N. Greif, H. Schrepf, V. Hartmann, G. Kilz, A test environment for GUM conformity tests, Physikalisch-Technische Bundesanstalt (PTB), Braunschweig und Berlin, PTB Report, to appear, 2013. [6] M. G. Cox, P. M. Harris, I. M. Smith, Software specification for uncertainty evaluation, NPL Report MS 7, March, 2010. [7] ISO/IEC 25010:2011, Systems and software engineering - System and software Quality Requirements and Evaluation (SQuaRE) - System and software quality models, 2011. Figure 4. Classification hierarchy of test cases (extract).