An informed type A evaluation of standard uncertainty valid for any sample size greater than or equal to 1 ACTA IMEKO ISSN: 2221-870X June 2022, Volume 11, Number 2, 1 - 5 ACTA IMEKO | www.imeko.org June 2022 | Volume 11 | Number 2 | 1 An informed type A evaluation of standard uncertainty valid for any sample size greater than or equal to 1 Carlo Carobbi1 1 Departement of Information Engineering, Università degli Studi di Firenze, Via Santa Marta 3, 50139 Firenze, Italy Section: RESEARCH PAPER Keywords: Measurement uncertainty; type A evaluation; pooled variance; Bayesian inference; informative prior Citation: Carlo Carobbi, An informed type A evaluation of standard uncertainty valid for any sample size greater than or equal to 1, Acta IMEKO, vol. 11, no. 2, article 29, June 2022, identifier: IMEKO-ACTA-11 (2022)-02-29 Section Editor: Francesco Lamonaca, University of Calabria, Italy Received October 1, 2021; In final form February 23, 2022; Published June 2022 Copyright: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Corresponding author: Carlo Carobbi, e-mail: carlo.carobbi@unifi.it 1. INTRODUCTION The quantification of the type A uncertainty contribution in the case of a small sample ( = 1, 2, 3n ) is a subject of research and passionate debate in the Working Group 1 of the Joint Committee for Guides in Metrology (JCGM WG1), the standards working group involved in the maintenance and development of the Guide to the Expression of Uncertainty in Measurement (GUM, [1]) and its supplements. The topic is so felt that, at the end of 2019, the “JCGM WG1 Workshop on Type A evaluation of measurement uncertainty for a small set of observations” was held at the Bureau International des Poids et Mesures (BIPM, Sèvres, Paris). The problem arose following the negative reaction to the Committee Draft (CD) of the review of the GUM, circulated at the end of 2014 [2]. One of the most criticized issues of the draft of the “new GUM” is the type A evaluation of uncertainty based on the use of a Student’s t probability density function having −1n degrees of freedom, shifted by the mean y of the n observations i y , = 1, 2,...,i n , and scaled by the standard deviation of the mean s n , where (1) and . (2) By following this approach, the type A evaluation of standard uncertainty is , (3) which is not valid for a sample having a size of less than = 4n . Such solution originates from a Bayesian approach to inference, where improper priors (Jeffreys prior) are adopted for the mean  and variance  2 parameters of the parent normal probability density function (PDF), i.e. (4) and = =  1 1 n i i y y n ( ) = = − −  22 1 1 1 n i i s y y n ( ) − = − 1 3 n s u y n n ~ const. ABSTRACT An informed type A evaluation of standard uncertainty is here derived based on Bayesian analysis. The result is mathematically simple, easily interpretable, applicable both in the theoretical framework of the Guide to the Expression of Uncertainty in Measurement (propagation of standard uncertainties) and in that of the Supplement 1 of the Guide (propagation of distributions), valid for any size greater than or equal to 1 of the sample of present observations. The evaluation consistently addresses prior information in the form of the sample variance of a series of recorded experimental observations and in the form of an educated guess based on expert’s experience. It turns out that distinction between type A and type B evaluation is, in this context, contrived. mailto:carlo.carobbi@unifi.it ACTA IMEKO | www.imeko.org June 2022 | Volume 11 | Number 2 | 2 , (5) where ( ) 20p represents the improper prior adopted for  2 , namely . (6) Note that the information conveyed by these priors is the one strictly relevant to the character of the two parameters:  is the location parameter and  2 is the scale parameter. In contrast, practitioners in testing and calibration have much richer information about the variability of the measurement process that is represented by (6). The Bayesian approach is the one followed by the Supplement 1 of the GUM (GUMS1, [3]) and the intent of JCGM WG1 was precisely to align the GUM to GUMS1 by attributing the same Student's t probability density to a sample of repeated observations. The problem is that, by doing so, it is possible to propagate the distributions (as foreseen by GUMS1) but it is not possible to propagate the standard uncertainties (as foreseen by the GUM) if the sample size is less than = 4n . This is generally not acceptable (e.g., in destructive), particularly if implemented as a standard (mandatory) method. The GUM and the GUMS1 approaches are therefore inconsistent. They produce substantially different results when random variability is a significant contribution to measurement uncertainty and the number of measurements used for its estimate is low [4]. The JCGM WG 1 did not seemingly yet identify a way out of the inconsistence between the GUM and the GUMS1. Both frequentists and Bayesians can agree on the fact that the estimate of the average value obtainable from such a small sample is not very reliable. In favor of the Bayesian approach to inference, one can observe that no other way to enrich the estimate is available than the use of prior information on the variability of the measurement process that integrates the meagre experimental observation. In this sense, a Bayesian approach is useful because, differently from the frequentist approach, it provides us with a method for combining prior information with experimental observation. From the applicative point of view these concepts have relevance to the evaluation of measurement repeatability. Measurement repeatability quantifies the variability of measurement results obtained under specified repeatability conditions. Measurement repeatability is an essential contribution to measurement uncertainty in every field of experimental activity. In the context of testing and calibration if a stable item is re- tested or re-calibrated, the new measurement results are expected to be compatible with the old ones. Two distinct operators should provide compatible measurement results when testing or calibrating the same item. Measurement repeatability is then a reference for qualification of personnel. Monitoring measurement repeatability contributes to assuring the validity of test and calibration results. In an accreditation regime [5], measurement repeatability must be kept under statistical control. Periodic assessments are carried out by the accreditation body aimed at verifying, through an appropriate experimental check, the robustness of the estimate of measurement repeatability, see [6], equation (6), p. 5 (in Italian), and [7], clause 6.6.3. The GUM provides type A evaluation of standard uncertainty as the tool to quantify measurement repeatability. Type A evaluation is based on a frequentist approach, thus implying that information on the quality of the estimate of measurement uncertainty must be conveyed to the user. This is done in terms of effective degrees of freedom. The GUMS1 adopts a knowledge based (in contrast to frequentist) approach to model measurement repeatability. The quality of the estimate of measurement uncertainty is accounted for by the available prior knowledge, which eventually determines the width of the coverage interval. The use of numerical methods for professional (accredited) evaluation of measurement uncertainty is expected to increase in the future. Indeed, the GUMS1 numerical method, which is based on the propagation of probability distributions, accounts for possible non-linearity of the measurement model, is simple, less prone to mistakes (partial derivatives are not required), provides all the available information about the measurand in terms of its probability distribution. Further, the use of numerical methods is practically unavoidable when the measurement model is complex and/or the measurand is an ensemble of scalar quantities (vector). On the other extreme, the analytical method (based on the law of propagation of uncertainty) is consolidated and the one predominantly adopted nowadays. A further point of strength of the analytical method is its great pedagogical value. Achieving consistence between the analytical and numerical approaches to measurement uncertainty quantification is therefore desirable since both have arguments of strength and are expected to coexist in the future. What is proposed here is a knowledge-based approach to the type A evaluation of measurement uncertainty and, specifically, measurement repeatability. An estimate of the repeatability of a measurement system may be available, representative of its performance in testing. This knowledge may be derived from: • Systematic recording of periodic verifications of the measurement system • Analysis and quantification of the individual sources of variability in the measurement chain • Normative reference (for standard measurement systems used in testing) • Information from manufacturers of measuring instruments • Experience with the specific measurement chain or similar ones. As in the GUMS1, use is made here of Bayesian inference since it provides a straightforward method to incorporate prior knowledge. Differently from the GUMS1 Bayesian approach, here an informative prior PDF is assigned to  2 . To obtain analytical results, useful in the framework of the law of propagation of uncertainty, a normal probability model is assumed with a non-informative prior PDF for the mean and a conjugate prior PDF for the variance. In section 2 the theoretical approach is described and in subsection 2.1 is compared with another one [8] previously presented in the scientific literature and proposed by a member of JCGM WG1. In section 3 theoretical results are applied to a practical case, based on the experience of the author, as an assessor of accredited testing laboratories. Conclusions follow in section 4. Finally, an appendix is devoted to the mathematical derivations supporting the results presented in section 2. ( )~ 2 20p ( )   2 0 2 1 p ACTA IMEKO | www.imeko.org June 2022 | Volume 11 | Number 2 | 3 2. TYPE A EVALUATION WHEN PRIOR INFORMATION IS AVAILABLE By prior information we mean here information on the variability of the measurement process obtained before that a certain test (or calibration) is carried out. Let us consider the case in which the a priori information consists of a relatively long series of experimental observations. The important hypothesis that must be verified is that the previous experimental observations are obtained under repeatability conditions that are representative of those that occur during the test, both as regards the measurement system and the measurand. If this is not verified, the a priori information is not valid to represent the variability observed during the test. This hypothesis is necessarily realized following an experimental procedure based on a physical modeling aimed at identifying the causes of the variability and at limiting its effects. It is the experimenter's task to ensure that the hypothesis is verified in practice. In mathematical terms, the Bayesian inference is made on the mean value  and the variance  2 of a Gaussian PDF assuming an improper uniform PDF for  and a scaled inverse  2 PDF [9], Table A.1, p. 576, for  2 . The choice of the improper uniform PDF for  is justified by the desire to avoid introducing an a-priori bias on the best estimate of the measurand value, which in this way depends solely on the experimental observation obtained during the test. The choice of the scaled inverse  2 PDF for  2 is justified by the desire to incorporate prior information while retaining the well-known Student’s t as the posterior PDF of  [9], section 3.3, p. 67. The parameters of the scaled inverse  2 PDF are the prior variance  2 0 and the associated degrees of freedom  0 . Another advantage stemming from the use of the scaled inverse  2 PDF is the immediate physical interpretation of the degrees of freedom  0 as the number of measurements that have been necessary to derive the prior estimate  2 0 minus 1. At the same time  0 can be linked to the degree of credibility trusted to  2 0 as an estimate of  2 , as it is demonstrated here, through the use of (11). With this choice of the prior PDFs (see the appendix for the derivation) we obtain, for the posterior marginal PDF of  a Student’s t PDF with degrees of freedom , (7) shifted in (8) and with scaling factor  2 n n , where . (9) According to this approach, the type A evaluation of standard uncertainty will be . (10) We observe from (7) that the number of degrees of freedom  0 of the prior evaluation of variability,  0 , add up to the number of degrees of freedom −1n with which the variability s is evaluated during testing. The result is valid if the assumption that repeatability conditions are kept the same both in the prior investigation and testing is verified. The estimate (8) is determined by repeated observations obtained during the testing phase because a constant and improper prior PDF for  has been chosen. The result (9) is particularly simple and convincing: the variance  2 n which quantifies the variability of the measurement process is the result of the pooling of the prior variance  2 0 and the sample variance observed in testing 2 s through a weighted average, the weights being the corresponding degrees of freedom. The type A evaluation of standard uncertainty passes from (3), in absence of prior information, to (10), which is valid also for = 1n provided that   0 3 . The following consideration is also of interest. The prior information about the variability of the measurement process may be derived, for example, from the assessment of an expert. A simple form of this prior information is a best estimate  0 and a quantile   that the expert judges to be exceeded with a small probability  . A link can be established among   ,  and  0 for a given  0 . This can be done through the cumulative distribution function of the scaled inverse  2 prior of  2 evaluated at  2 , namely , (11) where is the upper incomplete gamma function with parameters  and z , and ( ) z is the gamma function. If  0 is known then (11), for any given  , implicitly provides a value for  0 . This relationship can be represented through a plot such as the one in Figure 1. Note from Figure 1 that the larger is    0 the smaller is  0 for a given  . The smaller is  for a given    0 the larger is  0 . The idea of pooling prior variability is not new in the context of the GUM. It is indeed briefly mentioned in clause 6.4.9.6 of the GUMS1 and in 9.2.6 of the CD of GUM review [2]. 2.1. Comparison with the type A evaluation obtained truncating the improper prior for 𝝈𝟐 In a recent paper [8] Cox and Shirono propose a solution to the problem of the type A evaluation in case of small sample where  T is an upper bound (truncation) value for the improper prior of  2 , i.e.  = − + 0 ( 1) n n  = n y ( )     − + = − + 2 2 0 02 0 1 ( 1) n n s n      = − 2 n n n n              = −       2 0 0 0 2 0 , 2 2 1 2 ( ) ( )  −  = − 1 , exp z z t t dt ACTA IMEKO | www.imeko.org June 2022 | Volume 11 | Number 2 | 4 . (12) The prior PDF of  is in [8], as in this work, a constant improper prior. By following [8], the type A evaluation of standard uncertainty can be expressed as  s n , where . (13)   0 if  2n is a function of s , n and  T . As shown in Figure 2, it results that    T s also when  T s and n is arbitrarily large. This is problematic because, when observed variability is more credible (larger number of degrees of freedom) than prior knowledge of variability, then the observed variability, not its prior estimate, should dominate the type A evaluation. In other words, setting an upper bound on  2 is acceptable provided that irrefutable evidence is available of an upper truncation value. Otherwise, setting a large value with an associated small probability of being exceeded is a more cautionary approach. Another limitation of the approach in [8] is that necessarily  2n (see (13),  = 0 if = 1n ) while, according to the solution here proposed, also the case = 1n is tractable. 3. APPLICATION IN THE CONTEXT OF ACCREDITATION TO ISO/IEC 17025 National accreditation bodies require evaluation of measurement repeatability of the test methods in the scope of accreditation. Such evaluation is carried out by testing laboratories through periodic recording of measurement results obtained in representative conditions of actual testing. An estimate  0 with corresponding degrees of freedom  0 is thus obtained. How to incorporate this prior knowledge into test outcome? We here provide a numerical example in the context of electromagnetic compatibility (EMC) testing. Suppose that the estimate of the non-repeatability of the radiated emission measurement chain is  = 0 0.8 dB and  = 0 9 . Testing two times ( = 2n ) an absolute deviation between measured values of 1.5 dB is obtained, then = 1.5 2s dB = 1.06 dB. By pooling standard deviations  0 and s we have ( ) = − + = + =01 1 9 10n n , ( )      + − + = = = − + + 2 0 2 22 0 0 1 dB 0.76 dB 1 1 0 ( 1) 8 1 9 .06 9 . n n s n and      = = = − − 10 0.76 0.60 dB 2 10 2 2 n n n n As a second example consider the case where an expert of the specific test method provides a guess  = 0 1 dB, based on experience with similar test systems. The expert is also confident that, with a low probability  = 5 %,  exceeds   = 2.5 dB. This state of knowledge corresponds to approximately (see Figure 1)  = 0 4 , from which  = 5 n (instead of 10, as in the previous example),  = 0.86 n dB (instead of 0.76 dB), and   = 0.78 dB (instead of 0.60 dB). 4. CONCLUSIONS Reliable statistical techniques to incorporate prior knowledge into the so-called “type A” evaluation of standard uncertainty should be identified to make evaluation more robust in case of small sample. The use of these statistical techniques should be promoted and confidently accepted in accredited testing if competence requirements are fulfilled. GUMS1 already provides such a tool by pooling prior variance and sample variance. A Bayesian derivation of the GUMS1 pooled variance is here illustrated along with and a more flexible interpretation aimed at addressing expert’s knowledge as a useful source of reliable information. ( ) ~             2 2 0 1 0 0 T T p ( ) ( )      −    −−    =   −    −    1 2 2 2 2 2 3 , 2 2 11 2 1 , 2 2 1 T T n n sn n n s Figure 1: plots of the degrees of freedom as a function of the ratio obtained by solving the implicit equation (11) for three values of probability (see the legend). Figure 2: plots of as a function of and for selected values of (see the legend). Note that for any value of and for any value of .  0  0     T  T s n    Ts  Ts n ACTA IMEKO | www.imeko.org June 2022 | Volume 11 | Number 2 | 5 According to the results described in this work there is no need to distinguish between type A and type B evaluations since a homogeneous mathematical treatment is used to address prior information about variability (notwithstanding is originated from experimental evidence or expert’s experience) and its pooling with present observation. The main ideas and results in this work were presented by the author of this paper, during the 2019 JCGM-WG1 workshop mentioned in the introduction section. I would like to acknowledge, that during the same workshop, also Anthony O’Hagan (Emeritus professor, University of Sheffield), proposed the use of the scaled inverse  2 PDF to solve the problem of the Type A evaluation in case of small sample size. His formulation of the solution (still unpublished) was different from mine, but it is remarkable that two researchers, having a completely different background, arrived at similar proposal. The concluding section contains the major achievements of the research presented in the manuscript. It should be concise but informative. When numerical results are an essential part of the research, for instance a wider measurement range, higher uncertainty (6), they should be included in the conclusions. Notice that conclusions are not the same as an abstract. REFERENCES [1] GUM: BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML, 2008 Guide to the Expression of Uncertainty in Measurement, JCGM 100:2008, GUM 1995 with minor corrections. [2] JCGM 100 201X CD (Committee Draft), Evaluation of measurement data - Guide to uncertainty in measurement, circulated in December 2014. [3] GUMS1: BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML, 2008 Supplement 1 to the ‘Guide to the Expression of Uncertainty in Measurement’ – Propagation of distributions using a Monte Carlo method JCGM 101:2008. [4] W. Bich, M. Cox, R. Dybkaer, C. Elster, Revision of the ‘Guide to the Expression of Uncertainty in Measurement’, Metrologia 49 (2012) 702–705. DOI: 10.1088/0026-1394/49/6/702 [5] ISO/IEC 17025, Conformity Assessment–General Requirements for the Competence of Testing and Calibration Laboratories, Int. Org. Standardization, Geneva, Switzerland (2017). [6] SINAL DT-0002/6, Guida al calcolo della ripetibilità di un metodo di prova ed alla sua verifica nel tempo, Rev. 0, Dicembre 2007. [In Italian] [7] B. Magnusson, U. Örnemark (eds.) Eurachem Guide: The Fitness for Purpose of Analytical Methods – A Laboratory Guide to Method Validation and Related Topics, (2nd ed. 2014). ISBN 978- 91-87461-59-0. Online [Accessed 22 April 2022] https://www.eurachem.org/index.php/publications/guides/mv [8] M. Cox, T. Shirono, Informative Bayesian Type A uncertainty evaluation, especially applicable to a small number of observations, Metrologia 54 (2017), pp. 642–652. DOI: 10.1088/1681-7575/aa787f [9] A. Gelman, A. Vehtari, J. B. Carlin, H. Stern, D. B. Dunson, D. B. Rubin, Bayesian Data Analysis, Third Edition, CRC Press, 2014, ISBN 9781439840955. APPENDIX We here derive the marginal posterior PDF of  given prior information in terms of the prior PDFs of  and  2 and the set of observations i y , where = 1, 2,...,i n . A uniform prior PDF is assigned to  as , (14) while the prior of  2 is a scaled inverse  2 PDF with prior variance  2 0 and associated degrees of freedom  0 . (15)  and are a-priori independent, then the joint prior PDF of and is, from (14) and (15), . (16) The likelihood of the observations is easily obtained as [9] , (17) where y is a vector representing the set of observations i y , = 1, 2,...,i n . Due to Bayes theorem the joint posterior PDF of  and  2 is given by . (18) Substituting (16) and (17) into (18) and marginalizing with respect to  2 it is readily obtained (19) where ( )|p y represents the marginal posterior PDF of  . It is evident from (19) that ( )|p y is a Student’s t PDF shifted in y and scaled by  2 n n , where  2 n is given by (9). ~  2 const. ( )~   −2 2 20 0Inv ,  2   2 ( ) ( ) ( )       − +    −    0 2 2 1 2 2 0 0 2 , exp 2 p ( ) ( ) ( ) ( ) ( ) ( )         = −  − −        − + − = −       2 2 2 1 21 2 22 2 2 2 exp 2 ; , 1 exp 2 i n i n y l n s n y y ( ) ( ) ( )      2 2 2, | ; , ,p l py y ( ) ( ) ( )      + −  −  +   − +   0 2 2 2 2 0 0 | 1 1 n n y p y n s http://dx.doi.org/10.1088/0026-1394/49/6/702 https://www.eurachem.org/index.php/publications/guides/mv https://doi.org/10.1088/1681-7575/aa787f