Template for an Acta IMEKO event paper ACTA IMEKO ISSN: 2221-870X April 2017, Volume 6, Number 1, 20-26 ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 20 Numerical experimental investigation of comparison data evaluation method using preference aggregation Sergey V. Muravyov, Irina A. Marinushkina, Diana D. Garif National Research Tomsk Polytechnic University, Pr. Lenina 30, 634050 Tomsk, Russia Section: RESEARCH PAPER Keywords: interlaboratory comparisons; reference value; largest consistent subset; preference aggregation; robust method Citation: Sergey Muravyov, Irina Marinushkina, Diana Garif, Numerical experimental investigation of comparison data evaluation method using preference aggregation, Acta IMEKO, vol. 6, no. 1, article 4, April 2017, identifier: IMEKO-ACTA-06 (2017)-01-04 Editor: Paolo Carbone, University of Perugia, Italy Received July 4, 2016; In final form October 13, 2016; Published April 2017 Copyright: © 2017 IMEKO. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Funding: This work was supported by the Ministry of Education and Science of Russian Federation Corresponding author: Sergey Muravyov, e-mail: muravyov@tpu.ru 1. INTRODUCTION Interlaboratory comparisons (IC) are now a quite common and important metrological procedure that is used under key comparisons [1], measurement laboratories proficiency testing [2], etc. The procedure consists in arrangement and implementation of assessment of measurement quality of a given object characteristic by means of several different laboratories in accordance with definite prescribed rules. Main task of any kind of interlaboratory comparisons is establishing a reference value of measured quantity xref that characterizes a largest subset of consistent (reliable) measurement results, i.e. so called largest consistent subset (LCS) [3]. For this aim, participating in comparisons laboratories estimates the same nominal value xnom of the measured quantity. Laboratories having unreliable measurement results do not participate in establishing the final reference value. It should be noticed that, in contrast to proficiency testing, an official procedure of key comparisons (KCs) of the MRA [1] does not allow to discard any of the participant results, even though a result looks like unreliable or outlying. In this paper we will adhere to a hypothetical position that the two types of ICs can be treated to be similar actions tolerating exclusion of outliers, understanding the resulting reference value can be biased in the sense that some participants were excluded from its computation. There are different approaches to check consistency of laboratory measurement results and to find the reference value xref, see, for example [3]-[7]. The choice of a particular consistency test method depends on a kind of travelling standard, measurement conditions and number of participating laboratories. Widely used methods are statistical ones characterizing IC participant competences to carry out measurements based on, for example, calculation of the difference of laboratory measurement results and assigned by comparison providers, percent differences, percentiles, or ranks [8]. However, these methods usually impose limitations on a feasible IC participating laboratories number. Moreover, statistical methods may evince low discriminating ability, that is the capacity to differ truly unreliable laboratories from laboratories providing results to be trusted. In [4]-[5], a rather widely known so called Procedure A was presented. The procedure uses the weighted mean value y: − − = = = ∑ ∑2 2 1 1 ( )/ ( ), m m i i i i i y x u x u x (1) ABSTRACT An integrated software for experimental testing preference aggregation method for interlaboratory comparison data processing is presented. The data can be obtained by a Monte-Carlo simulation and/or taken from real comparisons. Numerical experimental investigations with the software have shown that, as against traditional techniques of interlaboratory comparison data processing, the preference aggregation method provides a robust comparison reference value to be closer to a nominal value. ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 21 where xi is the nominal value estimate provided by the i-th laboratory; u(xi) are corresponding standard uncertainties; m is the number of IC participating laboratories. The standard uncertainty of value y has the view: 2 2 1 ( ) 1/ ( ) m i i u y u x− = = ∑ . (2) In this procedure the weighted average value y is accepted as the reference value xref only if its consistency with IC participating laboratories data is confirmed in accordance to the criterion χ2. If the consistency test is not satisfied, it is proposed in [3] to use a strategy of successive exclusion of outliers, that is, results which are not consistent with the others by limits of claimed uncertainties. A result is deemed to be inconsistent if |En| > 2, where − = = ± n 2 2 , 1, ..., . ( ) ( ) i i x y E i m u x u y (3) The process of exclusion of one inconsistent result is repeated until a consistency of results by the criterion χ2 is achieved. For LCS obtained in this way, the reference value is determined by (1), where instead of m number of reliable laboratories m' is used. Procedure A can be reasonably applied if measurement results provided by participating laboratories are characterized by a normal probability distribution. That is why there is a need to develop robust methods for interlaboratory comparison data processing that are well-behaved in cases where the law of laboratory measurement results distribution differs from normal or unknown. For example, in paper [9] Nielsen proposed the method, the successful application of which has been described in [10]. The method offers to consider the uncertainty range u(xi) as the rectangular distribution and to deem that each participant gives one vote to each value within its uncertainty range and no votes for values outside this range. This produces a robust algorithm for the reference value xref determination that is insensitive to outliers, i.e. results with an uncertainty considerably lower than those of other participants. This paper is devoted to software implementation of the comparison reference value determination method presented in terms of preference aggregation [11]-[13]. In Section 2 a way is considered to transform uncertainty intervals provided by participating laboratories into rankings of measured quantity values. Then the obtained rankings, constituting an initial preference profile, can serve as input data for determination of consensus ranking by the Kemeny rule that allows to find the reference value of the measurand and to assess an ability of participating laboratories to provide reliable measurement results. In Section 3 specially developed software is discussed to carry out numerical experimental researches of IC methods including Procedure A, the Nielsen algorithm and the proposed preference aggregation method. In Section 4 processing of real comparison data by the preferences aggregation method is presented. 2. IC DATA PROCESSING ON THE BASE OF PREFERENCE AGGREGATION Define the procedure of transformation of uncertainty intervals provided by laboratories into rankings. For this aim, designate an uncertainty interval gained by the i-th laboratory through =( ) [ ( ), ( )]i l i u iu x u x u x . Define A, a range of actual values (RAV), of the measurand for converting uncertainty intervals of m laboratories to rankings. The initial value а1 of A is chosen to be equal to a least lower bound of uncertainty intervals = =1 1min{ ( )| 1, ..., }ia u x i m provided by laboratories. The finite value аn of A is chosen to be equal to a largest upper bound of laboratories uncertainty intervals = =umax{ ( )| 1, ..., }n ia u x i m . Divide A into n – 1 equal intervals (divisions) in such a way that their amount guarantees a necessary and sufficient accuracy of the measurand values representation. Then there will be n values of the measurand A = {а1, а2, …, аn} corresponding to boundaries of the division intervals (marks), see Figure 1. Details on the proper selection of a particular value of n can be found in [14]. Compose a preference profile Λ of m rankings representing the uncertainty intervals of laboratories. Each i-th ranking, i = 1, …, m, is a union of binary relations of strict order and equivalence possessing the following properties at k = 1, …, m and i, j = 1, …, n: а) ai  aj if ai ∈ u(xk) ∧ aj ∉ u(xk); b) ai ~ aj if ai, aj ∈ u(xk) ∨ ai, aj ∉ u(xk); c) ai  aj if ai ∉ u(xk) ∧ aj ∈ u(xk). Then the measurement result indicated by some laboratories is represented by a ranking of the measurand values where one or more equivalent values which belong to the uncertainty interval of the laboratory are more preferable. All other values of A in this ranking are less preferable and equivalent to each other. Thus, each ranking includes a single symbol of strict order  and n – 1 symbols of equivalence ~. To aggregate the m ranking means to determine a single preference relation β ensuring a best compromise between them. Such a ranking β is called consensus ranking. In the authors’ works [12], [15], [16] it was shown that the Kemeny median can be used in the capacity of consensus ranking. One of the possible algorithms is based on the branch and bound technique and described in [12]. As soon as a consensus ranking β is found, a value ranked first in it can be selected as the reference value xref of the measurand. The LCS consists of laboratories whose uncertainty intervals include the revealed reference value xref. Laboratories that do not contain the reference value are ignored when forming the largest consistent subset. A standard uncertainty of the obtained reference value for the LCS is defined as the smallest of the two values, i.e. from the maximum lower bound ≤ ref( )l iu x x and the minimum upper bound ≥u ref( )iu x x of the uncertainty intervals of laboratories. Figure 1. An example of shaping a range of actual values A. 11.6 a2 11.8 11.4 12.0 12.2 12.4 a1 a3 a4 a5 a6 Range of Actual Values (RAV) = A RAV division RAV mark 12.6 12.8 a7 a8 Initial value Finite value ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 22 3. EXPERIMENTAL INVESTIGATIONS OF IC DATA PROCESSING METHODS To investigate experimentally the proposed method for IC data processing on the base of preference aggregation special software was developed called INTERLABCOM in the environment Microsoft Visual С#. The software has a user- friendly interface and, in its current version, implements the following three IC data processing methods: the proposed preference aggregation method (PAM), Procedure A and the Nielsen algorithm. Measurement results provided by laboratories can be real and/or simulated by means of a program pseudo-random numbers generator that provides an opportunity to realize various modifications of the Monte-Carlo method when conducting numerical computing experiments. There is a possibility to choose a uniform or a normal distributions of generated measurement results. Uniformly distributed comparison data xi and u(xi) can be generated at a given value хnom using the standard library function rand(). Normally distributed data of comparison results are obtained from uniformly distributed data using the well-known Box–Muller transform [17]. When preparing an experiment, in a special window, one can preset a nominal measurand value xnom, the number of participating laboratories m, and the number of the measurand values n. By pushing the button "Generation" the generated measurement result xi and its uncertainty u(xi) are displayed on a monitor screen. The uncertainty u(xi) is represented as the couple of upper and lower bounds. A graph of the initial generated IC data is indicated in a special window (Figure 2). Uncertainty intervals are shown in a two-dimensional graph with dimensions “Measurand” (vertical axis) and “Laboratories” (horizontal axis). The software allows to indicate IC data processing of each method in a separate window including a table with initial comparison data (measurand values and corresponding uncertainty intervals), a graph of comparison processed data and conclusion on consistency of each participating laboratory results. All the IC data processing results by means of different methods are reduced to a summary table and graph. An inconsistent result is labelled by a special mark and the corresponding data are removed from the processed set. The graph and final data of comparison can be saved in Microsoft Excel format for further processing. In order to demonstrate the developed software tool operation, some IC measurement data for 7 participating laboratories are shown in Figure 3. In this case the RAV with lower and upper bounds 11.43 and 12.73 is divided into 5 equal divisions, bounds of which define 6 values a of the measurand. The appropriate preference profile Λ, constructed as described in Section 2, has the following view: λ1: a2 ~ a3  a1 ~ a4 ~ a5 ~ a6 λ2: a2 ~ a3  a1 ~ a4 ~ a5 ~ a6 λ3: a3 ~ a4 ~ a5 ~ a6  a1 ~ a2 λ4: a2 ~ a3  a1 ~ a4 ~ a5 ~ a6 λ5: a3 ~ a4 ~ a5  a1 ~ a2 ~ a6 λ6: a2 ~ a3 ~ a4  a1 ~ a5 ~ a6 λ7: a1 ~ a2  a3 ~ a4 ~ a5 ~ a6 For this profile two optimal consensus rankings exist: а3  a2  a4  a5  a6  a1 a3  a2  a4  a5  a1  a6, from where the final consensus ranking is: β = {a3  a2  a4  a5  a6 ~ a1}, where the first position is occupied by the value a3 = 11.95. This value is accepted as the measurand reference value xref. Our hypothesis consists in that, as ordinal data are used in the PAM, a reference value obtained by means of this method should not significantly depend on the particular probability distribution law of measurement results. For experimental investigations of this hypothesis normally distributed data for 100 individual problems have been generated that were distinguished from each other by random uncertainty intervals; the laboratories number m = 15; хnom = 3. These data were processed by PAM, Procedure A and the Nielsen algorithm. The same steps under similar conditions were undertaken for uniformly distributed generated data. In Table 1 and Table 2 the numerical experimental investigation results of PAM as compared with Procedure A and the Nielsen algorithm are reduced. The fact that the program model allows to assign and know a nominal value beforehand, gives a possibility to assess a quality of method M intended for IC data processing by means of calculation of the difference ξ = −ref nom( ) .x M x (4) Thus, Table 1 includes xref and ξ for each individual problem solved by each of the three methods obtained for normal distribution and Table 2 includes the values acquired for uniform distribution. Figure 3. Example of IC measurement results. Figure 2. One of the software user interface windows. 11.43 11.69 11.95 12.21 12.47 12.73 λ1 λ2 λ3 λ4 λ5 λ5 λ7 M ea su ra n d , a. u . Laboratories https://en.wikipedia.org/wiki/Uniform_distribution_(continuous) https://slovari.yandex.ru/hypothesis/en-ru ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 23 The experimental data were used to plot curves illustrating how values ξ are changed from problem to problem for each comparison method. Values ξ were taken for every 100 individual problems and organized in ascending order. Figure 4 shows the graph of deviations ξ obtained by the proposed PAM compared to Procedure A for uniform (U) and normal (N) distributions of comparison data. It should be noticed that Procedure A is not intended to be applied for data distributed by laws other than normal. Therefore, the experimental results obtained under the uniform law are given here in order to demonstrate the non- robust method behaviour compared to the robust ones over the same data. One can see in Figure 4 that a particular kind of measured results probability distribution practically does not influence the PAM (curves 3 and 4) performance. It means that the PAM is a robust procedure. Over the same data, the Procedure A (curves 1 and 2) has shown a considerable increase of ξ when passing from normally to uniformly distributed measurements. Figure 5 represents a graph of deviations ξ obtained by the Table 2. A fragment comparison generated data procession results for xnom = 3.0 a.u.; uniform distribution. Problem number PAM Procedure A Nielsen algorithm xref ξ xref ξ xref ξ 1 3.01 0.01 2.92 0.08 2.95 0.05 2 2.97 0.03 2.92 0.08 2.95 0.05 3 3.12 0.03 2.43 0.57 3.25 0.25 4 3.04 0.04 2.69 0.31 2.67 0.33 5 2.98 0.02 2.65 0.35 2.46 0.54 6 2.98 0.02 2.16 0.84 2.86 0.14 7 2.89 0.11 2.54 0.46 2.86 0.14 8 2.81 0.19 2.57 0.43 2.54 0.46 9 2.91 0.09 2.49 0.51 2.74 0.26 10 3.10 0.10 3.00 0.00 2.71 0.29 11 2.96 0.04 2.62 0.38 3.20 0.20 12 3.04 0.04 2.97 0.03 3.37 0.37 13 3.14 0.14 2.69 0.31 2.73 0.27 14 2.98 0.02 2.90 0.10 3.00 0.00 15 2.90 0.10 2.54 0.46 3.06 0.06 … 86 2.99 0.01 3.01 0.01 2.95 0.05 87 2.84 0.17 2.66 0.34 2.90 0.10 88 3.03 0.03 2.88 0.12 2.85 0.15 89 2.94 0.06 2.77 0.23 2.85 0.15 90 2.86 0.14 2.40 0.60 3.09 0.09 91 2.98 0.02 2.90 0.10 2.79 0.21 92 3.11 0.11 2.38 0.62 3.27 0.27 93 2.97 0.03 2.88 0.12 2.75 0.25 94 2.73 0.27 1.90 1.10 2.69 0.31 95 3.00 0.00 2.49 0.51 3.11 0.11 96 2.96 0.04 2.95 0.05 3.11 0.11 97 2.97 0.03 2.90 0.10 3.12 0.12 98 2.95 0.05 2.62 0.38 3.12 0.12 99 2.98 0.02 2.85 0.15 2.89 0.11 100 3.08 0.08 3.01 0.01 2.93 0.07 Table 1. A fragment comparison generated data procession results for xnom = 3.0 arbitrary units (a.u.); normal distribution. Problem number PAM Procedure A Nielsen algorithm xref ξ xref ξ xref ξ 1 2.97 0.03 2.92 0.08 2.95 0.05 2 2.91 0.09 2.90 0.10 2.93 0.07 3 2.95 0.05 2.91 0.09 2.91 0.09 4 2.98 0.02 2.98 0.02 2.90 0.12 5 3.05 0.05 2.90 0.10 2.96 0.04 6 2.89 0.11 2.89 0.11 2.86 0.14 7 2.98 0.02 3.00 0.00 2.79 0.21 8 2.93 0.07 2.98 0.02 3.10 0.10 9 2.98 0.02 2.86 0.14 2.91 0.09 10 2.97 0.03 2.97 0.03 2.68 0.32 11 2.98 0.02 2.95 0.05 3.02 0.02 12 2.92 0.08 2.99 0.01 2.85 0.15 13 2.99 0.01 2.97 0.03 2.92 0.08 14 2.96 0.04 2.99 0.01 2.92 0.08 15 2.93 0.07 2.99 0.01 2.99 0.01 … 86 3.03 0.03 2.90 0.11 2.94 0.06 87 2.99 0.01 2.97 0.03 2.85 0.15 88 2.94 0.06 2.97 0.03 2.83 0.17 89 2.98 0.02 2.94 0.06 2,74 0.26 90 2.91 0.09 2.94 0.06 2.88 0.12 91 2.93 0.07 2.97 0.03 2.92 0.08 92 2.98 0.02 2.90 0.10 2.93 0.07 93 2.97 0.03 2.81 0.19 2.70 0.30 94 2.99 0.01 2.99 0.01 2.94 0.06 95 2.99 0.01 2.78 0.22 2.87 0.13 96 2.96 0.04 2.99 0.01 2.83 0.17 97 2.98 0.02 2.97 0.03 3.05 0.05 98 2.97 0.03 2.84 0.16 2.93 0.07 99 2.99 0.01 2.91 0.09 3.11 0.11 100 3.01 0.01 3.01 0.01 2.85 0.15 Figure 4. Deviations ξ obtained by PAM and Procedure A for uniform (U) and normal (N) distributions of comparison data. Figure 5. Deviations ξ obtained by PAM and Nielsen algorithm for uniform (U) and normal (N) distributions of comparison data. 0 20 40 60 80 100 ξ, a .u . Individual problems 1 2 3 4 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0 20 40 60 80 100 ξ, a .u . Individual problems 1 2 3 4 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1 – Nielsen algorithm (U) 2 – Nielsen algorithm (N) 3 – PАM (U) 4 – PАM (N) 1 – Procedure А (U) 2 – Procedure А (N) 3 – PАM (U) 4 – PАM (N) ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 24 proposed PAM compared to the Nielsen algorithm for uniform (U) and normal (N) distributions of comparison data. It can be seen from Figure 5 that the PAM provides estimates of xref closer to the nominal value xnom than the Nielsen algorithm. At the same time, the latter method (curves 1 and 2) shows a discrepancy between normally and uniformly distributed data of about 0.18 which is more than twice bigger than PAM with a discrepancy 0.08. 4. REAL COMPARISONS DATA PROCESSING BY THE METHOD OF PREFERENCES AGGREGATION Let us demonstrate the applicability of the PAM to real world examples of comparison data taken from open sources [10], [18]. 4.1. The key comparison on high frequency power Participating national metrology institutes (NMI) in key comparison (KC) CIPM CCEM.RF-K25.W [18] determined the effective efficiency and the calibration factor reference of two waveguide thermistor power sensors in the frequency range from 33 to 50 GHz. The effective efficiency of the travelling standard was determined by formula: DC,sub eff RF,abs P P η = , (5) where PDC,sub is the substituted DC power and PRF,abs is the total absorbed RF power. Participants of the comparisons calculated also the calibration factor ηcal according to equation: η = −Γ η2cal eff(1 ) (6) where Г is the input reflection coefficient of the travelling standard which was measured as a complex quantity stated as magnitude and phase at the measuring frequencies. The median absolute deviation was used to identify an outlier: 1 med( MAD) median{[ ]},iS kσ ≈ ≡ η −η (7) where k1 is a multiplier determined by simulation; ηmed is the median value of measurement results {η}. The value of ηi which differed from the median by more than 2.5·S(MAD) has been regarded as an outlier. It has been excluded from the calculation of the reference value. This criterion was used to check each measurement result: η −η > ⋅med 2.5 ( MAD)i S . (8) The reference value of KC was determined in accordance with section 8 of the technical report [18] on the basis of the unweighted mean value: ' eff,ref eff, 1 1 . ' m i im = η = η∑ (9) The standard uncertainties were calculated: = η = η∑ ' 2 eff,ref ff, 1 1 ( ) ( ). ' m e i i u u m (10) KC data treatment in accordance with CCEM.RF-K25.W. The results of the comparison on effective efficiency at 36 GHz are reduced to Table 3. The comparison reference value ηeff,ref = 0.9161 was determined for the effective efficiency ηeff with the uncertainty u(ηeff,ref) = 0.0027. NIM, MNIA and NRC have not participated in the reference value determination as NIM and NRC measurement results were considered to be outliers in accordance with criterion (8). The result of MNIA was proved to be traceable to the results of other participants. A graphic illustration of the comparison results and the reference value are shown in Figure 6. The results of the comparison on the calibration factor are reduced to Table 4. The reference value for the calibration factor ηcal,ref = 0.7942 was determined with the uncertainty u(ηcal,ref) = 0.0024 (Figure 7). Results of VNIIFTRI and NRC were recognized as outliers. Result of NMIA turned out to be traceable to the results of other participants. KC data treatment by the PAM. Data of Table 3 were processed using PAM at n = 8. Then the RAV was divided into n – 1 = 7 equal divisions. Bounds of divisions corresponded to eight values a of the measurand: a1 = 0.8288, a2 = 0.8462, a3 = 0.8636, a4 = 0.8809, a5 = 0.8983, a6 = 0.9157, a7 = 0.9331, and a8 = 0.9505. The preference profile consisted of nine rankings describing the uncertainty intervals Figure 6. Uncertainty intervals of effective efficiency value provided by NMIs. Table 4. Key comparisons data on the calibration factor. m NMI Calibration factor, ηcal ηcal, i u(ηcal, i) 1 PTB 0.7954 0.0036 2 NPL 0.7937 0.0067 3 NIST 0.7976 0.0070 4 LNE 0.7914 0.0046 5 KRISS 0.7935 0.0079 6 NIM 0.7936 0.0031 7 VNIIFTRI 0.7820 0.0105 8 MNIA 0.7972 0.0073 9 NRC 0.8140 0.0130 Table 3. Key comparison data on effective efficiency. m NMI Effective efficiency, ηeff ηeff, i u(ηeff, i) 1 PTB 0.9153 0.0031 2 NPL 0.9167 0.0060 3 NIST 0.9184 0.0064 4 LNE 0.9157 0.0018 5 KRISS 0.9143 0.0104 6 VNIIFTRI 0.9160 0.0079 7 NIM 0.8360 0.0072 8 MNIA 0.9174 0.0071 9 NRC 0.9375 0.0130 0.8200 0.8400 0.8600 0.8800 0.9000 0.9200 0.9400 0.9600 1 2 3 4 5 6 7 8 9 η e ff National metrology institute 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600 0.8400 0.8200 ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 25 of the appropriate NMIs: λ1: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ2: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ3: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ4: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ5: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ6: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ7: a1  a2 ~ a3 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ8: a6  a1 ~ a2 ~ a4 ~ a5 ~ a6 ~ a7 ~ a8 λ9: a7 ~ a8  a1 ~ a2 ~ a3 ~ a4 ~ a5 ~ a6. The final consensus ranking was determined as βfin = {a6  a1 ~ a7 ~ a8  a2 ~ a3 ~ a4 ~ a5}. The comparison reference value of a6 = ηeff,ref = 0.9157 was obtained with uncertainty u(ηeff,ref) = 0.0018. The cardinality of LCS was m' = 7 as measurement results of NIM and NRC were recognized to be outliers because of not containing the obtained reference value (Figure 8). Data of Table 5 were also processed, using the PAM, at n = 6. The RAV was divided into five equal divisions. Bounds of the intervals corresponded to six values of the measurand: a1 = 0.7715, a2 = 0.7826, a3 = 0.7937, a4 = 0.8048, a5 = 0.8159, a6 = 0.8270 (Figure 9). The preference profile was shaped of 9 rankings: λ1: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ2: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ3: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ4: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ5: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ6: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ7: a1 ~ a2  a3 ~ a4 ~ a5 ~ a6 λ8: a3  a1 ~ a2 ~ a4 ~ a5 ~ a6 λ9: a4 ~ a5  a1 ~ a2 ~ a3 ~ a6. The final consensus ranking was determined as βfin = {a3  a2 ~ a4 ~ a5 ~ a6  a1}. The reference value of comparison ηcal,ref = 0.7937 was obtained with uncertainty u(ηcal,ref) = 0.0019. The cardinality of LCS was m' = 7 (Figure 9). Measurement results of VNIIFTRI and NRC were recognized to be outliers. 4.2. Interlaboratory power comparison in the microwave region In [5], results of interlaboratory power comparisons in the microwave region (50 MHz–26.5 GHz) on the project SIT.AF- 01 were reviewed. They were organized by the INRiM (Istituto Nazionale di Ricerca Metrologica, Italy) in Turin. A Hewlett Packard power meter model 438A as a travelling standard has been sent to 12 laboratories. The comparison aim was to confirm the claimed uncertainties of laboratories accredited in the national system of accreditation in the field of microwave measurements. Table 5 and Figure 10 show one of the series of comparison data of the power sensor calibration factor K measurements at a frequency of 1 GHz. To process the comparison data the Nielsen algorithm (see Section 1) was used. According to the data analysis outcomes, Figure 7. Uncertainty intervals of calibration factor value provided by NMIs. Figure 8. Uncertainty intervals of effective efficiency value provided by NMIs and reference value obtained by the PAM. Table 5. Comparisons data on power sensor calibration factor K at 1 GHz. Laboratories xi u(xi) 1 0.985 0.013 2 0.989 0.008 3 0.982 0.013 4 0.982 0.035 5 0.984 0.014 6 0.980 0.028 7 0.981 0.017 8 0.990 0.021 9 0.982 0.011 10 0.989 0.017 11 1.017 0.014 12 0.987 0.019 Figure 10. Uncertainty intervals of calibration factor K value provided by participating laboratories and corresponding reference value. Figure 9. Uncertainty intervals of the calibration factor value provided by NMIs and reference value obtained by the PAM. 0.7700 0.7800 0.7900 0.8000 0.8100 0.8200 0.8300 1 2 3 4 5 6 7 8 9 η c al National metrology institute 0.8300 0.8200 0.8100 0.8000 0.7900 0.7800 0.8288 0.8462 0.8636 0.8809 0.8983 0.9157 0.9331 0.9505 1 2 3 4 5 6 7 8 9 η e ff National metrology institute 0.9505 0.9331 0.9157 0.8983 0.8809 0.8636 0.8462 0.8288 0.925 0.945 0.965 0.985 1.005 1.025 1 2 3 4 5 6 7 8 9 10 11 12 Ca lib ra ti on fa ct or K Laboratories 1.025 1.005 0.985 0.965 0.945 0.7715 0.7826 0.7937 0.8048 0.8159 0.8270 1 2 3 4 5 6 7 8 9 η c al National metrology institute 0.8270 0.8159 0.8048 0.7937 0.7826 0.7715 ACTA IMEKO | www.imeko.org April 2017 | Volume 6 | Number 1 | 26 the LCS, formed as a result of Nielsen's algorithm processing, included the eleven laboratories. Laboratory 11 was excluded, because its result, in accordance with the algorithm conditions, was deemed to be unreliable. The reference value was obtained as xref = 0.985 in correspondence with the greatest number of laboratory "votes". The data of Table 5 were processed using the PAM at n = 5. The RAV was divided into n – 1 = 4 equal divisions. The bounds of intervals corresponded to five values a of the measurand: a1 = 0.947, a2 = 0.968, a3 = 0.989, a4 = 1.009, a5 = 1.030 (Figure 11). The corresponding preference profile was as follows: λ1: a3  a1 ~ a2 ~ a4 ~ a5 λ2: a3  a1 ~ a2 ~ a4 ~ a5 λ3: a3  a1 ~ a2 ~ a4 ~ a5 λ4: a1 ~ a2 ~ a3 ~ a4  a5 λ5: a3  a1 ~ a2 ~ a4 ~ a5 λ6: a2 ~ a3  a1 ~ a4 ~ a5 λ7: a2 ~ a3  a1 ~ a4 ~ a5 λ8: a3 ~ a4  a1 ~ a2 ~ a5 λ9: a3  a1 ~ a2 ~ a4 ~ a5 λ10: a3  a1 ~ a2 ~ a4 ~ a5 λ11: a4 ~ a5  a1 ~ a2 ~ a3 λ12: a2 ~ a3  a1 ~ a4 ~ a5. The final consensus ranking was βfin = {a3  a2  a4  a1 ~ a5}. The value a3 was chosen as the reference value xref = 0.989 with a corresponding uncertainty u(xref) = 0.004. The LCS formed by the PAM included 11 laboratories, just as in the project SIT.AF-01 (Figure 11). 5. CONCLUSION A method, called preference aggregation method (PAM) has been described, aimed to process IC data. The PAM is based on transformation of uncertainty intervals provided by participating laboratories into rankings of measured quantity values. For a preference profile composed in this way, a consensus ranking is determined by the Kemeny rule that allows to find the reference value of a measurand. The operation of this method was demonstrated. A software tool has been considered that is intended for experimental investigations of the proposed method and other methods of processing generated normally and uniformly distributed IC data. Numerical experiments, carried out with its help, have shown that the PAM is indeed a robust procedure that does not depend on the probability distribution of the measurement results. It also follows from numerical experiments that the PAM provides an estimate of a reference value being closer to the nominal value than the other robust method (the Nielsen algorithm) with half the discrepancy between normally and uniformly distributed comparison data. The PAM performance was experimentally verified on real comparison results. In all cases, the reference value and associated uncertainty, determined by the proposed method, were very close to the outcomes obtained by the comparison coordinators. ACKNOWLEDGEMENT This work was supported in part by the Ministry of Education and Science of Russian Federation, basic part of the state task in 2014-2016, project 2078 and in 2017-2019, project 4.1763.GZB.2017. The authors would like to thank the anonymous referee for helpful comments. REFERENCES [1] CIPM MRA-D-05. Measurement comparisons in the CIPM MRA, Version 1.5, p. 28. [2] ISO/IEC 17043 (2010) Conformity assessment – General requirements for proficiency testing. International Organization for Standardisation, Geneva, Switzerland. [3] M.G. Cox, The evaluation of key comparison data: determining the largest consistent subset, Metrologia 44 (2007) pp. 187-200. [4] M.G. Cox, The evaluation of key comparison data, Metrologia 39 (2002) pp. 589-595. [5] N.Yu. Efremova, A.G. Chunovkina, Experience in evaluating the data of interlaboratory comparisons for calibration and verification laboratories, Meas. Tech. 50(6) (2007) pp. 584-592. [6] C. Elster, B. Toman, Analysis of key comparisons data: critical assessment of elements of current practice with suggested improvements, Metrologia 50 (2013) pp. 549-555. [7] I. Lira, A.G. Chunovkina, C. Elster, W. Woeger, Analysis of key comparisons incorporating knowledge about bias, IEEE Trans. Instrum. Meas. 61(8) (2012) pp. 2079-2084. [8] ISO 13528 (2005) Statistical methods for use in proficiency testing by interlaboratory comparisons. International Organization for Standardisation, Geneva, Switzerland. [9] H.S. Nielsen, Determining consensus values in interlaboratory comparisons and proficiency testing, NCSLI Newsletter 44(2) (2004) pp. 12-15. [10] L. Brunetti, L. Oberto, M. Sellone, P. Terzi, Establishing reference value in high frequency power comparisons, Measurement 42 (2009), pp. 318-1323. [11] S.V. Muravyov, I.A. Marinushkina, “Largest consistent subsets in interlaboratory comparisons: preference aggregation approach”, Proc. of 14th Joint International IMEKO TC1, TC7, TC13 Symposium, Aug. 31-Sept. 2, 2011, Jena, Germany, pp. 69-73. [12] S.V. Muravyov, Ordinal measurement, preference aggregation and interlaboratory comparisons, Measurement 46(8) (2013) pp. 2927-2935. [13] S.V. Murav'ev, Aggregation of preferences as a method of solving problems in metrology and measurement technique, Meas. Tech. 57(2) (2014) pp. 132-138. [14] S.V. Muravyov, I.A. Marinushkina, Processing of interlaboratory comparison data by preference aggregation method, Meas. Tech. 58(12) (2016) pp. 1285-1291. [15] S.V. Muravyov, I.A. Marinushkina, Intransitivity in multiple solutions of Kemeny Ranking Problem, J. Phys. Conf. Ser. 459(1) (2013) 012006. [16] S.V. Muravyov, Dealing with chaotic results of Kemeny ranking determination, Measurement 51 (2014) pp. 328-334. [17] G.E.P. Box, M.E. Muller, A note on the generation of random normal deviates, Ann. Math. Stat. 29(2) (1958) pp. 610-611. [18] R. Judaschke, Final report of the pilot laboratory CCEM Key Comparison CCEM.RF-K25.W RF power from 33 GHz to 50 GHz in waveguide, Physikalisch-Technische Bundesanstalt, Germany, 2014. Figure 11. Uncertainty intervals provided by participating laboratories and corresponding reference value obtained by the PAM. 0.926 0.947 0.968 0.989 1.009 1.030 1 2 3 4 5 6 7 8 9 10 11 12 Ca lib ra ti on fa ct or K Laboratories 1.030 1.009 0.989 0.968 0.947 0.926 Numerical experimental investigation of comparison data evaluation method using preference aggregation