Upsala J Med Sci 95: 233-244, 1990 Medical Need for Quality Specifications within Laboratory Medicine Ronald H. Laessig S t a t e L a b o r a t o r y of H y g i e n e , University of Wisconsin C e n t e r for Health S c i e n c e s , M a d i s o n , Wisconsin, U S A MOTTO: MANAGEMENT IS DOING THE THING RIGHT; LEADERSHIP IS DOING THE RIGHT THING. INTRODUCTION The continuous and consistent improvement in the quality of laboratory results is well documented. Within the past decade, the quality of laboratory results, i.e. reduced intralaboratory coefficients of variation (CV) and bias values of essentially zero for many tests have exceeded the requirements of clinicians who utilize the laboratory results to treat patients. Heretofore, laboratorians have concentrated on improving quality of test results; in the future we must concentrate on focusing limited resources where they will be most cost effective. As numerous studies have documented, clinicians and laboratori- ans have considerable difficulty when they attempt to translate laboratory attributes of quality, CV, Bias, Total Error, etc., into clinically meaningful criteria, i.e. changes in test parameters which will evoke a change in treatment regimens. The various studies of Itmedical usefulness" to date have resulted in a steadily improved mutual understanding of the problems but no truly definitive answers. Laboratorians have recently resorted to speaking of clinicians as their llcustomersll. This popular concept is based in the industrial quality assurance doctrines of Demming, Juran and others. The concept of the physician as a customer implies not only the requirement on the part of the supplier (laboratory) to satisfy the perceived needs of the shooper, but also the idea that the clinician is free to seek the most appropriate service from the most appropriate (or convenient) source. 233 It is most timely for the Board of NORDCHEM to sponsor a seminar and study to assess the need for Quality Specifications which relate to Medical Usefulness (i.e. clinical needs). We as laboratorians have, for over forty years, and with good reason, focused our efforts reducing the total error in the analysis processes. In view of the obvious benefits to be obtained by reducing error (bias) and improving precision (reducing CV) this waa indeed Ildoing the thing right". The very existence of this Nordic Seminar on Quality Specifications indicates that it is time for laboratorians to assume the leadership role, that is Ifdo the right thing". In today's health care environment, particular- ly with respect to diminished resources in all areas, including laboratories, doing the risht thinq will require us to allocate resources, including quality improvement and test development resources, to those areas of greates potential benefit to patient and clinician. As today's seminar clearly demonstrates, this is not a singular activity of laboratorians, or clinicians; it must be a joint effort of both. The effort must be initially channeled into finding the risht thing to be done. This may include developing new tests; increasing accuracy and precision; improving test quality as characterized by turnaround time, sampling techniques or interpretive reporting or any other attributes of Ittotal quality". This is leadership; but more basically, this is communications. Once new test parameters are defined, once error specifications, based in medical needs are established, laborato- rians can manage the process. We can be counted up on to Itdo the thing right". This has always been our forte; it will continue to be. SETTING QUALITY SPECIFICATIONS FOR LABORATORY TESTS I would like to share some of our current research (with Sharon S. Ehrmeyer) with the group. We have, in the United States, inadvertently taken a new and interesting approach to setting quality specifications. This has come about because our federal government has determined that proficiency testing (PT) will become the major criteria by which laboratories are licensed by the federal government. PT is the process whereby an agency of 234 government sends unknown specimens to laboratories who analyze them and report back their results. If the laboratory gets the right answers, it passes, if not it fails. A laboratory must pass proficiency testing if it is to be reimbursed for tests it performs. The governmental regulations have been proposed in final form as of March 14, 1990. These regulations are very relevant to the theme of this NORDKEM conference in that they are in effect minimum intralaboratory performance standards for US laboratories. We suspect that these standards are not based on medical need, or laboratory capabilities but rather an intuitive approach to achievable levels of quality. However, under the regulatory approach which is utilized, it is a logical assumption that a laboratory will seek to improve its performance to a level commensurate with consistently passing proficiency testing. These criteria then, in effect, have become US national minimum performance standards for clinical laborato- ries. MATERIAL AND METHODS We have demonstrated previously that it is possible to determine the minimum intralaboratory performance levels necessary to meet the requirements specified in the Federal rules for PT programs. Under the plausible and tractable assumption of Gaussian imprecision, these relationships can be established by computer modeling using a Monte Carlo simulation approach, or by direct calculation based on statistical analysis. In both approaches a laboratory's internal performance characteristics, that is, its unique imprecision (expressed as the standard deviation [SD] or coefficient of variation [CV]) and its bias, i.e., its offset from the target value, must be taken into account when determining the probability of "passing" one or a series of PT challenges. other factors such as clerical errors, shipping problems, matrix effects, grading errors, etc., which can amount to 5 0 % of the apparent causes of PT failures should not be neglected, but a laboratory's analytical prowess is fundamental. We computed the relationship between a laboratory's internal CV and/or bias for a given analyte, and the recently published 235 Federal interlaboratory PT criteria. Each analyte has its own PT criterion; for the routine chemistry subspecialty, the criteria are identified in Table 1. By standardizing bias and SD as a percentage of these criteria, in effect obtaining "Z scores", we reduce each analyte to one generic case for analysis. From given (standardized) CV and bias values we induce a probability of producing a laboratory result outside the (standardized) PT criterion. This determines the probability of passing one PT event under the new five sample per shipment format, modeled as a Bernoulli trial. TABLE 1 Analyte or Test (routine chemistry) Enzmes ALT/SGPT ALP Amylase AST/SGOT CK CK isoenzymes LDH LDH isoenzymes Blood qas PO2 PCO, PH General 1 Albumin ' Bilirubin, total ' Calcium, total Cholestrol, total Cholestrol, HDL Creatinine Glucose Iron, total Total Protein Triglycerides BUN Urea Uric acid Electrolvtes Chloride Magnesium Potassium Sodium ~~ ~ Criteria for acceptable performance (target value + ) Old units SI units 20% 3 SD 3 SD 20% 3 SD 3 S D or MB elevated (+ or - ) 20% 3 SD or LDHl/LDH2 (+ or -) 3 SD 5 nun Hg or 8% 0.04 1 0 % 0 . 3 mg/dL or 2 0 % 5 pmol/L 1.0 mg/dL 0.25 mmol/L 15% 3 SD 0 . 3 mg/dL or 15% 23 pmol/L 6 mg/dL or 10% 0 . 3 3 mmol/L 20% 10% 3 SD 2 mq/dL or 9 % 4 3 ig/L or 9 % 17% 0.71 mmol/L 5% 25% 0.5 mmol/L 4 mmOl/L 236 The new regulations specify that "passing performance" in a PT event (one quarterly shipment) requires that four out of the five results for each analyte (80% of the results) must fall within the defined acceptable range (target value k PT limit). The target value can be the mean of the results from a group of participants using the same method or instrument once outliers have been removed, or can be established by definitive or reference methods. Alternatively, the target value can be the mean of results from 80% of 10 or more referee laboratories. From Table 1, when "grading" PT results, the acceptable range for glucose is the target value ? the performance criterion [ 6 mg/dL ( 0 . 3 3 mmol/L) or lo%] yielding the greater range. In the case of a PT specimen with a target value of 100 mg/dL ( 5 . 5 6 mmol/L) , acceptable performance would be between 90 ( 5 . 0 ) and 110 mg/dL (6.11 mmol/L). If the target value were 5 0 mg/dL (2.78 mmol/L), the acceptable range would be 4 4 ( 2 . 4 4 ) to 5 6 mg/dL (3.11 mmol/L) . FAILURE TO PASS PT - SINGLE ANALYTE If a laboratory does not achieve at least 80% acceptable performance on any given analyte ( 4 or 5 correct results) for a PT event, the laboratory is, in effect, put on probation for the entire subspecialty in which that particular analyte is listed. To actually "fail" PT and be subject to "adverse action" (the term used in the regulations), the laboratory must again fail to achieve acceptable performance for the same analyte on one of the next two PT events. Hence, by failing to achieve acceptable performance for the same analyte for any two of three consecutive events, the laboratory may fail the entire subspecialty and thereby suspend testing in the subspecialty. A curious anomaly is that a laboratory could vffail" different analytes, e.g., glucose for one PT event, uric acid for the next, and CK for the third, etc., indefinitely without being suspended in the subspecialty of routine chemistry. FAILURE TO PASS PT - MULTIPLE ANALYTES In addition to passing the individual analytes, the regulations 237 require a laboratory to achieve an 80% correct response rate over all analytes in a particular subspecialty. To be subject to adverse action, a laboratory must have less than 80% of all results correct for any two of three consecutive PT events. It is obvious that to fail this, a laboratory must also fail at least one analyte, i.e., have 2 or more incorrect out of 5 results. Intralaboratorv Performance required to pass one PT event for one analvte, zero bias. The right curve in figure 1 shows the probability of failure, i.e., achieving less than 80% correct for a laboratory analyzing only one analyte in any one PT event. The x axis is in units of internal CV or SD as a percent of the PT limit. For example, if the PT limit is lo%, as for glucose (Table l), 100 on the x axis denotes a laboratory with an internal CV of 10%. Under these circumstances, this laboratory has a 51% probability (y axis) of "failing" the analyte glucose in any one PT event in which it analyzes 5 PT samples. The right curve in figure 1 is based on the assumption that the laboratory has zero bias, i.e., any deviation from the target value is due only to the laboratory's internal imprecision. Similarly, for analytes whose performance criteria (Table 1) are defined as multiples of the group SD, i-e., 3 group SD for alkaline phosphatase, the 100% point on the graph is equivalent to a laboratory whose internal SD is equal to the entire performance criteria, or 3 group SD. Further, if the laboratory's internal CV is 50% of the stated performance limit, the probability of "failing" a single five sample PT event for one analyte drops just below 2%. With an internal CV of 33% or 113 of the PT limit, the laboratory will, in essence, always pass PT. The presence of co-existing bias reduces the "tolerable" CV, as will be shown in subsequent figures. 238 % chance of some PT failures, related to CV one 2 of 5 event, bias=O% of PT limit In OJ Ln 0 0 10 20 30 40 50 60 70 80 90 100 120 140 160 180 200 internal SD or CV as % of PT limit Ficrure 1. Intralaboratorv performance recruired to pass one PT event for multiple analvtes. zero bias. Figure 1 with the rest of the curves shows the effect of analyzing multiple analytes (i.e., glucose, BUN, cholesterol, etc.) on the laboratory's ability to pass PT. In general terms, for any given internal CV, the more analytes tested, the greater the probability that a laboratory will fail one or more analytes. For example, if a laboratory tests two analytes, and its internal CV is 100% of the PT performance criteria for both, the chance of failing at least one analyte increases from 51 to 76%. A laboratory doing 20 analytes, i.e., operating a large, multi- channel instrument (DuPont aca, TM Hitachi , TM SMAC, TM etc. ) with all the analytes' CVs equal to 100% of the acceptable performance criteria, would virtually be assured of failing at least one analyte on every PT event. By reducing all of these internal C V s to 50%, the probability of a failure for the same 20 test laboratory is reduced to 32%. With all C V s below the 33% level, the chances of failure are nearly zero. Obviously, a laboratory's C V s are not consistent across all parameters; some may be 239 considerably less than 3 3 % , and these tests would cause no problems in PT. However, even 2 analytes near 50% of the PT criteria would cause a 4 % probability of a failure, and 2 analytes at the 100% level would portend an 76% chance of a failure. The obvious conclusion is that a laboratory should strive to reduce all its tests' internal CVs to less than 3 3 % of the Table 1 performance criteria; but in particular to con- centrate on reducing the imprecision of any tests whose in- tralaboratory CVs approach 100% of the PT criteria. % chance of some PT failures, related to CV one 2 of 5 event, bias=20% of PT limit 0 10 20 30 40 50 60 70 80 90 100 120 140 160 180 200 internal SD or CV as % o f PT limit Fisure 2. Intralaboratorv Performance reauired to Pass one PT event for Figure 2 shows the effect of coexisting bias (20% of the PT criterion) for relevant levels of imprecision on the likelihood of a laboratory failing a single PT event. Like Figure 1, the family of curves represents respectively (from the right to left) the probabilities of failure for 1, 2, 5 , 10, 20 and 2 7 analytes. non-zero bias. 240 The presence of bias increases the likelihood that a laboratory will fail a PT event. In the case of glucose, where the PT criterion is +lo%, a 20% bias is equivalent to a consistent 2 mg/dL (0,11 mmol/L) error. Comparing figures 1 through 3 indicates the effect of increasing bias on the likelihood of failing a PT event. For one analyte, with biases of O%, 20% and 50% and a consistent coexisting internal CV of 50% of the PT criterion, the probability of failure increases from 2 % to 4 % to 18%. Further, for 20 analytes, the probability of a failure for biases of 0, 20 and 50%, increases from 32% to 51% to over 98%. While a laboratory does not have the same, or for that matter, any bias on every test, it is obvious that the presence of any significant bias seriously impairs the laboratory's ability to pass that analyte in a PT event. A common reason for analytical failures is the introduction of a large bias rather than large imprecision. Due to a pipetting error or a reconstitution problem bias may extend to all analytes and impose a large chance of an 80% failure. Consequently, to pass PT, a laboratory first needs to minimize the amount of bias, and then reduce its internal CV if possible. YO chance of some PT failures, related to CV one 2 of 5 event, bias=50% of PT limit 0 10 20 30 40 50 60 70 80 90 100 120 140 160 180 200 internal SD or CV as % of PT limit Ficrure 3 . 16-908573 24 1 Predicted failure rates as a function of CV and bias combina- , tions. Figure 4 shows the percent probability of failing a PT event for one analyte as a function of both internal CV and bias. The x and y axes depict intralaboratory CV and bias respectively as a percentage of the PT limit. The curves have a negative slope, since the presence of bias reduces the tltolerabletl internal CV consistent with a given percent probability of a laboratory failing a PT event. The rrlnt denotes a 1% probability of failing one PT event. As indicated by the continuous line, all com- binations of CV and bias, falling on the line will yield a 1% chance of failure. Those below (left) of the line have a lesser chance of failure. Likewise, further to the right, subsequent curves denote the probabilities of failure of 3 , 5, etc., percent for increasing values of the CV and bias. Typically laboratories using reasonable care in calibration have small biases. Those laboratories whose bias exceeds 20% of the performance limit and whose CVs are in the range of 30 to 100% of the performance limit need to reduce both. 0 0 7 - 0 - 0) 0 - 03 0 - r. - - I - ._ ._ _E 8 - c - - 0 - o m a 5 0 a ._ z - 0 I $ 0 I $ I 0 F) 7 0 ~~~~ le-05 0.005 0.1 % chance 1 of PT 3 failure 5 7 10 15 20 30 40 I I I I I I I I I I I I I I I I ( I I 242 Minimizins the bias contribution to the probability of failure in txoficiencv testina. Bias does not effect the likelihood of failure to nearly the same extent as CVs of equivalent size. For example if a laborato- ry reduces its CV to 3 3 % of the value in Table 1, for any given analyte, a coexisting bias of up to 40 percent is tolerable. If the bias can be reduced to any value below 20%, it's contribution to the probability of failure to pass PT is almost negligible. Most authors do not deal with the concepts of coexisting bias and imprecision, but rather set bias equal to zero. This is a justifiable assumption at least when dealing with PT data. In point of fact, the traditional function of PT programs has been to reduce bias since even with five samples, PT is ineffective in measuring intralaboratory imprecision. CONCLUSION A laboratory readily can predict the probability of passing PT based on a knowledge of its internal imprecision (CV) and bias. Our overwhelming conclusion for the proposed 2 of 5 (or 80%) PT rules is that a laboratory with small (<20%) bias can reasonably assure its likelihood of passing PT if, for each analyte, it reduces the imprecision of each analyte to one-third of the Federally mandated limit. This "rule of 1/3" when applied to the federal criteria in Table 1, represents performance limits for US laboratories. Since the ability to receive revenues for tests performed depends on passing proficiency testing, these limits, without respect to medical usefulness or significance, will become the goal of laboratory's quality assurance efforts. While one might dispute the logic of setting performance limits in this manner; there is a valuable lesson to be learned at the same time. External (or interlaboratory) quality assurance programs which closely mimic PT programs in concept and format, can translate analytical goals into performance limits. For example, if a decision could be reached that the total error requirement for Glucose tests should be 3 . 3 mg/dl (0.18 mmol/L); a scheme of interlaboratory quality assurance testing could be computer modeled to determine the criteria (in the case of our 243 example +lo% as in Table 1) to be used to assure successful , participants that they met the minimum internal performance requirement. The regulatory approach represents only one use of proficiency testing. Properly managed within laboratories or by concerned professional groups, it can be a powerful quality assurance tool. As the US standards outlined in this report are implemented beginning January 1, 1991, we will have an excellent opportunity as scientists to evaluate one of the tools which can be used for managing (or leading) the process of searching for meaningful performance goals for our clinical laboratories. Correspondence: Professor Ronald H. Laessig, State Laboratory of Hygiene, University of Wisconsin Center for Health Sciences, Madison, Wisconsin, USA. 244