key: cord-352111-frk319q1 authors: woodruff, amelita title: covid-19 follow up testing date: 2020-05-11 journal: j infect doi: 10.1016/j.jinf.2020.05.012 sha: doc_id: 352111 cord_uid: frk319q1 • positive cases of sars-cov-2 were seen in the mayo clinic fl covid virtual clinic. • 70% of patients met cdc guidelines for release from quarantine & still tested (+); • the average time from onset of symptoms to negative testing was 19 days.  positive cases of sars-cov-2 were seen in the mayo clinic fl covid virtual clinic.  70% of patients met cdc guidelines for release from quarantine & still tested (+)  the average time from onset of symptoms to negative testing was 19 days dear editor, there is some uncertainty regarding the incubation period of the sars-cov-2 virus. there is also some uncertainly on the proportion of infected individuals who are asymptomatic carriers, and the timeframe from when a patient is infectious until becoming non-infectious. 1 we provide care for covid-19 patients in the outpatient setting through a virtual clinic. our patients have tested positive via nasopharyngeal swabs and rna detection with rt-pcr. they are followed throughout their illness with visits at intervals based on the severity of their symptoms using telemedicine technology. the cdc has two strategies to determine when a patient with covid-19 can discontinue self-isolation. one is a "test-based" strategy, and the other is a "non-test-based" strategy. the non-test-based strategy recommends that covid-19 patients can discontinue self-isolation when they have been afebrile for 72 hours without anti-pyretic medications, have improvement in respiratory symptoms, and have at least 10 days elapse since symptoms started, recently increased from 7 days. the test-based strategy requires resolution of fever without the use of anti-pyretics, improvement of respiratory symptoms, and two consecutive negative covid-19 nasopharyngeal swabs collected ≥24 hours apart. 2 we decided as part of our covid-19 virtual clinic to use the test-based strategy for all of our patients to better ensure that they were not contributing to the spread of disease. our organization manufactures the test, so we had ample testing supplies and laboratory capacity. as this disease is a reportable condition, these patients were also followed by the respective county health departments. the county health departments were using the test-based strategy only for healthcare workers, or those with essential public service jobs. as of april 17, 2020, we have enrolled 97 patients in our covid virtual clinic. of these, 72 have been tested after being afebrile for at least 72 hours, and had 7 days pass since symptoms started, along with symptom improvement. that is, 72 patients met criteria for the original release from self-isolation with the non-test-based strategy, but were tested using the test-based strategy. of these, twenty-two (30.1%) tested negative upon the first two tests, while the vast majority of patients (69.9%) tested positive at this interval. of the 69.9% who failed, thirty-six (72%) were positive on the first test, while fourteen (28%) had a negative first test but were positive on the second test. in our patient population, the average time from the onset of symptoms to negative testing is 19 days. this data shows that the cdc non-test-based strategy may cause early release from isolation for covid-19 patients and result in additional community transmission. given this, it may be beneficial to prolong the self-isolation time to greater than 14 days after symptom onset. the incubation period of coronavirus disease from publicly reported confirmed cases: estimation and application discontinuation of isolation for persons with covid-19 not in healthcare settings mh&view=epic footnote: 1. the authors do not have a commercial or other association that might pose a conflict of interest (e.g., pharmaceutical stock ownership, consultancy, advisory board membership key: cord-350473-f47i7y5h authors: sen-crowe, brendon; mckenney, mark; elkbuli, adel title: covid-19 laboratory testing issues and capacities as we transition to surveillance testing and contact tracing date: 2020-05-27 journal: am j emerg med doi: 10.1016/j.ajem.2020.05.071 sha: doc_id: 350473 cord_uid: f47i7y5h nan j o u r n a l p r e -p r o o f as of may 19 th , 2020, 11,834,508 covid-19 tests have been performed in the us resulting in 1,523,534 (12.9%) confirmed cases 1 . the actual number of infected americans is much larger. antibody seroprevalence testing in santa clara county, california, estimates those infected between 2.49%-4.16% implying actual infections 50-85-fold larger than confirmed cases 2 . another study concluded that undiagnosed covid cases represent the infection source of 79% of documented cases 3 . accurate testing will be crucial to controlling and understanding this pandemic. estimation relies on testing kit accuracy (sensitivity/specificity). low sensitivity will underestimate disease prevalence, while low specificity will overestimate. 2 testing comes in two broad types, testing for nasopharyngeal viral rna and serologic testing for antibodies, which occur in response to the disease. rna testing is done with polymerase chain reaction (pcr) is cost-effective, easy to perform, and now available 4 . however, the pcr test has accuracy issues. sensitivity of fda-approved viral rna tests range from 63%-95% (table 1) [5] [6] [7] [8] . sensitivity of rna tests is dependent on the site of specimen collection. sensitivity was highest in bronchioalveolar lavage (93%), then sputum (73%), nasal swab (63%), feces (29%) and blood (1%). 5 another study found that patients with pneumonia often have negative nasopharyngeal samples, but positive lower airway samples 9 . the sensitivity of pcr tests have been estimated at 71%, resulting in ~30% of infected patients having a negative finding. another drawback is the presence of viral rna does not mean the virus is live, therefore, detection does not necessarily mean the virus can be transmitted 9 . rna-based tests are limited to the setting of acute illness. saliva-based tests offer promising results as a non-invasive and non-aerosol generating method of specimen collection 10 . compared to nasopharyngeal tests, saliva specimens have high sensitivity (84.2% 10 ) and can be self-administered. 10 one study reported greater sensitivity in saliva samples as compared to nasopharyngeal swabs and less variability. 11 reduced variability in samples taken from self-administered tests is helpful for mass testing because it preserves collection reliability and allows patients to send in their own samples from the comfort of their home. the second type of test is serologic, which detects immunoglobulins (igg and igm) specific for sars-cov-2 and provides an estimation of population virus exposure 4 . one drawback of serologic testing is the lag period between symptoms and antibody formation-one analysis found patients do not begin to seroconvert until 11-12 days post-symptom onset 12 .the sensitivity and specificity of fda-approved serologic tests ranges from 61.1%-98% and 90%-100% 13 . many fda-approved serologic tests have high sensitivity and specificity. for example, cellex inc. developed a rapid diagnostic test with 93.8% sensitivity and 95.6% specificity. bio-rad manufactured an elisa test with sensitivity and specificity of 98% and 99%, respectively (table 1) 13 . there are also clinical associations with confirmed covid-19 patients. an analysis of 119 patients with covid-19 at from wuhan university revealed an association with low urine specific gravity and increased ph 14 . in addition, the urine glucose and proteinuria correlated with severe/critical cases compared to mild/moderate 4 . the results imply that certain urinalysis profiles can be used to predict the severity of disease and possibly testing of asymptomatic patients that could be quarantined until a definitive test can be completed 14 . to address the development of a reliable test, the department of health & human services (hhs) provided funding for the development of simplexa covid-19 direct assay and to qiagen to accelerate development of their rps2 test 15 . additionally, hhs is purchasing the id now covid-19 rapid point-of-care test (abbott diagnostics scarborough inc.) for public health labs (table 1) 16 . the fda is issuing emergency use authorizations to expedite distribution 17 . states have differing amounts of laboratories authorized for testing ( figure 1 ). the targeted distribution of tests to areas of high density (figure 1-black diamonds) is paramount to ensure that resources are not undersupplied. the road back to normalcy is contingent on accurate tests, allowing suppression of spread. when a localized outbreak occurs, it will be important to have reliable testing methods to promptly contain it. random serologic testing can be used to surveil populations at high-risk for an outbreak. pcr tests can be used to assess those with active infection who may be asymptomatic. targeted distribution of tests needs to be to areas where covid is more prevalent and where people are at higher risk. in addition to distribution, the quality of the tests require improvement. many prospective tests in development report promising results in under 60 minutes, such as mammoth bioscience's crispr-based lateral flow assay (sensitivity:90%, specificity:100%) and united biomedical's kit (sensitivity:100%, specificity:100%) (table 1) . 13, 18 in the present era, technology allows diagnostics to be readily available. understanding the current disease state in communities' plays a role in the acceptance of control measures that require individual actions. now is the time to ensure systematic and coordinated efforts between the clinical, commercial and public sectors to leverage the power of testing to address the pandemic at our door. covid-19 map. johns hopkins coronavirus resource center covid-19 antibody seroprevalence diagnostic testing for severe acute respiratory syndrome-related coronavirus-2: a narrative review from mitigation to containment of the covid-19 pandemic: putting the sars-cov-2 genie back in the bottle detection of sars-cov-2 in different types of clinical specimens smart detect sars-cov-2 rrt-pcr kit. inbios covid-19 rt-digital pcr detection kit respiratory sars-cov-2 panel instructions for use (handbook) report from the american society for microbiology covid-19 international summit saliva sample as a non-invasive specimen for the diagnosis of coronavirus disease-2019 (covid-19): a cross-sectional study saliva is more sensitive for sars-cov-2 detection in covid-19 patients than nasopharyngeal swabs the promise and peril of antibody testing for covid-19 serology-based tests for covid-19. johns hopkins -center for health security the value of urine biochemical parameters in the prediction of the severity of coronavirus disease 2020;/j/cclm.ahead-of-print hhs funds development of covid-19 diagnostic tests. u.s. department of health & human services territorial and tribal public health labs with covid-19 rapid point-of-care test covid-19) -laboratory capacity crispr-cas12-based detection of sars-cov-2 key: cord-337462-9mvk86q6 authors: nan title: humanity tested date: 2020-04-08 journal: nat biomed eng doi: 10.1038/s41551-020-0553-6 sha: doc_id: 337462 cord_uid: 9mvk86q6 the world needs mass at-home serological testing for antibodies elicited by sars-cov-2, and rapid and frequent point-of-care testing for the presence of the virus’ rna in selected populations. h ow did we end up here? two ways. gradually, then suddenly. ernest hemingway's passage is a fitting description for humanity's perception of the exponential growth of covid-19 cases and deaths (fig. 1) . the worldwide spread of a highly infectious pathogen was only a matter of time, as long warned by many epidemiologists, public health experts, and influential and prominent voices, such as bill gates. yet most of the world was unprepared for such a pandemic; in fact, most western countries (prominently the united states 1 ) fumbled their response for weeks. singapore, hong kong and taiwan have shown the world that, to contain the propagation of severe acute respiratory syndrome coronavirus 2 (sars-cov-2), governments need to quickly implement aggressive testing (by detecting the viral rna through polymerase chain reaction (pcr)), the isolation of those infected and the tracing and quarantining of their contacts, while educating their citizens about the need for physical distancing and basic public health measures (in particular, frequent hand-washing and staying at home if feeling unwell). when outbreaks are not detected and acted upon sufficiently early, drastic physical distancing -of the sort implemented by china at the end of january and maintained for months -can eventually suppress the outbreak (fig. 1 ). it is however unclear whether western countries that have implemented strict physical-distancing measures later in their infection curve will be able to gradually release such lockdowns, let alone see their outbreaks controlled. such non-pharmacological interventions aim to 'flatten' the infection curve by reducing the number of transmission chains and thus the virus' basic reproduction number -that is, the average number of new cases generated by a case in an immunologically naive population. in the absence of a safe and effective vaccine -which, if current efforts end up being successful, is unlikely to become widely available within the next two yearsnon-pharmacological interventions will need to remain in place to reduce the threat of secondary outbreaks by maintaining the basic reproduction number below 1. however, the type and degree of the interventions could be better tailored if governments knew who are currently infected and who have been infected and recovered. for this, the world needs to see the mass deployment of serological testing for sars-cov-2 antibodies (which appear to be highly specific 2 ), and frequent testing for sars-cov-2 rna in those likely to be exposed to the virus (especially healthcare workers) or at a higher risk for severe respiratory disease (such as the elderly and younger individuals with relevant comorbidities). medical-device companies and government and research laboratories around the world have rushed to adapt and scale up nucleic acid tests (mostly employing pcr, but also crispr-based detection and loop-mediated isothermal amplification) to detect the virus' rna, and government agencies are scrambling to assess them via emergency routes (such as the emergency use authorization program 3 by the united states food and drug administration (fda)). point-of-care pcr kits -based on lateral-flow technology or cartridge-based instruments for sample preparation, nucleic acid amplification and detection -also require rna extraction from nasal or throat swabs (or both) but can speed up the time-to-result from a few hours to roughly 30 minutes 4 (and in one test, positive results can be obtained in five minutes 5 ), with nearperfect sensitivity and specificity if sample acquisition and preparation and device operation are carried out appropriately by trained personnel. this limits the usefulness of these kits for at-home use, which would significantly raise the fraction of false negatives. immunoassays incorporating monoclonal antibodies specific for sars-cov-2 antigens (for instance, a domain of the virus' spike protein) should be amenable to home use, yet they are more difficult to develop (the antibodies are typically obtained via the immunization of transgenic animals) and are less accurate than nucleic acid testing. lateral flow immunoassays (akin to the pregnancy test) and enzyme-linked immunosorbent assays to detect antibodies elicited by the virus are also being rapidly developed (mostly by chinese companies thus far). tens of at-home lateral-flow devices 6 are already being commercialized, having obtained the european union's ce mark or been authorized for emergency use by the fda or the chinese fda. in many of these kits, the recombinant viral antigens bind to sars-cov-2specific immunoglobulin m (igm) and immunoglobulin g (igg) within 15 min; hence, these tests can also detect early-stage infection (of which igm levels are a marker), but at the expense of sensitivity and accuracy (which can exceed 90% and 99% for igg 7 . the real-world performance of such serology tests, which is currently unknown, will depend on the actual prevalence of covid-19 in the population. for example, at a 5% pre-test probability of having the disease, a test with 99% sensitivity and 95% specificity would lead to as many true positives as false positives. hence, before wide deployment, governments need to ensure that these finger-prick antibody tests are clinically validated 8 . the world should roll out both antibody and nucleic acid tests on a wide scale. widely available and inexpensive serological testing would help governments to tailor non-pharmacological interventions to specific locations and populations, to decide when to relax them and to permit citizens immune to the virus to help those who remain susceptible to it. mass testing would also provide valuable data to pressing unknowns: what are the infection rates across locations and populations? what fraction of the population is immune? how long does immunity last and how does it depend on age and on the severity of infection? wider deployment of nucleic acid tests would also provide clues about the prevalence of a wider range of covid-19 symptoms, the role of children in spreading the disease, and the epidemiological characteristics of superspreaders 9 and of those who were infected and asymptomatic. testing should be complemented by privacyminded digital surveillance, via phone apps, aiding contact tracing and permitting lighter levels of physical distancing -as done in singapore, south korea and taiwan. the downside is that any invasion of privacy via the tracking of people can last longer than necessary. de-identified and aggregated health data, such as heart rate and activity levels collected via commercial wearables, might also predict (https://detectstudy.org) the emergence and location of outbreaks. in our globalized world, the risk of further waves of covid-19 outbreaks, and thus of prolonged drastic economic consequences, will remain substantial as long as any outbreak anywhere remains. it is in the world's best interest that richer countries provide test kits, technical and publichealth knowledge, personnel, personal protective equipment and, eventually, the necessary vaccine doses to poorer countries to assist them in their efforts to reduce and contain the spread of sars-cov-2. this is humanity's next test. ❐ the lost month: how a failure to test blinded the u.s. to covid-19 food & drug administration accula test: sars-cov-2 test. u.s. food & drug administration covid-19 coronavirus rapid test casette. surescreen diagnostics virus test results in minutes? scientists question accuracy today's data on the geographic distribution of covid-19 cases worldwide (european centre for disease prevention and control coronavirus disease (covid-19) -statistics and research (our world in data tracking covid-19 cases and deaths key: cord-330721-hmnrnem6 authors: chambliss, allison b; tolan, nicole v title: contingency planning in the clinical laboratory: lessons learned amidst covid-19 date: 2020-04-21 journal: j appl lab med doi: 10.1093/jalm/jfaa068 sha: doc_id: 330721 cord_uid: hmnrnem6 nan global transmission of the novel severe acute respiratory syndrome coronavirus 2 (sars-cov-2) has faced clinical laboratories with many challenges in continuing to offer critical services. round-the-clock laboratory testing remains essential to support patient care, both those with and without 2019 coronavirus disease . this pandemic is leading to an influx of hospitalized patients, while simultaneously yielding virus exposures and selfquarantines for the laboratory workforce. thus, laboratories should prepare to operate with limited staff and may need to prioritize laboratory tests according to clinical necessity. all laboratories will recognize the need to pay particular attention to those sections involved in sars-cov-2 viral testing; upstaffing areas that receive, test or send out samples, and report/call-back results. however, the laboratory should consider various staffing models to maintain healthy workers, such as altering shift hours, or even alternating staffing groups (1) . preemptive scaling back of laboratory staff and enabling them to work from home will allow for creation of a reserve labor pool that can be engaged as staff are required to quarantine with exposure. this is only possible when laboratory testing volumes for tests not relevant to covid-19 precipitously decrease as hospitals cancel all non-emergent and elective procedures that would otherwise require maintaining higher volumes of comprehensive testing. the laboratory should begin contingency planning by assessing baseline operational status, which benches can be offered less frequently (batched as sample stability allows), which can be closed altogether, and the resultant minimum number of staff required to support emergent testing ( tests that will need to be maintained include complete blood counts, metabolic panels, routine coagulation, troponin, liver function tests, blood gases, and inflammatory markers such as c-reactive protein, lactate dehydrogenase, and procalcitonin (4, 5) . with laboratory automation, it may be best to prioritize ftes by assay bench or analyzer as prioritization of individual tests would require additional work of scrutinizing and separating orders, and sorting, storing, and re-running a large number of samples. it may be most efficient to simply allow an automation line to run the complete battery of tests ordered unless analyte-specific technical issues arise. in times of particularly critical shortages of staff and/or reagents, with proper agreement of hospital leadership and use of mass notification mechanisms, non-emergent tests could be temporarily masked from providers in the test ordering system and eliminate the laboratory from receiving them in the first place. the laboratory should also evaluate reagent and supply inventory and consider increasing supplies on-hand in preparation for higher test volumes and/or possible lapses in vendor supplies or delivery mechanisms. this will need to be considered in relation to the number of tests anticipated in both critical care and general care patient populations (https://covidprotocols.org) and the likelihood of filling covid-19 expansion beds as part of surge planning ( table 2 ). the lab should prepare for an increased number of mechanically ventilated patients. hospital leadership can provide details about the plans to expand patient care areas for covid-19 patients and the expected testing volumes. it may also be valuable to preemptively evaluate the potential benefit of increased point-of-care testing to ease the burden of samples sent to the laboratory. however, it is essential to consider the entire workflow, including interface work that may be required for new tests. as elective surgical procedures are postponed, staff across the department may be available to provide support and back-up to the essential functions of the lab, particularly on offshifts. cross-training amongst the various core laboratory areas, ideally in advance of significant absenteeism, will yield flexibility of assignments. as universities are increasingly scaling back research operations, other able-bodied personnel such as research scientists, medical students, or pathology residents may help the clinical laboratory as long as institutional policies and regulatory requirements are met. non-certified personnel may assist the laboratory with, for example, internal specimen courier services, specimen accessioning, inventory, or the assembly of covid-19 test collection kits. finally, open and continuous communication, both among the laboratory department and healthcare providers, should be maintained with regards to the status of laboratory services. electronic 'daily huddles' can help with assessing the number of staff available, the benches that will operate each day, and where additional staff can be relocated to support intradepartmental needs. daily assessment and communication can be automated via e-mail templates to inform the hospital of real-time lab staffing capacity and tests that will be unavailable or delayed. in summary, there are a number of steps the laboratory can preemptively take as part of disaster planning that involve cross-specialty collaboration within laboratory medicine and with the support of hospital leadership ( table 3) . table 1 . example contingency planning fte assignment tool. using the chemistry section as an example, a similar contingency planning tool can be used across core clinical lab specialties to assess benches/testing that can be performed depending upon available staffing. its design affords managers to use this tool daily to assign benches, considering priority of assays and specimen stability for assays that are batched. notably, increased risks of staffing concerns are seen on off-shifts (weekend days, evenings, and nights) and may be addressed by identifying staff who would volunteer to be on-call to cover these shifts as needed. a similar tool can be used to automate communication within the department and help reallocate staffing where it is needed, while also providing updates to clinical care teams. data for the chemistry section is offered as an example of information to collect, which is dependent on testing volumes, breadth of testing offered, as well other lab-specific needs. lab control/receiving, hematology, and lab management sections are provided as a place holder, with blank, shaded cells indicating additional data to be entered. a downloadable excel file of this table is available as supplemental table 1 . fte: baseline full-time equivalent (fte) staff number; ds: preemptive down-staffing to create alternating labor pools; min: minimum number of fte required to support only emergent testing; %min: minimum percentage of full staffing capacity to perform testing; float: no dedicated staff, staff from other benches to cover as able; d/c: discard and cancel; 1+: requires supervisor review and sign-off. t a b l e 3 . s t r a t e g i e s f o r c o n t i n g e n c y p l a n n i n g i n t h e c l i n i c a l l a b o r a t o r y a m i d s t t h e c o v i d -1 9 p a n d e m i c v a r y s t a f f i n g m o d e l s a l t e r s h i f t h o u r the critical role of laboratory medicine during coronavirus disease 2019 (covid-19) and other viral outbreaks world health organization. second who model list of essential in vitro diagnostics. geneva: world health organization planning for laboratory operations during a disaster clinical characteristics of coronavirus disease 2019 in china clinical management of severe acute respiratory infection (sari) when covid-19 disease is suspected. interim guidance key: cord-323476-rb9n5wc0 authors: poole, stephen; townsend, jennifer; wertheim, heiman; kidd, stephen p.; welte, tobias; schuetz, philipp; luyt, charles-edouard; beishuizen, albertus; jensen, jens-ulrik stæhr; del castillo, juan gonzález; plebani, mario; saeed, kordo title: how are rapid diagnostic tests for infectious diseases used in clinical practice: a global survey by the international society of antimicrobial chemotherapy (isac) date: 2020-09-09 journal: eur j clin microbiol infect dis doi: 10.1007/s10096-020-04031-2 sha: doc_id: 323476 cord_uid: rb9n5wc0 novel rapid diagnostic tests (rdts) offer huge potential to optimise clinical care and improve patient outcomes. in this study, we aim to assess the current patterns of use around the world, identify issues for successful implementation and suggest best practice advice on how to introduce new tests. an electronic survey was devised by the international society of antimicrobial chemotherapy (isac) rapid diagnostics and biomarkers working group focussing on the availability, structure and impact of rdts around the world. it was circulated to isac members in december 2019. results were collated according to the un human development index (hdi). 81 responses were gathered from 31 different countries. 84% of institutions reported the availability of any test 24/7. in more developed countries, this was more for respiratory viruses, whereas in high and medium/low developed countries, it was for hiv and viral hepatitis. only 37% of those carrying out rapid tests measured the impact. there is no ‘one-size fits all’ solution to rdts: the requirements must be tailored to the healthcare setting in which they are deployed and there are many factors that should be considered prior to this. electronic supplementary material: the online version of this article (10.1007/s10096-020-04031-2) contains supplementary material, which is available to authorized users. rapid diagnostic tests (rdts) are increasingly used in clinical practice to provide actionable information for patient care in a timely manner, ideally at the time and location of the patient's interaction with health care systems. rdts (often referred to as point-of-care tests (poct) when deployed near-patient) are often simple to use and therefore can offer diagnostic support in resource-limited settings or away from more sophisticated diagnostic laboratory support, for example in primary care. the treatment of many infectious diseases is time-critical. a test that facilitates early-directed therapy increases the chance of good patient outcomes and promotes good antimicrobial stewardship. furthermore, the early identification of highly transmissible illnesses allows healthcare services in high-income countries to rapidly isolate patients and limit the spread of disease: a benefit which has been particularly highlighted with the emergence of sars-cov-2. the last decade has seen a boom in rapid diagnostic products, with many developed and approved by healthcare authorities around the world [1] for a variety of different infections including gastroenteritis [2] , bloodstream infections [3] , pneumonia [4] and respiratory viruses [5] . formats of these tests include lateral flow assays and polymerase chain reaction (pcr). there are potential pitfalls around the implementation of rdts. many are expensive, and robust evidence for tangible clinical benefit to justify this outlay can be lacking. for some, sensitivity and specificity may be lower than established laboratory tests and therefore require that these can only be applicable to specific situations (e.g. when the pre-test probability is high). governance, quality control and assurance can be challenging, particularly when rdts are not sited within a traditional laboratory setup. these challenges differ around the world depending on local health diagnostic regulations, availability of resource, local epidemiology and patient expectation. the international society of antimicrobial chemotherapy (isac)'s rapid diagnostics and biomarkers working group conducted this international survey aiming to identify and highlight some key issues related to rdts and their impacts in clinical practice and provide a number of key points to consider while adopting a rdt. a questionnaire (survey monkey®) was devised and approved by the working group (supplementary materials). the survey included 9 questions about the experience of rdts: the questionnaire was circulated to 400 isac members via isac secretaries and respondents were given 4 weeks to respond during december 2019 and january 2020 with at least two reminders. everyone surveyed was either in a position to request or deliver tests. among the questions, there was an optional question asking responders to provide their specific role and institution. the location of institutions was linked to a united nations (un) human development index (hdi) ranking (very high, high, medium or low) [6] . this is a widely used, blunt representation of a nation's development which considers life expectancy, income per capita and education. responses were received from 81 isac members representing 31 countries (fig. 1) . this represented 20% of those initially surveyed. six respondents did not disclose their nationality. 81% of those who did disclose their nationality were received from countries classified as very highly developed on the un hdi, 11% were from highly developed nations and the remaining 8% from countries classified as having medium or low levels of development. 13/81 (16%) respondents reported no available rdts. the proportion of those who have these available is reported in fig. 2 . the main barrier reported for not adopting rdts was financial (64%), and other reasons were a lack of expertise (6%) or lack of applicability to their clinical setting (6%). 4% cited a lack of interest in the tests. only 37% of those with rdts reported measuring the impacts of their tests in any way (fig. 3) . 91% of those with rdts reported the laboratory carrying out the test. 28% reported the emergency department performing them. other clinical settings rarely carried out the tests (5% in clinics and 7% inwards). the governance structure for rdts is presented in fig. 4 . the most common way for reporting was in the electronic patient file (51%); fewer institutions generate the report in real time (36%). 47% of institutions directly phone the result to the requesting clinician. one respondent reported still generating paper reports, one reporting by email, one by sms and one not generating any specific laboratory reports as the test is done in the assessment area. the survey has given us an insight into what is happening globally with rdts. many respondents reported 24/7 availability of tests. very high-income countries had higher proportional availability of rapid influenza and respiratory virus tests. in lower-income countries, however, a lower proportion of respondents reported the availability of these tests, but hiv and hepatitis testing were available in greater proportions. the explanation for this pattern is likely multifactorial. in general, the epidemiology of chronic viral hepatitis and hiv is such that they are more prevalent in developing countries where public health interventions are less likely to identify and treat patients early in the course of illness [7] . the priorities for treatment are also different: influenza other resp multiplex, other respiratory multiplex; hep, hepatitis; gi, gastrointestinal tests management in secondary care is a less pressing need in resource-restricted settings where patient isolation facilities are less readily available. furthermore, the clinical impact relative to the cost of identifying a case of influenza is less than hiv or viral hepatitis where early identification and treatment make a greater difference [8, 9] . the relative cost of each test is likely to also be a factor in the difference of availability, with multiplexed assays generally being considerably more expensive and requiring more complex logistical support. methods for reducing the costs of many rdts are lacking, which limit their availability in low-income settings. there are still major gaps in capturing the impact of rdts on decision making in a systematic manner. only 37% of users measure impact. 64% of those surveyed reported that lack of money was the major barrier to bringing in rdts in their institution. developing robust impact recording systems, such as regular audit cycles, coupled with cost-effectiveness analyses are crucial to support business cases for new rdts. the current setup of rdts appears to be more laboratory centred: governance and quality control are the responsibility of laboratories in the vast majority of those surveyed. 90% of those who responded to the survey said tests were carried out in their institution by laboratory staff. simpler tests lend themselves more towards near-patient deployment and a clia waiver is often a good indicator of this. while there are a number of existing international regulatory processes for drugs and medications, providing safeguards for their safety and efficacy, they are often lacking for rdts [10, 11] . as a result, diagnostic tests are often sold and used in the developing world without any evidence of effectiveness. for example, mak et al. [12] reported the sensitivity of an rdt for sars-cov-2 of 11.1-45.7% when the manufacturer had claimed it was 98%. the benefit of rdts can be lost if not coupled with rapid pre-and post-analytical phases. the survey identified that less than half of the results are communicated to the requester directly, and only 35% of reports are generated in real-time on computers. this means delays are introduced as clinicians look up results. interestingly in some institutions, results are sent out by sms or email to requesting clinicians which would optimise the reporting process. identification of certain infectious organisms may have wider public health implications, for example, legionella; therefore we advocate real-time connection for these results to systems that allow rapid reporting to responsible public health authorities. a limitation to the method we should consider is the selection bias towards isac members who would be motivated to respond to the survey: potentially those who have the greatest interest in rdts or who are highly critical of them. there is also a bias towards respondents with greater resources suggested by the fact that at least 90% of tests had a laboratory involvement. furthermore, the survey size is relatively small and certain world regions (especially southeast asian nations and sub-saharan african nations) are poorly represented. the main aims of rdts are to improve patient care most efficiently within well-managed healthcare systems. we therefore suggest a number of best practices for implementation of rdts (table 1) . for rdts there is no 'one-size-fits-all' model; modelling of tests and costs are wildly different for different healthcare systems. our survey highlights the availability of these tests in different resource settings, as well as the current models for governance, quality control and reporting. point-of-care testing for infectious diseases: past, present, and future multiplex molecular panels for diagnosis of gastrointestinal infection: performance, result interpretation, and cost-effectiveness a review of novel technologies and techniques associated with identification of bloodstream infection etiologies and rapid antimicrobial genotypic and quantitative phenotypic determination rapid syndromic molecular testing in pneumonia: the current landscape and future potential routine molecular point-of-care testing for respiratory viruses in adults presenting to hospital with acute respiratory illness (respoc): a pragmatic, open-label, randomised controlled trial human development index (hdi) (n.d.) human development reports hepatitis b virus burden in developing countries cost-benefit analysis of real-time influenza testing for patients in german emergency rooms the economic burden of late entry into medical care for patients with hiv infection a guide for diagnostic evaluations point-of-care tests for diagnosing infections in the developing world evaluation of rapid antigen test for detection of sars-cov-2 virus publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations acknowledgments the authors would like to thank the isac secretariat, key: cord-295126-lz2jbmcn authors: toresdahl, brett g.; asif, irfan m. title: coronavirus disease 2019 (covid-19): considerations for the competitive athlete date: 2020-04-06 journal: sports health doi: 10.1177/1941738120918876 sha: doc_id: 295126 cord_uid: lz2jbmcn nan all major sports leagues and tournaments have been suspended or canceled due to covid-19 since early march 2020. initially, some sporting events were to be held without spectators to reduce transmission through close contact among fans. 12, 13 in the case of the national basketball association, the season was suspended soon after a player tested positive for covid-19. 16 other sporting events were forced to cancel when local and state governments restricted the sizes of gatherings. 3 on march 24, 2020, the international olympic committee announced that the olympic and paralympic games tokyo 2020 would be postponed to summer 2021. 14 while the typical athlete may only experience mild symptoms as a result of covid-19, prevention strategies are necessary for multiple reasons. first and foremost, preventing the transmission of covid-19 is needed to reduce the risk of spread to individuals within a community who are most at risk of severe infection or death, which includes older individuals and the immunocompromised. 27 prevention of covid-19 is also important for the competitive athlete to minimize interruptions in training and the adverse effects that it could have on his or her respiratory tract and aerobic capacity in both the short and long term. while the first cases of covid-19 were associated with a seafood market in wuhan, the virus has since spread person-toperson primarily via respiratory droplets. 15, 26 this mode of transmission occurs when the virus, in the form of respiratory secretions from coughing or sneezing, contacts another person's mucous membranes. according to chinese data, the rate of secondary covid-19 infections ranges from 1% to 5%. 26 transmission can also occur if a person touches his or her eyes, nose, or mouth after touching a surface containing respiratory droplets with the virus, which can remain viable for hours to days. 7 presymptomatic/asymptomatic carriers, which comprised 48% of the 531 cases on the diamond princess cruise ship, are also capable of transmitting covid-19. 2, 17, 28 currently, there is no evidence that the virus is spread through the shipment of food or other products from overseas. sports medicine providers can support athletes and teams during the covid-19 pandemic by advocating the following preventative measures: hand hygiene: general guidelines include washing hands often with soap and water for at least 20 seconds or using hand sanitizer (at least 60% alcohol) if soap and water are not available. as the virus can survive for days on surfaces, frequently touched objects and surfaces should be regularly cleaned and disinfected. 22 social distancing: the centers for disease control and prevention (cdc) describes social distancing as remaining out of congregate settings, avoiding mass gatherings, and maintaining distance (approximately 6 feet) from others when possible. 9 this practice is being advocated by governments and promoted by professional athletes as well. 4, 19 travel: to slow transmission, many countries have imposed travel restrictions. measures have ranged from suspending flights, to banning travelers from affected countries, to in-home isolation for 14 days after returning from specific destinations. countries are also performing entry screening, including measuring body temperature and assessing for signs and symptoms of covid-19. domestic travel has become challenging as busy airports can be a common site of person-to-person spread. however, as a result of the sweeping suspensions and cancelations of sports leagues and tournaments, many athletes are not needing to travel beyond returning home from where they were training or competing. face mask: asymptomatic athletes should not be advised to wear a mask to prevent becoming infected with covid-19 in the community setting or while traveling since it does not significantly reduce the risk of infection. 8 inappropriate use of masks can affect supply and demand to the point where health care workers will have inadequate protection, as we are currently seeing. prolonged and strenuous training has been suggested to be associated with temporary immune system depression lasting hours to days. 21 a conservative approach would be to advise athletes to limit training sessions to <60 minutes and to <80% of maximum ability during this time to prevent covid-19. however, this "open window" theory of infection susceptibility that follows a bout of vigorous exercise has been challenged. 5 vaccines are in the early stages of development but are unlikely to be available until early to mid-2021. the incubation period is typically within 14 days from exposure, with 95% of cases occurring within 5 days. the most common symptoms include fever (99%), fatigue (70%), dry cough (59%), and myalgias (35%). 23 some may also experience anosmia (loss of smell), dysgeusia (altered taste), a sore throat, rhinorrhea, or gastrointestinal manifestations. pneumonia is the most common serious manifestation, with bilateral infiltrates seen on chest imaging. of nearly 50,000 cases in china, 81% were mild (did not require hospitalization), 14% were severe (dyspnea, hypoxia, or >50% lung involvement on imaging within 24-48 hours), and 5% were critical (respiratory failure, shock, or organ failure). influenza and bacterial pneumonia should be considered when evaluating an athlete with fever, cough, and/or shortness of breath. testing for influenza can be done either prior to testing for covid-19 or simultaneously. a complete blood count to look for leukocytosis can help determine whether the symptoms are caused by a bacterial pneumonia. conversely, lymphopenia and leukopenia have been seen in covid-19 infections, which may assist in diagnosis. 2 during the early course of the spread of covid-19, availability of outpatient testing for the virus has lagged behind clinical needs. with these limitations, testing algorithms offered preference to patients with symptoms (fever, cough, or shortness of breath), an immunocompromised state, or close contact with someone with covid-19. as more tests are developed and approved in the united states, including those with faster turnaround times, testing criteria are expected to expand and may include testing asymptomatic individuals, as was done in south korea. 11 testing is done with a nasopharyngeal swab using an rna detection polymerase chain reaction (pcr) test. retesting may be needed in those with a negative initial test and a high probability of disease. a chest computed tomography scan can also be used to evaluate for signs of viral pneumonia as reverse transcription pcr may not detect covid-19 early in the course of the infection. 1 the management of covid-19 infection depends on the severity of symptoms. in new york city, 10% of individuals age 18-45 who tested positive for covid-19 required hospitalization. 18 however, given the limited access to testing and variable symptomatology, the total number of individuals with covid-19 may be much higher so the true risk of hospitalization among this age group is likely lower. therefore, for an otherwise healthy athlete under age 45 who becomes infected with covid-19, he or she would likely experience a self-limited flu-like illness. managing symptoms in an athlete primarily involves symptomatic management with rest and overthe-counter antipyretics. in-home isolation is recommended for athletes with confirmed or suspected covid-19 who do not show severe symptoms. other members of the household should minimize time in the same room as the affected individual, who should wear a mask when others are present. the health minister of france recently advocated for use of acetaminophen to treat fever associated with covid-19 and suggested that ibuprofen could worsen the infection. 24 this appeared to be based on a theoretical concern that the antiinflammatory effects of nonsteroidal anti-inflammatory drugs (nsaids) could adversely affect the immune system. however, the who currently does not recommend against using nsaids when clinically indicated in the treatment of a covid-19 infection. the who recommends that corticosteroids not be used in patients with covid-19 pneumonia unless there are other indications, such as the exacerbation of chronic obstructive pulmonary disease. 25 corticosteroids have been associated with an increased risk for mortality in patients with influenza and delayed viral clearance in patients with mers. there has also been good evidence for short-and long-term harm in sars patients treated with corticosteroids. 20 the following agents are being investigated as potential treatment options. it is important to note that there are currently no controlled data supporting the use of these medications and their efficacy is unknown. remdesivir: randomized clinical trials are under way assessing this investigational antiviral nucleotide analog in hospitalized adults. it has shown promise in in vitro as well as in animal studies. lopinavir-ritonavir: there have been case reports of treatment with this protease inhibitor used in hiv treatment, which has shown in vitro activity against mers and sars. however, 1 trial of nearly 200 patients with severe covid-19 infection showed no difference in time to symptom resolution or mortality when compared with standard supportive treatment. 6 hydroxychloroquine/chloroquine: studies are ongoing to investigate these 2 agents, which have shown activity against covid-19 in vitro. hydroxychloroquine may have more potent antiviral activity. published clinical data are limited, and caution should be used given potential side effects, such as qt prolongation. the cdc recommends discontinuing home isolation using either a test-based strategy or non-test-based strategy, depending on availability of testing resources. 10 if a test-based strategy is used, home isolation can be discontinued when the following criteria are all met: • no fever is present without the use of fever-reducing medications • resolution of respiratory symptoms • two consecutive negative covid-19 tests collected ≥24 hours apart when a non-test based strategy is used, the following criteria must be met: • at least 7 days have passed since the appearance of symptoms • at least 72 hours (3 days) have passed since recovery of symptoms without the use of fever-reducing medications suspending seasons and canceling competitions can cause significant grief, stress, anxiety, frustration, and sadness for an athlete. the psychological impact of covid-19 on a competitive athlete is potentiated by the removal of his or her social support network and normal training routine, which for some is a critical component of managing depression or anxiety. sports medicine providers should anticipate the need for additional mental health support for athletes, which could include ensuring regular check-ins with athletes, facilitating telehealth consultation with a sports psychologist, and encouraging maintenance of social interactions with family, friends, and teammates by phone or video chat. if an athlete on a sports team develops symptoms consistent with covid-19, teammates, coaches, and other staff who had close contact with the athlete (within 6 feet) in the preceding 14 days should begin in-home isolation. if the athlete undergoes testing, contacts can discontinue isolation if the test result is negative for covid-19. however, if the test result is positive for covid-19 (or if testing is not pursued and the athlete is treated presumptively), close contacts will need to continue their in-home isolation for 14 days from the last contact with the athlete. there will likely be requests for testing from asymptomatic teammates, coaches, and other staff. testing availability will likely dictate whether these individuals can be tested. during this time, any symptoms experienced by other athletes or staff should be reported to the team physician to determine whether they are legitimate signs of covid-19. team physicians may also consider implementing daily temperature checks. for athletes with confirmed or presumed covid-19, training can begin once symptoms completely resolve and energy levels return to normal. since in-home isolation is necessary for at least 72 hours after resolution of symptoms, low-intensity indoor training may be attempted during that time. after discontinuing in-home isolation, an athlete can gradually return to training as tolerated. for asymptomatic athletes who are isolated due to recent travel or close contact with an individual with covid-19, maintaining cardiovascular fitness may be difficult. exercise that is recommended during the in-home isolation period is dependent on the available equipment, which may include a stationary bike, treadmill, and resistance training. guidance and monitoring by a strength and conditioning coach or exercise physiologist can be provided remotely. as of march 2020, covid-19 has become a global pandemic, halting athletic competition worldwide. current focus is on the prevention of viral spread through social distancing and other common hygiene measures. sports medicine providers should know the most common symptoms of covid-19, work within their environments to learn and develop testing protocols as indicated by local resources, and minimize spread among teams. treatment in the outpatient setting is mainly supportive and includes home isolation, although several treatment drugs are under clinical investigation. correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases presumed asymptomatic carrier transmission of covid-19 washington state bans public gatherings, impacting mlb, mls and xfl games nike releases new campaign to promote social distancing amid coronavirus pandemic debunking the myth of exercise-induced immune suppression: redefining the impact of exercise on immunological health across the lifespan a trial of lopinavir-ritonavir in adults hospitalized with severe covid-19 centers for disease control and prevention. interim recommendations for us households with suspected/confirmed coronavirus disease 2019 covid-19) protect yourself centers for disease control and prevention. discontinuation of home isolation for persons with covid-19 south korea took rapid, intrusive measures against covid-19-and they worked. the guardian international olympic committee. joint statement from the international olympic committee and the tokyo 2020 organising committee genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding nba to suspend season following wednesday's games field briefing: diamond princess covid-19 cases coronavirus disease 2019 (covid-19) daily data summary coronavirus disease 2019 clinical evidence does not support corticosteroid treatment for 2019-ncov lung injury olympic textbook of medicine in sport aerosol and surface stability of sars-cov-2 as compared with sars-cov-1 clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan anti-inflammatories may aggravate covid-19, france advises. the guardian clinical management of severe acute respiratory infection when novel coronavirus (ncov) infection is suspected report of the who-china joint mission on coronavirus disease characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72314 cases from the chinese center for disease control and prevention a familial cluster of infection associated with the 2019 novel coronavirus indicating potential person-to-person transmission during the incubation period for article reuse guidelines, please visit sage key: cord-339122-7vvqtk84 authors: deb, chaarushena; moneer, osman; nicholson price, w title: covid-19, single-sourced diagnostic tests, and innovation policy date: 2020-07-07 journal: j law biosci doi: 10.1093/jlb/lsaa053 sha: doc_id: 339122 cord_uid: 7vvqtk84 the united states’ disastrous response to the onset of the covid-19 pandemic has arisen in large part by an utter failure to provide adequate diagnostic tests for the presence of sars-cov-2. the centers for disease control were the sole testing source authorized by the food and drug administration, and when the cdc failed to provide reliable tests in sufficient volume, it took weeks for other providers to be approved and to ramp up testing. revised policies should decrease the likelihood of sole-sourcing tests in pandemic contexts, which results in a fragile system. the pandemic sole-sourcing failure, however, not only accelerated the pandemic, but also provides lessons for innovation policy about diagnostic testing more generally. sole-sourcing hurts clinical practice by limiting confirmatory testing and systemic robustness, whether in a pandemic or in regular practice. we thus argue against relying too heavily on exclusivity-creating patents as innovation incentive for diagnostic tests—including the proposed coons-tillis patent reform bill which would increase patentability for many such tests. instead, we propose the use of reformed reimbursement to create better incentives for diagnostic test innovation. in both pandemics and elsewhere, single-sourcing creates too great a point of failure, but targeted innovation policy can help at the heart of the united states' disastrous response to the covid-19 pandemic is a failure of diagnostic testing. that something so fundamental to medical care could be so botched evokes incredulity. from primary care offices to critical care settings, every patient encounter begins with a diagnostic workup. diagnostic testing tools are key parts of the physician's toolkit, and confirmatory testing is essential. but in the greatest public health challenge of the 21st century, the failures of diagnostic testing have been laid entirely bare. the story of this failure in diagnostic testing has been told in detail and will be unpacked for years to come. briefly, the u.s. pandemic response lacked high quality diagnostics, failed to deliver tests in sufficient quantity, and was too slow in deploying developed tests when initially needed. 1 this lack of appropriate testing resulted in inaccurate patient identification and poor epidemiological characterization of viral spread. 2 from there, the errors further multiplied and rendered the rest of the governmental pandemic response similarly sluggish and ineffective. this article seeks to highlight an implication of the failure that applies broadly to diagnostic testing in general: the problems inherent in relying on a sole source for a diagnostic test. though we focus on the question of single-sourcing in this exploration, we do not claim this factor as the only failure characterizing u.s. covid-19 testing policy. 3 beyond single-sourcing, the fda stumbled through a series of missteps around its application of emergency use authorization (eua) strictures and its issuance of euas. 4 nationwide testing further suffered from the cdc's initial promotion of stringent criteria (e.g., travel history, symptom severity, etc.) to determine whether patients should even be allowed to receive tests. the constellation of these factors resulted in fewer patients being tested and rapid spread of the virus. 5 on the other end of the spectrum, the fda's antibody-testing policy was likely too lax for sars-cov-2 serology tests (designed to determine viral antibody presence in blood), leading to too many poor-quality marketed tests that could lead individuals to falsely believe they were immune to covid-19. 6 the test generated a high proportion of false positives due to initial development with faulty reagents. 10 beyond quality control issues, the diagnostic test further suffered from a slow reporting of results as clinicians had to send samples back to labs beyond the clinical practice setting for analysis. 11 these shortcomings exacerbated the pandemic's toll by preventing quick containment of the virus. crisis may have been averted had there been any alternatives to the cdc's test, but under the fda's restrictions, there were no other approved tests available to perform any confirmatory testing and receive second opinions. labs were left to sit powerless until other tests were at long-last approved. this de facto diagnostic test monopoly proved critical and pushed back pandemic control efforts by several weeks. 12 even after the fda permitted other labs to conduct testing, they were all required to follow the cdc's chemical formulation. 13 non-cdc labs all required the same reagents, effectively rendering the additional approvals useless because the small reagent manufacturer could not keep up with the instantaneous jump in demand. the approval of several competing tests earlier in the pandemic response that are not chemically identical, such as the roche sars-cov-2 test, 14 could have also alleviated some of the initial supply chain pressure and underscores the need to prevent monopoly testing. of course, single-sourcing does not always lead to problems; the sars-cov-2 test adopted by the world health organization has worked well in the countries that used it. 15 but single-sourcing sharply raises the possibility of problems leading to catastrophic failure down the line. the problems with single-sourcing, though, are not limited to pandemics. diagnostic tests are crucial to quotidian clinical practice. and within that practice, the abilities to access high-quality diagnostics rapidly and to perform confirmatory testing are crucial. single-sourcing diagnostic tests jeopardizes the ability to confirm test results, but also may impact innovative efforts to develop new diagnostics based on old tools, supply robustness, and cost-effective development. 16 an appropriate model for diagnostic testing development, including innovation incentives, should go beyond simply discouraging test monopolies and promoting confirmatory testing, by also allowing for low r&d costs for test development and yielding fast delivery of high quality tests-preventing the errors highlighted in the covid-19 testing response from occurring in both emergency and non-emergency situations. unfortunately, policy efforts ongoing since before the pandemic are pointing exactly the wrong direction. after an eight-year stretch of unpatentability for certain diagnostics, efforts are afoot to change the law: recent legislative action would allow such tests to be patented again, 17 raising concerns about limitations on initial use and confirmatory testing. 18 in this paper, we first consider how to address the problems with single-sourcing diagnostic tests in emergency medical contexts like the ongoing covid-19 pandemic. we then turn to applying those lessons for standard diagnostic test development, in particular considering how we can create adequate incentives for development without relying on a problematic single-source or quasimonopoly model. the key diagnostic testing roadblock during the covid-19 pandemic response in the u.s. appears to have been the fda's decision to permit only the cdc to offer diagnostic testing for sars-cov-2. without approval to test, none of the many other potential testing providers could offer confirmatory testing when the cdc's test offered ambiguous results, provide testing to make up for the cdc's breathtaking shortfall in testing volume, or offer innovative advances on the basic test to improve the substantial time required to get results back from the cdc. 19 though the fda ultimately did relax its regulatory structure in approving diagnostic tests, 20 the policy-driven delay had profound consequences. accordingly, the most straightforward intervention would involve updates to the fda's emergency use authorization process. the 2013 pandemic and all-hazards preparedness reauthorization act (pahpra) passed under the obama administration contemplated a pandemic, but focused primarily on streamlining the administrative process necessary for the fda to start issuing "emergency use authorizations" (eua) of unapproved medical countermeasures (mcm) and expanding the emergency uses of already fda-approved mcms. 21 we suggest updating the eua model so that safety and efficacy of novel diagnostic tests can be established as quickly as possible. thus, in order to facilitate early deployment of multiple tests in a pandemic scenario, we suggest considering whether the eua model be updated to: (1) streamline the eua application process, perhaps through an alternative, temporary route administered through cms, a regulatory body that clia-approved labs are already familiar with, (2) make the regulatory regime more flexible, such that a more lenient structure can be quickly put in place to combat the early stages of an epidemic, and (3) initially institute a limited liability model that would incentivize labs, like those at stanford and the university of washington, to begin using their test if they have good scientific data backing their safety and efficacy, especially if eua processing delays are expected. an absence of any regulatory oversight brings its own problems-witness antibody testing-but overly tight entry rules can also be disastrous. fda eventually adopted a more lenient policy incorporating some of these points, but the delay was costly. perhaps most significantly, therefore, we think it important for fda to consider the potential danger of single-sourcing when shaping its early pandemic responses to avoid a situation like that faced in early 2020. we recognize these changes are not a panaceasome labs lacked clia approval, creating a parallel barrier to testing-but think they deserve careful consideration. the failure of diagnostic testing during the u.s. response to the covid-19 pandemic provides a warning and a lesson for policy surrounding diagnostic tests more generally. good policy for diagnostic testing in non-emergent times requires both protecting clinician access to diagnostic testing and creating appropriate incentives to support diagnostic test development. an incentive model for diagnostic tests should thus focus on three main aims: (i) promote the typical use case for tests, including allowing physicians to obtain second opinions, (ii) minimize the cost of research and development, and (iii) ensure rapid deployment of safe and effective tests. the third and especially the first of these aims are hindered by single-sourcing. and thus any sort of incentive model that relies on granting market exclusivity -such as the traditional patent system, fda approval exclusivity, or trade secret protection -raises the same sorts of issues, if on a smaller scale, as those grimly demonstrated by the failure of u.s. sars-cov-2 testing at the onset of the covid-19 pandemic, which ultimately resulted in the loss of many lives. 30 by preventing access to second opinions or incentivizing fast, but ultimately ineffective science, we risk putting patients in harm's way. 31 a better incentive system would allow for an increased number of players in the market, all vying to be the best diagnostic for each indication. such a system would promote diagnostic creation and implementation that mirrors actual diagnostic usage. we first summarize (a) a recent history of diagnostic test patentability, since patents have both created incentives but hampered access and clinical practice; (b) describe current efforts to increase diagnostic test patents; and then (c) propose improved reimbursement mechanisms as a solution to the balancing act of managing appropriate economic incentives and allowing for optimal clinical practice during non-emergent conditions. diagnostic testing patentability has once again entered public debate, this time in congress rather than the courts. in the summer of 2019, senators thom tillis (r-n.c.) and chris coons (d-del.) proposed draft legislation to modify existing patent law, rendering many diagnostic tests patentable. 35 the proposal no longer considers "abstract ideas," "laws of nature," or "natural phenomena" to be exceptions to patent eligibility, explicitly overturning the three supreme court decisions. 36 under the proposal, courts would no longer be able to invalidate patents because they cover underlying biological relationships-precisely the stuff of diagnostic tests. 37 the desire to incentivize diagnostic test development through exclusivity rights granted by patents drives the biotech industry's support of the bill. 38 while single gene patents are unlikely to reemerge due to the critical patent requirement of novelty-the human genome has been sequenced many times over, so genes are unlikely to be patentably "new"-patenting in other areas of molecular diagnostics (e.g., polygenic risk scoring or autoantibody detection) remains a concern. 39 32 robert although the pandemic has sidelined unrelated legislative efforts, there remains interest in revising patent law. if the tillis-coons bill passes, we may see a return to the pre-mayo era in which certain diagnostic tests existed as patent-protected monopolies, making it hard to verify quality or to obtain confirmatory tests. 40 outside the pandemic's recent horrifying exemplar, confirmatory tests have been stymied by patent-based single-sourcing in the past. for instance, until 2009 pgxhealth was the sole provider of genetic testing services for long qt syndrome (lqts). 41 independent assessment by clinicians suggested discrepancies in test accuracy and quality. 42 a competing diagnostic test could have allowed clinicians to discover these shortcomings and further validate false negatives. 43 good clinical practice also requires the availability of multiple diagnostic test options, including ideally by alternate test providers, as physicians frequently rely on second opinions to choose the best care for their patients. previous studies exploring the practice of second opinions in diagnostics have suggested that the ability to verify the results of an initial diagnosis may lead to a change in diagnosis or treatment. 44 before the supreme court intervened, myriad's exclusivity for brca1/2 genetic variant testing prevented precisely this ability. 45 the tillis-coons bill relies on patents to provide an incentive in the form of market exclusivity, but exclusivity is especially problematic for diagnostic tests. (there are other problems with relying on patents to drive biomedical innovation, especially in relation to equity, but we do not focus on those challenges here). 46 though patent holders may not always exercise their monopoly rights, the danger lies in their potential to do so. instead, we should create incentives for diagnostic test development by leveraging non-patent policy levers. a more effective incentive system would allow an increased number of players in the market, all vying to be the most accurate and cost-effective test for each indication. such a system would promote diagnostic creation and implementation that mirrors actual physician usage. increasing reimbursement rates to reflect the expected value of diagnostic tests could help. public and private insurance providers use the current procedural terminology (cpt) system to determine the level of reimbursement warranted by new diagnostics. 47 the cpt system, in which diagnostics are often treated as commodities, is based on cost and procedure instead of the diagnostic's value. 48 to incentivize diagnostic development without creating monopolies, health insurance reimbursement strategies should be updated to provide a monetary reward for the potentially substantial cost-savings and health improvements of diagnostics. for administrative simplicity, the cpt shoehorns new tests into established reimbursement categories ("codes") that have established prices, generally evaluating new diagnostics only in comparison to existing diagnostics. 49 new tests found analogous to old tests may be "cross-walked" to the old test's price, even if the new test is far more effective-and provides much more value. 50 in december of 2019, 70% of new diagnostics were cross-walked. 51 completely novel tests with no adequate comparison may require new pricing determinations that can take years to implement. 52 thus, new diagnostics face relatively cheap prices that may not offset research and development costs, decreasing incentives to create new tests. in an attempt to promote value-based pricing, some diagnostic test manufacturers have opted for a "miscellaneous" cpt code. 53 such coding requires an assignment of value from each payer, which effectively bypasses traditional cost-based pricing but represents a logistical nightmare that dissuades most manufacturers. effective diagnostic tests, however, can shift healthcare spending from therapeutic to preventative care, promote precision medicines responding specifically to a patient's disease state, lower physician trial and error, and reduce hospital stays 54 -all of which reduce health-care system costs and improve patient care. shifting to a reimbursement model that recognizes these substantial savings could properly incentivize diagnostic tests' true value. cms has recently acquired a new set of tools that may facilitate a leadership role in reimbursement model changes. the clinical laboratory fee schedule underwent a major overhaul as part of the protecting access to medicare act of 2014 (pama). though the system had not been updated in three decades, many cumbersome but necessary changes, such as a national fee schedule and private market data collection, were implemented in order to save medicare nearly $4 billion over 10 years. 55 while there are likely to be bureaucratic challenges to implementing value-based reimbursement for diagnostics via cms, some of the provisions of the new pama fee schedule, such as the shift to national pricing, could ease simpler regulatory updates rather than a complete overhaul of the fee schedule and thus make change easier. to be sure, some diagnostics will recommend more expensive care; getting the incentives right across the board will be complicated, but reimbursement is a better and more nuanced tool than bringing back broad patent exclusivity. 56 we recognize reimbursement will not create a complete incentive regime, and may need supplementing with grants or prizes. 57 and in some instances, patents will still be present, especially for more technically complex diagnostics. 58 but as a general level, reimbursement policy reform shows substantial potential for creating incentives without relying on single-sourcing. sars-cov-2 did not create new problems in biomedical innovation; instead, it dramatically exposed underlying issues, including in the development and deployment of diagnostic tests. in particular, single-sourcing, whether through regulatory restrictions or patent quasi-monopolies, may seem like an attractive way to centralize control and to create incentives. but for diagnostic tests which rely on confirmatory testing, innovative improvements, and robust access to supply, single-sourcing creates a host of serious problems. creating the right incentives demands careful attention to these problems, and ideally structuring a regime that avoids them. reimbursement, especially structured through public involvement, provides a promising option for such a regime. both to avoid future pandemic-related disasters and to improve the normal practice of medicine, it is worth rethinking how we drive and shape the development of diagnostic tests. 2019 icd-10-pcs conversion table (zip)" hyperlink; then select "icd10pcs_conversion_table.xlsx" file) (last visited apr office of the inspector general health and human services, hhs oig data brief medicare payments for clincial diagnostic laboratory test in 2015: year 2 of baseline data rev. 1115rev. (2015. 58 we acknowledge the existence of the rare cases in which early patent protection may be necessary to secure enough resources for development, most famously theorized by edmund w. kitch in the nature and function of the patent system, 20 j.l. & econ 265, (1977) (discussing the "prospect theory" of patents). the judicious use of grants and prizes may minimize the number of such cases. key: cord-325956-1kxxg0s9 authors: potluri, rahul; lavu, deepthi title: making sense of the global coronavirus data: the role of testing rates in understanding the pandemic and our exit strategy date: 2020-04-11 journal: nan doi: 10.1101/2020.04.06.20054239 sha: doc_id: 325956 cord_uid: 1kxxg0s9 the coronavirus disease 2019(covid-19) outbreak has caused havoc across the world. subsequently, research on covid-19 has focused on number of cases and deaths and predicted projections have focused on these parameters. we propose that the number of tests performed is a very important denominator in understanding the covid-19 data. we analysed the number of diagnostic tests performed in proportion to the number of cases and subsequently deaths across different countries and projected pandemic outcomes. we obtained real time covid-19 data from the reference website worldometer at 0900 bst on saturday 4th april, 2020 and collated the information obtained on the top 50 countries with the highest number of covid 19 cases. we analysed this data according to the number of tests performed as the main denominator. country wise population level pandemic projections were extrapolated utilising three models 1) inherent case per test and death per test rates at the time of obtaining the data (4/4/2020 0900 bst) for each country; 2) rates adjusted according to the countries who conducted at least 100000 tests and 3) rates adjusted according to south korea. we showed that testing rates impact on the number of cases and deaths and ultimately on future projections for the pandemic across different countries. we found that countries with the highest testing rates per population have the lowest death rates and give us an early indication of an eventual covid-19 mortality rate. it is only by continued testing on a large scale that will enable us to know if the increasing number of patients who are seriously unwell in hospitals across the world are the tip of the iceberg or not. accordingly, obtaining this information through a rapid increase in testing globally is the only way which will enable us to exit the covid-19 pandemic and reduce economic and social instability. the coronavirus disease 2019(covid-19) outbreak has caused havoc across the world after it was first reported in wuhan, china 1,2 . subsequently, research on covid-19 has exploded to understand the new disease and its impact on mankind [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . however, the number of baseless articles resulting in fake news articles has also gone up exponentially [17] [18] [19] . a number of models have been adapted by policymakers to predict the course of covid-19 across the world 4, 6, 13, 20 . the reason for such models is to ensure that healthcare systems can plan services to help them cope with the demands of this new disease which is resulting in serious cases leading to hospitalisation 3, 8 . core elements of the prediction models have been the number of cases and deaths reported and these studies extrapolated the numbers forward to the population over time 4, 6, 13, 20 . given the pandemic course of covid-19, it has become common practice to compare its spread in different countries using case fatality rates 3, 4, 7, 13 . however, such methods only tell us part of the story. vast differences amongst countries in their testing policies for varied reasons including availability of testing equipment, infrastructure, resources and local governing policies affect case fatality rates. in addition, comparing case fatality rates between countries which are at different stages of the epidemic in their region would be erroneous as rates at the beginning and end would be lower compared to rates at the peak when healthcare services are stretched to their limits. therefore, the search for a common yardstick or denominator is necessary to compare different countries so that the data can be extrapolated for global comparison. over the past four weeks, as covid-19 spread further around the world, testing rates have picked up in most countries. we propose that analysis of the number of diagnostic tests performed in proportion to the number of cases and subsequently deaths in the underlying populations of different countries is the best way to predict what might happen next. we analysed this from the acalm big data research unit. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint we obtained real time covid-19 data from the reference website worldometer at 0900 bst on saturday 4 th april, 2020 and collated the information obtained on the top 50 countries with the highest number of covid-19 cases 21 . from this source, we obtained many parameters including the number of country wise covid-19 cases, deaths, tests performed, cases per million population, deaths per million population and tests per million population. china and saudi arabia were excluded due to lack of data on number of diagnostic tests performed, therefore numbers 51 and 52 were included in the compiled top 50 list. we obtained case fatality rates by dividing the number of deaths by the number of cases represented as a percentage. next, tests per positive case were calculated by dividing the number of tests by the number of cases. we then calculated the number of cases per test and number of deaths per test by dividing the number of cases and deaths respectively, by the number of tests represented as a percentage (a case per test rate and a death per test rate). subsequently, we obtained the population of these countries (in millions) from the number of cases divided by the number of cases per million. we can obviously obtain more accurate country population statistics from other sources but to maintain our consistency of the data source and methodology (for all countries), we derived the information from this data only. we then analysed the above in three steps. firstly, we extrapolated the population level pandemic data for each country in terms of cases and number of deaths according to each country's case per test rate and death per test rate as calculated as a snapshot at the time of obtaining the data. there are a number of limitations to the methodology used when taking a snapshot of these countries at a point in time, as done above, especially because each country is likely to be on a different part of the pandemic curve and extrapolating to the population level data is not likely to be . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint accurate. therefore, we further undertook a consistent adjustment according to countries which performed the most tests; in favour of larger countries with bigger populations we chose an arbitrary cut off of 100,000 tests per country. 15 countries had undertaken more than 100,000 tests and as all these countries showed differences in their cases/test and deaths/test we took the 15 country group as a whole to obtain an adjustment factor according to the cases/test and deaths/test. using this we derived a case per test rate of 13.53% and a death per test rate of 0.77% for the 15 country group. based on the above numbers, we extrapolated figures at the population level for all 50 countries to calculate the predicated number of cases and deaths. we felt it necessary to undertake further analysis, the third analysis, to adjust the data to a country which is progressing towards the latter half of the pandemic curve -south korea [22] [23] [24] . ideally, undertaking this adjustment with data from china would be most appropriate but data for the number of diagnostic tests performed in china was not available. the adjustment factor for south korea was a case per test rate of 2.23% and 0.04% death per test rate. hence our country wise population level pandemic projections were based on 1) inherent case per test and death per test rates at the time of obtaining the data (4/4/2020 0900 bst) for each country 2) rates adjusted according to the countries who conducted at least 100000 tests and 3) rates adjusted according to south korea. our analyses are shown in the tables and figures. no additional analyses were performed. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint full data obtained on 4/4/2020 are shown in table 1 for the top 50 countries with highest number of covid-19 cases in the world. table 2 shows the countries according to number of tests performed per positive diagnosed covid-19 cases. table 3 shows population level pandemic projections for cases and deaths according to each individual country's case per test and death per test rate on 4/4/2020 0900 bst. table 4 shows population level pandemic projections adjusted for the combined case per test and death per test rate of the 15 countries group that have performed at least 100000 tests. table 5 shows the population level pandemic projections adjusted for the case per test and death per test rate of south korea. figure 1 show a scatter plot to show the relationship between the case fatality rate and the testing rate as a percentage of the total population of the country for the countries which have tested at least 1% of their total population. italy was excluded from this scatter plot as it was an outlier with a case fatality rate of 12.25%. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint discussion covid-19 statistics are complex and comparing different countries based on number of total cases, deaths and/or case fatality rate does not show the complete picture (table 1) . a common denominator is required to make senses of these numbers and we propose that this denominator is the number of diagnostic tests performed. in our analyses we showed the deaths and cases in relation to the number of tests performed and presented population level pandemic projections based on these. this is particularly relevant in the current environment where testing parameters vary across different countries leading to non-uniformity in projections. it is important to discuss each of our different analyses in turn, the rationale, drawbacks and what it means for different countries. as table 2 shows, the number of tests per positive case is an important parameter because it is an indication of how widely the testing policy of the respective country has followed the advice from the world health organisation (who) 1 high as the raw data obtain from this source but for consistency in dealing with all the raw data in the same way, we analysed according to the data obtained from worldometer. of course similar bias could be inherent for the testing data for all countries but in our defence we have treated all the raw data obtained in the same way for consistency and have opened up the data for scrutiny. another important factor to consider here is the testing policy followed in these countries. are the countries at the top of this table testing a cohort of people who have a low possibility of carrying this . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint infection? if only those with symptoms are tested then individuals are more likely to test positive for covid-19 leading to a low test per positive number. if these countries test the sickest of patients as you would expect in countries with the largest populations and limited testing kits to do, then high testing rates per positive case is even more remarkable as it may suggest lower virus rates compared to other countries but this cannot be concluded from this study. furthermore, the reliability of local testing kits is an important factor as there are a number of reports of covid-19 patients testing negative numerous times before a positive test 11 . south korea can be considered as an exception to this because following an explosion of cases initially, they embarked on an extensive testing policy along with isolation policies combined with the utility of mobile tech and applications to inform the public about real time locations of positive cases. as such, it is widely accepted that south korea are further along the pandemic curve and the rates of new cases and deaths have significantly reduced [22] [23] [24] . projections for the pandemic on an individual population level are very important for governments to plan and organise healthcare systems in response. covid-19 presents a unique problem because there is no immunity for this in the community, nor a vaccination or targeted medical treatment. given that this is a highly contagious disease that spreads very quickly, if a large part of the population suffer from the disease in a short space of time, even if majority of cases are mild, a small minority of severe/critical cases will still lead to significant pressures on healthcare systems as now seen in italy, spain and the usa (particularly new york). globally lockdowns have been instated to reduce the spread of infection, allow the healthcare systems to cope with the condition and "flatten the curve" of the pandemic. these were not enforced all at once and the projections in table 3 are based on the data available on 4/4/2020 and a snapshot depending on the actions, policies of individual countries. much more complex models have been undertaken by different groups which included time as a variable 20 . however, we propose that testing rate is a very important parameter in projecting the outcomes of the pandemic. therefore, whilst it seems far-fetched to suggest that indonesia may end up with over 7 million deaths from less than 2000 cases reported so far, we have . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint to note that only just over 7000 tests have been performed for a country of 283 million. there are a number of factors for low testing rates such as local policies, lack of resources and equipment and it is impossible to discuss them all; we point out that testing rates are extremely important in projecting the outcomes of the pandemic particularly in countries with large populations. in this context, if we look at countries such as the uk and india both of which have tested over 100000 tests, given the large populations and their case per test rate and death per test rate on 4/4/2020, both countries have projections for over 1000000 deaths. as mentioned the position of any given country on the pandemic curve is important in determining population level projections and since we proposed that testing rates have an impact on projections we adjusted all projections to the combined case per test and death per test rates of all the countries that have performed over 100000 tests. this analyses is shown in table 4. we also felt that projections should be done on the case per test and death per test rate for south korea given the countries position on the pandemic curve and are shown in table 4 [22] [23] [24] . both these analyses are biased in terms of predications for total deaths for countries with larger populations. for example in spite of the case per test and deaths per test rate being low in india, as the population of the country is large, the projections are still over 10 millions deaths as per table 4 (adjustment according to countries which have performed more than 100,000 tests) and 500,000 deaths as per table 5 (south korea adjustment). it is also no coincidence that none of the top 10 countries in table 4 or table 5 have tested at least 1% of the total population. we looked at the countries that have tested at least 1% of their populations and looked at their cases fatality rate. we excluded the 10 th country on the list -italy because of its high case fatality rate of 12.25%. all other countries had a case fatality rate of 3% or lower. we then correlated the case fatality rate with percentage of the population tested as shown in figure 1. this approach showed higher percentage of population tested in countries with lower populations who have tested a higher proportion of their total population, but not in all cases. both . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint germany (population 83 million) and to a lesser extent australia (population 25million) and have tested more than 1% of their population and showed low case fatality rates (1.4% germany and 0.54% australia). provided that their testing criteria is reliable, these figures may serve as early indicators of the actual mortality rate for covid-19 and these low figures are encouraging. none of these methods used for projections are likely to hold true in reality. if we go back to the analyses for south korea and its projections of approximately 20,000 deaths, there have been only 177 deaths in south korea so far. it seems highly improbable that for a country where the number of cases and deaths have significantly tailed off would end up with 19,952 deaths. furthermore, the herd immunity concept has been a strategy to contain disease spread not only for covid-19 but across a number of pandemics such as swine flu 26 . although it is widely debated as to what percentage of the population would need to be affected by the disease to confer herd immunity, a figure of 60% has been widely used [27] [28] [29] . even if we adjust the south korea figures (table 5) to 60%, we will probably still over estimate the number of deaths. where does all of this leave us and what is the point of all these statistics and analyses? clearly from the example of south korea we can contain covid-19 and in spite of differences of the specific policies of lockdown between countries, social distancing and limiting spread are the broad themes to take forward. the analyses in this study highlight the importance of testing as the relevant denominator for which all the covid-19 data should be related to. the testing policy is advocated strongly by the who in their covid-19 statements 1 . the suggested early indication of a low mortality rate from our analyses, coupled with the fact that covid-19 is a new disease affecting the globe in a short time, it is highly plausible that the serious cases and deaths we are seeing in the some countries may be the tip of the iceberg of a disease that has spread widely. if we look at influenza data there are millions of cases and up to half a million deaths worldwide every year due to flu and these tend to be seasonal in spite of vaccination programmes and herd immunity to some extent [27] [28] [29] [30] . in the case of covid-19 we might be experiencing the full whammy of a disease without . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint immunity, globally all at once resulting in deaths. the magnitude of these deaths in perspective to other diseases such as influenza may not be high 30 . our analyses in this study do not prove this theory but the only thing that can is continued extensive and rapid testing across the globe. this may be the only exit strategy to prevent covid-19 related economic and social breakdown. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint who. coronavirus disease 2019 (covid-19) situation report-43 clinical features of patients infected with 2019 novel coronavirus in wuhan european centre for disease prevention and control ecdc public health emergency team. rapidly increasing cumulative incidence of coronavirus disease (covid-19) in the european union/european economic area and the united kingdom retrospective analysis of the possibility of predicting the covid-19 outbreak from internet searches and social media data, china correlation of chest ct and rt-pcr testing in coronavirus disease estimating the unreported number of novel coronavirus (2019-ncov) cases in china in the first half of january 2020: a data-driven modelling analysis of the early outbreak nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint novel coronavirus infection (covid-19) in humans: a scoping review and meta-analysis opportunities and threats (swot) analysis of china's prevention and control strategy for the covid-19 epidemic coronavirus disease 2019: the harms of exaggerated information and nonevidence-based measures coronavirus disease 2019: the harms of exaggerated information and nonevidence-based measures effects of media reporting on mitigating spread of covid-19 in the early phase of the outbreak the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint estimates of the severity of coronavirus disease 2019: a model-based analysis korean society of infectious diseases; korean society of pediatric infectious diseases korean society for antimicrobial therapy; korean society for healthcare-associated infection control and prevention covid-19) outbreak in the republic of korea from estimating the reproductive number and the outbreak size of novel coronavirus disease (covid-19) using mathematical model in republic of korea. epidemiol health how is covid-19 affecting south korea? what is our current strategy? disaster med public health prep covid-19) testing: status update history and epidemiology of swine influenza in europe the vaccination coverage required to establish herd immunity against influenza viruses back to the future for influenza preimmunity-looking back at influenza virus history to infer the outcome of future infections pandemic dynamics and the breakdown of herd immunity y the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) the copyright holder for this preprint . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medrxiv preprint key: cord-303539-gimz41yb authors: goudouris, ekaterini s. title: laboratory diagnosis of covid-19() date: 2020-08-31 journal: j pediatr (rio j) doi: 10.1016/j.jped.2020.08.001 sha: doc_id: 303539 cord_uid: gimz41yb objectives: this was a non-systematic review of the literature on the laboratory diagnosis of covid-19. data sources: searches in pubmed and google scholar for articles made available in 2020, using the terms "diagnosis" or "diagnostic” or "diagnostic tests" or "tests" and "covid-19" or "sars-cov-2" in the title. summary of findings: tests for the etiological agent identify genetic material of sars-cov-2 or humoral responses to it. the gold standard for diagnosis is the identification of viral genome targets by real-time polymerase chain reaction (rt-pcr) in respiratory tract materials during the first week of symptoms. serological tests should be indicated from the second week of symptoms onwards. a wide range of different tests is available, with variable sensitivity and specificity, most of which require validation. laboratory tests such as complete blood count, c-reactive protein (crp), d-dimer, clotting tests, lactic dehydrogenase (ldh), ferritin, and procalcitonin identify risk of disease with greater severity, thromboembolic complications, myocardial damage, and/or worse prognosis. imaging tests may be useful for diagnosis, especially when there is a compatible clinical picture, and other tests presented negative results or were unavailable. conclusions: the identification of genetic material of the virus by rt-pcr is the gold standard test, but its sensitivity is not satisfactory. the diagnosis of covid-19 should be based on clinical data, epidemiological history, tests for etiological diagnosis, and tests to support the diagnosis of the disease and/or its complications. new diagnostic methods with higher sensitivity and specificity, as well as faster results, are necessary. since december 2019, humanity is once again facing a pandemic, this time caused by a betacoronavirus, the sars-cov-2. the disease caused by this infection was named coronavirus disease 2019 . 1 sars-cov-2 is a respiratory transmitting virus that causes a flu-like condition and, in some cases, severe acute respiratory syndrome (sars). 1 however, the follow-up of covid-19 patients has shown that the virus is capable of causing symptoms outside the respiratory tract, in addition to complications of an inflammatory nature in several organs, expanding the spectrum of associated clinical manifestations. 2 early and accurate diagnosis of sars-cov-2 infection is essential for prevention and pandemic containment. the heterogeneity of the clinical presentation, from asymptomatic individuals to severe cases, and the relevant diversity of non-specific clinical manifestations of covid-19, reinforce the need for complementary tests with good sensitivity and specificity. 3 the results of diagnostic tests have serious implications: return to work of a health professional, transfer to a covid-19 area of an inpatient unit, or the reverse, possible contamination of family members, among other delicate situations. as with any other infection, the gold standard for diagnosis is the identification of the infectious agent. in the case of viral infections, this identification can be made by visualizing viral particles at electron microscopy or identifying intracellular viral inclusions at light microscopy. tissue cultures are necessary for the study of in vitro virus replication. these methods require technology that is usually available only in research centers. in commercial laboratories, immunoenzymatic assays or agglutination tests are available for detection of viral antigens and nucleic acid amplification tests for detection of virus genetic material. 4, 5 an indirect way to diagnose viral infections is the identification of a specific immune system response. the humoral response, or antibody production, is the simplest way to diagnose infectious conditions. there are different techniques for identifying antibodies that are directed against different parts of viruses. 4, 5 however, it is important to note that the immune response to viral microorganisms occurs primarily by innate immunity, particularly by nk cells, and cellular immunity, especially cytotoxic t cells (tcd8+). 6 to date, pubmed features over 35,000 articles on covid-19. many of them are presented as preprint, without peer review; some of these studies were conducted with poor methodology, providing unreliable results. moreover, during the pandemic, knowledge has advanced greatly, and initially established concepts were modified, demonstrating that certain specificities of sars-cov-2 infection are not comparable with previously known viral infections. this was a non-systematic review of the literature on the laboratory diagnosis of covid-19, drawing attention to the knowledge already established, as well as the doubts that still need to be clarified. a non-systematic review of the literature was carried out in pubmed, searching for articles submitted in 2020, with the termsdiagnosisördiagnostic'' orẗestsördiagnostic testsänd ''covid-19örsars-cov-2ïn the title. since many manuscripts have been made available in preprint version, without peer review, google scholar searches have also been performed, using the same terms. this study included articles in english, portuguese, french, or spanish, using the checklists proposed by the user's guide to medical literature (jama evidence) as inclusion criteria. 7 the complementary tests used in the diagnosis of covid-19 can be divided into tests for etiological diagnosis and support tests, which help in the diagnosis or indicate the risk or presence of complications. tests for etiological diagnosis may be direct, identifying genetic material of sars-cov-2, or indirect, determining the humoral immune response to sars-cov-2. the most commonly used method for identifying genetic material from sars-cov-2 is real-time polymerase chain reaction (rt-pcr). this method involves reverse transcription of the genetic material of the virus (rna) to complementary dna (cdna), followed by amplification of some regions of the cdna. probes (dna/rna marked sequences to identify the genetic target in the material) and primers (dna/rna sequences that promote replication of the genetic material found in the sample) were created after the sars-cov-2 genome was sequenced. several serial amplification cycles are performed to identify these targets: the more cycles are needed, the lower the viral load of the material under study. 8 four regions of the sars-cov-2 genome have been targeted: rdrp gene (rna-dependent rna polymerase), genes from structural proteins e (virus envelope) and n (virus nucleocapsid), and orf1ab gene (open reading frame 1a and 1b). 3, 8 kits using different regions of the genome are commercially available. the sequential use of different probes and primers for the rdrp, e and n genes, known as the charité-berlin institute protocol, presents good sensitivity and specificity. 9 there are other proposed protocols that follow the same logic of sequential use of probes and primes for different genetic targets. 10 regardless of the method used, the sensitivity and specificity of the different rt-pcr kits are not 100%. this is considered the gold standard for diagnosis of sars-cov-2 infection, but its sensitivity is estimated to be approximately 70% and specificity, 95%. 11, 12 many factors can interfere with the results, whether related to the virus, to the method itself (the collection procedure and handling of the material), or even to the viral load of the sample (type of material collected, duration of symptoms, and disease severity). 13 mutations in the virus genome can render the probes and primers obsolete, producing false negative results. to date, sars-cov-2 has undergone mutations, but without implications for the rt-pcr detection. mismatch between primers and probes can also lead to false negative results, and ideally more than one region of the virus genome should be simultaneously or sequentially amplified. 13 factors related to the collection procedure and handling of the material are often responsible for false negative results. dacron or polyester swabs should be used and immersed immediately after collection in appropriate and refrigerated storage medium. the material should be kept under refrigeration and quickly sent to the laboratory. 10, 13 a low viral load, usually found in asymptomatic individuals or in those with mild clinical conditions, may also be responsible for a false negative result. 14, 15 individuals with more severe clinical conditions have greater elimination of viruses. 16 although it has been described that there may be elimination of viruses from two to three days before to up to six weeks after the onset of symptoms, 8 very early (before three days of symptoms) or late material collection (after the seventh day) may produce false negative results, due to lower viral load. 1, 13, 15 the type of material and the collection technique also interfere with the result. in several studies, bronchoalveolar lavage was the material with the highest positivity, followed by sputum, nasopharyngeal swabs, and nasal swabs. oropharyngeal swabs did not present good positivity. 15, 17 the identification of genetic material of the virus in feces is less common and has uncertain significance, since infecting virus was not detected in this material. 18 viral particles were not isolated in urine or blood. 18 saliva tests have also been implemented, but have lower sensitivity than the nasopharyngeal swab and require validation. 19 false positive results are most commonly related to errors in sample handling during or after swab collection, leading to inadvertent contamination. 13 tests to identify genetic material of the virus using simpler techniques, which do not require personal and sophisticated devices and that produce faster results, have been developed. 3 one example is the qualitative detection of the e and n proteins genes through the genexpert (cepheid company) platform, in which the amplification process takes place within a cartridge and provides results in 45 min. 20 point-of-care tests for sars-cov-2 proteins, most commonly using lateral flow assays, are useful for diagnosis in regions where there are no specialized laboratories. 3, 21 the presence of genetic material in respiratory tract secretions has no direct relationship with virus viability or infectivity, since inactive or dead virus particles can be identified. 8 therefore, a patient with positive rt-pcr test is not always able to infect other people. the viability of sars-cov-2 and consequent infectivity can be assessed directly, in vitro, by its ability to contaminate cells and, indirectly, through the threshold cycles (the lower the ct, the higher the viral load) or identification of sub-genomic rna (which are transcribed only by viable viruses). 18 serological tests identify the presence of humoral response to sars-cov-2. antibodies of iga, igm, and igg isotypes specific to different virus proteins are detected by enzyme-linked immunosorbent assay (elisa) or chemi-luminescence immunoassays (clia), and the latter has been shown to be more sensitive. 21 it is known that the priority immune response to the virus is related to the cytotoxic activity of nk cells and cd8 + t lymphocytes. there is evidence of robust cellular response to sars-cov-2, regardless of the results of serological tests; 22 however, tests to evaluate the specific cellular immune response for sars-cov-2 are not yet commercially available. antibodies against s protein, where the receptor-binding domain (rbd) is located, are very specific for sars-cov-2; 10 their levels presented a good correlation with the virus's neutralization capacity. 23 however, the role of antibodies directed to other proteins in the pathogenesis of covid-19, even promoting a greater penetration of the virus into cells, still need to be elucidated. 24 sensitivity and specificity of serological tests vary according to the testing technique, specificity of the antibody studied, duration of symptoms at the time of collection, and immunocompetence of the individual. 4 however, actual sensitivity and specificity values for these tests are difficult to define considering that a gold standard for diagnosis with high sensitivity is not yet available. 11 most of the tests in use were not evaluated in scientific publications. 21 the assessment of specific antibodies to n protein is more sensitive and less specific, since this protein is more abundant in coronaviruses. antibodies directed to s protein are more specific to sars-cov-2, because in this protein is rdb. 8 in addition, other factors that interfere with the results are duration of symptoms when the blood is collected and severity of the clinical picture. igm is identified from the fifth day of symptomatology, and more significantly, from the eighth day onwards. the specific iga dosage appears to be more sensitive and the values seem to increase earlier than those of igm. 21 specific igg values begin to be detectable from the tenth day of symptom onset, and more significantly, from the 14th day onwards. 21 these tests are therefore not appropriate for the early diagnosis of covid-19. they are, however, relevant when rt-pcr is not available or is negative in the face of a suggestive clinical picture, when the patient has been symptomatic for over 14 days, 8,21 or to assist in the diagnosis of covid-19-related multisystemic inflammatory syndrome. 25 some studies report patients with mild (or even asymptomatic) covid-19 present lower levels of sars-cov-2-specific antibodies or may even do not develop detectable levels, while patients with more severe conditions have higher levels of these. 26---28 these data raise questions about the protective capacity of antibodies and may suggest the participation of specific antibodies in the pathogenesis of covid-19. 14, 24 one study demonstrated that the positivity of serological tests was not accompanied by a rapid drop in virus elimination, which may indicate that the positivity of these tests does not necessarily imply prompt resolution of the disease or absence of infectivity. 18 it has recently been shown that specific igg levels suffer significant decline after two/three months. 27 considering that the immune response to the virus is primarily cellular, it is not yet known what are the implications of this reduction in the protection against the virus. regardless of the test used for diagnosis, either identification of genetic material of the virus or serologic test, the interpretation of the results is based on the accuracy of the test itself, and also on the estimated risk of the disease before the results. this risk is modified by the prevalence of covid-19 in a given region. 11 this means that tests developed in regions where the prevalence of sars-cov-2 infection is high tend to have lower sensitivity when used in regions where the prevalence is lower. a single negative test in an individual with a characteristic clinical picture should not discard the possibility of covid-19. 11 in turn, a positive rt-pcr has greater strength to confirm the diagnosis than a negative test has to discard it, since it presents high specificity, with only moderate sensitivity. 11 point-of-care tests for antibodies against sars-cov-2 using lateral flow assays (usually immunochromatography) are quite numerous and many of them have not been adequately validated. 20 moreover, they were tested in the laboratory using plasma or serum, but have been applied with whole blood, which can greatly modify their sensitivity. 21 they are not recommended to be used for the individual diagnosis of covid-19, but may be useful in implementing public policies. 29 these are laboratory or imaging tests that demonstrate characteristic manifestations of covid-19, its complications, and/or risk factors for complications. complete blood count ---lymphopenia, eosinopenia, and neutrophil/lymphocyte ratio ≥ 3.13 are related to greater severity and worse prognosis. thrombocytopenia is related to a higher risk of myocardial damage and a worse prognosis. 2 lymphopenia results from a multifactorial mechanism that includes the cytopathic effect of the virus, induction of apoptosis, il1-mediated pyroptosis, and bone marrow suppression by inflammatory cytokines. 30 high values of c-reactive protein (crp), ferritin, d-dimer, procalcitonin, lactic dehydrogenesis (dhl), prothrombin time, activated partial thromboplastin time, amyloid serum protein a, creatine kinase (ck), glutamic-pyruvic transaminase (sgpt), urea, and creatinine are risk factors for more severe disease, thromboembolic complications, myocardial damage, and/or worse prognosis. 2,30---32 immunological markers that may also represent risk factors for greater severity and/or worse prognosis are: decreased values of cd4 + t and cd8+ lymphocytes, and nk cells and increased values of il6, il-8, il-10, ifn-␥, tnf-il-2r, tnf-␣, gm-csf, and il-1 ␤. 2, 32 imaging tests imaging tests for the diagnosis of covid-19 have gained relevance, given the unavailability of tests for etiological diagnosis. [ 3] the alterations described in these tests can also be found in influenza or mycoplasma infections, in inflammatory processes of different origins, or in eosinophilic lung diseases. 33 although the findings in these tests are not specific to covid-19, given a compatible clinical picture and/or the presence of confirmed or possible history of contact, they may help in the diagnosis. plain chest x-rays are less sensitive than computed tomography, but may evidence sparse bilateral consolidations accompanied by ground glass opacities, peripheral/subpleural images, predominantly in the lower lobes. 33 computed tomography of the chest presents greater sensitivity and reveals multifocal, bilateral, peripheral/subpleural ground glass opacities, generally affecting the posterior portions of the lower lobes, with or without associated consolidations. 33, 34 children have a similar presentation to that found in adults, albeit with a milder involvement. 33 the halo sign, described as a consolidation area involved by ground glass opacities, was identified in 50% of the children. 33 an inverted halo sign, in which areas of ground glass opacities are surrounded by condensation halo, has also been described. 35 pulmonary ultrasonography has good sensitivity; the typical findings are b-lines, consolidations and pleural thickening. 36 the advantages of this method are its lower cost, absence of radiation exposure, and the fact that it does not require sedation or transportation of unstable patients. 37 most studies on diagnostic methods presented here refer to adults; however, studies specific to the pediatric age group show very similar data. 38 the data presented suggest that the diagnosis of covid-19 should be based on clinical manifestations, contact history, imaging tests, laboratory tests, and not only on serological tests and the search for the genetic material of the virus. in addition, strategies to increase sensitivity, specificity, and speed of diagnosis are fundamental. 11 the gold standard for the diagnosis of sars-cov-2 infection is the identification of viral genetic material by rt-pcr, in different samples, with greater sensitivity in bronchoalveolar lavage and nasopharyngeal swab. many factors related to the individual, the collection procedure, and the test technique interfere with the sensitivity of these tests. therefore, a negative test in a patient with a characteristic clinical picture should not discard the possibility of covid-19. the available serological tests are different from each other and many factors influence their sensitivity and specificity. not all patients who have sars-cov-2 infection will have detectable levels of antibodies, particularly if they have milder symptoms. the absence of antibodies does not imply the absence of contact or protection against the virus, since there may be an efficient specific cellular immune response. in turn, the presence of antibodies does not rule out the possibility that the individual is still infectious, as no immediate reduction in the elimination of the virus has been identified. the support laboratory and imaging tests show alterations that are characteristic of covid-19, but they lack specificity. the diagnosis of covid-19 should be based on clinical and epidemiological history, tests for etiological diagnosis, and tests to support the diagnosis of infection and/or its complications. pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (covid-19): a review immunology of covid-19: current state of the science diagnosing covid-19: the disease and tools for detection mandell, douglas and bennett's principles and practice of infectious diseases diagnosis of viral infections host defense to viruses user's guide to the medical literature ---essentials of evidence-based clinical practice interpreting diagnostic tests for sars-cov-2 detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr sars-cov-2 and the covid-19 disease: a mini review on diagnostic methods interpreting a covid-19 test result reconstructed diagnostic sensitivity and specificity of the rt-pcr test for covid-19. medrxiv. 2020, preprint real-time rt-pcr in covid-19 detection: issues affecting the results the first, holistic immunological model of covid-19: implications for prevention, diagnosis, and public health measures evaluating the accuracy of different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding of 2019-ncov infections. medrxiv. 2020, preprint viral dynamics in mild and severe cases of covid-19. the lancet infectious diseases clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan virological assessment of hospitalized patients with covid-2019 saliva as a noninvasive specimen for detection of sars-cov-2 covid-19: laboratory diagnosis for clinicians. an updating article antibody tests for identification of current and past infection with sars-cov-2 robust t cell immunity in convalescent individuals with asymptomatic or mild covid-19. biorxiv. 2020, preprint an affordable anti-sars-cov-2 spike elisa test for early detection of igg seroconversion suited for large-scale surveillance studies in low-income countries dissecting antibody-mediated protection against sars-cov-2 multisystem inflammatory syndrome in children (mis-c) related to covid-19: a new york city experience the dynamics of humoral immune responses following sars-cov-2 infection and the potential for reinfection longitudinal evaluation and decline of antibody responses in sars-cov-2 infection. medrxiv. 2020, preprint clinical and immunological assessment of asymptomatic sars-cov-2 infections point-of-care diagnostic tests for detecting sars-cov-2 antibodies: a systematic review and meta-analysis of real-world data immune response to sars-cov-2 and mechanisms of immunopathological changes in covid-19 the role of biomarkers in diagnosis of covid-19 -a systematic review clinical and immunological features of severe and moderate coronavirus disease 2019 international expert consensus statement on chest imaging in pediatric covid-19 patient management: imaging findings, imaging study reporting and imaging study recommendations chest ct features of covid-19 in rome covid-19 pneumonia and the reversed halo sign thoracic ultrasound and sars-covid-19: a pictorial essay lung ultrasound findings in patients with coronavirus disease (covid-19) covid-19 in 7780 pediatric patients: a systematic review the author declares no conflicts of interest. key: cord-276577-06boh550 authors: schanzer, dena l.; garner, michael j.; hatchette, todd f.; langley, joanne m.; aziz, samina; tam, theresa w. s. title: estimating sensitivity of laboratory testing for influenza in canada through modelling date: 2009-08-18 journal: plos one doi: 10.1371/journal.pone.0006681 sha: doc_id: 276577 cord_uid: 06boh550 background: the weekly proportion of laboratory tests that are positive for influenza is used in public health surveillance systems to identify periods of influenza activity. we aimed to estimate the sensitivity of influenza testing in canada based on results of a national respiratory virus surveillance system. methods and findings: the weekly number of influenza-negative tests from 1999 to 2006 was modelled as a function of laboratory-confirmed positive tests for influenza, respiratory syncytial virus (rsv), adenovirus and parainfluenza viruses, seasonality, and trend using poisson regression. sensitivity was calculated as the number of influenza positive tests divided by the number of influenza positive tests plus the model-estimated number of false negative tests. the sensitivity of influenza testing was estimated to be 33% (95%ci 32–34%), varying from 30–40% depending on the season and region. conclusions: the estimated sensitivity of influenza tests reported to this national laboratory surveillance system is considerably less than reported test characteristics for most laboratory tests. a number of factors may explain this difference, including sample quality and specimen procurement issues as well as test characteristics. improved diagnosis would permit better estimation of the burden of influenza. although influenza virus infection is associated with considerable morbidity and mortality [1] [2] [3] , laboratory confirmation of clinical illness is the exception rather than the rule. clinicians do not routinely seek laboratory confirmation for several reasons: diagnosis will often not alter patient management, a paucity of real-time, accurate, inexpensive testing methods [4] and because influenza is not recognized as the etiology of the clinical presentation [5] . accurate diagnosis of influenza-like illness, however, could improve clinical care through reduced use of antibiotics and ancillary testing, and more appropriate use of antiviral therapy [6] . although rapid influenza tests such as pointof-care tests are purported to generate results in a timely fashion to influence clinical care, the performance characteristics of the currently available tests are sub-optimal [7] . new technologies with improved sensitivity such as reverse-transcriptase polymerase chain reaction (rt-pcr) [8] as well as the use of more effective collection systems such as the flocked nasopharyngeal swab compared to traditional rayon wound swabs, and the recommendation to collect more ideal specimens, such as nasopharyngeal swabs rather than throat swabs are likely to improve diagnostic sensitivity [9] [10] [11] [12] . the performance characteristics of currently available tests for influenza vary considerably and the overall sensitivities of these tests when used in routine practice are also dependent on the type of specimen collected, the age of the patient and point in their illness in which they are sampled [4, 9, [13] [14] [15] . we sought to estimate the sensitivity of influenza testing based on results of a national respiratory virus surveillance system using a model-based method [1, 2, [16] [17] [18] . weekly respiratory virus identifications from september 1999 to august 2006 were obtained from the respiratory virus detection surveillance system (rvdss), public health agency of canada [19, 20] . the rvdss collects, collates, and reports weekly data from participating laboratories on the number of tests performed and the number of specimens confirmed positive for influenza, respiratory syncytial virus (rsv), para-influenza virus (piv), and adenovirus. specimens are generally submitted to laboratories by clinicians in the course of clinical care, and by clinicians participating in one of our national influenza surveillance programs, (fluwatch [20] ). indicators of influenza activity are reported year round on a weekly basis to the fluwatch program. the rvdss is supplemented by case reports of influenza positive cases [19, 21] . from the case reports, influenza a was confirmed in all age groups and sporadic cases were confirmed in the off-season months of june through september. infants and children under the age of 5 years accounted for 25% of the influenza a positive tests, and persons over the age 65 years another 35%. unfortunately, fluwatch surveillance data does not provide the total number of tests by age. testing practices are known to be varied [22, 23] . the predominant testing methods used for influenza detection varied considerably by province or laboratory and over time. for the 2005/06 season a survey of laboratory techniques in current use indicated that culture accounted for 44% of the diagnostic tests with rt-pcr, rapid antigen tests and direct fluorescent-antibody assay (dfa) accounting for 21%, 19%, and 16% respectively [23] . the weekly number of tests negative for influenza was modelled, using poisson regression, as a function of viral identifications for influenza, rsv, adenovirus and piv as well as a baseline consisting of seasonality, trend and holiday variables. the estimated baseline implicitly accounts for influenza tests on specimens taken from patients with respiratory infections due to respiratory pathogens other than the four viruses captured in the rvdss, as long as both the testing behaviour of clinicians and respiratory illnesses caused by other respiratory pathogens follow a consistent seasonal pattern as prescribed by the model (see below, the poisson regression model with a linear link function was estimated using sas [24] proc genmod: coefficients b 5 to b 9 are multipliers. the weekly number of influenza negative tests estimated to be falsely negative is given by b 5 infla w +b 6 inflb w . the weekly number of influenza negative tests attributed to rsv is given by b 7 rsvp w. , and similarly for adenovirus and piv. for each positive influenza a test, an additional b 5 tests above baseline were performed and found to be negative. by specifying a linear link, a value of 0.33, say, for coefficient b 5 , means that for every test for which influenza a was confirmed, 0.33 additional tests, on average, were performed on truly influenza a positive specimens and found to be negativewhich corresponds to a sensitivity of 75%. sensitivity was calculated as the number of influenza positive tests divided by the number of influenza positive tests plus the model-estimated number of false negative tests, or equivalently, the estimates of sensitivity for influenza a and b are given by 1/ (1+b 5 ) and 1/(1+b 6 ) respectively. the false negative rate is 1 minus sensitivity. while the null value for b 5 is zero, which indicates no statistical association between the number of influenza positive tests and the number of influenza negative tests, the corresponding null value for sensitivity is 1. for each test confirmed positive for rsv, on average b 7 tests were performed for influenza and found to be negative for influenza. these b 7 tests are attributed to an rsv infection, however the number of influenza-negative tests that actually tested positive for rsv is unknown. if all specimens had been tested for the same viruses (panel tests), 1/b 7 would correspond to the sensitivity for rsv testing, and the sensitivity for adenovirus and piv given by 1/b 8 and 1/b 9 respectively. some laboratories are known to test for viruses sequentially [22] , and so 1/b 7 -1/b 9 were not interpreted as estimates of the sensitivity for other viruses. sequential testing may occur if a rapid test for influenza is negative and the laboratory then performs pcr or culture testing. similarly in young children with a respiratory illness in the winter, rapid tests for rsv infection may be performed first, and only specimens with negative results submitted for subsequent testing for influenza or other respiratory viruses [25] . by contrast, many laboratories conduct panel tests for multiple viruses for ease of handling, decreased patient sampling, and recognition that coinfection can occur. either form of sequential testing would not bias the estimate of sensitivity applicable to test results reported to rvdss, though significant use of rapid antigen tests in the laboratories reporting to rvdss would reduce the overall sensitivity. as a single specimen may undergo multiple tests, the false-negative rate applicable to a specimen that has undergone multiple tests would be expected to be much lower than the system average for individual tests. parameters b 1. to b 4 account for trends and the seasonality of truly negative specimens (patients presenting with other acute respiratory infections). over 50,000 tests for influenza were reported to the rvdss each year, peaking in 2004/05 at 101,000. overall 10% of the influenza tests were positive for influenza, ranging from 4% to 13% depending on the season. the proportion positive for rsv, parainfluenza and adenovirus averaged 9%, 3% and 2% respectively. as seen in figure 1 , no virus was identified in 75% of specimens submitted for testing (white area under the curve). even for the winter months of december through april, one of these 4 viruses was identified on average in no more than 30% of the specimens. the strong and consistent synchronization of negative tests with influenza positive tests, as seen in figure 1 , is suggestive that false negative results contributed to the large number of negative tests during periods of influenza activity. the sensitivity for influenza a testing averaged 33.7% (with model-estimated 95% confidence intervals of 33.3-34.1) for the 1999/2000-2005/06 period. influenza b testing had a similar estimated sensitivity at 34.7 (95% ci 33.4-36.1). estimated sensitivities varied somewhat from season to season, generally ranging from 30%-40% (table 1) , and provincial level estimates, as well, were within a similar range. stratifying by province or season produced similar estimates for the sensitivity of influenza a testing: 32% (95% ci 30-34) and 36% (95% ci 33-41) respectively. estimates of sensitivity based on test results reported to the rvdss for individual laboratories with sufficient data to fit the model showed significant variation, with estimates of sensitivity ranging from 25-65%. as expected, laboratories using primarily rapid antigen tests had lower estimated sensitivities, and laboratories that used pcr methods had higher sensitivity estimates. however, information on testing procedures is limited primarily to the 2005/06 survey. as well, additional irregularities were noticed in the laboratory data and not all laboratories provided sufficient data to fit the model. figure 2 illustrates a good model fit where the weekly number of influenza negative tests is well explained by the model covariates, with a few exceptions. firstly, it is evident that additional specimens were tested during the sars period, as indicated by the period where the number of weekly influenza negative tests exceeded the expected number, or equivalently, a period of successive positive residuals. residuals typically capture random variation; hence represent tests that can not be allocated based on the specified model. in addition to the sars period, testing appears to have been elevated for a number of weeks in january 2000 during the peak of the 1999/2000 a/ sydney/05/97 (h3n2) season in which respiratory admissions were unusually elevated [26, 27] , and in december 2003, when an elevated risk of paediatric deaths associated with the a/fujian/411/02 (h3n2) strain [28] was identified in the us. as these periods corresponded to a period of heightened public awareness due to severe influenza outbreaks, parameter estimation was repeated without these data points. exclusion of these data points did not alter the sensitivity estimate for influenza. the attribution of influenza negative test results to influenza and other viruses is illustrated in figure 3 . the baseline curve is the model estimate of the number of tests that were likely truly negative for all four viruses tested. a reduction in specimen collection and testing, primarily for viruses other than influenza, is also evident over the christmas period ( figure 3) . the weekly proportion of tests confirmed positive for influenza peaked each season at 15 to 30%. accounting for the model estimated false negative rate suggests that during periods of peak influenza activity, 40-90% of tests were performed on specimens taken from persons recently infected with influenza. influenza was confirmed in only 14% of specimens sent for testing over the winter period, whereas the sensitivity estimate would imply that up to 40% of influenza tests could be attributed to an influenza infection. the corresponding figures for the whole year indicate that 10% of specimens were confirmed positive for influenza and 30% of influenza tests could be model-attributed to an influenza infection annually. despite a relatively large number of tests in the off-season, the number of influenza positive tests was almost negligible; suggesting that the false positive rate applicable to rvdss influenza testing is minimal. the model estimated sensitivity based on influenza test results reported to the rvdss of 30-40% is much lower than the standard assay sensitivities documented in the literature. standard sensitivities for diagnostic procedures used by participating laboratories ranged from 64% for rapid antigen tests to 95% for rt-pcr tests, averaging 75% for the study period [23] . as performance characteristics of specific tests are generally based on high quality specimens, the difference of approximately 40% is likely linked to any one of many operational procedures that affects the quality of the specimen and its procurement. unlike validation studies, our samples are taken from a variety of clinical settings and processed with a variety of procedures across the country. as well, variation in the indications for diagnostic testing may vary across the country. as there are many other respiratory pathogens that are not routinely tested for, or reported to the rvdss, including human metapneumovirus (hmpv), coronaviruses, and rhinoviruses for which patients may seek medical care and present with influenza like illness [29] [30] [31] [32] , a large proportion of negative test results was expected. the overall model fit, and the general consistency of the sensitivity estimates, suggests that these many respiratory viruses were reasonably accounted for by the seasonal baseline and that the strong association between the number of influenza positive and influenza negative tests on a weekly basis is indicative of a significant number of false negative results, rather than the activity of another virus or viruses exactly synchronous with influenza. the latter would bias the estimated sensitivity of the system downwards. however, to significantly and consistently bias the estimate, the degree of synchronization would have to be fairly strong, persist over the whole study period, and occur in all provinces. synchronization was not observed among the rvdss viruses (influenza a, influenza b, rsv, adenovirus and piv), and elsewhere other viruses such as rhinovirus, coronavirus and hmpv accounted for only a small proportion of the viral identifications and were not found to be synchronized with influenza [33] . as well, patients may present for care due to a secondary bacterial infection. while any specimen would likely test negative as the virus, at this point, is likely not detectable, the model would statistically attribute a negative test in this case to the primary infection; one of the four rvdss viruses or to the seasonal baseline that represents other respiratory infections, depending on the level of viral activity at the time of the test. this is not considered a source of bias. the large variation in false negative rates estimated for individual laboratories reporting to the rvdss suggests that standardization of sample procurement, testing and reporting procedures would likely reduce the overall false negative rate. the accuracy of diagnostic tests is known to be affected by the quality of the specimen [10, 11] , its handling, the timing of collection after symptom onset, and the age of the patient [14, 15] . even with the most sensitive molecular methodologies, yield was shown to be strongly related to the time since onset of symptoms [9, 14] , with a 3-fold decline in proportion positive within 3 to 5 days after onset of symptoms for both rt-pcr and culture procedures. for most laboratory tests, specimen procurement within 72 hours of from the onset of symptoms is recommended [6] , yet patients often present much later in the course of illness. estimates of the median time since onset of symptoms suggest a delay of 3 and 5 days for outpatient and inpatients respectively [15] , however these estimates are limited to patients with laboratory confirmed influenza. in addition, there are inherent differences in the performance characteristics of the currently used diagnostic tests [4, 6, 8, [34] [35] [36] [37] [38] . lack of standardization between diagnostic tests and algorithms used in different laboratories reporting to the rvdss adds to this complexity. the routine use of rt-pcr testing has only recently become available in canada (only 20% of tests used rt-pcr methods as of 2005/06 [23] ), but increased use of this modality is expected to improve accuracy. population or system level sensitivity estimates that include the effects of sample quality are limited. grijalva and colleagues [39] estimated the diagnostic sensitivity in a capture recapture study of children hospitalized for respiratory complications at 69% for a rt-pcr based system and 39% for a clinical-laboratory based system (passive surveillance of tests performed during clinical practice, and using a variety of commercially available tests). though the expected proportion of influenza tests that were due to influenza infections is unknown and variable, our model estimate of 30% appears plausible. cooper and colleagues [33] attributed 22% of telephone health calls for cold/flu to influenza over two relatively mild years, and elsewhere 20% of admissions for acute respiratory infections (including influenza) in adults aged 20-64 years were attributed to influenza, and 42% for seniors [1] . while there are limitations with this approach, there are no other simple alternatives to assist in the interpretation of the rvdss data. it would have been helpful to analyze data based on each specimen sent for testing. with only the number of weekly tests and number of positive results, we were unable to calculate the number of specimens that were actually found to be negative for all four viruses, or to estimate the extent of co-infection. coinfection, which was not accounted for in our model, could result in an under-estimation of the number of falsely negative tests, as the attribution of an influenza negative test that was actually coinfected with influenza and another respiratory virus would have to be split between the viruses. with auxiliary information associated with each specimen, model estimates of false negative rates based on, for example, test type, time since onset of symptoms, age of the patient, or clinical presentation would have allowed us to explore the reasons for the high false negative rates. as the false negative rate appears to be laboratory dependant (data not shown), this estimated range is applicable only to the rvdss for the study period. a significant reduction in the false negative rate is anticipated as methods become standardized and with the uptake of the new rt-pcr methods. as positive results, particularly for culture, are often obtained a week or more after the specimen was received, some positive results may have been reported in a different week than the test. multiple test results for a single specimen may have also contributed to reporting irregularities. these irregularities would tend to bias the estimated parameter towards zero, and hence the estimated sensitivity towards 1. considering the overall model fit and the relative severity of influenza [1] , we conclude that our estimate of sensitivity may be slightly over-estimated (number of false negatives under-estimated). poor test sensitivity contributes to the chronic underestimation of the burden of influenza in the general population. since estimates of the burden of illness drive planning for preventive and therapeutic interventions, it is important to improve all aspects leading to improved diagnostic accuracy. we have illustrated a simple method that uses the surveillance data itself to estimate the system wide sensitivity associated with the weekly proportion of tests confirmed positive. although our estimate of sensitivity is only applicable to the interpretation of the rvdss data over the study period, similar estimates for specific cohorts or laboratory procedures may help guide further investigation into the reasons for the large number of false negative test results. the capacity for improved diagnostic accuracy will ultimately improve our understanding of the epidemiology of influenza. role of influenza and other respiratory viruses in admissions of adults to canadian hospitals co-morbidities associated with influenza-attributed mortality influenzaattributable deaths: canada 1990-1999 sensitivity of diagnostic tests for influenza varies with the circulating strains accuracy and interpretation of rapid influenza tests in children role of the laboratory in diagnosis of influenza during seasonal epidemics and potential pandemics the limitations of point of care testing for pandemic influenza: what clinicians and public health professionals need to know genescan reverse transcription-pcr assay for detection of six common respiratory viruses in young children hospitalized with acute respiratory illness enhancing the predictive value of throat swabs in virological influenza surveillance comparison of flocked and rayon swabs for collection of respiratory epithelial cells from uninfected volunteers and symptomatic patients use of throat swab or saliva specimens for detection of respiratory viruses in children nasal swab versus nasopharyngeal aspirate for isolation of respiratory viruses increased detection of respiratory syncytial virus, influenza viruses, parainfluenza viruses, and adenoviruses with real-time pcr in samples from patients with respiratory symptoms virological surveillance of influenza-like illness in the community using pcr and serology effectiveness of reverse transcription-pcr, virus isolation, and enzyme-linked immunosorbent assay for diagnosis of influenza a virus infection in different age groups hospitalization attributable to influenza and other viral respiratory illnesses in canadian children influenza-attributed hospitalization rates among pregnant women the modelled attribution of the weekly number of specimens tested for influenza to influenza (a and b), and adenovirus, parainfluenza virus, and rsv combined is shown along with the numbers confirmed positive. the total is the number of weekly tests for influenza (most were likely panel tests). the baseline accounts for routine tests in the hypothetical absence of influenza, rvs, adenovirus and parainfluenza activity, and corresponds to the model estimate of the number of tests that were truly negative for all tested viruses. the blue area (light plus dark) corresponds to tests attributed to influenza, with the light blue area corresponding to tests confirmed positive for influenza. the purple area (light plus dark) corresponds to tests attributed to rsv, adenovirus or parainfluenza influenza-associated hospitalizations in the united states influenza in canada: 2005-2006 season influenza in canada: 2003-2004 season antiviral therapy and outcomes of influenza requiring hospitalization in ontario impact of changing laboratory diagnostics on influenza surveillance sas/stath 9 user's guide strategy for efficient detection of respiratory viruses in pediatric clinical specimens prescription for excellence: how innovation is saving canada's health care system emergency department overcrowding: ambulance diversion and the legal duty to care influenza-associated deaths among children in the united states characterization of viral agents causing acute respiratory infection in a san francisco university medical center clinic during the influenza season human metapneumovirus infections in adults: another piece of the puzzle human metapneumovirus infection in adults human metapneumovirus infection in the canadian population the contribution of respiratory pathogens to the seasonality of nhs direct calls superiority of reverse-transcription polymerase chain reaction to conventional viral culture in the diagnosis of acute respiratory tract infections in children real-time pcr in clinical microbiology: applications for routine laboratory testing evaluation of three immunoassay kits for rapid detection of influenza virus a and b performance of six influenza rapid tests in detecting human influenza in clinical specimens comparison of the directigen flu a+b test, the quickvue influenza test, and clinical case definition to viral culture and reverse transcription-pcr for rapid diagnosis of influenza virus infection estimating the undetected burden of influenza hospitalizations in children the authors acknowledge the support of the national fluwatch network and all those involved in the collection and compilation of this data. special thanks to the anonymous reviewers for valuable comments. key: cord-345454-r1ymzk6n authors: levesque, j.; maybury, d. w. title: a note on covid-19 seroprevalence studies: a meta-analysis using hierarchical modelling date: 2020-05-06 journal: nan doi: 10.1101/2020.05.03.20089201 sha: doc_id: 345454 cord_uid: r1ymzk6n in recent weeks, several seroprevalence studies have appeared which attempt to determine the prevalence of antibodies against sars-cov-2 in the population of certain european and american locations. many of these studies find an antibody prevalence comparable to the false positive rate of their respective serology tests and the relatively low statistical power associated with each study has invited criticism. to determine the strength of the signal, we perform a meta-analysis on the publicly available seroprevalence data based on bayesian hierarchical modelling with markov chain monte carlo and generalized linear mixed modelling with prediction sampling. we examine studies with results from santa clara county (ca), los angeles county (ca), san miguel county (co), chelsea (ma), the comte de geneve (switzerland), and gangelt (germany). our results are in broad agreement with the conclusions of the studies; we find that there is evidence for non-trivial levels of antibody prevalence across all study locations. however, we also find that a significant probability mass exists for antibody prevalence at levels lower than the reported figures. the results of our meta-analysis on the recent seroprevalence studies point to an important and strongly suggestive signal. infections with mild symptoms. moreover, high antibody prevalence indicates that a significant fraction of the population has already been exposed, lowering estimates of the infection fatality rate, and providing possible clues about herd immunity. given the importance of determining the societal-wide exposure to sars-cov-2, critically understanding the results from the seroprevalence studies represents a pressing concern. the recent serology study [1] from santa clara county, in which the authors report that the santa clara area has an antibody prevalence of 1.5% (exact binomial 95ci 1.11-1.97%), has received sharp criticism [2, 3] . in particular, the combination of the small sample size, small effect size, and a competitive false positive rate led to a study with relatively low statistical power. but, given the potential importance of serology analysis to our understanding of covid-19, we must extract all the available information that the data contain. we simply cannot afford to dismiss weak signals. along with the santa clara study, a number of other studies have recently appeared using different types of tests [4, 5, 6, 7, 8] . results are disparate. for example, the study in chelsea, massachusetts suggests an antibody prevalence as high as 30%, in marked contrast to the low levels detected by the santa clara study. given the early days of this type of research, and the unevenness of the infection rates across regions, a noisy picture does not surprise us. in this note we perform a meta-analysis based on the data from seven different studies. we apply two methods: 1) bayesian hierarchical modelling with markov chain monte carlo, 2) generalized linear mixed modelling (glmm) with prediction sampling. both methods lead to similar results. we find that there is evidence for non-trivial levels of antibodies in the populations across all the studies-in broad concordance with the seroprevalence study conclusions-but that smaller than publicly stated levels are also probable. the studies we analyze were conducted in the last few weeks; peer-reviewed scientific publications are not yet available. we construct our datasets from scientific preprints, manufacturers' specifications, and transcripts from interviews with study authors published in the news media. each study differs in its sampling strategy, and data on sample demographics is not yet available for most studies. in each study, the authors aim to collect a representative sample of their population. we did not attempt to post-stratify any of the results, and our analysis is therefore limited by our lack of knowledge of the demographics associated with the samples in each study. in this sub-section we summarize the data we found on each study. the studies are highly varied with differing levels of publicly available information. antibodies in the german town of gangelt. on april 10, 2020 the authors published preliminary results [4] , stating that "around" 500 individuals had been investigated, with 14% testing positive for igg antibodies. we assume this figure means that 70 individuals tested positive. the preprint does not mention which serology test the authors use, but the principal author is quoted in die zeit on april 10, 2020, stating that the study uses the igg elisa test produced by euroimmun ag [9] . the authors do not include demographic information, but they do refer to the who recommendation of investigating a random sample of 100-300 households. the authors state that they investigated "approximately" 1000 individuals from "approximately" 400 households, and that the preliminary results are for "approximately" 500 individuals. geneva, switzerland. this ongoing study started in early april 2020 in the comté de genève [10, 5] . the study releases results on a weekly basis. according to their study protocol, researchers use the same euroimmun elisa test for igg antibodies as the gangelt study. in the week of april 6 to 10, researchers found that 3.5% of the 343 participants tested positive, which we interpret as 12 individuals. in the week of april 14 to 17, they found that 5.5% of the 417 participants tested positive, which we interpret as 23 individuals. this study aims to test most residents from a county in colorado, with a population of approximately 8,000. the study is conducted in partnership with the serological test manufacturer ubi group. in a progress update published on april 21, the county announced that out of 4,757 antibody tests processed, 26 were positive, and 70 were "borderline". the county's web-page notes that a borderline result on the first test indicates that the result produced a "high-signal flash" which is not enough to produce a positive result. a borderline result means that the individual may have been recently exposed to covid-19 and may be in the early stage of producing antibodies. in our analysis, we include the borderline cases (96 total positive tests, out of 4,757). covid-19 antibody test manufactured in china by hangzhou biotest biotech co., ltd., and distributed in the united states by premier biotech, inc. this study was conducted between april 10, and april 11, 2020 [7, 12] , involving some of the same authors from the santa clara study. a preprint of this study is not yet available, but the raw numbers were announced in communiques made by la county public health, and university of southern california. out of 863 participants, 4.1% tested positive, which we interpret as 35 individuals. the researchers reported the 95% confidence bounds on the prevalence of 2.8%, and 5.6%. the sample was generated by a random draw from a marketing firm's database, made to be representative of the county's demographics. they used the antibody test sold by premier biotech, the same used for the santa clara study. we do not included due to a lack of data about the case counts and the serology test specifications. given the size of these studies and the potential impact of their stated results, we feel that releasing the underlying data is of paramount importance. • new york. on april 24, 2020 the state of new york announced the results of a covid-19 antibody study conducted over the preceding week. in the sample of 3,000 individuals, the state reported that 13% tested positive. within those, the subset of new york city residents (1,300 individuals) tested positive at 21%. we could not find any specifications about the antibody test used. • netherlands. the netherlands' central blood bank (sanquin) tracks covid-19 antibodies by sampling the blood donations they collect on a weekly basis. however, the test they use is unknown to us (one of the study leads is quoted in a science news piece [2] : hans zaaijer, a virologist at sanquin, the dutch national blood bank, who helped lead the study, says the team used a commercial test, which "consistently shows superior results" in validation studies, but didn't provide more details. . in their first set of results [8] , • denmark. denmark is conducting a study similar to the netherlands in which they are sampling weekly donations at their central blood bank. in an update published on april 7, 2020, the danish health authority [13] indicates that among 1,000 blood donors, 2.7% had tested positive for antibodies. they also reported that the sensitivity of their test is 70% (there was no information on test specificity). this sensitivity figure is the same reported by danish researchers for the euroimmun igg test [14] . however we could not confirm this association. the seven studies we consider use four different test types. each test has a different false positive rate and a different false negative rate. we include data on each test to build our models. this test is used in the gangelt [4] , and geneva studies [10, 5] . its performance was assessed independently by a team of danish researchers (lassaunière et al., [14] ). in their assessment, 20 out of 30 confirmed positive samples tested positive, and 3 out of 82 confirmed negative samples tested positive. the authors note that "borderline data were considered negative". these results mitigate the manufacturer's claim that their test has >99% specificity. one reason for the lower specificity seems to be the test's cross-reactivity with other coronaviruses, as found by okba et al. [15] . the manufacturer's own assessment [11] found that out of 397 confirmed positive samples, 352 tested positive, and out of 128 confirmed negative samples, 12 tested positive. the manufacturer claims that 100% of the blood samples collected at day 10 or later after infection from sars-cov-2 from patients who tested positive to covid-19 by other methods were also found to be positive using the ubi r sars-cov-2 elisa [16] . in the absence of an actual figure for the number of tests, we assumed a conservative n = 30. in addition, ubi indicates that out of "over 900" blood samples collected before the covid-19 outbreak, none tested positive. again, we took a conservative approach, and assumed n = 900. was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which this version posted may 6, 2020. . out of 75 clinically confirmed covid-19 patients with positive igg tested positive, and 78 out of 85 confirmed igm-positive samples tested positive. the manufacturer also found that 369 out of 371 confirmed negative samples tested negative. the santa clara study authors found that in a set of 37 samples confirmed to be pcr-positive, and iggor igm-positive, 25 tested positive. they also found that out of 30 confirmed negative samples, 30 tested negative. our first approach uses a hierarchical bayesian model, which we solved by markov chain monte carlo using stan [17] . there are three observables in the model: • n obs (i), the number of positive antibody tests out of n obs (i) individuals tested in each location i; • n fpos (j), the number of positive antibody tests out of n neg (j) confirmed negative samples in the validation for each test brand j (false positives); • n fneg (j), the number of negative antibody tests out of n pos (j) confirmed positive samples in the validation for each test brand j (false negatives). from these observables, we infer, p prev (i), the prevalence of antibodies at each location, i. the number of positive antibody tests n obs (i) observed in each study are random variables that depend on p prev (i), but also on p fpos (j(i)) and p fneg (j(i)), which are the 6 all rights reserved. no reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which this version posted may 6, 2020. . false positive and false negative rates of each serology test type j, indexed as a function of study i, namely n obs (i) ∼ binomial(n obs (i), p obs (i)), p obs (i) = p prev (i)(1 − p fneg (j(i))) + (1 − p prev (i))p fpos (j(i)), n fpos (j) ∼ binomial(n neg (j), p fpos (j)) n fneg (j) ∼ binomial(n pos (j), p fneg (j)). (1) our prior construction reads, with hyper-priors of, µ prev, fpos, fneg ∼ normal(0, 10) σ prev, fpos, fneg ∼ exponential(1). we run the model in stan using 4 chains each with 10,000 samples, discarding the first 5,000 as part of the mcmc warm-up. the 20,000 post warm-up samples result inr values above 0.9999 for all parameters in the model, and effective sample counts of approximately 10,000. we plot the marginal posterior density functions for the antibody prevalence at each location in figure 1 . the santa clara study shows a density function consistent with a high probability of a non-zero antibody prevalence, with a mean and a mode slightly greater than 1%, although we note that the posterior distribution does include zero. los angeles stands out-the mode of the distribution sits at 4% with a 95% credible interval that does not include zero. the german town of gangelt was particularly hard hit by covid-19 and the subsequent serology study suggests an antibody prevalence of 14%. our bayesian analysis is consistent with the study's result, but we see that the posterior distribution has a tail that includes a prevalence below 10%. the contours in figure 2 show a two dimensional slice through the posterior distribution revealing the probability density in the prevalence-false positive plane. while the santa clara study has a false positive rate competitive with the implied prevalence, our bayesian result clearly shows the mode located substantially greater than zero prevalence. figures 1 and 2 show that geneva has an implied prevalence close to zero. the euroimmun test used in the geneva studies has a relatively high false positive rate, creating tension in the resulting marginal posterior distributions. a significant probability mass sits at zero for both weeks of the geneva study. 7 all rights reserved. no reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the chelsea (ma) study uses a serology test (biomedomics) with a relatively high false positive rate. however, the strength of the signal coupled to the bayesian learning across all the studies strongly suggest a highly non-zero antibody prevalence level. the 95% credible interval around the mode excludes a prevalence level below 10%. as a check on the the bayesian implementation, we use a set of binomial generalized linear mixed models (glmm) with prediction sampling. mixed models provide 8 all rights reserved. no reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which this version posted may 6, 2020. figure 2 : the two dimensional marginal posterior distribution functions for antibody prevalence with the false positive rate at each study location from bayesian hierarchical model. estimations in situations which have have sub-population specific effects by borrowing strength from population averages (see, for example, [18] ). the borrowing effect or "shrinkage" tempers those sub-populations which have relatively less data but otherwise allows the data to speak for itself. glmm provides an optimal compromise between complete pooling and no pooling of the sub-populations in a regression analysis. in that sense glmm is a precursor to bayesian approaches which permit greater flexibility through priors and hyper-priors and afford more control over shrinkage effects. using [19, 20] , we build separate glmms for the empirical count observation process across the study locations, for the test type false positive rates, and for the test type false negative rates. in total, we have three glmms. we then build monte carlo prediction samples [21] for the means of each glmm and compute the implied prevalence. the glmm specification is: we estimate the implied prevalence distribution at each location from, where k represents the k-th monte carlo sample from the glmm prediction. the density function of p loc [k] is the prevalence density function at each location. we estimate the three separate glmms in eq.(4) using [20] and we display the fixed and random effects with their respective the 95% confidence intervals in figures 4, 5, and 6. to build our monte carlo estimate of the prevalence density function in eq.(5), we sample the predictions from the models over the uncertainty in each parameter with each realization representing the mean of the model, namely where i denotes the glmm model for the k-th sample. figure 7 shows the resulting density functions for the antibody prevalence in each location from the glmm prediction sampling. notice the similarity with figure 1; both of our implementations lead to similar density functions and are in broad agreement. the santa clara study suggests a prevalence of 1.5% and we see in the figure that the quoted value is just beyond the mode of the distribution. we also notice that santa clara has a heavy tail towards zero, just as we found in the bayesian analysis. yet, in part by relying on "borrowing strength" from the across all the studies, we also see that the santa clara result clearly suggests a non-zero antibody prevalence. again, we also see a strong effect in los angeles county, with a mode of 4%; the 95% confidence interval around the mean does not include zero. in figure 8 we show the 95% confidence intervals with the mean for all locations computed from the glmm sampling. in figure 9 we show a contour density map of the prevalence with the false positive rate. we can clearly see the tension between the false positive rate and prevalence but we also see a clear signal associated with each region. again, notice the similarity to figure 2. in particular, note that san miguel county (co) again shows a strong prevalence result, but that the glmm shrinks the false positive rate away from zero. the glmm is in some sense "semi-bayesian"-we have a hierarchical model with a random effect on the intercept term in logit space which we sample over, but we do not have a formal set of priors with hyper-parameters. while the glmm provides a basis for learning across the data, our full bayesian specification with markov chain monte carlo provides a more complete result. however, by implementing the glmm we see that our results are robust to different approaches and model specifications. our results are not sensitive to the specifics of the prior and hyper-prior constructions in our bayesian model. both methods demonstrate that there is significant evidence for non-trivial antibody prevalence in the populations associated with these studies. while the individual statistical power of each study is not high, using bayesian techniques and glmm constructions on the data from all the studies reveal a definite signal that we should take seriously. seroprevalence studies are an important tool in combating covid-19 since public policies are dependent on how far the disease has already penetrated into the general population. not only does serology testing help us understand the overall infection fatality rate of the disease, but testing also helps us design targeted strategies such as contact tracing. furthermore, serology studies can provide insight into the dynamics of the disease propagation. while we do not correct for possible population sample bias or other demographic issues, our analysis points to an important signal: the seroprevalence studies to date show that a significant fraction of the populations examined have antibodies against sars-cov-2 in their bloodstream. the exact prevalence levels are highly region dependent. as more serology studies appear, they will sharpen our understanding of antibody prevalence in the general population. the quality of antibody prevalence estimates depend on sample size and on the specificity/sensitivity of the antibody test. high test specificity increases statistical power. thus, there is a natural trade-off between investing in tests of high quality, the human resources required to generate large sample sizes, and the speed at which society needs results. the societal and economic consequences between type i and type ii errors are not equal. covid-19 antibody seroprevalence in santa clara county, california. medrxiv antibody surveys suggesting vast undercount of coronavirus infections may be unreliable how (not) to do an antibody survey for sars-cov-2. the new scientist, 2020 vorläufiges ergebnis und schlussfolgerungen der covid-19 case-clusterstudy séroprévalence covid-19 : première estimation de la prévalence d'anticorps anti-sars-cov-2 igg dans la population genevoise county covid-19 information usc-la county study: early results of antibody testing suggest number of covid-19 infections far exceeds number of confirmed cases in los angeles county slides from technical briefing given to the netherlands government on sars-cov-2 elisa test systems from euroimmun enquête de séroprévalence répétée des anticorps igg anti-sars-cov-2 dans la population du canton de genève covid-19 igm/igg rapid test early antibody testing suggests covid-19 infections in l.a. county greatly exceed documented cases covid-19 i danmark: status ved indgang til 6. epidemiuge status-og-strategi/covid19_status-6-uge.ashx?la=da&hash= 6819e71bfeaab5aca55bd6161f38b75f1eb05999) evaluation of nine commercial sars-cov-2 immunoassays. medrxiv united biomedical group's c19 company partners with san miguel county, colorado to be first in nation to test an entire county for covid-19 with new antibody diagnostic test stan: a probabilistic programming language generalized linear mixed models r: a language and environment for statistical computing. r foundation for statistical computing fitting linear mixed-effects models using lme4 mertools: tools for analyzing mixed effect regression models key: cord-354006-j1y42oxu authors: ozdemir, vural; williams-jones, bryn; glatt, stephen j; tsuang, ming t; lohr, james b; reist, christopher title: shifting emphasis from pharmacogenomics to theragnostics date: 2006 journal: nat biotechnol doi: 10.1038/nbt0806-942 sha: doc_id: 354006 cord_uid: j1y42oxu what will be the role of theragnostic patents in upstream and downstream biomarker research? what will be the role of theragnostic patents in upstream and downstream biomarker research? p harmacogenomics aims to identify the genetic basis of variability in drug efficacy and safety, and ultimately develop diagnostics that can individualize pharmacotherapy. theragnostics, a term denoting the fusion of therapeutics and diagnostics, is receiving increasing attention as pharmacogenomics moves to applications at point of patient care. in contrast to pharmacogenomics, theragnostic tests focus not on a singular marker set, such as genetic polymorphisms, but rather on the integration of information from a diverse set of biomarkers (e.g., genomic, proteomic, metabolomic). although it remains to be seen whether theragnostics reflects a form of hyperbole in biomarker research, it is grounded in both established (e.g., genomics) and exploratory (e.g., metabolomics) technologies that can offer, respectively, mechanistic and heuristic insights for therapeutics ( fig. 1) . recent social science analyses suggest that in some cases, bio-hype can be an integral component or driver for the establishment of biotechnologies, particularly in the early stages of development of a new concept or idea [1] [2] [3] [4] [5] . the synthesis of both types of technologies (established and exploratory) will likely have a differential impact on the regulation and economic promise of pharmaceuticals developed under the overarching theme of theragnostics. moreover, we suggest that advances in this field may shape, in potentially unexpected ways, the pursuit of theragnostic patents depending on whether the research is conducted in an upstream drug discovery-oriented context or towards downstream point-of-care applications 6, 7 . as biomarker applications move towards point-of-care to individualize drug therapy, a number of qualitatively different concerns arise relating to gene patents and ethical and therapeutic policy aspects of theragnostic testing 4, 8, 9 . for example, a fear is that theragnostic tests could be adopted without regard for the particular research context in which they are being applied, or be understood as a homogeneous category, a ubiquitous set of biotechnologies with similar implications for therapeutic policy 6, [9] [10] [11] . in effect, this may result in a predicament where the patents on biomarkers are conceptualized in a one-size-fits-all manner (as in drug prescriptions) thereby precluding the equitable implementation of emerging biomarker technologies and informed critique of their societal implications. additionally, the effects of theragnostic patents on 'knowledge commons'-that is, a space (usually in universities) where knowledge is shared without undue restriction-have not been adequately evaluated 11-13 . in the present analyses, we 'unpack' and contrast the motivations at play that are driving the pursuit for theragnostic patents and its bioethical corollaries in: (1) fundamental upstream basic research oriented to the discovery of genes for human diseases; and (2) downstream clinical applications at point-of-care as theragnostic tests to stratify patient populations for individualization of pharmacotherapy. we emphasize the need for integration, as well as the risk for excessive compartmentalization, of various biomarker technologies, and evaluate the subtle distinctions between dna-and protein-based theragnostic tests. the search for genetic determinants of common complex human diseases reflects one of the pivotal upstream applications of pharmacogenomics. as dna samples are being increasingly archived in pharmaceutical clinical trials, a sizable proportion of research resources are devoted to identifying the genes causally related to human diseases 4 . for many rare congenital monogenic human diseases (e.g., duchenne muscular dystrophy), pre-or postnatal cytogenetic analysis of chromosomes or conventional clinical chemistry tests have existed for several decades. more recently, highthroughput genomic technologies spun off from the human genome project (hgp) and the decreasing cost of genotyping have made possible the adoption of molecular genetic tests that can pinpoint the precise etiology or predisposition for a few adult-onset human diseases such as huntington disease, certain familial forms of breast cancer, and alzheimer disease 14 . while genetic tests hold the promise of a more rational clinical forecast and management of disease risk in the future, they are also raising concerns about the provision of public healthcare services (reduced access) and the impact of market forces on the products of research (commercialization of technologies) and academic freedom. some of these concerns have crystallized around the issue of commercial genetic testing for disease risk, as exemplified by the case of testing for hereditary breast cancer (brca testing) 15 . in the mid-1990s, two genes (brca1 & brca2) that greatly increase the risk for hereditary breast cancer were identified and sequenced. the brca genes were given broad patent protection in the us (and subsequently internationally in canada, europe, australia, etc.), and granted to the biopharmaceutical company myriad genetics (salt lake city, ut), which had been involved in much of the initial research. this strong intellectual property protection has allowed myriad to effectively control the brca testing market in the us; the commercial bracanalysis test is available only from myriad or their licensees. most recently, myriad has licensed their test to san franciscobased dnadirect, in order to provide services direct to consumers 16 . basic research into the function of the brca genes or resulting proteins would be permissible without infringing on myriad's patent rights, although there is still contention about what precisely constitutes basic research exclusions 17 . myriad has, for example, signed agreements with the us national institutes of health and national cancer institute to provide sequencing at cost (us $1,200) for research purposes 18 . however, research that results in a commercial or clinical service, defined by myriad as any research in which a fee is charged for testing or in which results are provided to patients, would infringe on their patents. notably, technology assessment research by third parties, for example to evaluate test performance metrics such as sensitivity, specificity, or positive predictive value, is particularly jeopardized. moreover, research aimed at comparing the bracanalysis test against other testing methodologies would prove difficult, as the clinical nature of such a trial would constitute an infringement on myriad's patents. hence, the brca patents give myriad the ability to constrain research-oriented applications of brca patents and particularly head-to-head comparisons of which genotyping methodology or test product is most informative for clinical management of the susceptibility to breast cancer. in part due to myriad's broad patent rights and the attendant concerns to ensure patients' access to affordable genetic testing for breast cancer risk, the european brca patents have been constrained or not enforced in recent years 15, 19 . in some sense the myriad case may be thought of as an extreme scenario, and one from which industry and governments have learned. one specific consequence is that the initial enthusiasm for granting such broadly defined upstream patents on any and all forms of biological material has waned due to concerns for public good and scientific progress. more generally, there is a growing awareness that patents on genes and other biological materials can have an unfavorable effect on downstream genetics research and knowledge commons 13, 20 . thus, without an adequate conceptual framework on patents relating to theragnostic technologies, upstream patents on potential drug target genes or genetic methodologies for molecular definitions of human diseases (as in familial breast cancer) may lead to a monopoly on theragnostic tests. this may also dissuade some research laboratories from investigating otherwise potentially promising lines of inquiry for technology transfer towards downstream theragnostic products in the clinic 20 . the myriad patent case remains relevant since, in part, the current trends at the us patent and trademark office (uspto), and in patent offices in other countries, to grant more narrowly defined patents on biological materials were driven by this case 21 . according to some commentators, "no other event has had as big an impact on the human gene patent debate…and the case [myriad] has thus become a 'harbinger' of the policy challenges created by gene patents" 21 . more recent examples also support the idea that upstream theragnostic patents can limit translational applied clinical research. for instance, the sars-associated coronavirus genome patents were filed by the us centers for disease control (cdc) and the british columbia cancer agency (bcca) [22] [23] [24] , officially to ensure continued public access to the viral genome and pre-empt other entities from exerting restrictive or controlling rights. notably, it is suggested that this type of patent is symptomatic of what is wrong with the [gene patent] system, when pre-emptive patents have to be filed to protect science and the public good 22, 23 . a more worrying example is illustrated by the australian company genetic technologies' patents on the non-coding regions of the human genome. formerly conceived to be 'junk dna' , the biological role of non-genic segments of the genome is receiving increasing attention, and their patenting may significantly constrain the free design of primers for pcr analysis of coding regions of the genome 25, 26 . a common thread in these two recent examples, however, is that broad theragnostic patents, particularly those granted on dna and other biological materials, may serve as tollbooths that impede downstream research or discourage competition and innovation due to the broad exclusive rights granted to an individual scientist, institution or company (e.g., as in the case of myriad). this 'anticommons' effect of upstream theragnostic patents is now increasingly being recognized. patents with a broad scope may actually enclose the 'knowledge-commons' and inhibit technology transfer and development at a societal or macro level (i.e., as a contrast from the viewpoint of an individual investigator), such that the promises of new diagnostics and therapeutics are not realized 6, 11, 13 . pharmacogenomic-guided drug development represents a fundamental conceptual departure from conventional 'one-size-fits-all' clinical trifigure 1 hierarchy of biomarkers and their integration into theragnostic tests, from gene sequence (upstream or static marker) to downstream (dynamic) markers on gene and protein expression or cellular metabolites. a theragnostic profile is depicted as a synthesis of various biomarker tests that characterize an individual patient and her/his drug treatment outcome. the theragnostic profile may be heuristic in nature when only a singular biomarker is associated with treatment outcomes while more mechanistic insights can be achieved when biomarkers from different levels of the biological hierarchy corroborate and complement each other. 27 . this is a favorable advance for rational therapeutics and optimal patient care but it also engenders varying degrees of trepidation among pharmaceutical companies and financial investors about proactive implementation in the clinic: pharmacogenomics may inevitably result in smaller economic markets for drugs introduced with an attendant genetic test predictive of drug efficacy or toxicity 4,6,9 . the pharma companies fervently respond that only drugs with large-scale markets allow recovery of the r&d costs for new medications 8, 9 , which can range from $400-800 million according to different estimates 28 . while r&d costs per se are not necessarily prohibitive to pursue targeted therapies, the pharmaceutical industry will still need to find mechanisms to maintain their growth rates in such niche markets defined by theragnostic tests. the ways in which upstream and downstream theragnostic patents are sought may play a decisive role in the development of focused therapeutic interventions in smaller markets that can benefit public good and industry growth equally. as a contrast to arguments of market fragmentation, pharmacogenomics and related theragnostic technologies may enhance therapeutic differentiation and market penetration of new medicines [29] [30] [31] . many of the currently marketed drugs, however, fall under the 'metoo' designation with comparable efficacy and safety profiles differing only in terms of slight changes in their chemical structures or pharmacophore composition 32 . hence, in diseases or therapeutic areas characterized by me-too drugs, the diagnostic companies without a pharmaceutical pipeline may be more inclined to develop theragnostic tests that can impact more than one drug by virtue of being in the same therapeutic or chemical class. conversely, in the case of large drug manufacturers, a theragnostic test for a me-too drug may be equally predictive of treatment outcomes for most if not all drugs within the same me-too category, redistributing the financial gains on the theragnostic test from an individual pharma company holding the theragnostic patent to multiple firms who manufacture similar me-too drugs. consider, for example, a patient with major depression receiving the result of a theragnostic test on the serotonin transporter gene in relation to antidepressant response to paroxetine, a selective serotonin reuptake inhibitor (ssri). in this case, the patient has the freedom afterwards to choose from among a host of comparable ssri drugs without necessarily having to commit to paroxetine co-developed with the hypothetical theragnostic test. therefore, the pursuit of theragnostic patents can also be shaped by the type of industry setting (e.g., diagnostic sector versus large pharma) as well as the type of pharmaceutical (e.g., me-too drugs) associated with theragnostic tests. regardless of the 'true' cost of drug development or the varied perceptions of the impact of pharmacogenomics or theragnostic tests on the economic promise of pharmaceuticals, the fact is that the blockbuster model of drug development with large-scale markets is increasingly less viable 8, 33 . when new technologies such as pharmacogenomics and theragnostics enter the market, they can become 'paradigm-disruptive' forces that significantly undermine the traditional broadly defined market model of drug development and commercialization. diverse and divergent diagnostic tests, multiple actors (e.g., biotech diagnostic companies, small and large pharma companies) seeking to create and protect their intellectual property, and changing social and political contexts (global demands for patent reform and licensing of drugs in the developing world) create an unstable environment for drug manufacturers. despite the often very public proclamations about an interest in integrating pharmacogenomic research into drug development strategies, prospective stratification of patients using genetic tests in advanced stages of drug development with a view to proactive incorporation of pharmacogenomic data into drug labels is still rare 33, 34 . this illustrates that there is a great uncertainty about how pharmacogenomic and theragnostic tests ought to be developed as functional commercial products. it is noteworthy that multiplicity of diagnostic patents anticipated by the introduction of theragnostic technologies may also result in an 'anticommons effect' since most pharmacotherapeutic outcomes are polygenic or multifactorial in nature. if each segment of this expanding sphere of patentable biological elements along the biological dogma is held by different individuals, academic researchers or commercial firms, scientific advances in theragnostics can again be stifled, as in the case of broadly defined upstream gene patents. this is further supported by at least two seemingly divergent but complementary lines of evidence. first, the uspto and patent offices in other countries increasingly favor narrowly defined gene patents, in part as a response to the myriad case. secondly, theragnostics is now introducing the need to characterize (and motivations to patent) downstream gene products such as mrna, proteins or cellular metabolites to individualize drug therapy. because time-dependent changes in gene expression or encoded proteins cannot always be accurately inferred from the upstream gene sequence, it is conceivable that there will be many more narrowly defined patents granted in the near future along the biological dogma from gene sequence to proteins and metabolites. coupled with trends in patent offices in favor of narrowly defined gene patents, there will likely be a fragmentation of the diagnostic sector, as in the case of the blockbuster drugs and the niche therapies guided by theragnostic tests. many of today's most common diseases (including most forms of cancer, heart disease and psychiatric disorders) are known to arise not exclusively from either genes or environmental factors, but through a combination of the two (along with a significant amount of incalculable stochastic factors). moreover, certain environmental exposures may only evoke illness when experienced during a critical period and in concert with a high-risk genetic background. therefore, theragnostic tests predicated on genetic information alone would have a low a priori likelihood of capturing all of the predictable variance in a particular response outcome. ideally, genetic tests could be fashioned to capture the entire heritable portion of a response variable (distributed across one or many genetic polymorphisms). in this context, genetic polymorphisms impart a constant or 'static' state of responsiveness that can be assayed once in each individual and presumed not to change over the course of the lifespan (barring de novo somatic mutations). the residual non-heritable portion of a given response phenotype must then be assayed by other means. to the extent that environmental (i.e., non-heritable) exposure influences response phenotypes by impinging on biological systems, this additional proportion of variance can be assayed through more 'dynamic' biomarker platforms, such as transcriptomics, proteomics, and metabolomics, each of which may be influenced by both genetic and environmental factors (fig. 1) . thus, through a combination of static genetic and other dynamic biological '-omic' technologies (i.e., theragnostics), the potential to identify a more comprehensive set of predictors is maximized. there are certain unique aspects of pharmacogenomic (and theragnostic) tests that differ from genetic testing for disease susceptibility. for all the parallels between genetic tests for disease risk and drug response, the latter are applied in reference to a drug that will be administered to patients in the immediate foreseeable future, while genetic testing for disease susceptibility usually predicts a risk in the distant future, often several years or decades away. this 'temporal dissociation' between the genetic test and the future disease occurrence may permit the estimation of the attendant cumulative disease risk with use of genetic data only; there is also a functional disconnect because the disease susceptibility test provides information which is rarely accompanied by effective treatment options. in contrast to genetic testing for disease risk, pharmacogenomic tests are envisioned as being both temporally and functionally proximal. the purpose of a pharmacogenomic test is not to provide risk information as such, but to aid in the individualized prescription of a particular drug. further, most drug effects are elicited within a matter of minutes, hours or days which may require a more precise estimation of the present or acute state of the pathophysiological pathway whose function is inferred through a genetic test. hence, because the only barrier between the patient and drug safety or efficacy may be reliance on the accuracy of a pharmacogenomic test, clinicians need to know both the genetic variants in patients' dna as well as the corresponding proteins encoded by the same genes. this is essential because (1) proteins are responsible for the eventual functional or clinical significance of genes and, (2) there may be marked differences or fluctuations in protein function (than what is predicted solely by gene structure) due to environmental factors or physiological feedback mechanisms that may influence posttranscriptional/posttranslational modification of gene products and proteins. further, an accurate prediction of drug effects may require a two-step complementary strategy involving, for example, both genetic and proteomic tests for the same gene and its protein product. this may create unprecedented challenges for patents and their legal defense. for instance, what are the implications of a biotechnology company attempting to develop a metabolomics-based, non-genetic 'dynamic phenotyping biomarker' for a gene patented hitherto for a static genotyping test to predict drug response or toxicity? there are presently no definitive answers to such emerging novel intellectual property issues associated with theragnostic tests. since its first appearance in the research literature in september 1997 (refs. 2,35) , the term 'pharmacogenomics' has been hailed as a revolutionary enabling technology that can deliver highly customized drug therapies in the short term. now, nearly a decade after its introduction, the more realistic expectation is that pharmacogenomics will complement efforts for rational individualization of drug therapy in conjunction with existing therapeutic monitoring tools and other novel biotech-nologies (e.g., proteomics and metabolomics). there is increasing support for the view that the human genome is highly dynamic and that gene expression, as well as the regulation of gene function, is subject to poorly understood plasticity. to achieve the much hoped for provision of personalized medicines, the role of environmental and social factors on both drug response and the human genome (and its expressed products, mrna and proteins) requires detailed consideration. there is also growing recognition that the search for genetic biomarkers of outcomes associated with therapeutic interventions may carry the risk for compartmentalization among biomarkers through excessive reliance on a singular biotechnology. it is against this background that theragnostics is slowly emerging as a new concept to synthesize information from various biotechnologies directed at different levels of the biological dogma ranging from dna (genomics), mrna (transcriptomics), proteins (proteomics) or cellular metabolites (metabolomics). unlike mainstream genetic tests for disease susceptibility, the commercialization of theragnostic tests is inextricably linked with the deployment of patented pharmaceuticals. we suggest that theragnostic patents, particularly in the case of downstream applications at point-of-care, can be at variance with the traditional blockbuster model of drug development which stipulates the development of drugs for the entire population even though this approach yields modest therapeutic response and suboptimal drug safety 8,10,33 . hence, a very different and unprecedented story is evolving for theragnostic patents at point-of-care: the traditional tight 'coupling' between patents and their subsequent commercialization may not always occur in clinical trials designed for the registration of new therapeutic candidates under the blockbuster model 6 . experts in biotechnology patent law have thus slowly begun to point to this potential 'uncoupling' between the discovery of biomarkers on treatment outcomes and the necessary technology transfer to develop theragnostic products in the clinic. instead, the downstream theragnostic patents on biomarker discoveries may remain primarily as in-house discoveries within the pharmaceutical industry to benefit future drug discovery efforts but without the accompanying translational clinical research for their development as a diagnostic kit for prediction of treatment response, failure or drug toxicity 6, 9 . this uncoupling of biomarker discovery and necessary technology transfer towards their clinical application poses a threat to theragnostic product development. it is thus con-ceivable that academic research initiatives for biomarker discovery that consider both drug efficacy and broader functional treatment outcomes 29, 36 will play a critical role in the development of theragnostic-guided personalized medicine. in contrast to downstream patents on biomarkers associated with drug efficacy and safety, upstream patents in drug discovery and the identification of novel drug targets may be particularly welcomed by the pharmaceutical industry. this raises concerns over such patents becoming tollbooths that can increase costs for theragnostic tests in the clinic and slow or block downstream applied research. these nuanced contextual differences in applications of theragnostic patents in upstream research or at point-of-patient-care can shape the motivations at play, the strategies behind the patenting of genes, and the subsequent commercialization into theragnostic tests that may (or may not) become available to patients and consumers. the theoretical and practical framework on patents needs to incorporate implications of theragnostics on both upstream and downstream biomarker research while ensuring technology transfer by more than one stakeholder, to prevent future market monopoly and excessive premium pricing of theragnostic tests 15 . as the pharmaceutical industry transitions from the blockbuster model towards targeted therapies with market shares that resemble orphan drugs, there is a parallel need to offer incentives to stakeholders who pursue theragnostic-guided drug development. the discipline of science and technology studies (sts) is already focused on the complex issues at the intersection of emerging biotechnologies, genetic research, bioethics, market forces and the pharmaceutical industry 2,7,37 . unfortunately, the expertise in the sts research community does not always find its way into the mainstream medical research literature 38, 39 . such collaboration among geneticists, ethicists, applied pharmacologists and social scientists is essential for the equitable implementation of commercial theragnostic testing in the clinic, but also to prevent the risk of bioethics being used as a 'rubber stamp' that can 'deal with the issues' prior to the development of anticipated theragnostic-guided customized therapies. as witnessed during the initial planning and implementation stages of the hgp, we suggest that adequate attention and research resources should be made available to resolve these and similar policy and patent issues associated with theragnostic testing at the point-of-care. additionally, these efforts should parallel the development of much needed prospective clinical investigations, designed primarily for the purpose of biomarker discovery, which can importantly contribute to development of targeted therapeutic interventions in the near future. we thus believe there is reason for guarded optimism that theragnostics may allow the synthesis of different types of biomarker data, dna, protein or metabolomicbased, to achieve individualized therapeutics in medicine. the politics of personalised medicine-pharmacogenetics in the clinic ethics and law of intellectual property: current problems in politics world health organization resource on patentability of the sars virus genome we have never been modern science in action: how to follow scientists and engineers through society all authors contributed to the ideas, critique and synthesis of the data discussed in the present review, as well as the specific considerations involving the role of gene patents on theragnostic tests and nuanced distinctions among different types of biomarkers. nature biotechnology volume 24 number 8 august 2006 key: cord-310195-am3u7z76 authors: waller, j.; rubin, g. j.; potts, h. w. w.; mottershaw, a.; marteau, t. m. title: immunity passports for sars-cov-2: an online experimental study of the impact of antibody test terminology on perceived risk and behaviour date: 2020-05-10 journal: nan doi: 10.1101/2020.05.06.20093401 sha: doc_id: 310195 cord_uid: am3u7z76 objective: to assess the impact of describing an antibody-positive test result using the terms immunity and passport or certificate, alone or in combination, on perceived risk of becoming infected with sars-cov-2 and intention to continue protective behaviours. design: 2 by 3 experimental design. setting: online with data collected between 28th april and 1st may 2020. participants: 1,204 adults registered with a uk research panel. intervention: participants were randomised to receive one of six descriptions of an antibody test and results showing sars-cov-2 antibodies, differing in the terms used to describe the type of test (immunity vs antibody) and the test result (passport vs certificate vs test). main outcome measures: the primary outcome was the proportion of participants perceiving no risk of becoming infected with sars-cov-2 given an antibody positive test result. other outcomes include intended changes to frequency of hand washing and physical distancing. results: when using the term immunity (vs antibody), 19.1% of participants [95% ci: 16.1 to 22.5] (vs 9.8% [95% ci: 7.5 to 12.4]) perceived no risk of catching coronavirus at some point in the future given an antibody-positive test result (aor: 2.91 [95% ci: 1.52 to 5.55]). using the terms passport or certificate, as opposed to test, had no significant effect (aor: 1.24 [95% ci: 0.62 to 2.48] and aor: 0.96 [95% ci: 0.47 to 1.99] respectively). there was no significant interaction between the effects of the test and result terminology. across groups, perceiving no risk of infection was associated with an intention to wash hands less frequently (aor: 2.32 [95% ci: 1.25 to 4.28]) but there was no significant association with intended avoidance of physical contact with others outside of the home (aor: 1.37 [95% ci: 0.93 to 2.03]). conclusions: using the term immunity (vs antibody) to describe antibody tests for sars-cov-2 increases the proportion of people believing that an antibody-positive result means they have no risk of catching coronavirus in the future, a perception that may be associated with less frequent hand washing. the way antibody testing is described may have implications for the likely impact of testing on transmission rates. at the height of the first wave of the covid-19 pandemic, about a third of the world's population is estimated to have been in lockdown, with all but essential workers largely confined to home (1) . without an effective treatment or vaccine, testing for infection combined with contact tracing and isolation will be central to effective strategies to ease populations out of lockdown while keeping the basic reproduction number (r0) below one (2) . testing for antibodies to sars-cov-2 is a possible complement to testing for active infection to identify those who have developed antibodies to the virus and so may be able to return to work and other activities without significantly increasing transmission rates (3) . these tests have been variously described in the media as immunity passports (4, 5) , immunity certificates (6,7) immunity cards (8) and release certificates (9) . unfortunately, the use of these terms implies a certainty unmatched by current evidence about antibody tests (10) . uncertainties inherent in tests for antibodies to sars-cov-2 include the extent and duration of immunity conferred (11) . they also include the uncertainties inherent in any test regarding the proportion of those who would be correctly identified. this depends upon the test performance -its sensitivity and specificity -as well as the population prevalence of the tested condition (12) . given these uncertainties, those who receive a test result indicating the presence of antibodies will have a residual risk of becoming infected by sars-cov-2 in the future. understanding that there is this residual risk -albeit one that is difficult to quantify at present -will be important to minimise transmission that could arise from those receiving "antibody positive" test results. if people testing positive perceive that they have no risk of becoming infected by the virus, they may ignore any future symptoms of infection and facilitate transmission if they fail to self-isolate appropriately. such a perception may also overgeneralise to a belief that they are unable to transmit infection through contact with contaminated surfaces. regardless of antibody status, all individuals can indirectly transmit the virus between surfaces by touch. hand washing or sanitising therefore need to remain frequent. evidence from other testing programmes suggests that interpreting a low risk result to mean no risk can be reduced by verbal and numerical expressions of residual risk when presenting test results (13, 14) . but even before testing programmes are in place, the terms commonly used to describe these tests -immunity passport or certificates -may inadvertently be fuelling a misplaced sense of certainty about their results. it is unknown whether describing these tests as being for immunity -as opposed to antibodies -or their results as passports or certificates increases misunderstanding of the residual risk inherent in an antibody-positive . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 10, 2020. . https://doi.org/10.1101/2020.05.06.20093401 doi: medrxiv preprint test result and thereby reducing adherence to protective behaviours and increasing risk of transmission (15) . this study was designed to test two hypotheses: describing a test indicating the presence of antibodies using the term immunity (vs antibody), and describing test results as passports or certificates (vs test), increases the likelihood that those with this test result erroneously perceive they have no risk of becoming infected in the future with coronavirus. ethical approval for this study was granted by the king's college london research ethics committee (reference: mra-19/20-18685). the protocol was preregistered on the open science framework https://osf.io/tjwz8/ study 2 the statistical analysis plan was pre-specified and uploaded to the open science framework prior to receipt of the data https://osf.io/tjwz8/ study 2 an initial study with similar methods was conducted https://osf.io/tjwz8/ study 1 but, due to an error, the intervention was not correctly programmed. this study is therefore not reported. the study was an online experiment using a 2 × 3 factorial design, with participants randomised, with an equal allocation ratio, to one of six groups varying in the description of an antibody test and a result showing the presence of antibodies. these descriptions differed only in the term used for what was being tested (immunity vs antibody) and the term used for the test result (passport vs certificate vs test). a quota sample of 1,204 adults was recruited via predictiv, the behavioural insights team's online experimentation platform (https://www.bi.team/bi-ventures/predictiv/) comprising 500,000 adults in the uk. quotas were based on age, gender and uk region to achieve a sample broadly representative of the uk population. 1373 clicked on the link to enter the study of whom 1214 subsequently completed the study. ten were excluded for failing to meet quality checks. participants were reimbursed in points (equivalent to £1) which could be redeemed in cash, gift vouchers or charitable donations. participants did not know the topic of the study prior to participation. due to the rapid nature of this research, the public was not involved in the development of the study. the sample size was chosen pragmatically without reference to a specific power calculation. we are fitting a full model with an interaction. conservatively, we then had an 80% chance of detecting, at a 5% significance level, an increase in the primary outcome measure from 50% in a baseline group to 64% in another group. participants were randomised to groups by random number generation. a random number between 1 and 6 was generated for every participant upon entry to the study to determine which description they saw, with each of the six numbers corresponding to one of the six description. as this is based on true randomness, the number of participants within each group can vary due to chance. the intervention comprised a description of antibody testing and test results indicating the presence of antibodies [see box 1 for one example and s1 for wording of all six descriptions]. these differed across six groups in test name of results indicating the presence of antibodies. all descriptions included the information that the result would mean a lower risk of future infection and transmission, and that people with this result could return to work earlier. wording of the items used for each measure is shown in supplementary materials (s2). scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future, and therefore also • lower risk of passing it on to others those who test 'positive' would get an immunity passport. they could return to work early. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 10, 2020. . https://doi.org/10.1101/2020.05.06.20093401 doi: medrxiv preprint primary outcome proportion of participants perceiving an antibody-positive test result to mean no risk of catching coronavirus in the future, assessed in response to a question with four response options. perceived likelihood of catching coronavirus in the future, assessed on a visual analogue scale from 0% to 100%. intention to engage in handwashing less or more frequently than now, given an antibodypositive test result: assessed in response to a question with five response options. intention to avoid physical contact with others outside the home more or less frequently than now, given an antibody-positive test result: assessed in response to a question with five response options. interest in undergoing the test if offered today: assessed in response to a question with four response options. demographic characteristics: age, gender, level of education and geographical region of residence. employment status, planned to be included, was omitted due to a technical error. a detailed statistical analysis plan is available on the open science framework, specified prior to receipt of the data https://osf.io/tjwz8/ study 2. binary logistic regression was used to assess the impact of test type (immunity/antibody) and result type (passport/certificate/test) on the odds of believing the antibody test result means there is no risk of future infection. an interaction term was included in the model (16) . the analysis was repeated adjusting for age (including a quadratic function to model a non-linear relationship), gender, education and region based on prior results showing these are predictors of risk beliefs. binary logistic regressions were run (as above) for the secondary outcomes: intention to wash hands less, intention to engage less in social distancing and intention to undergo the test. unadjusted and adjusted odds ratios and 95% confidence intervals are reported. logistic regression was run to assess the extent to which intentions to engage in less frequent handwashing or social distancing measures is predicted by perceiving the test result to mean no risk of being infected in the future by coronavirus. as only a very small proportion of participants gave a 'zero' response on the sliding scale of future risk, we used a linear regression model to examine this outcome, rather than a binary (zero vs. other) logistic regression as pre-specified in the analysis plan. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. . https://doi.org/10.1101/2020.05.06.20093401 doi: medrxiv preprint data were collected using an online survey platform, predictiv. upon entry to the study, participants were informed that they were to be asked some questions about coronavirus and that it would take about five minutes to complete. participants were then shown one of six brief descriptions of an antibody test for coronavirus (see s1 for full text for each of the six descriptions). they were then asked five questions, assessing the primary and secondary outcomes. participants' demographic characteristics were accessed from the survey platform. the sample comprised 606 women and 598 men with a median age of 36 years. around a quarter had some graduate-level education (24.2%) and there was good representation of all uk regions (see table 1 ). distribution of sample characteristics by exposure group is shown in table 1 . responses to the five outcome questions for the whole sample and by experimental group are shown in table 2 perceived level of future risk (on a scale of 0 to 100%) showed a complex, trimodal distribution. the median was 35% with an interquartile range from 18% to 51%. only 5% of respondents put their risk at 0%. overall, 63% put their risk below 50%. 10% put their risk at 50%, which was the modal response. 24% put their risk at greater than 50%, but below 100%. 3% of respondents put their risk at 100%: that is, they said they were certain to contract the virus. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. proportion answering 'no risk' in each sub-group % (95% ci) (n=1204) odds ratio (95% confidence interval) mutually adjusted (n=1204) figure 1 ]. there was no significant effect of result type and no significant interaction. we analysed the continuous measures of future perceived risk of infection using a linear model (anova) with two levels for test type, three levels for result type, and an interaction term. overall, there was no significant effect: f5,1198 = 1.46, p = 0.20, adjusted r2 < 1%. we repeated the analysis adjusting for demographic factors as covariates. overall, there was a significant effect: f13,1165 = 1.88, p = 0.03, adjusted r2 = 1%. this was because of a significant effect of age: as age increased, perceived risk decreased. there remained no significant effect of the experimental variables. logistic regression analyses examining the impact of test type, result type and their interaction on intentions to wash hands and avoid physical contact less frequently and on willingness to have the test are shown in supplementary tables 1 and 2. neither test type, result type nor their interaction were significantly associated with these behavioural outcomes. logistic regression analyses were used to examine belief that the result meant 'no risk' as a predictor of intention to wash hands and avoid physical contact less frequently, given a positive result (see figure 2 and supplementary . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. using the term immunity -as opposed to antibody -to describe antibody tests for sars-cov-2 doubled the proportion who erroneously perceived they would have no risk of becoming infected with the virus in the future if they were given an antibody-positive test result, from 9.8% for antibody to 19 these was no significant association with intended frequency of avoiding physical contact with others outside of the home. interest in undergoing the test was high -with 85.2% saying they would probably or definitely have it if offered -and was unaffected by the terms used to describe the tests. this study was designed to test two hypotheses, providing strong support for the first, that describing a test indicating the presence of antibodies using the term immunity (vs antibody) increases the likelihood that those with this test result erroneously perceive they have no risk of becoming infected in the future with coronavirus. this likely reflects a certainty about risk of future infection implicit in lay understandings of the term immunity that is not implied by the term antibody (17) . qualitative studies could explore this and other potential mechanisms for the effect observed. the results of this study did not support the second hypothesis that describing test results as passports or certificates increases the likelihood that those with this test result erroneously perceive they have no risk of becoming infected in the future with coronavirus. this does not mean that these terms are unproblematic however, only that they did not influence the specific perceptions that we explored. qualitative studies are warranted to understand the broader meanings these terms have in the context of testing for antibodies for sars-cov-2 and other contexts. responses on the sliding scale of future risk showed a high variability and were largely unexplained by the experimental intervention or other variables measured. this may point to considerable uncertainty in the public as to how to interpret test results. it also likely reflects the well-described tendency of people to use a 50% response to indicate uncertainty rather than a true judgement of probability (18) . we also saw that about a quarter of respondents on the first question stated their risk was "average" or "higher". this may point to considerable uncertainty in the public as to how to interpret test results use of the top end of the scale is hard to interpret but may either reflect a failure to read the information carefully and . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. . https://doi.org/10.1101/2020.05.06.20093401 doi: medrxiv preprint therefore a misunderstanding of the meaning of the result, or participants' using information beyond the experiment to assess their risk and not adequately considering the hypothetical test result when making their response. while we found no evidence for a direct effect on protective behaviours of the terms used to describe antibody tests results, there was indirect evidence that perceiving no risk of future infection might reduce frequency of handwashing. this finding is tentative, given it is based on behavioural intentions in response to a hypothetical antibody-positive result. nonetheless the potential for antibody testing to increase viral transmission must be considered alongside the potential benefits the tests might have in allowing the easing of lockdown restrictions. clear communication about the ongoing need for handwashing, in particular, will be essential and raising public awareness of the main mechanisms through which sars-cov-2 is transmittedthrough air and surfacesmight help improve adherence. this, in addition to acknowledgment of the imperfect nature of the tests, will give the public a more accurate representation of the meaning and implications of an antibody test result and a better understanding of how to reduce the risk of transmission. such communications need to emphasise that transmission can occur through contact regardless of antibody status. such communications also need to be rigorously evaluated to ensure their effectiveness at communicating these points both to those undergoing antibody tests as well as to general populations that are now having to learn to live with sars-cov-2. this study provides the first experimental evidence for the potentially adverse impact on risk perceptions and protective behaviours of commonly used terms to describe sars-cov-2 antibody tests and their results. as such, it provides timely evidence to inform policy and research to mitigate these effects to realise the potential benefits of such tests. the study has several limitations. first, participants were responding to a hypothetical test and asked to imagine that they had received a test result that had detected antibodies. findings from such studies can generalise to clinical settings (19, 20) but some caution is warranted. second, the protective behaviours of handwashing and physical distancing were measured using single items assessing behavioural intentions following a hypothetical test result. third, the sample size was insufficient to detect effect sizes that could be important at a population level. it is possible, for example, that the use of the terms certificate or passport might impact on risk perception, but the current study lacked the power to detect this. fourth, while quotas were used to achieve a sample broadly representative of the uk population, research panels are not representative of the general population (21, 22) . we found no evidence that the impact of the interventions in this study was modified by demographic characteristics of the participants, providing some reassurance about the generalisability of results across age groups, gender, educational level and geographical region of the uk. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. the results of this study have several implications for research and policy. the effectiveness of antibody tests for sars-cov-2 will depend not only on the extent and duration of any immunity conferred and the performance of a test, but also upon a good understanding of the meaning of tests results among those offered them. first, the use of the term immunity should be avoided in phrases to describe antibody tests, whether described as passports, certificates or tests. second, research is needed to evaluate different ways of informing those offered tests and receiving tests results to minimise the proportion erroneously perceiving an antibody-positive test result to mean no risk of becoming infected with the virus. it should also focus on maximising understanding that -regardless of antibody-status -anyone can indirectly transmit the virus by touching a contaminated surface and infecting the next surface they touch. hand washing or sanitising therefore need to remain frequent. research is also needed with those undergoing actual tests, powered to detect effects judged meaningful in the context of a population-based testing programme and involving measures of actual behaviour. interest in sars-cov-2 antibody testing is high -across many countries, employers and populations. while such testing could contribute to wider strategies to ease lock-down restrictions, their use may have an adverse impact on transmission-related behaviour. this appears to vary with the way the tests are described. using the term immunity (vs antibody) to describe antibody tests increases the proportion of people believing that an antibodypositive result means they have no future risk of coronavirus, a perception that may be associated with less frequent handwashing and hence increased risk of transmission. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 10, 2020. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get an immunity passport. they could return to work early. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get an immunity certificate. they could return to work early. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 10, 2020. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get a result showing immunity. they could return to work early. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get an antibody passport. they could return to work early. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get an antibody certificate. they could return to work early. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 10, 2020. scientists are developing tests to see who has already had coronavirus. no test is 100% effective. this means that those who test 'positive' would have: • lower risk of catching coronavirus in the future -and therefore also • lower risk of passing it on to others those who test 'positive' would get a result showing antibodies. they could return to work early. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 10, 2020. oxford covid-19 government response tracker, blavatnik school of government world health organization. critical preparedness, readiness and response actions for covid-19. world health organization disease control, civil liberties, and mass testing -calibrating restrictions during the covid-19 pandemic immunity passports' could speed up return to work after covid-19. the guardian delta's ceo said he would support an 'immunity passport' program or other steps to jumpstart travel as the airline reports its first quarterly loss in more than 5 years. business insider people could be given coronavirus 'immunity certificates' to leave lockdown early. the independent mass coronavirus antibody tests have serious limits. bloomberg.com [internet coronavirus immunity cards for americans are 'being discussed'. politico chile to push ahead with coronavirus 'release certificates' despite who warning. reuters evaluation of antibody testing for sars-cov-2 using elisa and lateral flow immunoassays. medrxiv prepr what policy makers need to know about covid-19 protective immunity. the lancet understanding sensitivity and specificity with the right side of the brain numbers or words? a randomized controlled trial of presenting screen negative results to pregnant women women's understanding of a "normal smear test result": experimental questionnaire based study immunity passports' in the context of covid-19 factorial versus multi-arm multi-stage designs for clinical trials with multiple treatments fuzzy' virus: indeterminate influenza biology, diagnosis and surveillance in the risk ontologies of the general public in time of pandemics verbal and numerical expressions of probability: 'it's a fifty-fifty chance'. organ behav hum decis process the impact of genetic testing for crohn's disease, risk magnitude and graphical format on motivation to stop smoking: an experimental analogue study effect of communicating dna based risk assessments for crohn's disease on smoking cessation: randomised controlled trial office for national statistics we thank steve reicher for comments on an earlier draft of the study protocol . cc-by-nc-nd 4.0 international license it is made available under a all authors have completed the unified competing interest form (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; and no financial relationships with any organisations that might have an interest in the submitted work in the previous three years. hwwp declares consultancy fees from babylon health; all authors declare no other relationships or activities that could appear to have influenced the submitted work. the authors affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned have been explained. anonymised data will be made available upon reasonable request. the study was conceptualised by tmm, jw & gjr. am completed data collection. jw & hp analysed the data. all authors contributed to, and approved, the final manuscript. key: cord-320864-k9zksbyt authors: remes-troche, j. m.; valdovinos-diaz, m. a.; viebig, r.; defilippi, c.; bustos-fernández, l. m.; sole, l.; hani-amador, a. c. title: recommendations for the reopening and activity resumption of the neurogastroenterology units in the face of the covid-19 pandemic. position of the sociedad latinoamericana de neurogastroenterología date: 2020-11-01 journal: nan doi: 10.1016/j.rgmxen.2020.07.004 sha: doc_id: 320864 cord_uid: k9zksbyt the covid 19 pandemic has forced the establishment of measures to avoid contagion during diagnostic and therapeutic tests in gastroenterology. gastrointestinal motility studies involve a high and intermediate risk of transmission of infection by this virus. given its elective or non-urgent indication in most cases, we recommend deferring the performance of these tests until there is a significant control of the infection rate in each country, during the pandemic. when health authorities allow a return to normalcy and in the absence of effective treatment or a preventive vaccine for covid 19 infection, we recommend a strict protocol to classify patients according to their infectious-contagious status through the appropriate use of tests to detect the virus and its immune response, as well as the use of protective measures to be followed by health personnel to avoid contagion during the performance of a gastrointestinal motility test. from the very start of the outbreak, sars-cov-2, or covid-19 infection, has been highly contagious. the virus spread rapidly across the globe, and by may 2020, infected more than 5 million persons in 188 countries. 1 due to its high infection and fatality rates, along with the lack of prior immunity, this new infection has been perceived as a great threat to the life and health of the worldwide human population. the first case of covid-19 in latin america was reported in brazil on february 26, 2020 , and the first death on march 7 in argentina. 2 in mexico, the first case was reported on february 25 and the first death on march 18, and in colombia, the first case was reported on march 6 and the first death of a physician on april 11. 3 thus, at the time of this writing (may 2020), latin america is considered to be the epicenter of the pandemic. confronted with that situation, and following the recommendations of the world health organization (who), the governments of the latin american countries have declared health emergencies and implemented actions (on different dates and in accordance with the epidemiologic trend of each country), such as restrictions in the public, private, and social sectors that include voluntary quarantining, shutting down schools, closing borders, suspending international air travel, carrying out physical distancing, and limiting nonessential activities. as in other parts of the world, medical care has changed dramatically in relation to non-urgent diseases that involve the performance of diagnostic and therapeutic procedures, such as those carried out at neurophysiology and/or digestive motility units. positions have already been established on how to work and/or resume activities at those units (e.g., those issued by the american neurogastroenterology and motility society [anms] 4 and the grupo español de motilidad digestiva [gemd]) 5 but due to the fact that the epidemiologic behavior, protective equipment avail-ability, serologic diagnostic test performance capacity for corroborating immunity, and socioeconomic context are different throughout latin america, a group of experts that are members of the sociedad latinoamericana de neurogastroenterología (slng) had a virtual meeting to formulate a consensus document with recommendations for the performance of gastrointestinal motility tests. the present document establishes a series of guidelines for prioritizing, selecting, and performing the most frequently indicated neurogastroenterology procedures at digestive units in the face of covid-19, such as manometry and esophageal reflux measurement, anorectal manometry and biofeedback, and breath tests. summoned by the presidency and scientific committee of the slng, a group of experts in the area of neurogastroenterology from several latin american countries had a virtual meeting on april 15, 2020, to formalize the creation of a document that would serve as a guide on how to resume and perform gastrointestinal motility procedures in the scenario of the covid-19 pandemic. at that first reunion, three working groups were organized to revise and establish the recommendations for the areas of: 1) esophageal function tests, 2) anorectal function tests, and 3) breath tests. the recommendations established were based on currently available recommendations, consensuses, and evidence, making the necessary adjustments according to the settings of each latin american country during the different phases of the pandemic. two other virtual meetings were carried out to analyze, revise, and modify the recommendations of each working group. the final virtual meeting took place on may 27, 2020, at which the seven members of the expert group unanimously approved the recommendations that follow below. high risk: due to the nature of the virus and its transmission route, it is clear that all aerosol-generating procedures involve a very high risk of infection for the personnel exposed to them. therefore, esophageal manometry, impedance ph monitoring, wireless capsule ph monitoring, antroduodenal manometry, and breath tests are the gastrointestinal motility tests that should be considered high risk. intermediate risk: the fact that the nucleocapsid protein of the virus has been detected in gastrointestinal epithelial cells and rna of the virus has been found in stool points to the possibility of fecal transmission of sars-cov-2. 6, 7 in a meta-analysis on the subject, tian et al. 6 showed that fecal pcr became positive two to five days later than positive sputum pcr in 36-53% of the infected patients, and the fecal excretion of viral particles persisted up to 11 days after sputum excretion in 23-82% of the patients. in another metaanalysis, cheung et al. 7 estimated that the prevalence of viral rna in stool in patients with covid-19 was 48.1% (95% ci 38.3---57.9), reporting the case of a 78-year-old patient in whom viral rna persisted up to 33 days. importantly, the fact that viral particles are detected in stool does not necessarily mean there is transmission by that route, but in the face of uncertainty and following the colonoscopy guidelines, anorectal manometry (arm), biofeedback, balloon expulsion test, electromyography, colonic manometry, and barostat study should be considered procedures of intermediate risk. 8 colonic transit measurements utilizing radiopaque markers, or a wireless motility capsule, are considered noninvasive procedures but involve exposure to subjects that could be asymptomatic virus carriers. thus, we suggest that those tests also be considered intermediate risk. strictly speaking, it must be understood that none of the abovementioned procedures can be considered urgent to the extent that their performance would be required during the epidemic phase of the covid-19 pandemic. according to numerous guidelines and recommendations, postponing all ''elective'' motility procedures should be managed in the context of the clinical indication and should be adapted to the reality of each latin american country. the dialogue between physicians and patients is encouraged for explaining and understanding the situation. the reopening of the motility laboratories in each latin american country will depend on several factors, including: a) the epidemiologic situation of the region. that situation is established according to the health authority of each country, considering the phases (table 1 ) determined by the who 9 for the covid-19 pandemic: • phase 1 (case importation): in this first setting, the disease arrives at a country through one person or a small number of people that acquire the virus abroad, thus the number of cases is limited to a few dozen. • phase 2 (community contagion): for this stage of the pandemic, outbreaks of the disease begin to occur in persons that have not been traveling. the first persons with covid-19 that arrived at the country infected others they came in contact with, and in turn, those persons continue to propagate the disease. confirmed cases begin to surpass the hundreds and containment becomes more complicated. • phase 3 (epidemic contagion): this is the most critical stage in the advancing of an epidemic because it means the disease is now present in the entire country and there is an elevated number of community outbreaks (thousands of persons). the main risk is that the number of patients increases exponentially, overloading the healthcare facilities and medical services. the who also recognizes other possible settings, once the situation begins to stabilize and the contagion curve begins to flatten. • phase 4 (second wave): once local contagion is reduced, imported cases are likely to present again, producing a second wave of infected patients. that can occur three to nine months after phase 3 has ended. • phase 5 (end of the epidemic): the who is in charge of declaring the end of the pandemic once the majority of countries are safe, with contagion under control. in countries with adequate epidemiologic surveillance systems, the declaration of phases is established, based on the number of new infections expected to occur as a result of infection by a single individual (r0), which expresses the speed at which the disease can propagate in a given population. 10 if the r0 is close to a value of 2, the country is experiencing exponential growth. an r0 of 1 indicates that the infection rate is remaining constant, and when it is below 1 (e.g., two infected persons would infect fewer than two individuals), the number of infected persons is decreasing. b) availability of personal protective equipment. obviously, the risk of infection in healthcare professionals is higher if they do not have the adequate personal protective equipment (ppe) (see further ahead). however, it is important to recognize that there is a worldwide shortage of material, resulting in possibly limited availability in some of the latin american countries. in fact, the inter-american society of digestive endoscopy 11 and the asian-pacific society for digestive endoscopy (apsde-covid declarations) 12 issued their recommendations based on ppe availability. we believe that could play a key role in the reopening of gastrointestinal physiologic procedures in our region (table 1) . c) type of institution at which motility studies will be performed: outpatient centers that specialize in those techniques have a lower risk than hospitals or clinics because they have a lower circulation of patients. such centers also aid in decongesting the demand at hospitals. once the return to activities or the reopening of activities is being contemplated, it is important to consider each of the following aspects: according to the gemd, 5 at present there is no conclusive scientific evidence that supports performing microbiologic tests before a motility procedure to diagnose sars-cov-2 for the purpose of modifying the protective measures related to the infection. however, it is important to recognize that increasingly more subjects infected with sars-cov-2 are recovering, but there is also a high number of subjects that can be asymptomatic carriers. rt-pcr testing determines if the patient is infected, whereas serologic testing (igm and igg against sars-cov-2) determines the immune status. depending on the availability of those tests in each country, their performance before the patient undergoes neurophysiologic studies is recommended. in line with that, patients indicated for a digestive motility study should be classified as follows: in cases 1 and 2, the motility test will not be programmed and should be re-scheduled for at least four weeks later (fig. 1) . corroboration of active infection resolution (a negative rt-pcr 24−48 h before the motility study) is recommended. in case 3, the motility test can be performed with no problem. importantly, availability of and access to those serologic and molecular biologic tests can vary between the latin american countries. therefore, in treating case 4 patients, a potential risk of infection must be assumed, and the motility test must be performed utilizing all the ppe (that scenario is probably the most common in our countries). it is very important to take into account that even though the rt-pcr test is the ''gold standard'', there is variability in its diagnostic accuracy. for example, false negatives have been reported at 30%, if the test is performed during the first days of infection or the pre-symptomatic phase. 13 therefore, the recommendations must be adjusted to the sensitivity and specificity of each test and the laboratories that perform those assays must be encouraged to use the commercial tests that have shown the best diagnostic accuracy. the esophageal physiologic studies, including esophageal manometry (conventional or high resolution), 24-h ph study, or ph-impedance study are diagnostic tests usually performed on ambulatory patients in the context of gastroesophageal reflux symptoms or nonobstructive dysphagia. they have a broad spectrum of clinical indications that includes the evaluation of antisecretory treatmentrefractory esophageal symptoms (dysphagia, regurgitation, or chest pain) not explained by upper gastrointestinal endoscopy assessment, the evaluation of esophageal motor function prior to antireflux surgery, the evaluation of persistent reflux symptoms despite medical therapy, or the development of postoperative dysphagia. 14, 15 in general, they are elective tests that in exceptional cases can have an urgent indication (table 1) . certain medical indications can have a degree of urgency, as recently described by the anms. 4 the presence of probable achalasia with severe symptoms (significant dysphagia making oral food intake and hydration impossible), the presence of large hiatal hernias (with risk of aspiration or volvulus), or the impossibility of maintaining oral hydration and nutrition are relatively urgent, or semi-urgent, indications. in the case of giant hiatal hernias, if manometry cannot be carried out due to the covid-19-related limitations, surgery should be performed without previous manometry. all other indications, such as dysphagia with no weight loss, reflux studies prior to antireflux surgery, studies due to refractory reflux symptoms, and suspicion of supragastric belching or rumination syndrome, can be postponed. 16 for as long as there is no return to performing functional tests, the recommendation is to support achalasia diagnosis or dysphagia evaluation through barium swallow tests, which can be useful. regarding patients with reflux symptoms, continuing, changing, or adjusting the medication dose is recommended during the time in which it is not possible to complete their diagnostic evaluation. according to the international anorectal physiology working group (iapwg) and the london classification, the conventional indications for performing anorectal function tests (primarily anorectal manometry [arm]) 17 are: 1) constipation and/or defecation disorder symptom evaluation, 2) fecal incontinence (fi) evaluation, 3) painful anorectal disorder evaluation, 4) preoperative and postoperative evaluation of ileorectal anastomoses, rectopexy, fistulotomies, etc., and 5) evaluation of obstetric trauma. it should also be pointed out that arm, in addition to having a diagnostic purpose, is utilized in many centers for biofeedback therapy in patients with constipation and/or fecal incontinence. in that respect, and reviewing the indications, we emphasize the fact that anorectal function tests are not urgent procedures, and thus should be postponed during the exponential phase of the pandemic. when laboratories are reopened, certain semi-urgent situations should be prioritized (table 1) . returning first to the performance of arm in patients that had already received biofeedback therapy or had been programmed for it before the pandemic, is sustained by ia and iib grades of evidence in the management of constipation and fi, respectively, according to the anms. 18 in addition, its resumption in patients with fi is supported by the fact that arm provides a pathophysiologic approach in more than 90% of cases. for example, sphincter hypertonia (low resting baseline pressure) is associated with passive fi, 19, 20 whereas hypocontractility (the inability to reach an increase in pressure during voluntary contraction) suggests that fi, specifically urge fi, can be secondary to external anal sphincter (eas) lesions. in a prospective study conducted by rao et al., arm not only confirmed the clinical impression but also contributed new clinically undetected information on patients with fi and influenced the treatment decision in the majority of cases. 21 until the return to normality, recommendations for patients that have not completed their biofeedback sessions include: continue, at home, carrying out the pelvic floor exercises learned during training or use the electronic devices. 22 if patients had learned how to perform those types of exercises before the pandemic, they should continue doing them on a regular basis. with respect to constipation and chronic proctalgia, we recognize that they considerably compromise patient quality of life, but given their chronic nature, we feel that the performance of arm can wait, as long as patients are offered symptomatic medical treatment (laxatives, antispasmodics, etc.) to mitigate symptom intensity. if defecatory dyssynergy is suspected in relation to inappropriate straining, abdominal breathing exercises can be useful. 23 if defecation alterations related to posture are suspected, its correction, including the use of a device that favors the opening of the anal right angle (a 6-inch-high footstool to favor knee flexion) can also be helpful. 24 the performance of other anorectal tests involving the placement or manipulation of probes or devices in the anorectum, e.g., surface electromyography, barostat, or pudendal nerve latency, should follow the same recommendations for arm. none of those tests are considered urgent and they should be programmed once neurogastroenterology units have returned to normality. regarding colonic transit with radiopaque markers or with a wireless capsule, even though they do not involve an aerosol-generating process, they should be postponed, given that patients and physicians are frequently going to the health units for study follow-up and surveillance. finally, colonic manometry, which requires colonoscopy-assisted probe placement, is considered a procedure with intermediate risk that should be postponed until the final stage of the pandemic. breath tests are studies that are widely used to evaluate different function alterations, such as bacterial overgrowth and intolerance to different carbohydrates (mainly lactose), and are the standard for corroborating eradication of helicobacter pylori infection. 25, 26 they are programmed studies and are not to be considered a medical emergency under any circumstance during the covid-19 pandemic. even though the filter mouthpieces of some systems incorporate a unidirectional valve and an infection control filter that have been shown to eliminate 99% and 96.5% of the bacteria and viruses in the air, respectively, specific tests have not been carried out to demonstrate whether they are capable of preventing sars-cov-2 infection. the general steps to follow are detailed below, taking into account that there can be certain variations according to the test that is performed. 4, 5 patient preparation a) a telephone interview with the patient should be carried out 24 h before a motility test to identify symptoms consistent with covid-19 (cough, fever, myalgia, anosmia, ageusia, diarrhea). if any of those symptoms are identified, the recommendation is to cancel the appointment and postpone it until the case is reevaluated. the patient should be sent to a service (clinic, emergency room, infectious diseases unit, etc.) that can perform a rt-pcr test and provide appropriate management. that recommendation will depend on the protocol for covid-19 care established in each country. b) patients should preferably go to their appointment alone, but if not possible, be accompanied by only one person, ideally under 65 years of age. the patient and companion should each be given a facemask if they entered the unit without one. the patient and companion should then apply hand sanitizer or wash their hands. c) the companion should not enter the unit, unless the patient requires specific assistance, and should stay in the waiting room. d) before entering the room in which the procedure will be performed, all patients must be asked again about the presence of respiratory symptoms or fever, to stratify their transmission risk, and their body temperature should be measured. if there is any suspicion, the procedure should be canceled and rescheduled. during the procedure a) personnel care • promote the application of basic hygiene measures, among the entire personnel, for the prevention of infection. • given that the large majority of latin american countries are in a community transmission phase, in which spread via asymptomatic individuals is reported, ppe use by all the healthcare personnel involved in the procedures is recommended. • fig. 2 shows the ppe recommended for each procedure. • the procedure should not be performed if the ppe necessary for guaranteeing the performance safety of the manometric studies is not available. • learn how to adequately put on and take off all ppe. b) equipment care and use recommendations • equipment preparation and probe calibration should be performed before the patient is admitted to the procedure room, to decrease the amount of exposure time. • if informed consent is utilized, promote the use of ''verbal or recorded'' consent, when permitted by local committees. otherwise, consider disinfecting the material involved (ballpoint pens or pencils) and insist on handwashing after contact with said material. • the patient will be asked to enter the procedure room with no personal belongings (cellphone, glasses, keys, etc.). • regarding arm, the patient will change clothes in a specific bathroom (whose cleaning should follow the specific protocol for covid-19). • all the material utilized (syringes, trays in case of vomiting) should be disposable. • the test should be performed by an expert and not more than 2 persons are recommended to be in the room during the procedure. • the manometric system, as well as the computer keyboard, can be covered in plastic film during each examination. • with respect to ph study and ph/impedance, consider whether it is possible to favor the use of disposable or single-use probes. • when using high-resolution solid-state equipment without impedance, consider whether utilizing a disposable case could be useful. • with respect to old model ph study equipment that has a leather carrying case, an alternative could be a plastic bag to cover the case and prevent damage during post-examination disinfection. • utilizing the chicago protocol and avoiding unnecessary maneuvers that prolong study duration are recommended. • to the degree possible, if there is more than one procedure room, studies could be carried out in alternating rooms to provide sufficient time for sanitization, if more than 2 studies are to be carried out per day. • if fecal material is encountered during rectal examination, and prior to the introduction of the manometry probe, an enema is customarily given, waiting 30 min to perform the test. in the context of the covid-19 pandemic, we do not recommend that measure because it can increase the risk for exposure. if an enema is to be used, it can be applied at home, before the performance of the motility study. • the london protocol is recommended because the duration of the test, which should be performed by an expert, is 15 min. • the use of high-resolution systems is promoted, given that perfusion systems have the disadvantage of the probes needing constant water perfusion, resulting in the continuous outflow of water into the anal canal, increasing the risk for contact with body fluids. • systems that have a disposable case for the probe are recommended because they are safer for the patient. • if a balloon expulsion test is to be performed at the end of the study, a disposable probe is recommended. the post-procedure measures are subject to constant review, depending on the overall situation of each hospital, daily necessities, and material availability, and are adapted to them and to the recommendations of the acting authorities in each country. • carry out the disinfecting and reprocessing of the probes according to the customary protocol. • do not reuse single-use devices. • assign cleaning personnel that exclusively work at the physiology unit. • apply the protocols for the cleaning and disinfecting of materials that come into contact with patients or their secretions, such as the examining table, keyboard, and screens. • disinfecting and cleaning are to be performed with a disinfectant included in the institutional cleaning and disinfection policy. the covid-19 virus is inactivated after five minutes of contact with disinfectants such as bleach, alcohol at 70%, and a sodium hypochlorite solution containing 1000 ppm of active chloride. • manage residuals following the local protocols of each center for category b (un3291) high-capacity ineffective material. • maintain physical distancing of 1---2 meters, basic hygiene measures, and the independent flow of patients in the recovery rooms. • consider the implementation of patient follow-up programs 7---15 days after the procedure, to evaluate the appearance of symptoms consistent with sars-cov-2 infection. • promote the delivery of unprinted reports and recommendations, i.e., making them online. the covid-19 pandemic has forced the establishment of measures to prevent contagion during the performance of therapeutic and diagnostic tests in gastroenterology. digestive tract motility tests involve intermediate and high risks for transmission of covid-19 infection. given their elective and nonurgent indication in the majority of cases, we recommend postponing those tests until there is significant control of the infection rate in each latin american country during the pandemic. when the health authorities allow the return to normality, and in the absence of an effective treatment for covid-19 infection, or a preventive vaccine, we recommend a strict protocol for classifying patients according to their infectious-contagious status through the appropriate use of tests for detecting the virus and the immune response to it, as well as the use of protection measures that the healthcare personnel should follow to prevent contagion during the performance of a gastrointestinal motility test. finally, we recognize that the recommendations contained herein can change in the future, as more evidence with respect to safety measures is being produced. no financial support was received in relation to the present article. covid-19 in latin america: the implications of the first confirmed case in brazil american neurogastroenterology and motility society (anms) task force recommendations for resumption of motility laboratory operations during the covid-19 pandemic recomendaciones de asenem para la reanudación de la actividad en los laboratorios de motilidad digestiva durante la pandemia por covid-19 review article: gastrointestinal features in covid-19 and the possibility of faecal transmission gastrointestinal manifestations of sars-cov-2 infection and virus load in fecal samples from a hong kong cohort: systematic review and meta-analysis recomendaciones generales de la asociación española de gastroenterología (aeg) y la sociedad española de patología digestiva (sepd) sobre el funcionamiento en las unidades de endoscopia digestiva y gastroenterología con motivo de la pandemia por sars-cov-2 world health organization. coronavirus disease (covid-19) pandemic el número reproductivo básico (r0): consideraciones para su aplicación en la salud pública recomendaciones para las unidades de endoscopia durante la pandemia de coronavirus (covid-19). version 3.1 español. 2020 practice of endoscopy during covid-19 pandemic: position statements of the asian pacific society for digestive endoscopy variation in false-negative rate of reverse transcriptase polymerase chain reaction-based sars-cov-2 tests by time since exposure utility of esophageal high-resolution manometry in clinical practice: first, do hrm clinical characteristics and outcomes of patients with postfundoplication dysphagia recommendations for essential esophageal physiologic testing during the covid-19 the international anorectal physiology working group (iapwg) recommendations: standardized testing protocol and the london classification for disorders of anorectal function anms-esnm position paper and consensus guidelines on biofeedback therapy for anorectal disorders anorectal function investigations in incontinent and continent patients. differences and discriminatory value relationship between symptoms and disordered continence mechanisms in women with idiopathic faecal incontinence long-term outcome and objective changes of anorectal function after biofeedback therapy for faecal incontinence randomized controlled trial of biofeedback for fecal incontinence effectiveness of pelvic physiotherapy in children with functional constipation compared with standard medical care influence of a defecation posture modification device (squatty potty ® ) in healthy volunteers and dyssynergic patients acg clinical guideline: small intestinal bacterial overgrowth hydrogen and methane-based breath testing in gastrointestinal disorders: the north american consensus jose maría remes troche is a member of the advisory board of takeda, asofarma, and biocodex. he has given talks for takeda, asofarma, medtronic, carnot, and alfasigma.miguel ángel valdovinos díaz is a member of the advisory board of takeda. he has given talks for takeda, asofarma, medtronic, carnot, and grünenthal.laura sole has given talks for roemmers, casasco, asofarma, temis lostaló, and raffo albis cecilia hani amador has given talks for medtronic, takeda, abbott, biopas, and astra zeneca.claudia defilippi has given talks for pharma investi, ferrer, and axon pharma.luis maría bustos fernández declares that he has no conflict of interest.ricardo veibig declares that he has no conflict of interest. key: cord-312477-2y88gzji authors: mlcochova, p.; collier, d.; ritchie, a. v.; assennato, s. m.; hosmillo, m.; goel, n.; meng, b.; chatterji, k.; mendoza, v.; temperton, n.; kiss, l.; ciazyns, k. a.; xiong, x.; briggs, j. a.; nathan, j.; mescia, f.; zhang, h.; barmpounakis, p.; demeris, n.; skells, r.; lyons, p.; bradley, j.; baker, s.; lee, h. h.; smith, k. g.; goodfellow, i.; gupta, r. k. title: combined point of care nucleic acid and antibody testing for sars-cov-2: a prospective cohort study in suspected moderate to severe covid-19 disease. date: 2020-06-18 journal: nan doi: 10.1101/2020.06.16.20133157 sha: doc_id: 312477 cord_uid: 2y88gzji abstract background rapid covid-19 diagnosis in hospital is essential for patient management and identification of infectious patients to limit the potential for nosocomial transmission. the diagnosis is complicated by 30-50% of covid-19 hospital admissions with negative nose/throat swabs negative for sars-cov-2 nucleic acid, frequently after the first week of illness when sars-cov-2 antibody responses become detectable. we assessed the diagnostic accuracy of combined rapid antibody point of care (poc) and nucleic acid assays for suspected covid-19 disease in the emergency department. methods we developed (i) an in vitro neutralization assay using a lentivirus expressing a genome encoding luciferase and pseudotyped with spike protein and (ii) an elisa test to detect igg antibodies to nucleocapsid (n) and spike (s) proteins from sars-cov-2. we tested two promising candidate lateral flow rapid fingerprick test with bands for igg and igm. we then prospectively recruited participants with suspected moderate to severe covid-19 and tested for sars-cov-2 nucleic acid in a combined nasal/throat swab using the standard laboratory rt-pcr and a validated rapid nucleic acid test. additionally, serum collected at admission was retrospectively tested by in vitro neutralization, elisa and the candidate poc antibody tests. we determined the sensitivity and specificity of the individual and combined rapid poc diagnostic tests against a composite gold standard of neutralisation and the standard laboratory rt-pcr. results 45 participants had specimens tested for nucleic acid in nose/throat swabs as well as stored sera for antibodies. serum neutralisation assay, sars-cov-2 spike igg elisa and the poc antibody test results were concordant. using the composite gold standard, prevalence of covid-19 disease was 53.3% (24/45). median age was 73.5 (iqr 54.0-86.5) years in those with covid-19 disease by our gold standard and 63.0 (iqr 41.0-72.0) years in those without disease. median duration of symptoms was 7 days (iqr 1-8) in those with infection. the overall sensitivity of rapid naat diagnosis was 79.2% (95ci 57.8-92.9%) and 50.0% (11.8-88.2) at days 8-28. sensitivity and specificity of the combined rapid poc diagnostic tests reached 100% (95ci 85.8-100) and 94.7% (95ci 74.0-99.0) overall. conclusions dual point of care sars-cov-2 testing can significantly improve diagnostic sensitivity, whilst maintaining high specificity. rapid combined tests have the potential to transform our management of covid-19, including inflammatory manifestations where nucleic acid test results are negative. a rapid combined approach will also aid recruitment into clinical trials and in prescribing therapeutics, particularly where potentially harmful immune modulators (including steroids) are used. rapid covid-19 diagnosis in hospital is essential for patient management and identification of infectious patients to limit the potential for nosocomial transmission. the diagnosis is complicated by 30-50% of covid-19 hospital admissions with negative nose/throat swabs for sars-cov-2 nucleic acid, frequently after the first week of illness when sars-cov-2 antibody responses become detectable. we assessed the diagnostic accuracy of combined rapid antibody point of care (poc) and nucleic acid assays for suspected covid-19 disease in the emergency department. we developed (i) an in vitro neutralization assay using a lentivirus expressing a genome encoding luciferase and pseudotyped with spike protein and (ii) an elisa test to detect igg antibodies to nucleocapsid (n) and spike (s) proteins from sars-cov-2. we tested two promising candidate lateral flow rapid fingerprick test with bands for igg and igm. we then prospectively recruited participants with suspected moderate to severe covid-19 and tested for sars-cov-2 nucleic acid in a combined nasal/throat swab using the standard laboratory rt-pcr and a validated rapid nucleic acid test. additionally, serum collected at admission was retrospectively tested by in vitro neutralization, elisa and the candidate poc antibody tests. we determined the sensitivity and specificity of the individual and combined rapid poc diagnostic tests against a composite 'gold' standard of neutralisation and the standard laboratory rt-pcr. results 45 participants had specimens tested for nucleic acid in nose/throat swabs as well as stored sera for antibodies. serum neutralisation assay, sars-cov-2 spike igg elisa and the poc antibody test results were concordant. using the composite gold standard, prevalence of covid-19 disease was 53.3% (24/45). median age was 73.5 (iqr 54.0-86.5) years in those with covid-19 disease by our gold standard and 63.0 (iqr 41.0-72.0) years in those without disease. median duration of symptoms was 7 days (iqr [1] [2] [3] [4] [5] [6] [7] [8] in those with infection. the overall sensitivity of rapid naat diagnosis was 79.2% (95ci 57.8-92.9%). sensitivity and specificity of the combined rapid poc diagnostic tests reached 100% (95ci 85. and 94.7% (95ci 74.0-99.0) overall. dual point of care sars-cov-2 testing can significantly improve diagnostic sensitivity, whilst maintaining high specificity. rapid combined tests have the potential to transform our management of covid-19, including inflammatory manifestations where nucleic acid test results are negative. a rapid combined approach will also aid recruitment into clinical trials and in prescribing therapeutics, particularly where potentially harmful immune modulators (including steroids) are used. as of the 5 th of june 2020, 6.7 million people have been infected with sars-cov-2 with over 390 000 deaths [1] . the unprecedented numbers requiring sars-cov-2 testing has strained healthcare systems globally. there is currently no gold standard for diagnosis of covid-19. detection of sars-cov-2 by nucleic acid amplification testing (naat), is largely done by real time rt-pcr on nose/throat swabs in centralised laboratories. rt-pcr specimens need to be handled in biosafety level 3 category laboratory (bsl3) and then batch analysed. given these bottlenecks, the turnaround time for this test is in the order of 2-4 days [2] . naat tests from a single nose/throat swab are negative in up to 50% in patients who have ct changes consistent with covid-19 and/or positive antibodies to sars-cov-2 [3] [4] [5] . the lack of detectable virus in upper airway samples is not only a serious barrier to making timely and safe decisions in the er, but also leads to multiple swab samples being sent, frequently from the same anatomical site, leading to strain on virology laboratories. additionally, recruitment into clinical trials for covid-19 treatments has moved towards 'clinical diagnosis' for eligibility. multiple factors contribute to negative results by naat, including sampling technique and timing of the sampling in the disease course. the viral load in the upper respiratory tract frequently wanes by this point [6] and as seen in a case series from france, was undetectable in nose and throat swabs from 9 days of illness in 4 out of 5 patients [7] . similarly, a case series from germany found the detection rate by rt-pcr was <50% after 5 days since onset of illness [8] . a proportion of patients develop a secondary deterioration in clinical condition requiring hospitalisation and respiratory support, at a time when immune pathology is thought to be dominate rather than direct pathology related to viral replication [7, 9] . the antibody response to sars-cov-2 is detectable 6 days from infection [10] . antibody based diagnosis of covid-19 shows increasing sensitivity in the latter part of the disease course when naat testing on nose/throat samples is more likely to be negative [11] [12] [13] [14] . one study reported that combined lab based rt-pcr with lab based antibody testing could increase sensitivity for covid-19 diagnosis from 67.1% to 99.4% in hospitalised patients [15] . however, lab antibody testing also has a turnaround time of a day or more, and rapid diagnosis and triage of patients requiring hospitalisation is needed in order to avoid overwhelming the diagnostic and isolation capacities of hospitals, especially during periods when influenza is co-circulating. we previously evaluated the diagnostic accuracy of the samba ii sars-cov-2 rapid test compared with the standard laboratory rt-pcr and found similar accuracy and a turnaround time of 2-3 hours even in real world settings [2] . several studies have now performed headto-head comparisons of immuno-chromatographic lateral flow immunoassays (lfias) [12] [13] [14] 16] . these assays are cheap to manufacture and give a binary positive/negative result, thereby lending themselves well to point of care (poc) testing. however, they have variable performance and in general they are negative in the early phase of illness, but highly sensitive in the later stage of illness [12] [13] [14] 16] . in this study we evaluated the diagnostic performance of a poc combination comprising naat and lfa antibody testing against a composite gold standard of laboratory rt-pcr and a serum neutralisation assay. 45 prospectively recruited participants with suspected moderate to severe covid-19 disease had specimens tested for nucleic acid in nose/throat swabs as well as stored sera for antibodies. samples at hospital admission were collected at a median of 7 (iqr 7-13) days after illness onset. the sera from 42.2% (19/45) participants showed strong neutralising antibody response against sars-cov-2 spike protein pseudotyped virus infection in a neutralization assay ( figure 1a ). 26 participants' sera showed no neutralising response ( figure 1b) . the neutralisation ability of participants' sera was compared with an elisa igg assay (supplementary figure 1) detecting spike antibodies. figure 1c is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint and procalcitonin were significantly higher in confirmed covid-19 patients and 'classical' chest radiograph appearances were more common in confirmed covid-19 patients (table1, p<0.001). however, 6/24 (25%) had normal or indeterminate chest radiographs in the confirmed covid-19 group. the overall positivity rate of the rapid nucleic acid test was 79.2% (95% ci 57.8-92.9), decreasing from 88.9% (95% ci 65.3-98.6) in days 1-7 of illness to 50.0% (95% ci 11.8-88.2) in days 8-28 of illness (table 2 ). when the covidix igg/igm rapid test was combined with naat, the positivity rate increased to 100% (95% ci 81.5-100) in days 1-7 of illness and 100% (95% ci 54.1-100) in days 8-28 of illness. specificity was 90.0% (95% three participants had stored samples available for testing at multiple time points in their illness ( figure 2 ). two individuals were sampled from early after symptom onset and the third presented three weeks into illness. in the first two ( figure 2a -f), we observed an increase in neutralisation activity over time that was mirrored by band intensities on the rapid poc antibody test. as expected igm bands arose early on with igg following closely. in the individual presenting 21 days into illness ( figure 2g -i), only igg was detected with the rapid poc antibody test and as expected band intensity did not increase with time. given the need for multiple options under current demand for such tests, we next decided to use an alternative rapid lfa antibody test in combination with the samba ii naat. surescreen sars-cov-2 igg/igm test (derby, uk) was recently validated with elisa igg and demonstrated a very good sensitivity and specificity profile compared to five other tests on stored sera from acute infection [17] . we compared elisa igg and serum neutralisation (table 2) . . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint here we have shown that naat testing with antibody detection can improve diagnosis of covid-19 in moderate to severe suspected cases, but more importantly that accurate diagnosis can be achieved with combined rapid tests. overall positivity in nose/throat swab samples was around 80% with naat testing alone and 100% with a combined approach of rapid naat testing and either of two fingerprick blood/serum rapid antibody tests. specificity of the combined approach was 85-95% overall. as expected, nucleic acid detection in nose/throat samples was highest in the first few days (100% for samba ii sars-cov-2 test in the first 3 days after symptom onset). conversely antibody detection by lfa increased over time. a strength of this study is the use of serum neutralisation, a phenotypic test for functionality of antibodies, as part of a composite gold standard for defining covid-19 disease. this assay was carefully validated against a recently described elisa method for sars-cov-2 igg detection that is now used globally [18] . we also demonstrated that sera from participants did not neutralise sars-cov-1. use of antibody tests for covid-19 diagnosis in hospitals have been limited for a number of reasons. firstly, we know from sars-cov-1 that previous humoral immunity to hcov oc43 and 229e can elicit a cross reactive antibody response to n of sars-cov-1 in up to 14% of people tested in cross-sectional studies [19] , and previous exposure to hcov can rarely elicit an antibody response cross reaction to the n and s proteins of sars-cov-2 [17, 20] . secondly, antibody tests do not achieve the same detection rates as nucleic acid based tests early in infection, as humoral responses take time to develop following viral antigenic stimulation. however, later in disease igg reaches 100% sensitivity by day 6 [10] and this is useful in cases with immune mediated inflammatory disease where rt-pcr on respiratory samples is often negative, for example in the recently described kawasaki-like syndrome named pims (paediatric inflammatory multi-system syndrome) [21] . ct scanning has previously been shown to be highly sensitive [5] , though few countries have the resources for large scale ct based screening. in our study chest radiographs were statistically more likely to show changes associated with covid-19, but a quarter of chest radiographs in the confirmed covid-19 group were normal or indeterminate. . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . this study had limited numbers of participants, though patients were distributed well by symptom onset and part of a clinical trial with complete data. we tested stored sera rather than whole finger prick blood, though this was intentional given the caution needed in interpreting antibody tests and potential cross reactivity of antibodies. although sars-cov2 elisa testing of our pre 2020 sera did reveal occasional n and s reactivity to sars-cov-2 (supplementary table 2), these samples were negative on the rapid antibody testing. in light of our data, prospective evaluation on a finger prick sample is now warranted on a larger scale in patients with moderate to severe disease. at present we cannot speculate on the diagnostic accuracy of the antibody or naat tests in mild disease. we envisage a deployment approach whereby both test samples, finger prick blood and nose/throat swab, are taken at the same time on admission to hospital. the finger prick antibody test result is available within 15 minutes and is highly specific; therefore in an individual with classical features and a positive antibody test result can be acted upon confidently, for example movement to a covid-19 area, or recruitment into a clinical treatment study. the naat result following shortly after will assist in diagnosis for early infections where antibody testing is negative. naat is also expected to be more valuable than antibody tests in milder cases given severity appears to correlate with magnitude of antibody responses [17, 22] . a combined rapid testing approach may have significant benefits in low resource settings where centralised virology laboratories are scarce and the epidemic is expanding. in addition, it removes the need for repeated nose/throat swabbing which may generate aerosols and lead to transmission. we envisage the combined rapid testing approach being important for safe and quick patient recruitment to clinical trials for covid-19, specifically where potentially harmful treatments such as immune modulators are being tested. rapid combined tests could be transformative in diagnosis and management of moderate to severe covid-19 disease requiring hospitalisation, particularly as diverse manifestations of disease emerge. . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint 293t cells were cultured in dmem complete (dmem supplemented with 100 u/ml penicillin, 0.1 mg/ml streptomycin, and 10% fcs). viral vectors were prepared by transfection of 293t cells by using fugene hd transfection reagent (promega) as follows. confluent 293t cells were transfected with a mixture of 11ul of fugene hd, 1ug of pcaggs_sars-cov-2_spike, 1ug of p8.91 hiv-1 gag-pol expression vector [23, 24] , and 1.5ug of pcsflw (expressing the firefly luciferase reporter gene with the hiv-1 packaging signal). viral supernatant was collected at 48 and 72h after transfection, filtered through 0.45um filter and stored at -80ë�c. the 50% tissue culture infectious dose (tcid 50 ) of sars-cov-2 pseudovirus was determined using using steady-glo luciferase assay system (promega). spike pseudotype assays have been shown to have similar characteristics as neutralization testing using fully infectious wild type sars-cov-2 [25] .virus neutralization assays were performed on 293t cell transiently transfected with ace2 and tmprss2 using sars-cov-2 spike pseudotyped virus expressing luciferase. pseudovirus was incubated with serial dilution of heat inactivated human serum samples from covid-19 suspected individuals in duplicates for 1h at 37ë�c. virus and cell only controls were also included. then, freshly trypsinized 293t ace2/tmprss2 expressing cells were added to each well. following 48h incubation in a 5% co 2 environment at 37â°c, the luminescence was measured using steady-glo luciferase assay system (promega). the 50% inhibitory dilution (ec 50 ) was defined as the serum dilution at which the relative light units (rlus) were reduced by 50% compared we developed an elisa targeting the sars-cov-2 spike and n proteins. trimeric spike protein antigen used in elisa assays consists of the complete s protein ectodomain with a c-terminal extension containing a tev protease cleavage site, a t4 trimerization foldon and . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint a hexa-histidine tag. the s1/s2 cleavage site with amino acid sequence prrar was replaced with a single arginine residue and stabilizing proline mutants were inserted at positions 986 and 987. spike protein was expressed and purified from expi293 cells (thermo fisher). n protein consisting of residues 45-365 was initially expressed as a his-tev-sumo-fusion. after ni-nta purification, the tag was removed by tev proteolysis and the cleaved tagless protein further purified on heparin and gel filtration columns. the elisas were in a stepwise process; a positivity screen was followed by endpoint titre as previously described [18] . briefly, 96-well eia/ria plates (corning, sigma) were coated with pbs or 0.1î¼g per well of antigen at 4â°c overnight. coating solution was removed, and wells were blocked with 3% skimmed milk prepared in pbs with 0.1% tween 20 (pbst) at ambient temperature for 1 hour. previously inactivated serum samples (56â°c for 1 hour) were diluted to 1:60 or serially diluted by 3-fold, six times in 1% skimmed milk in pbst. blocking solution was aspirated and the diluted sera was added to the plates and incubated for 2 hours at ambient temperature. diluted sera were removed, and plates were washed three times with pbst. goat anti-human igg secondary antibody-peroxidase (fc-specific, sigma) prepared at 1:3,000 in pbst was added and plates were incubated for 1 hour at ambient temperature. plates were washed three times with pbst. elisas were developed using 3,5,3â�²,5â�²-tetramethylbenzidine (tmb, thermoscientific); reactions were stopped after 10 minutes using 0.16m sulfuric acid. the optical density at 450 nm (od450) was measured using a spectramax i3 plate reader. the absorbance values for each sample were determined by subtracting od values from uncoated wells. all data analyses were performed using prism 8 version 8.4.2 (graphpad). this colloidal-gold lateral flow immunoassay is designed to detect igg and igm to sars-cov-2. it was used according to the manufacturer's instructions. 10âµl of serum was added to the test well followed by 2 drops of the manufacturer's proprietary buffer. results were read as the presence or absence of a colored band the results window as igm positive-control and igm test bands present, as igg positive-control and igg test bands present or negative-control band only. in order to rule out cross reactivity of this test with seasonal coronavirus antibodies we tested 19 stored specimens from before 2020, some of which had n and s protein sars-cov-2 cross reactivity (supplementary table 2) . for quantification of igg and igm band density in covidix 2019 ncov igg/igm test, high resolution images of completed poc antibody test cassettes were acquired using chemidoc mp imaging system (bio-rad) at 20min post-. cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint addition of the human serum. band intensities were analysed using image lab software (bio-rad). ltd, derby, uk) . this colloidal-gold lateral flow immunoassay is designed to detect igg and igm to sars-cov-2. it was used according to the manufacturer's instructions. 10âµl of serum was added to the test well followed by 2 drops of the manufacturer's proprietary buffer. results were read as the presence or absence of a colored band the results window as igm positive-control and igm test bands present, as igg positive-control and igg test bands present or negative-control band only. it has been previously validated against historical controls and in serum from confirmed pcr positive covid-19 cases [17] . the study participants were part of the covidx trial [2] , a prospective analytical study which this was later expanded to include any adult requiring hospital admission and who was symptomatic of sars-cov-2 infection, demonstrated by clinical or radiological findings. [2] . 48 participants who had available stored sera were included in this sub-study and underwent further antibody testing. the laboratory standard rt-pcr test, developed by public health england (phe), targeting the rdrp gene was performed on a combined nose/throat swab on parallel. this test has an estimated limit of detection of 320 copies/ml. samba ii sars-cov-2 testing was performed on a combined nose/throat swab collected by dry sterile swab and inactivated in a proprietary buffer at point of sampling. samba ii sars-cov-2 targets 2 genes-orf1 and the n genes and uses nucleic acid sequence based amplification to detect sars-cov-2 rna, with limit of detection of 250 copies/ml. the sensitivity and specificity of samba ii sars-cov-2 test and covidix sars-cov-2 igg/igm test or surescreen sars-cov-2 igg/igm test for diagnosing covid-19 were . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint calculated alone and then in combination along with binomial 95% confidence intervals (ci). a composite gold standard was used -standard lab rt-pcr and a neutralisation assay. descriptive analyses of clinical and demographic data are presented as median and interquartile range (iqr) when continuous and as frequency and proportion (%) when categorical. the differences in continuous and categorical data were tested using wilcoxon rank sum and chi-square test respectively. the correlation between elisa and neutralisation assay was determined using the pearson correlation coefficient. statistical analysis were conducted using stata (version 13), with additional plots generated using graphpad prism. . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 18, 2020. . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june 18, 2020. comparison between ec50 dilution titre from neutralizing assay and positive/negative poc antibody test results (covidix sars-cov-2 igm igg test). n=44, p=0.0025. an interactive web-based dashboard to track covid-19 in real time rapid point of care nucleic acid testing for sars-cov-2 in hospitalised patients: a clinical trial and implementation study. medrxiv false-negative results of initial rt-pcr assays for covid-19: a systematic review. medrxiv detection of sars-cov-2 in different types of clinical specimens sensitivity of chest ct for covid-19: comparison to rt-pcr temporal dynamics in viral shedding and transmissibility of covid-19 clinical and virological data of the first cases of covid-19 in europe: a case series virological assessment of hospitalized patients with covid-2019 covid-19 illness in native and immunosuppressed states: a clinical-therapeutic staging proposal antibody responses to sars-cov-2 in patients with covid-19 a preliminary study on serological assay for severe acute respiratory syndrome coronavirus 2 (sars-cov-2) in 238 admitted hospital patients evaluation of nine commercial sars-cov-2 immunoassays. medrxiv comparative assessment of multiple covid-19 serological technologies supports continued evaluation of point-of-care lateral flow assays in hospital and community healthcare settings test performance evaluation of sars-cov-2 serological assays. medrxiv antibody responses to sars-cov-2 in patients of novel coronavirus disease 2019 antibody testing for covid-19: a report from the national covid scientific advisory panel. medrxiv comparative assessment of multiple covid-19 serological technologies supports continued evaluation of point-of-care lateral flow assays in hospital and community healthcare settings a serological assay to detect sars-cov-2 seroconversion in humans false-positive results in a recombinant severe acute respiratory syndrome-associated coronavirus (sars-cov) nucleocapsid enzyme-linked immunosorbent assay due to hcov-oc43 and hcov-229e rectified by western blotting with recombinant sars-cov spike polypeptide evaluation of commercial and automated sars-cov-2 igg and iga elisas using coronavirus disease (covid-19) patient samples an outbreak of severe kawasaki-like disease at the italian epicentre of the sars-cov-2 epidemic: an observational cohort study sars-cov-2 neutralizing antibody responses are more efficient transfer, integration, and sustained long-term expression of the transgene in adult rat brains injected with a lentiviral vector full-length hiv-1 gag determines protease inhibitor susceptibility within in vitro assays measuring sars-cov-2 neutralizing antibody activity using pseudotyped and chimeric viruses . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity (which was not certified by peer review)the copyright holder for this prep this version posted june 18, 2020. . https://doi.org/10.1101/2020.06. 16.20133157 doi: medrxiv preprint spike protein pseudotyped viral particles were incubated with serial dilutions of heat inactivated human serum samples from covid-19 suspected individuals (#15,16,32) in duplicates for 1h at 37ë�c. 293t ace2/tmprss2 expressing cells were added to each well. following 48h incubation in a 5% co2 environment at 37â°c, the luminescence was measured using steady-glo luciferase assay system (promega). percentage of neutralization was calculated with non-linear regression, log (inhibitor) vs. normalized response using graphpad prism 8 (graphpad software, inc., san diego, ca, usa). (c) the 50% inhibitory dilution (ec50) was defined as the serum dilution at which the relative light units (rlus) were reduced by 50% compared with the virus control wells (virus + cells) after subtraction of the background rlus in the control groups with cells only. the ec50 values were calculated with non-linear regression, log (inhibitor) vs. normalized response using graphpad prism 8 (graphpad software, inc., san diego, ca, usa). . cc-by-nc 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity (which was not certified by peer review)the copyright holder for this prep this version posted june 18, 2020. . https://doi.org/10.1101/2020.06.16.20133157 doi: medrxiv preprint key: cord-349161-4899cq99 authors: whiting, penny f; sterne, jonathan ac; westwood, marie e; bachmann, lucas m; harbord, roger; egger, matthias; deeks, jonathan j title: graphical presentation of diagnostic information date: 2008-04-11 journal: bmc med res methodol doi: 10.1186/1471-2288-8-20 sha: doc_id: 349161 cord_uid: 4899cq99 background: graphical displays of results allow researchers to summarise and communicate the key findings of their study. diagnostic information should be presented in an easily interpretable way, which conveys both test characteristics (diagnostic accuracy) and the potential for use in clinical practice (predictive value). methods: we discuss the types of graphical display commonly encountered in primary diagnostic accuracy studies and systematic reviews of such studies, and systematically review the use of graphical displays in recent diagnostic primary studies and systematic reviews. results: we identified 57 primary studies and 49 systematic reviews. fifty-six percent of primary studies and 53% of systematic reviews used graphical displays to present results. dot-plot or box-andwhisker plots were the most commonly used graph in primary studies and were included in 22 (39%) studies. roc plots were the most common type of plot included in systematic reviews and were included in 22 (45%) reviews. one primary study and five systematic reviews included a probability-modifying plot. conclusion: graphical displays are currently underused in primary diagnostic accuracy studies and systematic reviews of such studies. diagnostic accuracy studies need to include multiple types of graphic in order to provide both a detailed overview of the results (diagnostic accuracy) and to communicate information that can be used to inform clinical practice (predictive value). work is required to improve graphical displays, to better communicate the utility of a test in clinical practice and the implications of test results for individual patients. readers of a research report evaluating a diagnostic test may wish to assess the test's characteristics (diagnostic accuracy) or evaluate the impact that its use has on diag-nostic decisions (predictive value) for individual patients. graphical displays of results of test accuracy studies allow researchers to summarise and communicate the key findings of their study. we discuss the types of graphical dis-play commonly encountered in primary diagnostic accuracy studies and systematic reviews of such studies, and systematically review the use of graphical displays in recent diagnostic systematic reviews and primary studies. table 1 defines the various measures of diagnostic accuracy used. primary studies figure 1 illustrates four types of graphical display commonly used to present data on diagnostic accuracy for primary diagnostic accuracy studies. we used data from a study of the biochemical tumour marker ca-19-9 antigen to diagnose pancreatic cancer to construct these graphs [1] . dot plots (figure 1a) and box-and-whisker plots (figure 1b ) dot plots are used for test results that take many values, and display the distribution of results in patients with and without the target condition. box and whisker plots summarise these distributions: the central box covers the interquartile range with the median indicated by the line within the box. the whiskers extend either to the mini-mum and maximum values or to the most extreme values within 1.5 interquartile ranges of the quartiles, in which case more extreme values are plotted individually [2] . sometimes an indication of the threshold used to define a positive test result is included, for example by adding a horizontal line or shading at the relevant point. such plots can be used to clearly summarise a large volume of data, but are only able to display differences in the distribution of test values between patients with and without the target condition; they do not directly display the diagnostic performance of the test. although the ca-19-9 antigen test to diagnose pancreatic cancer (used to construct figure 1 ) is an example of continuous data, it is also possible to construct similar graphs for categorical test results providing that the number of categories is reasonably large. alternatively, for smaller numbers of categories, similar information can be conveyed using paired bar charts/histograms. paired histograms show the distribution of test results in patients with the target condition above the x-axis and the distribution in patients without the target condition below the x-axis. these types of graphical display are less commonly used. used as an overall (single indicator) measure of the diagnostic accuracy of a diagnostic test. it is calculated as the odds of positivity among diseased persons, divided by the odds of positivity among non-diseased. when a test provides no diagnostic evidence then the dor is 1.0. [33] this measure has a number of limitations: by combining sensitivity and specificity into a single indicator the relative values of the two are lost i.e. the dor can be the same for a very high sensitivity and low specificity as for very high specificity and low sensitivity [33] further, tests that are effective for classifying persons as having or not having the target condition have dors that whose magnitude is much greater (e.g. 100) than usually considered as indicating strong associations in epidemiological studies. [ predictive values depend on disease prevalence, the more common a disease is, the more likely it is that a positive test result is right and a negative result is wrong. [35] it is not possible to construct any of these graphs for truly dichotomous test results. however, truly dichotomous tests rarely occur in practice. examples of dichotomous tests include dipstick tests that change colour if the target condition is said to be present (although these are based on an underlying implicit threshold) or the presence/ absence of certain clinical symptoms. receiver operating characteristic (roc) plot (figure 1c ) roc plots show values of sensitivity and specificity at all of the possible thresholds that could be used to define a positive test result [3] . typically, sensitivity (true positive rate) is plotted against 1-specificity (false positive rate): each point represents a different threshold in the same group of patients. stepped lines are used for continuous example graphical displays for primary study data test results while sloping lines are used for ordered categories. roc curves may be derived directly from the observed sensitivity and specificity corresponding to different test thresholds, or by fitting curves based on parametric [4] , semi-parametric [5, 6] , or non-parametric methods [7] . the area under the roc curve (auc) is a summary of diagnostic performance, and takes values between 0.5 and 1. the more accurate the test, the more closely the curve approaches the top left hand corner of the graph (auc = 1). a test that provides no diagnostic information (auc = 0.5) will produce a straight line from the bottom left to the top right. roc curves may be restricted to a range of sensitivities or specificities of clinical interest. roc plots show how estimated sensitivity and specificity vary according to the threshold chosen, and can be used to identify suitable thresholds for clinical practice if the points on the curve are labelled with the corresponding threshold as in figure 1c , which shows for example that the sensitivity and specificity corresponding to a threshold of 39.3 are 74% and 90%, respectively. confidence intervals can be added to indicate the uncertainty in estimates of test performance at each point. roc plots also allow comparison of the performance of several tests independently of choice of threshold, by plotting data sets for multiple tests in the same roc space. however, they are thought to be difficult to interpret as they describe the characteristics of the test in a way which does not relate directly to its usefulness in clinical practice; research has shown that roc plots are generally poorly understood by clinicians [8] . flow charts (figure 1d ) these depict the flow of patients through the study: for example how many patients were eligible, how many entered the study, how many of these had the target condition, and the numbers testing positive and negative. such charts require categorisation of test results, for example as "positive" and "negative". although flow charts do not directly present diagnostic accuracy data, addition of percentages to the test result boxes (as in figure 1d ) can be used to report test sensitivity (68/90 = 76%) and specificity (46/51 = 90%). charts that first separate individuals according to test result before classification by disease status may similarly be used to depict positive and negative predictive values. the stard (standards for reporting of diagnostic accuracy) statement, an initiative to improve the reporting of diagnostic test accuracy studies similar to the consort statement for clinical trials, recommends the inclusion of a flow diagram in all reports of primary diagnostic accuracy studies [9] . this should illustrate the design of the study and provide information on the numbers of participants at each stage of the study as well as the results of the study. the example flow chart in figure 1d is not a full stard flow diagram as we do not have data on numbers of withdrawals or uninterpretable results from this study. it does, however, show the design (diagnostic case-control) and results of the study. figure 2 illustrates two graphical displays commonly used to present data on diagnostic accuracy in diagnostic systematic reviews. data from a systematic review of dipstick tests for urinary nitrite and leukocyte esterase to diagnose urinary tract infections were used to construct these graphs [10] . forest plots (figure 2a ) forest plots are commonly used to display results of metaanalysis. they display results from the individual studies together with, optionally, a summary (pooled) estimate. point estimates are shown as dots or squares (sometimes sized according to precision or sample size) and confidence intervals as horizontal lines [11] . the pooled estimate is displayed as a diamond whose centre represents the estimate and tips the confidence interval. for diagnostic accuracy studies, measures of test performance (sensitivity, specificity, predictive values, likelihood ratios or diagnostic odds ratio) are plotted on the horizontal axis. diagnostic test performance is often described by pairs of summary statistics (e.g. sensitivity and specificity; positive and negative likelihood ratios), and these are depicted side-by-side. between-study heterogeneity can readily be assessed by visual examination. results may be sorted by one of a pair of test performance measures, usually that which is most important to the clinical application of the test. a disadvantage of paired forest plots is that they do not directly display the inverse association between the two measures that commonly results from variations in threshold between studies. roc plots can be used to present the results of diagnostic systematic reviews, but differ from those used in primary studies as each point typically represents a separate study or data set within a study (individual studies may contribute more than one point). a summary roc (sroc) curve can be estimated using one of several methods [12] [13] [14] [15] and quantifies test accuracy and the association between sensitivity and specificity based on differences between studies. as with forest plots, roc plots provide an overview of the results of all included studies. however, unless there are very few studies, it is not feasible to display confidence intervals as the plot would become cluttered. results for several tests can be displayed on the same plot, facilitating test comparisons. it is also possible to display pooled estimates of sensitivity and specificity together with associated confidence intervals or prediction regions. roc plots may also be used to investigate possible expla-nations for differences in estimates of accuracy between studies, for example those arising from differences in study quality. figure 3 shows results for a recent review that we conducted on the accuracy of magnetic resonance imaging (mri) for the diagnosis of multiple sclerosis (ms) [16] . by using different symbols to illustrate studies that did (diagnostic cohort studies) and did not (other study designs) include an appropriate patient spectrum we were able to show that studies that included an inappropriate patient spectrum grossly overestimated both sensitivity and specificity. various other graphical methods have been developed to display the results of systematic reviews and meta-analyses [17, 18] . although not generally developed specifically for diagnostic test reviews these can be adapted to display the results of such reviews. funnel plots [19] and galbraith plots [20] are often used to assess evidence for publication bias or small study effects in systematic reviews of the effects of medical interventions assessed in randomized controlled trials. however, their application to systematic reviews of diagnostic test accuracy studies is example graphs for systematic review data figure 2 example graphs for systematic review data. a. paired forest plots of sensitivity and specificity for le dipstick. b. roc plot with sroc curves. problematic [20] . diagnostic odds ratios are typically far from 1, and it has been shown that, for data of this type, sampling variation can lead to artefactual associations between log odds ratios and their standard errors [21] . it is therefore recommended that the effective sample size funnel plot be used in reviews of test accuracy studies [20] . a number of graphical displays aim to put results of diagnostic test evaluations into clinical context, based either on primary studies or systematic reviews. two graphical displays commonly used for this purpose are the likelihood ratio nomogram ( figure 4a ) and the probabilitymodifying plot (figure 4b) . each allows the reader to estimate the post-test probability of the target condition in an individual patient, based on a selected pre-test probability. to use the likelihood ratio nomogram, the reader needs an estimate of the likelihood ratios for the test. he then draws a line through the appropriate likelihood ratio on the central axis, intersecting the selected pre-test probability, to derive the post-test probability of disease. the probability-modifying plot depicts separate curves for positive and negative test results. the reader draws a vertical line from the selected pre-test probability to the appropriate likelihood ratio line and then reads the post-test probability off the vertical scale. both graph types are based on a single estimate of test accuracy (likelihood ratio), although it is possible to plot separate curves on the probability-modifying plot or lines on the nomogram to depict confidence intervals around the estimated likelihood ratios. each assumes constant likelihood ratios across the range of pre-test probabilities. however, this assumption may be violated in practice [22] , because populations in which the test is used may have different spectrums of disease to those in which estimates of test accuracy were derived. example graphs for interpreting diagnostic study result figure 4 example graphs for interpreting diagnostic study result. a. likelihood ratio nomogram. b. probability modifying plot. sensitivity plotted against specificity, separately for cohort studies and for studies of other designs for mri for diagnosis of multiple sclerosis figure 3 sensitivity plotted against specificity, separately for cohort studies and for studies of other designs for mri for diagnosis of multiple sclerosis. we systematically reviewed how graphical displays are currently incorporated in studies of test performance. we included primary diagnostic accuracy studies published in 2004, identified by hand searching 12 journals (table 2) , and diagnostic systematic reviews published in 2003, identified from dare (database of abstracts of reviews of effects) [23] . searches were conducted in 2005 and so these years were the most complete available years for searching (there is a delay in adding studies to dare). diagnostic accuracy studies were studies that provided data on the sensitivity and specificity of a diagnostic test and that focused on diagnostic (whether the patient had the condition of interest) rather than prognostic (disease severity/risk prediction) questions. journals were selected to provide a mixture of the major general medical and specialty journals. we particularly aimed to select journals that clinicians read. we extracted data on the different graphical displays used to summarise information about test performance, defined as any graphical method of summarising data on diagnostic accuracy or the predictive value of a test (table 1) . we located 56 primary studies and 49 systematic reviews (web appendix). fifty-seven percent of primary studies and 53% of systematic reviews used graphical displays to present results. in publications using graphics, the number of graphs per publication ranged from 1 to 51 (median 2, iqr 1 to 3 for primary studies and median 4, iqr 2 to 7 for systematic reviews). table 3 summarises the categories of tests evaluated in the primary studies and systematic reviews. none of the tests evaluated in any of the primary studies were truly dichotomous: they all gave continuous or categorical results. three of the eight systematic reviews that assessed clinical examination looked at whether a variety of signs or symptoms were present or absent: these can be considered as truly dichotomous tests. all other reviews evaluated continuous or categorical tests. dot-plots or box-and-whisker plots were the most commonly used graphic and were included in 22 (39%) studies. generally the plots showed individual test results separately for patients with and without the target condition, with four including an indication of the threshold used to define a positive test result. three studies included both a dot plot and a box-and-whisker plot on the same figure. other variations included separate plots for different patient subgroups, different symbols to indicate different stages of disease, or separate plots for different tests. the majority of studies using these types of plots were of laboratory tests. an roc curve was displayed in 15 (26%) studies. all of these plotted full roc curves; only two provided any indication of the thresholds corresponding to one or more of the points. thirteen studies included separate roc curves for different tests, either on the same plot (10 studies) or on separate plots (3 studies). five studies included separate roc plots for different patient subgroups. although all the primary studies were published in 2004, after the publication of the stard guidelines, only one included a stard flow diagram. roc plots were included in 22 (45%) reviews. twenty showed individual study estimates of sensitivity and specificity, 14 fitted sroc curves, and two displayed a summary point. one study, which did not fit an sroc curve, added a box and whisker plot to each axis to show the distributions of sensitivity and specificity. one study plotted only summary estimates of sensitivity and specificity in roc space, with no sroc curves. some reviews included separate plots for different tests, for different patient subgroups, or for different thresholds used to define a positive test result. ten reviews (20%) used forest plots to display individual study results. one study provided a plot of diagnostic odds ratios, while all others displayed paired plots of sensitivity and specificity (8 reviews), positive and negative likelihood ratios (3 reviews), or positive and negative predictive values (1 review). several studies displayed more than one set of forest plots, including plots for more than one summary measure, for different stages of diagnosis, different test thresholds or for different tests. one study included a forest plot of summary data only, showing how pooled estimates of positive and negative likelihood ratios varied for different patient subgroups. none of the studies included a likelihood ratio nomogram. one primary study and five systematic reviews included a probability-modifying plot. research in the area of cognitive psychology suggests that sensitivity and specificity are generally poorly understood by doctors [8, 24] and are often confused with predictive values [8, 25, 26] . doctors tend to overestimate the impact of a positive test result on the probability of disease [27, 28] and this overestimation increases with decreasing pre-test probabilities of disease [29] . this research suggests that the most informative measures for doctors may be estimates of the post-test probability of disease (predictive value), which can be presented as a range corresponding to different pre-test probabilities. however, graphical displays that facilitate the derivation of post-test probabilities, such as likelihood ratio nomograms, are usually based on summary estimates of test characteristics (positive and negative likelihood ratios) without allowing for the precision of the estimate, or its applicability to a given population. use of summary estimates in this way is questionable in the context of reviews of diagnostic accuracy studies, which typically find substantial between-study heterogeneity [30] . it is particularly problematic if the summary estimate is the only information conveyed in a graphic and the graphic is taken as the key message of the paper. the inclusion of some form of graphical presentation of test accuracy data has a number of advantages compared to not using such displays. it allows fuller reporting of results, for example (s)roc plots can display results for multiple thresholds whereas reporting test accuracy results in a text or table generally requires the selection of one or more thresholds. in addition, (s)roc plots depict the trade-off between sensitivity and specificity at different thresholds. use of such displays also have the advantage of presenting all of the results of a primary study or systematic review without the need for selected analyses, which may be biased depending on the analyses selected. the inclusion of graphical displays, such as sroc plots or forest plots, in systematic reviews of test accuracy studies allows a visual assessment of heterogeneity between studies by showing the results from each individual study included in the review. there is also a suggestion that graphical displays may be easier to interpret than text or tabular summaries of the same data. diagnostic accuracy studies will usually need to include more than one graphic in order both to provide a detailed description of results (diagnostic accuracy) and to communicate appropriate summary measures that can be used to inform clinical practice (predictive value); the more detailed graphic provides context for the interpretation of summary measures. further work is required to improve on existing graphical displays. the starting point for this should be further evaluation of the types of graphical display most helpful to assessing the utility of a test in clinical practice and the implications of test results for individual patients. we hope that this paper will contribute to an increase in the use and quality of graphical displays in diagnostic accuracy studies and systematic reviews of these studies. include references to the stard flow diagram. stard itself does not comment on how graphical displays should be used to convey results of test accuracy studies other than to recommend the inclusion of a flow diagram and to provide an illustration of a dot-plot as a suggestion for how individual study results may be displayed. guidelines on the type of graphical displays that should be included in reports of test accuracy studies could be considered when stard is next updated, and should be considered by journals in their instructions for authors. our review suggests that graphical displays are currently underused in primary diagnostic accuracy studies and systematic reviews of such studies. graphical displays of diagnostic accuracy data should provide an easily interpretable and accurate representation of study results, conveying both diagnostic accuracy and predictive value. this is not usually possible in a single graphic: the type of information presented in the most commonly used graphs does not directly allow clinicians to assess the implications of test results for an individual patient. the author(s) declare that they have no competing interests. all authors contributed to the design of the study and read and approved the final manuscript. pfw and mew identified relevant studies and extracted data from included studies. pfw carried out the analysis and drafted the manuscript with help from jd and rh. venous doppler in the prediction of acid-base status of growth-restricted fetuses with elevated placental blood flow resistance detection of human polyomaviruses in urine from bone marrow transplant patients: comparison of electron microscopy with pcr magnetic resonance imaging of the breast prior to biopsy diagnosis of pancreatic cystic neoplasms: a report of the cooperative pancreatic cyst study rapid hiv-1 testing during labor: a multicenter study potential clinical utility of a new irma for parathyroid hormone in postmenopausal patients with primary hyperparathyroidism immuno-pcr for detection of antigen to angiostrongylus cantonensis circulating fifth-stage worms computed tomographic colonography (virtual colonoscopy): a multicenter comparison with standard colonoscopy for detection of colorectal neoplasia comparison of endoscopic ultrasonography and multidetector computed tomography for detecting and staging pancreatic cancer comparison of clinical criteria for the acute respiratory distress syndrome with autopsy findings use of the fetal fibronectin test in decisions to admit to hospital for preterm labor soluble triggering receptor expressed on myeloid cells and the diagnosis of pneumonia prediction of outcome from the chest radiograph appearance on day 7 of very prematurely born infants cervicovaginal interleukin-6, tumor necrosis factor-, and interleukin-2 receptor as markers of preterm delivery natriuretic peptides as markers of mild forms of left ventricular dysfunction: effects of assays on diagnostic performance of markers association of coronary heart disease with pre-[beta]-hdl concentrations in japanese men prognostic value of tubular proteinuria and enzymuria in nonoliguric acute tubular necrosis reliability of symptoms to determine use of bone scans to identify bone metastases in lung cancer: prospective study plasma fluorescence scanning and fecal porphyrin analysis for the diagnosis of variegate porphyria: precise determination of sensitivity and specificity with detection of protoporphyrinogen oxidase mutations as a reference standard quantitative real-time pcr with automated sample preparation for diagnosis and monitoring of cytomegalovirus infection in bone marrow transplant patients ross me, the colorectal cancer study group. fecal dna versus fecal occult blood for colorectal-cancer screening in an average-risk population analysis of subforms of free prostate-specific antigen in serum by two-dimensional gel electrophoresis: potential to improve diagnosis of prostate cancer identification by proteomic analysis of calreticulin as a marker for bladder cancer and evaluation of the diagnostic accuracy of its detection in urine oesophageal endoscopic ultrasound with fine needle aspiration improves and simplifies the staging of lung cancer efficacy of mri and mammography for breast-cancer screening in women with a familial or genetic predisposition improved specificity of newborn screening for congenital adrenal hyperplasia by second-tier steroid profiling using tandem mass spectrometry a serum autoantibody marker of neuromyelitis optica: distinction from multiple sclerosis improved accuracy of detection of nasopharyngeal carcinoma by combined application of circulating epstein-barr virus dna and anti-epstein-barr viral capsid antigen iga antibody a clinical prediction rule for diagnosing severe acute respiratory syndrome in the emergency department diagnosis of tuberculosis in south african children with a t-cell-based assay: a prospective cohort study iga antibodies against tissue transglutaminase in the diagnosis of celiac disease: concordance with intestinal biopsy in children and adults comparison of new clinical and scintigraphic algorithms for the diagnosis of pulmonary embolism effect of breast augmentation on the accuracy of mammography and cancer characteristics proenzyme forms of prostate-specific antigen in serum improve the detection of prostate cancer predictive value of the balloon expulsion test for excluding the diagnosis of pelvic floor dyssynergia in constipation hyperglycosylated hcg) as a screening marker for down syndrome during the second trimester invasive trophoblast antigen (hyperglycosylated human chorionic gonadotropin) in second-trimester maternal urine as a marker for down syndrome: preliminary results of an observational study on fresh samples a novel and accurate diagnostic test for human african trypanosomiasis fecal lactoferrin for diagnosis of symptomatic patients with ileal pouch-anal anastomosis differential time to positivity: a useful method for diagnosing catheter-related bloodstream infections negative ddimer result to exclude recurrent deep venous thrombosis: a management trial predicting bacterial cause in infectious conjunctivitis: cohort study on informativeness of combinations of signs and symptoms serum markers detect the presence of liver fibrosis: a cohort study serologic assay based on gliadin-related nonapeptides as a highly sensitive and specific diagnostic aid in celiac disease diagnostic accuracy of ten second-generation (human) tissue transglutaminase antibody assays in celiac disease accuracy of computed tomographic angiography and magnetic resonance angiography for diagnosing renal artery stenosis protein profiling in urine for the diagnosis of bladder cancer surveillance of brca1 and brca2 mutation carriers with magnetic resonance imaging, ultrasound, mammography, and clinical breast examination roc analysis comparison of three assays for the detection of antibodies against double-stranded dna in serum for the diagnosis of systemic lupus erythematosus is endosonography guided fine needle aspiration (eus-fna) for sarcoidosis as good as we think? mammaglobin as a novel breast cancer biomarker: multigene reverse transcription-pcr assay and sandwich elisa systematic reviews metaanalysis of the accuracy of rapid prescreening relative to full screening of pap smears antenatal screening for postnatal depression: a systematic review eponyms and the diagnosis of aortic regurgitation: what says the evidence accuracy of ottawa ankle rules to exclude fractures of the ankle and mid-foot: systematic review is this woman perimenopausal? computed tomography and magnetic resonance imaging in staging of uterine cervical carcinoma: a systematic review screening for dementia b-type natriuretic peptide: a review of its diagnostic, prognostic, and therapeutic monitoring value in heart failure for primary care physicians systematic review of the value of positron emission tomography in the diagnosis of alzheimer's disease does this patient have pulmonary embolism? the effectiveness of diagnostic tests for the assessment of shoulder pain due to soft tissue disorders: a systematic review a systematic review of transvaginal ultrasonography, sonohysterography and hysteroscopy for the investigation of abnormal uterine bleeding in premenopausal women first-trimester prenatal screening for down syndrome and other aneuploidies. montreal, pq, canada. agence d'evaluation des technologies et des modes d'intervention en sante (aetmis) meta-analysis of eeg test performance shows wide variation among studies tumor markers in the diagnosis of primary bladder cancer: a systematic review assessment of clinical utility of f-18-fdg pet in patients with head and neck cancer: a probability analysis diagnostic value of adenosine deaminase in tuberculous pleural effusion: a meta-analysis test performance of positron emission tomography and computed tomography for mediastinal staging in patients with non-small-cell lung cancer results of systematic review of research on diagnosis and treatment of coronary heart disease in women test characteristics of alpha-fetoprotein for detecting hepatocellular carcinoma in patients with hepatitis c a meta-analysis derivation of continuous likelihood ratios for diagnosing pleural fluid exudates the diagnostic accuracy of computed tomography angiography for traumatic or atherosclerotic lesions of the carotid and vertebral arteries: a systematic review accuracy of cervical transvaginal sonography in predicting preterm birth: a systematic review f-18-fdg pet for the diagnosis and grading of soft-tissue sarcoma: a meta-analysis evaluation of acute knee pain in primary care assessment of diagnostic tests to inform policy decisions-visual electrodiagnosis breast cancer diagnosis by scintimammography: a metaanalysis and review of the literature screening performance of first-trimester nuchal translucency for major cardiac defects: a meta-analysis imaging in appendicitis: a review with special emphasis on the treatment of women validity of colposcopy in the diagnosis of early cervical neoplasia: a review diagnostic accuracy of nucleic acid amplification tests for tuberculous meningitis: a systematic review and metaanalysis review of the literature on the value of magnetoencephalography in epilepsy the effectiveness of community-based visual screening and utility of adjunctive diagnostic aids in the early detection of oral cancer whispered voice test for screening for hearing impairment in adults and children: systematic review diagnostic impact of signs and symptoms in acute infectious conjunctivitis: systematic literature search a systematic review and evaluation of tumour markers in paediatric oncology: ewing's sarcoma and neuroblastoma magnetic resonance cholangiopancreatography: a meta-analysis of test performance in suspected biliary disease accuracy of computer diagnosis of melanoma: a quantitative meta-analysis does this child have acute otitis media? accuracy of physical diagnostic tests for assessing ruptures of the anterior cruciate ligament: a meta-analysis diagnostic performance of intracardiac echogenic foci for down syndrome: a meta-analysis evidence assessment of the accuracy of methods of diagnosing middle ear effusion in children with otitis media with effusion noninvasive staging of non-small cell lung cancer: a review of the current evidence invasive staging of non-small cell lung cancer: a review of the current evidence does this patient have acute cholecystitis? computed tomographic angiography for detecting cerebral aneurysms: implications of aneurysm size distribution for the sensitivity, specificity, and likelihood ratios screening accuracy for latelife depression in primary care: a systematic review the accuracy and efficacy of screening tests for chlamydia trachomatis: a systematic review a family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data statistical methods in medical research fourth edition statistics notes: diagnostic tests 3: receiver operating characteristic plots receiver-operating characteristic (roc) plots: a fundamental evaluation tool in clinical medicine semi-parametric roc regression analysis with placement values smooth semiparametric receiver operating characteristic curves for continuous diagnostic tests smooth non-parametric receiver operating characteristic (roc) curves for continuous diagnostic tests academic calculations versus clinical judgments: practicing physicians' use of quantitative measures of test accuracy towards complete and accurate reporting of studies of diagnostic accuracy: the stard initiative clinical effectiveness and cost-effectiveness of tests for the diagnosis and investigation of urinary tract infection in children: a systematic review and economic model forest plots: trying to see the wood and the trees combining independent studies of a diagnostic test into a summary roc curve: data-analytic approaches and some additional considerations a hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations zwinderman ah: bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews empirical bayes estimates generated in a hierarchical summary roc analysis agreed closely with those of a full bayesian analysis accuracy of magnetic resonance imaging for the diagnosis of multiple sclerosis: systematic review a graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials a note on graphical presentation of estimated odds ratios from several clinical trials summing up: the science of reviewing research cambridge the performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature sources of variation and bias in studies of diagnostic accuracy -a systematic review general practitioners' self ratings of skills in evidence based medicine: validation study interpretation by physicians of clinical laboratory results probabilistic reasoning in clinical medicine: probems and opportunities communicating accuracy of tests to general practitioners: a controlled study overestimation of test effects in clinical judgment the effect of changing disease risk on clinical reasoning exploring sources of heterogeneity in systematic reviews of diagnostic tests statistics notes: diagnostic tests 1: sensitivity and specificity diagnostic tests 4: likelihood ratios the diagnostic odds ratio: a single indicator of test performance limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker statistics notes: diagnostic tests 2: predictive values association of coronary heart disease with pre-{beta}-hdl concentrations in japanese men this work was supported by the mrc health services research collaboration. jonathan deeks is funded by a senior research fellowship in evidence synthesis from the department of health. the pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/8/20/prepub key: cord-307500-2jwuzfan authors: gray, nicholas; calleja, dominic; wimbush, alex; miralles-dolz, enrique; gray, ander; de-angelis, marco; derrer-merk, elfride; oparaji, bright uchenna; stepanov, vladimir; clearkin, louis; ferson, scott title: "no test is better than a bad test": impact of diagnostic uncertainty in mass testing on the spread of covid-19 date: 2020-04-22 journal: nan doi: 10.1101/2020.04.16.20067884 sha: doc_id: 307500 cord_uid: 2jwuzfan background: the cessation of lock-down measures will require an effective testing strategy. much focus at the beginning of the uk's covid-19 epidemic was directed to deficiencies in the national testing capacity. the quantity of tests may seem an important focus, but other characteristics are likely more germane. false positive tests are more probable than positive tests when the overall population has a low prevalence of the disease, even with highly accurate tests. methods: we modify an sir model to include quarantines states and test performance using publicly accessible estimates for the current situation. three scenarios for cessation of lock-down measures are explored: (1) immediate end of lock-down measures, (2) continued lock-down with antibody testing based immunity passports, and (3) incremental relaxation of lock-down measures with active viral testing. sensitivity, specifcity, prevalence and test capacity are modified for both active viral and antibody testing to determine their population level effect on the continuing epidemic. findings: diagnostic uncertainty can have a large effect on the epidemic dynamics of covid-19 within the uk. the dynamics of the epidemic are more sensitive to test performance and targeting than test capacity. the quantity of tests is not a substitute for an effective strategy. poorly targeted testing has the propensity to exacerbate the peak in infections. interpretation: the assessment that 'no test is better than a bad test' is broadly supported by the present analysis. antibody testing is unlikely to be a solution to the lock-down, regardless of test quality or capacity. a well designed active viral testing strategy combined with incremental relaxation of the lock-down measures is shown to be a potential strategy to restore some social activity whilst continuing to keep infections low. the uk government's covid-19 epidemic management strategy has been influenced by epidemiological modelling conducted by a number of research groups [1, 2] . the analysis of the relative impact of different mitigation and suppression strategies has influenced the current approach. the "only viable strategy at the current time" is to suppress the epidemic with all available measures, including school closures and social distancing of the entire population [3] . these analyses have highlighted from the beginning that the eventual relaxation of lock-down measures would be problematic [3, 4] . without a considered cessation of the suppression strategies the risk of a second wave becomes signficant, possibly of greater magnitude than the first as the sars-cov-2 virus is now endemic in the population [5, 6] . although much attention has been focused on the number of tests being conducted [7, 8] , not enough attention has been given to the issues of imperfect testing. whilst poorly performing tests have not been prominent in public discourse, evidence suggests they are epidemiologically significant. the failure to detect the virus in infected patients can be a significant problem in high-throughput settings operating under severe pressure [9] . evidence suggest that this is indeed the case [10, 11, 12, 13] . everyone seems to agree that testing will be a pillar of whatever approach is employed to relax the current social distancing measures [14] . the public are rapidly becoming aware of the difference between the 'have you got it?' tests for detecting active cases, and the 'have you had it?' tests for the presence of antibodies, which imply some immunity to . what may be less obvious is that these different tests need to maximise different test characteristics. to be useful in ending the current social distancing measures, active viral tests need to maximise the sensitivity. how good the test is at telling you that you have the disease. high sensitivity reduces the chance of missing people who have the virus who may go on to infect others. there is an additional risk that an infected person who has been incorrectly told they do not have the disease, when in fact they do, may behave in a more reckless manner than if their disease status were uncertain. the second testing approach, seeking to detect the presence of antibodies to identify those who have had the disease would be used in a different strategy. this strategy would involve detecting those who have successfully overcome the virus, and are likely to have some level of immunity (or at least reduced susceptibility to more serious illness if they are infected again), so are relatively safe to relax their personal social distancing measures. this strategy would require a high test specificity, aiming to minimise how often the test tells someone they have had the disease when they haven't. a false positive tells people they have immunity when they don't, which is even worse than when people are uncertain about their viral history. 2 introduction to test statistics: what makes a 'good' test? in order to answer this question there are a number of important statistics: sensitivity σ -out of those who actually have the disease, that fraction that received a positive test result. specificity τ -out of these who did not have the disease, the fraction that received a negative test result. the statistics that characterise the performance of the test are computed from a confusion matrix (table 1) . we test n inf ected people who have covid-19, and n healthy people who do not have covid-19. in the first group, a people correctly test positive and c falsely test negative. among healthy people, b will falsely test positive, and d will correctly test negative. from this confusion matrix the sensitivity is given by (1) and the specificity by (2) . 2 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted april 22, 2020. . sensitivity is the ratio of correct positive tests to the total number of infected people involved in the study characterising the test. the specificity is the ratio of the correct negative tests to the total number of healthy people. importantly, these statistics depend only on the test itself and do not depend on the population the test is intended to be used upon. computing the statistics require a definitive way to determine the true viral status of a patient; a so-called gold standard. if there is doubt about in which column a patient falls, the confusion matrix cannot be constructed. when novel tests are employed the confusion matrix can be very challenging to rigorously assess in the midst of a fast moving epidemic [15, 16, 17] . when the test is used for diagnostic purposes, the characteristics of the population being tested become important for interpreting the test results. to interpret the diagnostic value of a positive or negative test result the following statistics must be used: prevalence p -the proportion of people in the target population that have the disease tested for. positive predictive value p p v -how likely one is to have the disease given a positive test result. negative predictive value n p v -how likely one is to not have the disease, given a negative test result. the p p v and n p v depend on the prevalence, and hence depend on the population you are focused on. this may be the uk population, a sub population with covid-19 compatible symptoms, or any other population you may wish to target. the p p v and n p v can then be calculated using bayes' rule: to improve the diagnostic performance of tests they are often repeated to increase the aggregate p p v or n p v . to do this an assumption of independence between the two tests needs to be made. this assumption could be questionable in some circumstances. for instance, it would be questionable if samples are analysed at the same time in the same lab by the same technician, or if the same method for extracting the sample from the patient is employed, it may be unsuccessful at detecting virus for the same reason. a plethora of other possible errors are imaginable. many of these errors may be truly random, and independent, but many may not be so the independence assumption may be weakly justified. the rapid development and scaling of new diagnostic systems invites error, particularly as labs are converted from other purposes and technicians are placed under pressure, and variation in test collection quality, reagent quality, sample preservation and storage, and sample registration and provenance. assessing the magnitude of these errors on the performance of tests is challenging in real time. point-ofcare tests are not immune to these errors and are often seen as less accurate than laboratory based tests [18, 19] . the prevalence of the disease matters. the p p v can vary drastically for different populations with different prevalence. the idea that prevalence depends on the population may seem counterintuitive to some audiences. for example, if we were to select 100 people from a respiratory ward this week from any hospital in the uk, and 100 people from a street outside the building, what proportion of each population have covid-19? if one tests both populations with the same test and found positives in each population, which would have the higher p p v ? 3 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted april 22, 2020 to illustrate the impact of prevalence on p p v , for a test with σ = τ = 0.95 if prevalence p = 0.05, then the p p v ≈ 0.48. figure 1 shows why, for 1000 test subjects there will be similar numbers of true and false positives even with high sensitivity and specificity of 95%. in contrast, using the same tests on a sample with a higher prevalence p = 0.5 we find the p p v = 0.8, see figure 2 . similarly, the n p v is lower when the prevalence is higher. the number of active cases exceeding the test capacity may not be the only discrepancy between the true cases and reported cases. the impact of uncertainty in testing may also be contributing to the discrepancy, even in the tested population. more testing will not reduce this uncertainty. the director of the who suggests that testing is a crucial part of any strategy [20] , but even testing the entire country every day would not give an accurate tally of infections. to explore the effect of imperfect testing on the disease dynamics when strategies are employed to relax the current social distancing measures the sir model described in the supplimentary material was modified. three new classes were added to the model, the first is a quarantined susceptible state, q s , the second is a quarantined infected state, q i , and the third is people who have recovered but are in quarantine, q r . to model the current lock-down, the model evaluations begin with a majority of the population in the q s (quarantined but susceptible) state. whilst in this state the transmission rate of the disease is totally suppressed. the model evaluates each day's average population-level state transitions. there are two possible tests that can be performed: an active virus infection test that is able to determine whether or not someone is currently infectious. this test is performed on some proportion of the un-quarantined population (s + i + r). it has a sensitivity of σ a and a specificity of τ a . an antibody test that determines whether or not someone has had the infection in the past. this is used on the fraction of the population that is currently in quarantine but not infected (q s + q r ) to test whether they have had the disease or not. this test has a sensitivity of σ b and a specificity of τ b . these two tests are used on some of those eligible for testing each day limited by the test capacity, ρ and φ respectively. a person (in any category) who tests positive in an active virus test transitions into the (corresponding) quarantine state, where they are unable to infect anyone else. a person, in q s or q r , who tests positive in an antibody test transitions to s and r respectively. for this parameterisation the impact of being in the susceptible quarantined state, q s , makes an individual insusceptible to being infected. similarly, being in the infected quarantined state, q i , individuals are unable to infect anyone else. in practicality there is always leaking, no quarantine is entirely effective, but for the sake of exploring the impact of testing uncertainty these effects are neglected from the model. the participation in infection propagation of individuals in either quarantine state are idiosyncratic, and on average are assumed to be negligibly small for the sake of this analysis. if the tests were almost perfect, then we can imagine how the epidemic would die out very quickly by either widespread infection or antibody testing with a coherent management strategy. a positive test on the former and the person is removed from the population, and positive test on the latter and the person, unlikely to contract the disease again, can join the population . more interesting are the effects of incorrect test results on the disease dynamics. if someone falsely tests positive in the antibody test, they enter the susceptible state. similarly, if an infected person receives a false negative for the disease they remain active in the infected state and hence can continue the disease propagation and infect further people. 4 what part will testing play in relaxing current social distancing measures? in order to explore the possible impact of testing strategies on the relaxation of current social distancing measures several scenarios have been analysed. these scenarios are illustrative of the type of impact, 5 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 22, 2020. . https://doi.org/10.1101/2020.04. 16.20067884 doi: medrxiv preprint and the likely efficacy of a range of different testing configurations. immediate end to social distancing scenario: this baseline scenario is characterised by a sudden relaxation of the current social distancing measures. immunity passports scenario: a policy that has been discussed in the media [21, 22, 17] . analogous to the international certificate of vaccination and prophylaxis, antibody based testing would be used to identify those who have some level of natural immunity. incremental relaxation scenario: a phased relaxation of the government's social distancing advice is the most likely policy that will be employed. to understand the implications of such an approach this scenario has explored the effect of testing capacity and test performance on the possible disease dynamics under this type of policy. under the model parameterisation this analysis has applied an incremental transition rate from the q s state to the s state, and q r to r. whilst the authors are sensitive to the sociological and ethical concerns of any of these approaches [23, 24] , the analysis presented is purely on the question of efficacy. under the baseline scenario, characterised by the sudden and complete cessation of the current social distancing measures, we explored the impact of infection testing. under this formulation the initial conditions of the model in this scenario is that the all of the population in q s transition to s in the first iteration. as would be expected the model indicates the second wave is an inevitability and as many as 20 million people could become infected within 30 days, figure 4. to illustrate the sensitivity of the model to testing scenarios an evaluation was conducted with a range of infection test sensitivities, from 50% (i.e of no diagnostic value) to 98%. the specificity of these tests has a negligible impact on the disease dynamics. a false positive test result would mean people are unnecessarily removed from the susceptible population, but the benefit of a reduction in susceptible population is negligibly small. it's also very likely the infection testing would be heavily biased toward symptomatic carriers, where the prevalence of the disease is high so fewer false positives would be expected. two evaluations have been conducted. the first using the stated government goal of 100,000 tests per day (left graphs in figure 4 ). it remains unclear whether this aim is feasible, or if this testing capacity would include both forms of tests (antibody and active virus). the second evaluation looks at a very optimistic case where we could conduct as many as 150,000 tests per day (right graphs in figure 4 ). the authors draw no conclusions about the feasibility of achieving these levels. however the authors do wish to encourage caution that with a capacity for testing of the order targeted by the uk government, testing in isolation is not sufficient to allow any rapid cessation of the current social distancing measures without a resurgence of the virus. this caution is irrespective of test performance, even very good tests with a sensitivity of 98%, and effective isolation of cases that have tested positive, the outcome is broadly invariant. the immunity passport is an idiom describing an approach to the relaxation of the current social distancing measures that focuses heavily on antibody testing. wide-scale screening for antibodies in the general population promises significant scientific value, and targeted antibody testing is likely to have value for reducing risks to nhs and care-sector staff, and other key workers who will need to have close contact with covid-19 sufferers. the authors appreciate these other motivations for the development and roll-out of accurate antibody tests. this analysis however focuses on the appropriateness of this approach to relaxing current social distancing measures by mass testing the general population. antibody testing has been described as a 'game-changer' [25] . some commentators believe this could have a significant impact on the relaxation of social distancing measures [22] . . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. much of the discussion around antibody testing in the media has focused on the performance and number of these tests. the efficacy of this strategy however is far more dependent on the prevalence of antibodies in the general population. without wide-scale antibody screening it's impossible to know the prevalence of antibodies in the general population, so there is scientific value in such an endeavour. however, the prevalence is the dominant factor to determine how efficacious antibody screening would be for relaxing social distancing measures. presumably, only people who test positive for antibodies would be allowed to leave quarantine. the more people in the population with antibodies, the more people will get a true positive, so more people would be correctly allowed to leave quarantine (under the paradigms of an immunity passport). the danger of such an approach is the false positives. we demonstrate the impact of people reentering the susceptible population who have no immunity. we assume their propensity to contract the infection is the same as those without this the false sense of security a positive test may engender. on an individual basis, and even at the population level, behavioural differences between those with false security from a positive antibody test, versus those who are uncertain about their viral history could be significant. the model parametrisation here does not include this additional confounding effect. to simulate the prevalence of antibodies in the general population the model is preconditioned with different proportions of the population in the q s and q r states. this is analogous to the proportion of people that are currently in quarantine who have either had the virus and developed some immunity, and the proportion of the population who have not contracted the virus and have no immunity. of course the individuals in these groups do not really know their viral history, and hence would not know which state 8 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 22, 2020. each column corresponds to a different antibody test sensitivity in figure 5 as titled. the specificity for each test in these evaluations was fixed to 90%. in figure 6 each column corresponds to a different antibody test specificity as titled. the sensitivity for each test in these evaluation was fixed to 95%. for all model evaluations in figures 5 and 6 we modelled a continuing and constant ability to conduct targeted active virus testing which continued to remove individuals from the infected population. the infection testing continued throughout each model run with a fixed capacity of 10,000 tests per day, similar to the number of unique individuals that are currently being tested. each of the graphs in the two figures shows the effect of different prevalences of antibodies in the population. to be clear, this is the proportion of the population that has contracted the virus and recovered but are in quarantine. sir patrick vallance, the uk government chief scientific advisor, in the daily press briefing on the 9 april 2020 stated his belief that this prevalence is likely to be less than 10%, possibly much less. the analysis has explored a range for prevalence from 0.1% to 50%. figure 5 explores the impact of a variation in sensitivity, from a test with 50% sensitivity (i.e no diagnostic value) to tests with a high sensitivity of 98%. it can be seen, considering the top half of the graphs, that the sensitivity of the test has no discernible impact on the number of infections. the prevalence entirely dominates. this is possibly counter intuitive, but as was discussed in section 2.1, even a highly accurate test produces a very large number of false positives when prevalence is low. in this case that would mean a large number of people are allowed to re-enter the population, placing them at risk, with a false sense of security that they have immunity. the bottom row of figure 5 shows the proportion of the entire population leaving quarantine over a year of employing this policy. at low prevalence there is no benefit to better performing tests. this again may seem obscure to many readers. if you consider the highest prevalence simulation, where 50% of the population have immunity, higher sensitivity tests are of course effective at identifying those who are immune, and gets them back into the community much faster. however this is not the case currently in the uk because, as sir patrick stated, the prevalence of antibodies is likely to be very low at least during the lock-down. a more concerning story can be seen when considering the graphs in figure 6 . now we consider a range of antibody test specificities. going from 50% (no value in ruling people out) to 98%. when the prevalence is low, a lower specificity of 75% not only leads to an initial large increase in the number of infections, but also, if employed throughout the year would lead to repeated peaks. this is because the active virus testing would still be employed along side the antibody testing. falsely-diagnosed susceptible people leaving quarantine leads to a sharp rise in the number of infections. as the prevalence of virus in the non-quarantined population increases the active virus testing becomes more effective and subdues the rise in infections because the testing is more targeted on active virus cases. this would be followed by additional waves as further false positive tests for antibodies are observed. the number of people in quarantine with antibodies declines over the length of the simulation, so naturally the prevalence of immunity in the quarantined population declines. as the prevalence declines the n p v of the test declines. when we consider the bottom half of figure 6 and look at the impact on the proportion of the population able to leave quarantine, unlike previously, the number of false positives dominates when there is a lower specificity. so there are many more people leaving the quarantine, even when the prevalence is very low (0.1%). this may be desirable to some who favour increasing economic and social activity, but it is of course at the cost of further infections. decision makers and the public need to be aware of the trade-off being made. the dangers of neglecting uncertainties in medical diagnostic testing are pertinent to this decision [26] , particularly if immunity passports become prominent in the strategy to end the current social distancing measures. at this point, some form of incremental relaxation of the current government social distancing advice seems highly likely. this could take many forms, it could be an incremental restoration of certain activities such as school openings, permission for the reopening of some businesses, the relaxation of the stay-at-home messaging, etc. under the parameterisation chosen for this analysis the model is not sensitive to any particular policy change. we consider a variety of rates of phased relaxations to the current quarantine. to model these rates we consider a weekly incremental transition rate from q s to s, and q r to r. in figure 7 , three weekly transition rates have been applied: 1%, 5% and 10% of the quarantined population. whilst in practice the rate is unlikely to be uniform as decision makers would have the ability to update their timetable as the impact of relaxations becomes apparent, it is useful to illustrate the interaction of testing capacity and release rate. the model simulates these rates of transition for a year, with a sensitivity and specificity of 90% for active virus tests. the specifics of all the runs are detailed in table 2. figure 7 shows five analyses, with increasing capacity for the active virus tests. in each, the 3 incremental transition rates are applied with initial population split σ a τ a β γ q s s q i q i q r r 0.9 0.9 0.32 0.1 0.95 0.034 0.004 0.01 0.001 0.001 table 2 : fixed parameters used for figure 7 analysis. a range of disease prevalences in the population being tested. the p p v , as discussed in section 2.1, has a greater dependence on the prevalence (at lower values) in the tested population than it does on the sensitivity of the tests, the same is true of the specificity and the n p v . it is important to notice that higher test capacities cause a higher peak of infections for the 10% quarantine release rate. this has a counterintuitive explanation. when there is the sharpest rise in the susceptible population (i.e., high rate of transition), the virus rapidly infects a large number of people. when these people recover after two weeks they become immune and thus cannot continue the spread of the virus. however, when the infection testing is conducted with a higher capacity up to 120,000 units per day, these tests transition some active viral carriers into quarantine, so the peak is slightly delayed providing more opportunity for those released from quarantine later to be infected, leading to higher peak infections. this continues until the model reaches effective herd immunity after which the number of infected in the population decays very quickly. having higher testing capacities delays but actually worsens the peak number of infections. at 10% release rate, up to a capacity of testing of 120,000 these outcomes are insensitive to the prevalence of the disease in the tested population . this analysis indicates that the relatively fast cessation of social distancing measures and stay-home advice would lead to a large resurgence of the virus. testing capacity of the magnitude stated as the goal of the uk government would not be sufficient to flatten the curve in this scenario. at the rate of 5% of the population in lock-down released incrementally each week the infection peak is suppressed compared to the 10% rate. the number of infections would remain around this level for a significantly longer period of time, up to 6 months. there is negligible impact of testing below a capacity of 50,000 tests. however if the test capacity were 80,000 tests, at a quarantine release rate of 5% the duration of the elevated levels of infections would be reduced, reducing the length of necessary wide-scale social distancing. this effect is only observed with the more targeted tests, where a prevalence of the disease in the targeted population is over 30%. any less well targeted testing and the testing would have a negligible impact compared to the untested scenario. the 1% release rate scenario indicates that a slow release by itself is sufficient to lower peak infections, but extends the duration of elevated infections. the first graph of the top row in figure 7 shows that the slow release rate causes a plateau at a significantly lower number of infections compared to the other release rates. poorly targeted tested at capacities less than 100,000 show similar consistent levels of infections. however, with a targeted test having a prevalence of 30% or more, the 1% release rate indicates that even with 50,000 tests per day continuous suppression of the infection may be possible. this analysis does support the assertion that a bad test is worse than no tests, but a good test is only effective in a carefully designed strategy. more is not necessarily better and over estimation of the test accuracy could be extremely detrimental. this analysis is not a prediction; the numbers used in this analysis are estimates, and therefore, when such policies are devised and implemented this analysis would need to be repeated with more up-to-date numerical values. as such, the authors are not drawing firm conclusions about the absolute necessary capacity of tests. nor do they wish to make specific statements about the necessary sensitivity or specificity of tests or the recommended rate of release from quarantine. the authors do, however, propose some conclusions that would broadly apply to the present situation, and therefore believe they should be considered by policy makers when designing strategies to tackle covid-19. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 22, 2020. . 12 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 22, 2020. . diagnostic uncertainty can have a large effect on the epidemic dynamics of covid-19 within the uk. and, sensitivity, specificity, and the capacity for testing alone are not sufficient to design effective testing procedures. great caution should be exercised in the use of antibody testing. under the assumption that the proportion of people in the uk who have had the virus is still low, it's unlikely antibody testing at any scale will significantly support the end of lock-down measures. and, the negative consequences of un-targeted antibody screening at the population level could cause more harm than good. antibody testing, with a high specificity may be very useful on an individual basis, it certainly has scientific value, and could reduce risk for key workers. but any belief that these tests would be useful to relax lock-down measures for the majority of the population is misguided. at best it is a distraction, at worst it could be dangerous. the incremental relaxation to lock-down measures, with all else equal, would significantly dampen the increase in peak infections, by 1 order of magnitude with a faster relaxation, and 2 orders of magnitude with a slower relaxation. the capacity for infection screening needs to be significantly increased if it is to be used to relax quarantine measures, but only if it is well targeted, for example through effective contact tracing. untargeted mass screening would be ineffectual and may prolong the necessary implementation of lock-down measures. one interpretation of these results is that countries that had mass testing regimes early in the pandemic but had much lower case fatality rates may have been reporting a large number of false positives. the results of this paper may explain what is being observed in nations such as singapore as they continue to employ less-targeted mass testing and after a rapid cessation to their lock-down measures are now experiencing a second peak in infections [27] . this work has been partially funded by the epsrc iaa exploration award with grant number ep/r511729/1, epsrc programme grant "digital twins for improved dynamic design", ep/r006768/1, and the epsrc and esrc centre for doctoral training in quantification and management of risk and uncertainty in complex systems and environments, ep/l015927/1 . . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. sir models offer one approach to explore infection dynamics, and the prevalence of a communicable disease. in the generic sir model, there are s people susceptible to the illness, i people infected, and r people who are recovered with immunity. the infected people are able to infect susceptible people at rate β and they recover from the disease at rate γ [28] . once infected persons have recovered from the disease they are unable to become infected again or infect others. this may be because they now have immunity to the disease or because they have unfortunately died. figure 8 shows a schematic of the generic model formulation, and how people move between the states. figure 9 demonstrates the typical disease dynamics, the infected corresponding to the now well known curve that we are trying to flatten. the sir model has two ways in which the number of new infections falls to zero. either the number of susceptible people reduces to a point at which the disease can no longer propagate, perhaps because of a vaccine or natural immunity, or the epidemic stops if the basic reproduction rate of the disease falls below 1 due to social distancing or effective viral suppression. . . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted april 22, 2020. . https://doi.org/10.1101/2020.04. 16.20067884 doi: medrxiv preprint b binomial sir model the sir model used in this paper uses discrete-time binomial sampling for calculating movements of individuals between states. for a defined testing strategy, with an active viral test having sensitivity, specificity and capacity of σ a , τ a and c a respectively, an antibody test with sensitivity, specificity and capacity σ b , τ b and c b respectively and a testing prevalence of p, these rates are defined as follows: n a = min (c a , bin (s + i, ρ)) , (6a) n b = min (c b , bin (q s + q r , φ)) , (6b) t i = min (n a p, ip) , (6c) t s = min(s, n a − t i ), (6d) at each time step t, the model calculates the number of persons moving between each state in the order defined above. the use of a binomial model was prompted by a desire to incorporate both aleatory and epistemic uncertainty in each movement. the current approach does not make use of epistemic uncertainty, fixing the model parameters σ a , τ a , σ b , τ b , φ, ρ, c a , c b and p. a discrete time model was selected to allow for comparisons against available published data detailing recorded cases and recoveries on a day-by-day basis. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted april 22, 2020. . https://doi.org/10.1101/2020.04. 16.20067884 doi: medrxiv preprint covid-19: government announces moving out of contain phase and into delay phase scaling up our testing programmes. department of health and socal care impact of nonpharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand. imperial college covid-19 response team pm address to the nation on coronavirus the effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study. the lancet public health first-wave covid-19 transmissibility and severity in china outside hubei after control measures, and second-wave scenario planning: a modelling impact assessment. the lancet offline : covid-19 and the nhs -"a national scandal offline: covid-19-bewilderment and candour. the lancet coronavirus and the race to distribute reliable diagnostics why the coronavirus test gives so many false negatives difficulties in false negative diagnosis of coronavirus disease 2019: a case report assume you have the illness, even if you test negative are coronavirus tests accurate? world report developing antibody tests for sars-cov-2 labs scramble to produce new coronavirus diagnostics medical product alert no3/2020: falsified medical products, including in vitro diagnostics, that claim to prevent, detect, treat or cure covid-19 britain has millions of coronavirus antibody tests, but they don't work; 2020 molecular and antibody point-of-care tests to support the screening , diagnosis and monitoring of covid-19. oxford covid-19 evidence service point-of-care versus lab-based testing: striking a balance who director-general's opening remarks at the media briefing on covid-19 -16 why it's too early to start giving out "immunity passports"; 2020 immunity passports' could speed up return to work after covid-19 the covid-19 coronavirus is now a pandemic -can we ethically deal with lockdowns? we cannot leave our coronavirus exit strategy to the experts boris johnson and donald trump talk up potential 'game-changer' scientific advances on coronavirus calculated risks: how to know when numbers deceive you singapore coronavirus surge raises fears of post-lockdown breakouts key: cord-296306-xcomjvaa authors: rivett, lucy; sridhar, sushmita; sparkes, dominic; routledge, matthew; jones, nick k; forrest, sally; young, jamie; pereira-dias, joana; hamilton, william l; ferris, mark; torok, m estee; meredith, luke; curran, martin d; fuller, stewart; chaudhry, afzal; shaw, ashley; samworth, richard j; bradley, john r; dougan, gordon; smith, kenneth gc; lehner, paul j; matheson, nicholas j; wright, giles; goodfellow, ian g; baker, stephen; weekes, michael p title: screening of healthcare workers for sars-cov-2 highlights the role of asymptomatic carriage in covid-19 transmission date: 2020-05-11 journal: elife doi: 10.7554/elife.58728 sha: doc_id: 296306 cord_uid: xcomjvaa significant differences exist in the availability of healthcare worker (hcw) sars-cov-2 testing between countries, and existing programmes focus on screening symptomatic rather than asymptomatic staff. over a 3 week period (april 2020), 1032 asymptomatic hcws were screened for sars-cov-2 in a large uk teaching hospital. symptomatic staff and symptomatic household contacts were additionally tested. real-time rt-pcr was used to detect viral rna from a throat+nose self-swab. 3% of hcws in the asymptomatic screening group tested positive for sars-cov-2. 17/30 (57%) were truly asymptomatic/pauci-symptomatic. 12/30 (40%) had experienced symptoms compatible with coronavirus disease 2019 (covid-19)>7 days prior to testing, most self-isolating, returning well. clusters of hcw infection were discovered on two independent wards. viral genome sequencing showed that the majority of hcws had the dominant lineage b∙1. our data demonstrates the utility of comprehensive screening of hcws with minimal or no symptoms. this approach will be critical for protecting patients and hospital staff. despite the world health organisation (who) advocating widespread testing for sars-cov-2, national capacities for implementation have diverged considerably (who, 2020b; our world in data, 2020) . in the uk, the strategy has been to perform sars-cov-2 testing for essential workers who are symptomatic themselves or have symptomatic household contacts. this approach has been exemplified by recent studies of symptomatic hcws (hunter et al., 2020; keeley et al., 2020) . the role of nosocomial transmission of sars-cov-2 is becoming increasingly recognised, accounting for 12-29% of cases in some reports . importantly, data suggest that the severity and mortality risk of nosocomial transmission may be greater than for community-acquired covid-19 (mcmichael et al., 2020) . protection of hcws and their families from the acquisition of covid-19 in hospitals is paramount, and underscored by rising numbers of hcw deaths nationally and internationally (cook et al., 2020; cdc covid-19 response team, 2020) . in previous epidemics, hcw screening programmes have boosted morale, decreased absenteeism and potentially reduced long-term psychological sequelae (mcalonan et al., 2007) . screening also allows earlier return to work when individuals or their family members test negative (hunter et al., 2020; keeley et al., 2020) . another major consideration is the protection of vulnerable patients from a potentially infectious workforce (mcmichael et al., 2020) , particularly as social distancing is not possible whilst caring for patients. early identification and isolation of infectious hcws may help prevent onward transmission to patients and colleagues, and targeted infection prevention and control measures may reduce the risk of healthcare-associated outbreaks. the clinical presentation of covid-19 can include minimal or no symptoms (who, 2020a). asymptomatic or pre-symptomatic transmission is clearly reported and is estimated to account for around half of all cases of covid-19 (he et al., 2020) . screening approaches focussed solely on symptomatic hcws are therefore unlikely to be adequate for suppression of nosocomial spread. preliminary data suggests that mass screening and isolation of asymptomatic individuals can be an effective method for halting transmission in community-based settings (day, 2020) . recent modelling has suggested that weekly testing of asymptomatic hcws could reduce onward transmission by 16-23%, on top of isolation based on symptoms, provided results are available within 24 hr (imperial college covid-19 response team, 2020). the need for widespread adoption of an expanded screening programme for asymptomatic, as well as symptomatic hcws, is apparent (imperial college covid-19 response team, 2020; black et al., 2020; gandhi et al., 2020) . challenges to the roll-out of an expanded screening programme include the ability to increase diagnostic testing capacity, logistical issues affecting sampling and turnaround times and concerns about workforce depletion should substantial numbers of staff test positive. here, we describe how we have dealt with these challenges and present initial findings from a comprehensive staff screening programme at cambridge university hospitals nhs foundation trust (cuhnft). this has included systematic screening of >1000 asymptomatic hcws in their workplace, in addition to >200 symptomatic staff or household contacts. screening was performed using a validated real-time reverse transcription pcr (rt-pcr) assay detecting sars-cov-2 from combined oropharyngeal (op) and nasopharyngeal (np) swabs (sridhar et al., 2020) . rapid viral sequencing of positive samples was used to further assess potential epidemiological linkage where nosocomial transmission was suspected. our experience highlights the value of programmes targeting both symptomatic and asymptomatic staff, and will be informative for the establishment of similar programmes in the uk and globally. between 6 th and 24 th april 2020, 1,270 hcws in cuhnft and their symptomatic household contacts were swabbed and tested for sars-cov-2 by real-time rt-pcr. the median age of the hcws was 34; 71% were female and 29% male. the technical rt-pcr failure rate was 2/1,270 (0. 2% see materials and methods); these were excluded from the 'tested' population for further analysis. ultimately, 5% (n = 61) of swabs were sars-cov-2 positive. 21 individuals underwent repeat testing for a variety of reasons, including evolving symptoms (n = 3) and scoring 'medium' probability on clinical covid-19 criteria (tables 1-2) (n = 11). all remained sars-cov-2 negative. turn around time from sample collection to resulting was 12-36 hr; this varied according to the time samples were obtained. table 3 outlines the total number of sars-cov-2 tests performed in each screening group (hcw asymptomatic, hcw symptomatic, and hcw symptomatic household contact) categorised according to the ward with the highest anticipated risk of exposure to high; 'amber', medium; 'green', low; . in total, 31/1,032 (3%) of those tested in the hcw asymptomatic screening group tested sars-cov-2 positive. in comparison, 30/221 (14%) tested positive when hcw symptomatic and hcw symptomatic household contact screening groups were combined. as expected, symptomatic hcws and their household contacts were significantly more likely to test positive than hcws from the asymptomatic screening group (p<0. 0001, fisher's exact test). hcws working in 'red' or 'amber' wards were significantly more likely to test positive than those working in 'green' wards (p=0. 0042, fisher's exact test). all users of ffp3 masks underwent routine fit-testing prior to usage. cleaning and re-use of masks, theatre caps, gloves, aprons or gowns was actively discouraged. cleaning and re-use of eye protection was permitted for certain types of goggles and visors, as specified in the hospital's ppe protocol. single-use eye protection was in use in most scenario 1 and 2 areas, and was not cleaned and re-used. all non-invasive ventilation or use of high-flow nasal oxygen on laboratory-confirmed or elife digest patients admitted to nhs hospitals are now routinely screened for sars-cov-2 (the virus that causes covid-19), and isolated from other patients if necessary. yet healthcare workers, including frontline patient-facing staff such as doctors, nurses and physiotherapists, are only tested and excluded from work if they develop symptoms of the illness. however, there is emerging evidence that many people infected with sars-cov-2 never develop significant symptoms: these people will therefore be missed by 'symptomatic-only' testing. there is also important data showing that around half of all transmissions of sars-cov-2 happen before the infected individual even develops symptoms. this means that much broader testing programs are required to spot people when they are most infectious. rivett, sridhar, sparkes, routledge et al. set out to determine what proportion of healthcare workers was infected with sars-cov-2 while also feeling generally healthy at the time of testing. over 1,000 staff members at a large uk hospital who felt they were well enough to work, and did not fit the government criteria for covid-19 infection, were tested. amongst these, 3% were positive for sars-cov-2. on closer questioning, around one in five reported no symptoms, two in five very mild symptoms that they had dismissed as inconsequential, and a further two in five reported covid-19 symptoms that had stopped more than a week previously. in parallel, healthcare workers with symptoms of covid-19 (and their household contacts) who were self-isolating were also tested, in order to allow those without the virus to quickly return to work and bolster a stretched workforce. finally, the rates of infection were examined to probe how the virus could have spread through the hospital and among staff -and in particular, to understand whether rates of infection were greater among staff working in areas devoted to covid-19 patients. despite wearing appropriate personal protective equipment, healthcare workers in these areas were almost three times more likely to test positive than those working in areas without covid-19 patients. however, it is not clear whether this genuinely reflects greater rates of patients passing the infection to staff. staff may give the virus to each other, or even acquire it at home. overall, this work implies that hospitals need to be vigilant and introduce broad screening programmes across their workforces. it will be vital to establish such approaches before 'lockdown' is fully lifted, so healthcare institutions are prepared for any second peak of infections. clinically suspected covid-19 patients was performed in negative-pressure (à5 pascals) side rooms, with 10 air changes per hour and use of scenario 2 ppe. all other aerosol generating procedures were undertaken with scenario 2 ppe precautions, in negative-or neutral-pressure facilities. general clinical areas underwent a minimum of 6 air changes per hour, but all critical care areas underwent a minimum of 10 air changes per hour as a matter of routine. surgical operating theatres routinely underwent a minimum of 25 air changes per hour. viral loads varied between individuals, potentially reflecting the nature of the sampling site. however, for individuals testing positive for sars-cov-2, viral loads were significantly lower for those in the hcw asymptomatic screening group than in those tested due to the presence of symptoms (figure 1) . for the hcw symptomatic and hcw symptomatic contact screening groups, viral loads did not correlate with duration of symptoms or with clinical criteria risk score (figure 1 -figure supplement 1 and data not shown). three subgroups of sars-cov-2 positive asymptomatic hcw each individual in the hcw asymptomatic screening group was contacted by telephone to establish a clinical history, and covid-19 probability criteria ( table 1) were retrospectively applied to categorise any symptoms in the month prior to testing ( figure 2 ). one hcw could not be contacted to obtain further history. individuals captured by the hcw asymptomatic screening group were generally asymptomatic at the time of screening, however could be divided into three sub-groups: (i) hcws with no symptoms at all, (ii) hcws with (chiefly low-to-medium covid-19 probability) symptoms commencing 7 days prior to screening and (iii) hcws with (typically high covid-19 probability) symptoms commencing >7 days prior to screening ( figure 2 ). 9/12 (75%) individuals with symptom onset >7 days previously had appropriately self-isolated and then returned to work. one individual with no symptoms at the time of swabbing subsequently developed symptoms prior to being contacted with their positive result. overall, 5/1032 (0.5%) individuals in the asymptomatic screening group were identified as truly asymptomatic carriers of sars-cov-2, and 1/1032 (0.1%) was identified as pre-symptomatic. box 1 shows illustrative clinical vignettes. for the hcw asymptomatic screening group, nineteen wards were identified for systematic priority screening as part of hospital-wide surveillance. two further areas were specifically targeted for screening due to unusually high staff sickness rates (ward f), or concerns about appropriate ppe usage (ward q) ( figure 3 ). interestingly, in line with findings in the total hcw population, a significantly greater proportion of hcws working on 'red' wards compared to hcws working on 'green' wards tested positive as part of the asymptomatic screening programme ('green' 6/310 vs 'red' 19/ 372; p=0.0389, fisher's exact test). the proportion of hcw with a positive test was significantly higher on ward f than on other wards categorised as 'green' clinical areas (ward f 4/43 vs other 'green' wards 2/267; p=0.0040, fisher's exact test). likewise, amongst wards in the 'red' areas, ward q showed significantly higher rates of positive hcw test results (ward q 7/37 vs other 'red' wards 12/335; p=0.0011, fisher's exact test). ward f is an elderly care ward, designated as a 'green' area with scenario 0 ppe (tables 4-5) , with a high proportion of covid-19 vulnerable patients due to age and comorbidity. 4/43 (9%) ward staff tested positive for sars-cov-2. in addition, two staff members on this ward tested positive in the hcw symptomatic/symptomatic contact screening groups. all positive hcws were requested to self-isolate, the ward was closed to admissions and escalated to scenario 1 ppe ( table 5) . reactive screening of a further 18 ward f staff identified an additional three positive asymptomatic hcws (figure 4 ). sequence analysis indicated that 6/9 samples from hcw who worked on ward f belonged to sars-cov-2 lineage b. 1 (currently known to be circulating in at least 43 countries [rambaut et al., 2020] ), with a further two that belonged to b1. 7 and one that belonged to b2. 1. this suggests more than two introductions of sars-cov-2 into the hcw population on ward f (figure 4-figure supplements 1-2, table 6 ). it was subsequently found that two further staff members from ward f had previously been admitted to hospital with severe covid-19 infection. ward q is a general medical ward designated as a 'red' clinical area for the care of covid-19 positive patients, with a scenario 1 ppe protocol (tables 4-5). here, 7/37 (19%) ward staff tested positive for sars-cov-2. in addition, one staff member tested positive as part of the hcw symptomatic screening group, within the same period as ward surveillance. reactive screening of a further five staff working on ward q uncovered one additional infection. 4/4 sequenced viruses were of the b. 1 lineage (figure 4-figure supplements 1-2, table 6 ; other isolates could not be sequenced due to a sample ct value >30). all positive hcws were requested to self-isolate, and infection control and ppe reviews were undertaken to ensure that environmental cleaning and ppe donning/doffing practices were compliant with hospital protocol. staff training and education was provided to address observed instances of incorrect infection control or ppe practice. ward o, a 'red' medical ward, had similar numbers of asymptomatic hcws screened as ward f, and a similar positivity rate (4/44; 9%). this ward was listed for further cluster investigation after the study ended, however incorrect ppe usage was not noted during the study period. the majority of individuals who tested positive for sars-cov-2 after screening due to the presence of symptoms had high covid-19 probability ( table 7) . this reflects national guidance regarding self-isolation at the time of our study (uk government, 2020a). through the rapid establishment of an expanded hcw sars-cov-2 screening programme, we discovered that 31/1,032 (3%) of hcws tested positive for sars-cov-2 in the absence of symptoms. of 30 individuals from this asymptomatic screening group studied in more depth, 6/30 (20%) had not experienced any symptoms at the time of their test. 1/6 became symptomatic suggesting that the true asymptomatic carriage rate was 5/1,032 (0.5%). 11/30 (37%) had experienced mild symptoms prior to testing. whilst temporally associated, it cannot be assumed that these symptoms necessarily resulted from covid-19. these proportions are difficult to contextualise due to paucity of table 4 . the hospital's traffic-light colouring system for categorising wards according to anticipated covid-19 exposure risk. different types of ppe were used in each ( table 5) . red (high risk) amber (medium risk) green (low risk) areas with confirmed sars-cov-2 rt-pcr positive patients, or patients with very high clinical suspicion of covid-19 areas with patients awaiting sars-cov-2 rt-pcr test results, or that have been exposed and may be incubating infection areas with no known sars-cov-2 rt-pcr positive patients, and none with clinically suspected covid-19 point-prevalence data from asymptomatic individuals in similar healthcare settings or the wider community. for contrast, 60% of asymptomatic residents in a recent study tested positive in the midst of a care home outbreak (arons et al., 2020) . regardless of the proportion, however, many secondary and tertiary hospital-acquired infections were undoubtedly prevented by identifying and isolating these sars-cov-2 positive hcws. amber + red wards, for example intensive care unit, respiratory units with non-invasive ventilation facilities. all operating theatres, including facilities for bronchoscopy and endoscopy. 12/30 (40%) individuals from the hcw asymptomatic screening group reported symptoms > 7 days prior to testing, and the majority experiencing symptoms consistent with a high probability of covid-19 had appropriately self-isolated during that period. patients with covid-19 can remain sars-cov-2 pcr positive for a median of 20 days (iqr 17-24) after symptom onset (zhou et al., 2020) , and the limited data available suggest viable virus is not shed beyond eight days (wö lfel et al., 2020) . a pragmatic approach was taken to allowing individuals to remain at work, where the hcw had experienced high probability symptoms starting >7 days and 1 month prior to their test and had been well for the preceding 48 hr. this approach was based on the following: low seasonal incidence of alternative viral causes of high covid-19 probability symptoms in the uk (public health england, 2018), the high potential for sars-cov-2 exposure during the pandemic and the potential for prolonged, non-infectious shedding of viral rna (zhou et al., 2020; wö lfel et al., 2020) . for other individuals, we applied standard national guidelines requiring isolation for seven days from the point of testing (uk government, 2020b). however, for hcw developing symptoms after a positive swab, isolation was extended for seven days from symptom onset. our data clearly demonstrate that focusing solely on the testing of individuals fitting a strict clinical case definition for covid-19 will inevitably miss asymptomatic and pauci-symptomatic disease. this is of particular importance in the presence of falling numbers of community covid-19 cases, as hospitals will become potential epicentres of local outbreaks. therefore, we suggest that in the setting of limited testing capacity, a high priority should be given to a reactive asymptomatic screening programme that responds in real-time to hcw sickness trends, or (to add precision) incidence of positive tests by area. the value of this approach is illustrated by our detection of a cluster of cases in ward f, where the potential for uncontrolled staff-to-staff or staff-to-patient transmission could have led to substantial morbidity and mortality in a particularly vulnerable patient group. as sars-cov-2 testing capacity increases, rolling programmes of serial screening for asymptomatic staff in all box 1. clinical vignettes. self-isolation instructions were as described in table 2 . case 1: completely asymptomatic. hcw1 had recently worked on four wards (two 'green', two 'amber'). upon testing positive, she reported no symptoms over the preceding three weeks, and was requested to go home and self-isolate immediately. hcw1 lived with her partner who had no suggestive symptoms. upon follow-up telephone consultation 14 days after the test, hcw1 had not developed any significant symptoms, suggesting true asymptomatic infection. case 2: pre-symptomatic. hcw2 was swabbed whilst asymptomatic, testing positive. when telephoned with the result, she reported a cough, fever and headache starting within the last 24 hr and was advised to self-isolate from the time of onset of symptoms ( table 2) . her partner, also a hcw, was symptomatic and had been confirmed as sars-cov-2 positive 2 days previously, suggesting likely transmission of infection to hcw2. case 3: low clinical probability of covid hcw3 developed mild self-limiting pharyngitis three days prior to screening and continued to work in the absence of cough or fever. she had been working in' green' areas of the hospital, due to a background history of asthma. self-isolation commenced from the time of the positive test. hcw3's only contact outside the hospital, her housemate, was well. on follow-up telephone consultation, hcw3's mild symptoms had fully resolved, with no development of fever or persistent cough, suggesting pauci-symptomatic infection. case 4: medium clinical probability of covid hcw4 experienced anosmia, nausea and headache three days prior to screening, and continued to work in the absence of cough or fever. self-isolation commenced from the time of the positive test. one son had experienced a mild cough~3 weeks prior to hcw4's test, however her partner and other son were completely asymptomatic. upon follow-up telephone consultation 10 days after the test, hcw4's mild symptoms had not progressed, but had not yet resolved. case 5: high clinical probability of covid. hcw5 had previously self-isolated, and did not repeat this in the presence of new high-probability symptoms six days before screening. self-isolation commenced from the date of the new symptoms with the caveat that they should be completely well for 48 hr prior to return to work. all household contacts were well. however, another close colleague working on the same ward had also tested positive, suggesting potential transmission between hcws on that ward. areas of the hospital is recommended, with the frequency of screening being dictated by anticipated probability of infection. the utility of this approach in care-homes and other essential institutions should also be explored, as should serial screening of long-term inpatients. the early success of our programme relied upon substantial collaborative efforts between a diverse range of local stakeholders. similar collaborations will likely play a key role in the rapid, de novo development of comprehensive screening programmes elsewhere. the full benefits of enhanced hcw screening are critically dependent upon rapid availability of results. a key success of our programme has been bespoke optimisation of sampling and laboratory workflows enabling same-day resulting, whilst minimising disruption to hospital processes by avoiding travel to off-site testing facilities. rapid turnaround for testing and sequencing is vital in enabling timely response to localised infection clusters, as is the maintenance of reserve capacity to allow urgent, reactive investigations. there appeared to be a significantly higher incidence of hcw infections in 'red' compared to 'green' wards. many explanations for this observation exist, and this study cannot differentiate between them. possible explanations include transmission between patients and hcw, hcw-to-hcw transmission, variability of staff exposure outside the workplace and non-random selection of wards. it is also possible that, even over the three weeks of the study, 'red' wards were sampled earlier during the evolution of the epidemic when transmission was greater. further research into these findings is clearly needed on a larger scale. furthermore, given the clear potential for pre-symptomatic and asymptomatic transmission amongst hcws, and data suggesting that infectivity may peak prior to symptom onset (he et al., 2020) , there is a strong argument for basic ppe provision in all clinical areas. the identification of transmission within the hospital through routine data is problematic. hospitals are not closed systems and are subject to numerous external sources of infection. coronaviruses generally have very low mutation rates (~10 à6 per site per cycle) (sanjuán et al., 2010) , with the first reported sequence of the current pandemic only published on 12 th january 2020 (genbank, 2020). in addition, given sars cov-2 was only introduced into the human population in late 2019, there is at present a lack of diversity in circulating strains. however, as the pandemic unfolds and detailed epidemiological and genome sequence data from patient and hcw clusters are generated, realtime study of transmission dynamics will become an increasingly important means of informing disease control responses and rapidly confirming (or refuting) hospital acquired infection. importantly, implementation of such a programme would require active screening and rapid sequencing of positive cases in both the hcw and patient populations. prospective epidemiological data will also inform whether hospital staff are more likely to be infected in the community or at work, and may identify risk factors for the acquisition of infection, such as congregation in communal staff areas or inadequate access to ppe. our study is limited by the relatively short time-frame, a small number of positive tests and a lack of behavioural data. in particular, the absence of detailed workplace and community epidemiological data makes it difficult to draw firm conclusions with regards to hospital transmission dynamics. the low rate of observed positive tests may be partly explained by low rates of infection in the east of england in comparison with other areas of the uk (cumulative incidence 0.17%, thus far) (public health england, 2020). the long-term benefits of hcw screening on healthcare systems will be informed by sustained longitudinal sampling of staff in multiple locations. more comprehensive data will parametrise workforce depletion and covid-19 transmission models. the incorporation of additional information including staffing levels, absenteeism, and changes in proportions of staff self-isolating before and after the introduction of widespread testing will better inform the impact of screening at a national and international level. such models will be critical for optimising the impact on occupationally-acquired covid-19, and reducing the likelihood that hospitals become hubs for sustained covid-19 transmission. in the absence of an efficacious vaccine, additional waves of covid-19 are likely as social distancing rules are relaxed. understanding how to limit hospital transmission will be vital in determining (table 4) . hcws working across >1 ward were counted for each area. the left-hand y-axis shows the percentage of positive results from a given ward compared to the total positive results from the hcw asymptomatic screening group (blue bars). the right-hand y-axis shows the total number of sars-cov-2 tests (stars) and the number positive (pink circles). additional asymptomatic screening tests were subsequently performed in an intensified manner on ward f and ward q after identification of clusters of positive cases on these wards (figure 4) . asymptomatic screening tests were also performed for a number of individuals from other clinical areas on an opportunistic basis; none of these individuals tested positive. results of these additional tests are included in summary totals in table 1 , but not in this figure. infection control policy, and retain its relevance when reliable serological testing becomes widely available. our data suggest that the roll-out of screening programmes to include asymptomatic as well as symptomatic patient-facing staff should be a national and international priority. our approach may also be of benefit in reducing transmission in other institutions, for example carehomes. taken together, these measures will increase patient confidence and willingness to access healthcare services, benefiting both those with covid-19 and non-covid-19 disease. two parallel streams of entry into the testing programme were established and managed jointly by the occupational health and infectious diseases departments. the first (hcw symptomatic, and hcw symptomatic household contact screening groups) allowed any patient-facing or non-patientfacing hospital employee (hcw) to refer themselves or a household contact, including children, should they develop symptoms suggestive of covid-19. the second (hcw asymptomatic screening group) was a rolling programme of testing for all patient-facing and non-patient-facing staff working in defined clinical areas thought to be at risk of sars-cov-2 transmission. daily workforce sickness reports and trends in the results of hcw testing were monitored to enable areas of concern to be highlighted and targeted for screening and cluster analysis, in a reactive approach. high throughput clinical areas where staff might be exposed to large numbers of suspected covid-19 patients were also prioritised for staff screening. these included the emergency department, the covid-19 assessment unit, and a number of 'red' inpatient wards. staff caring for the highest priory 'shielding' patients (haematology/oncology, transplant medicine) were also screened, as were a representative sample of staff from 'amber' and 'green' areas. the personal protective equipment (ppe) worn by staff in these areas is summarised in table 5 . inclusion into the programme was voluntary, and offered to all individuals working in a given ward during the time of sampling. regardless of the table 6 continued on next page route of entry into the programme, the process for testing and follow-up was identical. wards were closed to external visitors. we devised a scoring system to determine the clinical probability of covid-19 based on symptoms from existing literature giacomelli et al., 2020; table 1 ). self-referring hcw and staff captured by daily workforce sickness reports were triaged by designated occupational health nurses using these criteria ( table 2) . self-isolating staff in the medium and low probability categories were prioritised for testing, since a change in the clinical management was most likely to derive from results. self-isolation and household quarantine advice was determined by estimating the pre-test probability of covid-19 (high, medium or low) in those with symptoms, based on the presence or absence of typical features (tables 1-2) . symptom history was obtained for all symptomatic hcws at the time of self-referral, and again for all positive cases via telephone interview when results became available. all individuals who had no symptoms at the time of testing were followed up by telephone within 14 days of their result. pauci-symptomatic individuals were defined as those with low-probability clinical covid-19 criteria ( table 2) . testing was primarily undertaken at temporary on-site facilities. two 'pods' (self-contained portable cabins with office, kitchen facilities, generator and toilet) were erected in close proximity both to the laboratory and main hospital. outside space was designed to enable car and pedestrian access, and ensure !2 m social distancing at all times. individuals attending on foot were given pre-prepared self-swabbing kits containing a swab, electronically labelled specimen tube, gloves and swabbing instructions contained in a zip-locked collection bag. pods were staffed by a team of re-deployed research nurses, who facilitated self-swabbing by providing instruction as required. scenario 1 ppe ( table 5 ) was worn by pod nurses at all times. individuals in cars were handed self-swabbing kits through the window, with samples dropped in collection bags into collection bins outside. any children (household contacts) were brought to the pods in cars and swabbed in situ by a parent or guardian. in addition to pod-based testing, an outreach hcw asymptomatic screening service was developed to enable self-swabbing kits to be delivered to hcws in their area of work, minimising disruption to the working routine of hospital staff, and maximising pod availability for symptomatic staff. lists of all staff working in target areas over a 24 hr period were assembled, and kits pre-prepared accordingly. self-swabbing kits were delivered to target areas by research nurses, who trained senior nurses in the area to instruct other colleagues on safe self-swabbing technique. kits were left in target areas for 24 hr to capture a full cycle of shift patterns, and all kits and delivery equipment were thoroughly decontaminated with 70% ethanol prior to collection. twice daily, specimens were delivered to the laboratory for processing. the swabbing, extraction and amplification methods for this study follow a recently validated procedure (sridhar et al., 2020) . individuals performed a self-swab at the back of the throat followed by the nasal cavity as previously described (our world in data, 2020). the single dry sterile swab was immediately placed into transport medium/lysis buffer containing 4m guanidine thiocyanate to table 7 . distribution of positive sars-cov-2 tests amongst symptomatic individuals with a positive test result, categorised according to test group and covid-19 symptom-based probability criteria (as defined in table 2 ). inactivate virus, and carrier rna. this facilitated bsl2-based manual extraction of viral rna in the presence of ms2 bacteriophage amplification control. use of these reagents and components avoided the need for nationally employed testing kits. real-time rt-pcr amplification was performed as previously described and results validated by confirmation of fam amplification of the appropriate controls with threshold cycle (ct) 36. lower ct values correspond to earlier detection of the viral rna in the rt-pcr process, corresponding with a higher copy number of the viral genome. in 2/1,270 cases, rt-pcr failed to amplify the internal control and results were discarded, with hcw offered a re-test. sequencing of positive samples was attempted on samples with a ct 30 using a multiplex pcr based approach (quick et al., 2017) using the modified artic v2 protocol (quick, 2020) and v3 primer set (artic network, 2020). genomes were assembled using reference based assembly and the bioinformatic pipeline as described (quick et al., 2017) using a 20x minimum coverage as a cut-off for any region of the genome and a 50.1% cut-off for calling of single nucleotide polymorphisms (snps). samples were sequenced as part of the covid-19 genomics uk consortium, cog-uk), a partnership of nhs organisations, academic institutions, uk public health agencies and the wellcome sanger institute. as soon as they were available, positive results were telephoned to patients by infectious diseases physicians, who took further details of symptomatology including timing of onset, and gave clinical advice ( table 2) . negative results were reported by occupational health nurses via telephone, or emailed through a secure internal email system. advice on returning to work was given as described in table 2 . individuals advised to self-isolate were instructed to do so in their usual place of residence. particularly vulnerable staff or those who had more severe illness but did not require hospitalisation were offered follow-up telephone consultations. individuals without symptoms at the time of testing were similarly followed up, to monitor for de novo symptoms. verbal consent was gained for all results to be reported to the hospital's infection control and health and safety teams, and to public health england, who received all positive and negative results as part of a daily reporting stream. swab result data were extracted directly from the hospital-laboratory interface software, epic (verona, wisconsin, usa). details of symptoms recorded at the time of telephone consultation were extracted manually from review of epic clinical records. data were collated using microsoft excel, and figures produced with graphpad prism (graphpad software, la jolla, california, usa). fisher's exact test was used for comparison of positive rates between groups defined in the main text. mann-whitney testing was used to compare ct values between different categories of tested individuals. hcw samples that gave sars cov-2 genomes were assigned global lineages defined by rambaut et al., 2020 using the pangolin utility (o'toole and mccrone, 2020). as a study of healthcare-associated infections, this investigation is exempt from requiring ethical approval under section 251 of the nhs act 2006 (see also the nhs health research authority algorithm, available at http://www.hra-decisiontools.org.uk/research/, which concludes that no formal ethical approval is required). written consent was obtained from each hcw described in the anonymised case vignettes. the citiid-nihr covid-19 bioresource collaboration ravi gupta harmeet gill; iain kean; mailis maes; nicola reynolds; michelle wantoch; sarah caddy anita furlong nathalie kingston; sofia papadia anne meadows naidine escoffery; heather jones; carla ribeiro nick brown; surendra parmar ; hongyi zhang; ailsa bowring; geraldine martell; natalie quinnell stefan grä f aloka de sa; maddie epping; andrew hinch conceptualization, data curation, formal analysis, investigation, methodology, project administration, writing -review and editing conceptualization, data curation, formal analysis, validation, methodology, project administration, writing -review and editing data curation, formal analysis, writing -original draft, project administration, writingreview and editing writing -original draft, project administration, writingreview and editing data curation, investigation, methodology, writing -original draft, project administration, writing -review and editing data curation, validation data curation, formal analysis, investigation data curation, writing -original draft conceptualization, writing -original draft, project administration, writing -review and editing data curation, supervision, writing -review and editing; the citiid-nihr covid-19 bioresource collaboration, conceptualization, data curation, formal analysis, funding acquisition, investigation, writing -original draft data curation, software; ashley shaw, supervision, project administration project administration, writing -review and editing data curation, formal analysis, supervision, project administration, writing -review and editing conceptualization, data curation, formal analysis, methodology, writing -original draft, project administration, writing -review and editing writing -original draft, project administration, writing -review and editing author orcids lucy rivett ethics human subjects: as a study of healthcare-associated infections, this investigation is exempt from requiring ethical approval under section 251 of the nhs act 2006 (see also the nhs health research authority algorithm presymptomatic sars-cov-2 infections and transmission in a skilled nursing facility artic network. 2020. artic-ncov2019 / primer_schemes covid-19: the case for health-care worker screening to prevent hospital transmission characteristics of health care personnel with covid-19 -united states exclusive: deaths of nhs staff from covid-19 analysed covid-19: identifying and isolating asymptomatic people helped eliminate virus in italian village asymptomatic transmission, the achilles' heel of current strategies to control covid-19 wuhan seafood market pneumonia virus isolate wuhan-hu-1 complete genome self-reported olfactory and taste disorders in sars-cov-2 patients: a crosssectional study temporal dynamics in viral shedding and transmissibility of covid-19 first experience of covid-19 screening of health-care workers in england report 16: role of testing in covid-19 control roll-out of sars-cov-2 testing for healthcare workers at a large nhs foundation trust in the united kingdom immediate and sustained psychological impact of an emerging infectious disease outbreak on health care workers epidemiology of covid-19 in a long-term care facility in king county software package for assigning sars-cov-2 genome sequences to global lineages to understand the global pandemic, we need global testing -the our world in data covid-19 testing dataset surveillance of influenza and other respiratory viruses in the uk to coronavirus (covid-19) in the uk 2020 multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples ncov-2019 sequencing protocol v2 a dynamic nomenclature proposal for sars-cov-2 to assist genomic epidemiology viral mutation rates a blueprint for the implementation of a validated approach for the detection of sars-cov2 in clinical samples in academic facilities stay at home advice covid-19: management of exposed healthcare workers and patients in hospital settings clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in report of the who-china joint mission on coronavirus disease who. 2020b. covid-19 strategy update virological assessment of hospitalized patients with covid-2019 clinical course and risk factors for mortality of adult inpatients with covid-19 in wuhan, china: a retrospective cohort study additional files . source data 1. asymptomatic sars-cov-2 screening programme source data.. transparent reporting form sequencing data have been deposited in gsaid under accession codes epi_isl_433989-epi_-isl_433992, epi_isl_434005, epi_isl_433489-epi_isl_433497. researchers will be prompted to register and log on to the website to access the datasets (https://www.epicov.org/epi3/ frontend#1f1442). key: cord-343340-zi0rfidc authors: aragón‐caqueo, diego; fernández‐salinas, javier; laroze, david title: optimization of group size in pool testing strategy for sars‐cov‐2: a simple mathematical model date: 2020-05-03 journal: j med virol doi: 10.1002/jmv.25929 sha: doc_id: 343340 cord_uid: zi0rfidc coronavirus disease (covid‐19) has reached unprecedented pandemic levels and is affecting almost every country in the world. ramping up the testing capacity of a country supposes an essential public health response to this new outbreak. a pool testing strategy where multiple samples are tested in a single reverse transcriptase‐polymerase chain reaction (rt‐pcr) kit could potentially increase a country's testing capacity. the aim of this study is to propose a simple mathematical model to estimate the optimum number of pooled samples according to the relative prevalence of positive tests in a particular healthcare context, assuming that if a group tests negative, no further testing is done whereas if a group tests positive, all the subjects of the group are retested individually. the model predicts group sizes that range from 11 to 3 subjects. for a prevalence of 10% of positive tests, 40.6% of tests can be saved using testing groups of four subjects. for a 20% prevalence, 17.9% of tests can be saved using groups of three subjects. for higher prevalences, the strategy flattens and loses effectiveness. pool testing individuals for severe acute respiratory syndrome coronavirus 2 is a valuable strategy that could considerably boost a country's testing capacity. however, further studies are needed to address how large these groups can be, without losing sensitivity on the rt‐pcr. the strategy best works in settings with a low prevalence of positive tests. it is best implemented in subgroups with low clinical suspicion. the model can be adapted to specific prevalences, generating a tailored to the context implementation of the pool testing strategy. in late december of 2019, several cases of pneumonia of apparent viral origin were reported in wuhan, china. 1,2 subsequently, a novel coronavirus was identified as the causative pathogen, 3 this new pathogen was identified as severe acute respiratory syndrome coronavirus 2 (sars-cov-2). the disease (coronavirus disease ) rapidly spread to neighboring countries and overseas, reaching pandemic proportions and was declared by the world health organization (who) as a public health emergency of international concern on 30 january, 2020. 4 as of 19 april, 2020, the who has reported 2 241 359 confirmed cases with 152 551 deaths worldwide, 5 a total of 185 countries affected, while 10 still remain with no reported cases. 6 the main diagnostic test that has been implemented worldwide to confirm the infection by this novel coronavirus is the real-time reverse transcriptase-polymerase chain reaction (rt-pcr) from respiratory samples with satisfactory levels of sensibility and specificity. 7 however, there might be other clinical specimens where the virus could be detected as well, using the same technique. [8] [9] [10] the procedure takes about a day to come up with a result 11 ; however, more efficient methods are being developed as the pandemic progresses. a crucial part of the public health response to this new threat is to rapidly diagnose and isolate infected individuals to prevent further spreading. 12, 13 therefore, amplifying the testing capacity of a country experiencing a massive outbreak, is a key strategy for facing this new public health emergency. 14 nowadays, the united states is the country with a greater number of confirmed cases worldwide and performs as of 19 april, 2020, 167 330 tests daily, with a total of 3 865 864 tests performed since the beginning of the outbreak 15 with all states currently testing. 16 other largely affected countries are also performing thousands of confirmatory tests on a daily basis. 17 however, due to the overwhelming number of rapidly growing cases, a considerably large number of suspected cases cannot be properly tested and isolated due to the lack of logistics of a progressively collapsing healthcare system. therefore, it becomes urgent to optimize the standard operating procedures to confirm the infection by sars-cov-2. 18 since the clinical presentation of the disease is often mild or asymptomatic, 19, 20 and that it has been reported that asymptomatic individuals could transmit the virus, 21, 22 it becomes crucial to implement an efficient testing strategy to screen that population and properly isolate them to prevent the further spread of the virus. however, as the healthcare systems around the world are progressively collapsing due to the increasing demand of moderate to severe patients that every day present to the emergency room, the testing of individuals with low clinical suspicion has been left behind, in order to prioritize the available resources for the patients with moderate to severe symptoms. although it becomes quite logical to prioritize testing for patients with higher clinical suspicion, there is a considerable segment of the population that is not being screened and become vectors of the virus, contributing even more to the spread of the disease and further collapse the healthcare system with the new cases yet to come. 23 on the other hand, as proposed by seifried and ciesek, 24 25 however, some studies suggest that the pooling of the sample should be kept as low as possible to reduce dilution and maintain the sensitivity of the test. 26, 27 since the scope of this strategy could potentially increase multiple times the testing capacity of a country, it becomes prudent to explore how to optimize the implementation of it in the healthcare setting. therefore, the aim of this study is to provide a mathematical model to estimate the optimum number of pooled samples according to the specific prevalences of positive tests in a particular country context, in order to save as many tests as possible and cover as many people as possible, knowing that if a group tests out positive, all the individuals of the sample would have to be individually tested. it is important to highlight that this model is based on the prevalence of positive tests and can be adapted to each country's specific prevalence. however, it is best implemented for countries with a large number of confirmed cases and relatively large number of tests performed on a daily basis, since more data on the specific prevalence of positive yielding results are available and more accurate estimations can be done based on this; rather than countries with a low number of confirmed cases or where the implementation of testing the population has not been the most adequate. the manuscript is arranged in the following way: in section 2, the materials and methods are introduced. in section 3, the results are given together with the discussion. finally, the final remarks are presented in section 4. thoughtful description of the process and reasoning for obtaining a formula that represents the benefit of performing a pool test of the most optimum size assuming in advance that if a group tests out positive, all the subjects in the group have to be individually tested, in order to track down the positive case or cases, while if a group tests out negative, then no further testing in that specific group is needed. all the computations were performed with the software wolfram mathematica. 28 considering that the sample of each suspected individual tested for the infection of sars-cov-2 with the rt-pcr could yield either a negative or positive result, and that performing a pool testing strategy could yield a negative result only when all the samples included in the pool sample are negative, and that it will yield a positive result when at least one of the individual samples is positive, the possible diagnostic scenarios for the pool test can be expressed by the binomial expression of where x represents the probability of subjects with an individual positive test (prevalence of positives), y represents the probability of subjects with an individual negative test (prevalence of negatives), and n is size of the pool group. such that n > 1, 0 < x < 1, and y = 1 − x. under these assumptions, we obtain that d = 1. note that the breakdown of this expression will hold all the possible events. this will be represented by the addends, and the combination present in these will be determined by x and y and its respective exponent, which will indicate the number of subjects with a positive or negative sample, respectively. the distribution of the possibilities will depend on the prevalence of the disease, in this case, being the percentage of positive test results obtained from the recent historical data available. for this reason, the probability of each expressed event occurring, will be determined by the substitution of x and y by the respective prevalences of positive and negative tests. now, let us separate equation (1) in two parts x y x y y y 1 . here, the negative groups for the pool test and its probability will be represented by y n , while the pool tests that yield a positive will correspond to all the other cases where there is at least one individual positive sample in the pool, having, therefore, a 1 − y n probability of becoming true. to facilitate the use of equation (3), it will be expressed as a function of x, which relates to the direct prevalence of positive historical testing for each country, so that it can be inputted in the equation. therefore, considering that every time a pool test yields a negative result, no further testing will be performed to that group, the saved tests of the otherwise individually tested subjects, will be ex to obtain the optimum group size given the prevalence of positive tests (x) in a determined setting, the minimal global of equation (5) must be obtained. this minimum value is calculated using x as the input, because x is a continuous variable, while n is a discrete one. let us remark that, knowing the average minimum number of tests per subject needed to diagnose one subject, then the population covered by one test using a pool testing strategy according to the optimal pool size previously calculated (and addressing the fact that when a group yields a positive result, the whole group has to be individually tested), can be expressed as = /z subjects covered per test 1 . with the model proposed above, different scenarios were tested according to different prevalences of positive tests. this was done to address the fact that each country presents a unique distribution of daily performed tests and positive results. according to this, the optimum size and average minimum tests per subject to detect a positive for the diverse chosen prevalence scenarios were calculated. then, it was further compared to the individual testing strategy and how many more positive results could be detected using pool testing, f i g u r e 1 contour plot of the average minimum number of tests per subject to diagnose one subject. horizontal-axis: prevalence of positive tests, x, the interval ranges from 0 to 0.4. vertical-axis: group size, n. the interval ranges from 2 to 100. the average minimum number of tests per subject to diagnose one subject is represented by the colors, where higher and better values go from green to orange, being orange the closest to the optimum with the same amount of tests, thus, addressing the efficiency of the strategy over individual testing, as shown on table 1 . on the other hand, given the optimum group sizes calculated for the chosen prevalence scenarios, the population covered by a 100 tests was calculated using the average minimum test per subject to detect a positive, and was compared the 100 subjects that an individual testing strategy would cover, as it is exposed in table 2 . as exposed in the results, the lower the prevalence of positive tests for a particular country is, the more tests that can be saved and the larger the pool groups will be. from prevalences ranging from 0.03 to 0.07, the testing capacity of a country using a pool testing strategy is increased by a factor of two or by a factor three, rather than using individual testing. this could bring unprecedented advances in better understanding the disease and how it distributes on a particular population. from prevalence ranging from 0.08 to 0.2, the net saving of test kits using pool testing strategy is still significant, saving around 46.6% to 17.9% of the tests if an individual testing strategy were to be performed in the same number of subjects, thus, covering a greater portion of the population. however, as prevalence rises, the efficiency of the strategy flattens. reaching a prevalence over 0.25, the net saving of tests is still significant. however, separating the samples, creating pool groups, tracking individuals in the groups that yielded a positive result, and retesting all those subjects individually, suppose logistical challenges that every healthcare center must weigh to implement this strategy over the most likely already implemented individual testing strategy. finally, reaching a prevalence near 0.3, the pool testing strategy becomes similar to the individual testing strategy, thus losing its effectiveness and becoming a logistical problem, rather than optimizing the testing protocols. this is mainly because in the model proposed, whenever a group tests positive, all the individuals of the group should get retested to track the positive subject or subjects in the pooled sample. therefore, the more positive individuals there are in the population, the highest positive pool tests there will be. thus, more tests will be lost, and more tests will be used in retesting the positive pool samples. notice that for large positive groups, further subgrouping and pool testing of those subgroups could be implemented. this could potentially save even more tests; however, it is believed that this approach might suppose a difficult logistical challenge that the progressively collapsing healthcare systems worldwide might not be able to cope for now. as of 19 april 2020, most countries have prevalences of positive tests that range around 0.1 to 0.2 of all the tests daily performed 29 so a pool testing strategy is still a plausible strategy to implement on a national level. however, for the analysis, the overall historical prevalence was used as a country scenario, but when subjects are further stratified according to clinical suspicion, a lower prevalence of positive tests are expected in lower clinical suspicion groups, so the pool testing strategy could be best implemented in this stratified subgroup rather than the whole population. as it was previously exposed, lower prevalence of positive tests, show greater efficiency in the test use, however, with larger group sizes. one of the main critiques to the pool testing strategy, is the dilution that occurs when pooling the samples together, and how this dilution might affect the test sensitivity. previous studies have shown that there is no decrease in sensitivity for rt-pcr in detecting other viruses when using pool samples of 10 and 20 subjects, 30 however, as far as the available evidence on sars-cov-2 show, samples of five subjects do not affect sensibility of rt-pcr for detecting the virus. 24 the model proposes optimum group numbers that range from 11 to 3 subjects, depending on the individual prevalence. this exquisitely copes with the possibility that larger groups might decrease rt-pcr sensitivity due to the dilution of the pooled sample and it has been proposed that to effectively implement poll testing strategy, the pooled samples should be kept as low as possible. 26, 27 further developing on this, the model predicts optimum groups of four and three subjects for the prevalence of positive tests that range from 0.1 to 0.2, which are the prevalence that most countries are reporting nowadays. therefore, it adapts to the clinical reality that the frontline workers all over the world are experiencing on a daily basis. this article proposed a simple and landed model to estimate the most optimum group number to implement pool testing strategy for sars-cov-2, according to the specific historical positive tests prevalence for a determined healthcare context. the aim of this model is to be implemented in different levels of healthcare facilities fighting the pandemic, given its flexibility to estimate the optimum group number, according to specific prevalence. these particular prevalences might differ from a healthcare facility to another, from one a city to another and might also differ from the country's overall outbreak status. therefore, it helps to create a tailored to the context implementation of the pool testing strategy for testing individuals with suspected infection by sars-cov-2. one of the main limitations of this study is that it assumes that the rt-pcr for detecting sars-cov-2 has a 100% sensitivity to the viral arn, when the evidence available shows sensitivity to be around 70%. 31 however, astonishing work is currently being done to improve test sensitivity; and addressing this non perfect sensitivity would greatly increase the complexity of the model. finally, it is worth mentioning the social implications that implementing pool testing might have. as the pandemic grows and more people get tested, implementing this testing strategy might not be well received by the general public, since patients most likely will want to know if their particular test yielded a positive or a negative result as soon as possible and will likely not accept their particular sample to be mixed with other samples. therefore, it becomes crucial to develop a strong public health policy to inform the population, secure equal access, and best implement the strategy for the greater good. the authors are thankful to dr ricardo segovia, md (hospital a pneumonia outbreak associated with a new coronavirus of probable bat origin the proximal origin of sars-cov-2 genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding world health organization. statement on the second meeting of the international health regulations (2005) emergency committee regarding the outbreak of novel coronavirus (2019-ncov) situation report -90. geneva: who johns hopkins coronavirus resource center. covid-19 map detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr detection of sars-cov-2 in different types of clinical specimens detection of sars-cov-2 by rt-pcr in anal from patients who have recovered from coronavirus disease 2019 the presence of sars-cov-2 rna in feces of covid-19 patients reverse-transcription pcr (rt-pcr) disease control, civil liberties, and mass testing-calibrating restrictions during the covid-19 pandemic improved early recognition of coronavirus disease-2019 (covid-19): single-center data from a shanghai screening hospital laboratory testing strategy recommendations for covid-19 centers for disease control and prevention, cdc. testing in the u.s. available from to understand the global pandemic, we need global testing -the our world in data our world in data combination of rt-qpcr testing and clinical features for diagnosis of covid-19 facilitates management of sars-cov-2 outbreak characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72314 cases from the chinese center for disease control and prevention the clinical feature of silent infections of novel coronavirus infection (covid-19) in wenzhou transmission of 2019-ncov infection from an asymptomatic contact in germany epidemiological analysis of covid-19 and practical experience from china impact of nonpharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand pool testing of sars-cov-02 samples increases worldwide test capacities many times over. aktuelles aus der goethe universitt frankfurt pooling rt-pcr or ngs samples has the potential to cost-effectively generate estimates of covid-19 prevalence in resource limited environments optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests data. covid-19: confirmed cases vs. tests conducted evaluation of saliva pools method for detection of congenital human cytomegalovirus infection antibody responses to sars-cov-2 in patients of novel coronavirus disease 2019 optimization of group size in pool testing strategy for sars-cov-2: a simple mathematical model the authors declare that there are no conflict of interests. http://orcid.org/0000-0001-7233-960x key: cord-289461-bnusv816 authors: droste, m. c.; stock, j.; atkeson, a. title: economic benefits of covid-19 screening tests date: 2020-10-27 journal: nan doi: 10.1101/2020.10.22.20217984 sha: doc_id: 289461 cord_uid: bnusv816 we assess the economic value of screening testing programs as a policy response to the ongoing covid-19 pandemic. we find that the fiscal, macroeconomic, and health benefits of rapid sars-cov-2 screening testing programs far exceed their costs, with the ratio of economic benefits to costs typically in the range of 4-15 (depending on program details), not counting the monetized value of lives saved. unless the screening test is highly specific, however, the signal value of the screening test alone is low, leading to concerns about adherence. confirmatory testing increases the net economic benefits of screening tests by reducing the number of healthy workers in quarantine and by increasing adherence to quarantine measures. the analysis is undertaken using a behavioral sir model for the united states with 5 age groups, 66 economic sectors, screening and diagnostic testing, and partial adherence to instructions to quarantine or to isolate. the effectiveness of a screening testing program hinges on whether those who test positive adhere to the instruction to self-isolate. using survey data from the united kingdom covering march through august 2020, smith et al. (2020) found that, of individuals reporting covid symptoms, only 18% report self-isolating; among those who were told by the national health service that they had been in close contact with a confirmed covid-19 case, only 11% reported quarantining for the recommended 14 days. these findings suggest that adherence will be low in response to other signals that have low information contentin particular, a low positive predictive value (ppv) 2about whether the individual is actually infected. we therefore allow the rate of adherence to depend on the specificity of the screening test. table 1 presents results for three representative testing programs. the programs are calibrated to existing or proposed tests and are designed to be representative of ones that might be deployable with adequate resources and effort. for cost-benefit purposes, we assume all incremental testing is federally funded. panel a considers a $5 screening test with 97.1% sensitivity and 98.5% specificity, 3 in which half of those who test positive on the screening test take a $50 confirmatory pcr test with a 48hour mean turnaround time. we suppose that adherence is high (75%) for those with a positive pcr test and low (25%) for those testing positive on the screening test but not taking a confirmatory pcr test. for random population screening testing at a weekly frequency, the total incremental cost of the program is $51 billion over the june 1 -december 31, 2020 simulation period. using our epidemiological-economic model, we project 66,000 deaths averted, an increase in gdp of $248 billion, and an increase in federal tax revenues of $68 billion over the counterfactual period of the program, june 1 -december 31, 2020, relative to a baseline with diagnostic but not screening testing. table 1 modifies this screening program so that everyone testing positive on the screening test receives a confirmatory pcr test. at a weekly testing cadence, this increases demand for pcr tests by approximately 630,000 tests per day, relative to the no-screening baseline. testing costs are somewhat higher, but because the effective adherence rate is higher under this program than in panel a, deaths averted rise to 153,000 and the increase in gdp is larger, $544 billion, for weekly testing. 2 the positive predictive value is the probability of being infected conditional on testing positive. by bayes law, the ppv depends on the specificity and sensitivity of the test and on the population rate of infection. 3 these costs and accuracy rates are those of the abbot laboratories binaxnow tm antigen test (fda (2020) ). additional estimates of test performance and costs are available in table 2 of silcox et. al. (2020) . . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 notes: counterfactual simulations suppose that the testing program was put in place on june 1, 2020. the simulations end december 31, 2020. entries are relative to a baseline with diagnostic testing at rates comparable to the summer of 2020 and no screening testing. deaths are as of january 1, 2021, and monetary values are current dollars for june 1, 2020 -december 31, 2020. panel c considers a different screening testing program, with two-step testing with a combined two-step specificity of 99.7%. as an illustration, one way to achieve this specificity is by independent two-step rapid antigen tests. the first step is a low-specificity (80%) $2 test, for example an inexpensive hypothetical paper-strip antigen test; if there is a positive test, the confirmatory test is the $5, 98.5% specificity test used in panels a and b. 4 there is no confirmatory pcr testing. we suppose the 99.7% specificity evokes a 50% adherence rate. the incremental testing costs under this program are less than for programs a or b. in part because turnaround is rapid, it averts 118,000 deaths and increases gdp by $397 billion at a weekly testing frequency, despite assumed lower adherence than to a pcr-based regime. these results and the additional sensitivity analysis below lead to four main conclusions. first, even with partial compliance, screening testing induces large net economic benefits. for the cases in table 1 , economic benefits exceed costs by a factor of 5-10 for weekly testing. if all the tests were paid for by the federal government, the additional tax revenues generated by the induced gdp growth would more than pay for the testing costs. net benefits rise if one additionally monetizes deaths averted using a statistical value of life. second, the signal value of a single positive screening test is low: in our simulations, the ppv is typically less than 5% for a test with 98.5% specificity. this low signal value could lead to low adherence and, among those who do adhere, imposes economic costs because of healthy workers isolating. introducing confirmatory testing into the program increases the signal value, reduces unnecessary isolation, and arguably would lead to greater adherence, increasing net benefits. third, screening test sensitivity is of secondary importancea finding that is consistent with, for example, larremore et al (2020) and paltiel, zheng, and walensky (2020) . for example, the results in table 1 are very similar if screening test sensitivity is reduced from 97.1% to 85%: even at 85% sensitivity, the vast majority of the tested infected are detected and, if they adhere, isolated. fourth, we find that targeting testing to younger and middle-aged adults can improve both economic and mortality outcomes, holding constant the number of screening tests. although targeting those ages retards activity by sending workers into isolation, it breaks the chain of transmission to the elderly. the model is summarized in section 2. section 3 presents the results for uniform testing, and age-based testing is examined in section 4. related literature. this paper is related to a growing literature synthesizing epidemiological models of disease transmission with macroeconomic dynamics, 5 some of which considers testing and quarantine. berger, herkenhoff, and mongey (2020) consider the effects of testing and quarantine in a sir model with a single perfect test and imperfect adherence to quarantine; the 5 an early focus of this literature concerned the macroeconomic and epidemiological effects of lockdown and re-opening policies. eichenbaum, rebelo, trabant (2020a) augment a standard new keynesian macroeconomic model with a sir-type model of disease transmission, characterize the relationship between consumption/labor supply decisions and disease transmission, and study the effects of simple lockdown policies. acemoglu, chernozhukov, werning, and whinston (2020) study a multi-group sir model where infection, hospitalization, and fatality rates vary between groups and characterize optimal age-varying lockdown policies. this literature has expanded to include other non-pharmaceutical interventions, see baqaee, farhi, mina, and stock (2020a) . see bfms (2020b) for additional references. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 authors show that testing can reduce the severity of lockdowns required to achieve a given reduction in the spread of disease. cherif and hasanov (2020) study the costs and returns of a test-and-quarantine strategy in a sir model, paying particular attention to 'smart' testing strategies that take advantage of spatial heterogeneity in disease prevalence and population density. brotherhood, kircher, santos, and tertilt (2020) and eichenbaum, rebelo, and trabandt (2020b) consider age-varying diagnostic testing and quarantine; in both models, the role of testing is primarily to resolve individual uncertainty about infection status. other papers that address testing, contact tracing, and/or quarantine are acemoglu, makhdoumi, malekian, and ozdaglar (2020) , augenblick, obermeyer, kolstad, and wang (2020) , bmfs (2020b), gans (2020) , and piguillem and shi (2020) . a closely related paper in the epidemiological literature is paltiel, zheng and walensky (2020) consider college coronavirus testing and incorporate costs of tests and of housing the quarantined. also see, among others, larremore et al (2020) , taiaple, romer, and linnarsson (2020) and peto et al (2020) . relative to this literature, our main contribution is to provide carefully calibrated and estimated model for assessing the net economic, fiscal, and total (including mortality) benefits of multistep imperfect screening testing in conjunction with diagnostic testing. by combining a 66-sector economic model with a five-age behavioral sir model, we are able to consider age-based strategies and the effect of temporary isolation on employment and output. our starting point is the bfms behavioral sir model, which connects a sir model of disease transmission to economic activity. the bfms model has five age groups (ages 0-19, 20-44, 45-64, 65-74, and 75+) and 66 private economic sectors plus the government. pre-pandemic contact matrices are estimated from polymod (mossong et. al. (2017) ) for work, home, and other activities. work contact matrices vary by sector depending on sectoral worker proximity (mongey, pilossoph and weinberg (2020) ). the behavioral aspect of the model arises from a feedback rule in which activity depends on the current weekly death rate and the slope of the weekly death rate. in addition, the behavioral rule has a lockdown-fatigue component in which high unemployment rates and cumulative past unemployment rates contribute (all else equal) to a desire to resume activity. epidemiological parameters, including age-based death rates, are taken from the epidemiological literature, from the cdc, or, for the transmission rate and initial infection rate, estimated from us data on daily deaths. for additional details, see appendix 1 and bfms. this paper extends the bfms model to incorporate explicit screening and diagnostic testing with partial adherence. the key elements of this extension are: . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 1. individuals are selected at random, at a daily rate , for rapid screening testing. 2. a fraction ν of individuals testing positive in the screening test take a confirmatory diagnostic (pcr) test; the remaining fraction 1-ν of those who test positive are instructed to self-isolate. 3. symptomatic individuals can receive a diagnostic test. 4. those awaiting diagnostic test results are instructed to quarantine. 5. the isolation pool consists of those with a "terminal" positive test result: a positive test among the pcr-tested symptomatic, a positive pcr test among the fraction ν of screening-test positives who take a confirmatory test, or a positive screening test among the fraction 1-ν who do not. 6. adherence to instructions to quarantine or to isolate is partial. the extended sir model is illustrated in figure 1 (equations are given in appendix 1). the horizontal flows represent the disease progression from susceptible to exposed to infected to recently recovered to fully recovered, or from infected to deceased. the distinction between recently recovered and fully recovered is that the recently recovered test positive on a pcr test but are not contagious, e.g. see larremore et al. (2020) . the screening test is assumed to be less sensitive than the pcr test so detects the virus among the infected but not among the recently recovered. those instructed to isolate enter the isolation compartment (q, to distinguish it from the infected i). the extent to which they adhere with that instruction depends on whether they arrived by testing positive on the diagnostic test, which has high signal value, or on the screening test, . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint which has lower signal value. (although we describe this as, say 25% of those arriving from a positive screening test as adhering, because of the homogenous structure of the model this is equivalent to all those arriving by this channel reducing their contacts by 25%.) with full adherence, the susceptible in quarantine and in isolation would not become infected, however because adherence is partial some of the isolated susceptibles (sq) and quarantined susceptibles (sd) can become exposed. table 2 describes our baseline parameter values. the sensitivity and specificity of diagnostic test are intended to be in the range of laboratory pcr tests. the sensitivity and specificity of the baseline screening test are calibrated to analytical estimates corresponding to the binaxnow rapid test, although we consider alternative values. the rate of uptake of diagnostic testing among the non-infected (ρ0) and among the infected (ρ1) were calibrated to match the total number of tests and the positivity rate in the us during july and august 2020. we allow the frequency of screening tests (governed by μ) to vary between 0 (no screening tests at all) to 0.3 (each person is screened on average every three days). our baseline isolation adherence rate is 25% for those testing positive on the screening test with 98.5% specificity, is . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint 50% for those testing positive on the two-stage screening test with 99.7% specificity, and is 75% for those testing positive on the diagnostic test (either after qualifying by being symptomatic or as the second stage following a positive screening test). bfms has two baseline scenarios, both exhibiting a second wave of infections starting midsummer 2020. in bfms, the second wave was induced by a relaxation of social distancing, masks, and other protections, combined with a full return to school in the fall. the difference between the two scenarios was the strength of the feedback from deaths and the growth rate of deaths to activity. the baseline here uses feedback parameters that are a mid-point between the two baseline scenarios considered in bfms. we estimate the model using data through june 12, 2020. the simulation period begins june 1, 2020 and ends on january 1, 2021. thus, the simulations reflect alternative, counterfactual paths for the virus and the economy for the final seven months of 2020. notes: quarterly gdp (green step function) is shown in real levels, indexed to 1 in 2019q4. total deaths (actual in black dashed, simulated in red) under the baseline simulation are 359,000 by january 1, 2021. bands denote 67%, 90%, and 95% confidence bands using standard errors for the estimated model parameters. figure 2 shows the time path of actual deaths (black dashed), simulated deaths (red), and the level of gdp (green) indexed to its level in february 2020, under our baseline calibration with . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 no screening testing. the bands for deaths and gdp are standard error bands based on estimation uncertainty for the model parameters. although the baseline scenario was constructed in june, it closely tracks the path of deaths through mid-august. subsequently, simulated deaths exceed actual deaths, in part because the simulation presumes a full return to school whereas many school districts chose remote or hybrid reopenings. under the baseline, there are 359,000 deaths by january 1. under the baseline, there are no screening tests ( = 0), however there are diagnostic tests at rates that match the volume and positivity rates of actual testing in july and august. under the screening testing counterfactuals, diagnostic testing is augmented by screening testing, holding constant all model parameters except for those describing the screening tests. the incremental costs and benefits of testing are computed from the number of tests and the model-implied economic and mortality outcomes under testing and no-testing scenarios. we assume the cost of the 98.5% specific screening test is $5. the 80% specific screening test is assumed to cost $2, but is packaged for use with the $5 test with 98.5% specificity at a 5-to-1 ratio for an average cost of $3/test. the price of a diagnostic pcr test varies considerably in the united states; we use $50 for our baseline. we compute three measures of benefits of tests: incremental gdp, incremental federal government revenues, and the monetized value of deaths avoided. gdp is measured in 2020 dollars. because the simulations start on june 1, gdp is the same under baseline and testing alternative have the same values for gdp for the first five months of the year, so any differences in gdp under the two scenarios occurs only from june through december; these incremental dollars of gdp are dollars for those seven months only, not at an annual rate. the effect of an increase in gdp on government revenues is computed using elasticities of income taxes, corporate profits taxes, fica, and the self-employment contributions tax from the congressional budget office (russek and kowalewski (2015 , table 3 ) and cbo (2019)). like gdp, these incremental revenues are for june-december only and are not annualized. to compute net fiscal benefits, we assume that all incremental testing is paid for by the federal government. deaths avoided are the cumulative number of deaths from covid-19 on january 1, 2020 under the testing scenario, minus the total number of deaths in the baseline scenario. deaths are . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint monetized using the value of statistical life is from the us epa (2020), converted to 2020 dollars, which is $9.3 million per life. this section provides full results for the three programs in table 1 , then provides sensitivity checks and time paths of the virus and gdp for illustrative programs. we begin with program a in table 1 , a single-stage screening test with 98.5% specificity with 50% confirmatory pcr testing. the adherence rates are 25% for those instructed to isolate based on the screening test alone and 75% for those instructed to isolate based on the diagnostic test. mortality and economic outcomes for program a in table 1 are shown in figure 3 . in all figures, the outcome of interest is plotted as a function of the screening test intensity . the multiple lines in the figures represent different screening test sensitivities, from 80% to 98.5%. relative to the no screening test baseline, testing biweekly is estimated to avert approximately 37,000 deaths, and testing weekly averts 66,000 deaths, when screening test sensitivity is 97%. the number of days that individuals are told to isolate (upper right) increases approximately linearly with the amount of screening testing (there is some curvature because symptomatic testing falls as screening intensity increases). for weekly testing, there are approximately 930 million proscribed isolation days, which amounts to 1.3% of the total of 70 billion person-days during the june-december simulation period. (because of partial adherence, only some of those isolation-days are actually observed.) because only half of the screening-testing positives receive confirmatory pcr testing, the preponderance of those instructed to isolate are false positives. the screening test ppv (middle left) is low, approximately 6% for weekly testing. because the virus is increasingly suppressed as  increases, gdp (middle right) increase with  for low and moderates testing rates ; although the increase is held back by the large number of healthy workers who are isolating. cost-benefit results screening program a are shown in the bottom panel of figure 3 and in figure 4 . additional testing costs rise approximately linearly with the testing rate. for these parameter values, net economic benefits (figure 3 , lower left) are in the range of $75-120 billion for biweekly testing and $150-200 billion for weekly testing, depending on the screening test sensitivity. when the value of life is included as a benefit (figure 3 , lower right), net benefits are in the range of $320-470 billion for biweekly testing and $650-820 billion 215 for weekly testing, depending on the screening test sensitivity. in nearly every case considered, the screening program pays for itself (figure 4, middle right) , under the assumption that all additional testing is paid for by the federal government. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 in all these figures, increasing the sensitivity of the screening test improves outcomes, but those improvements are typically small relative to the gains from introducing the screening program in the first place. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint figure 5 and figure 6 are the counterparts of figure 3 and figure 4 for program b, in which has the same screening test but with universal confirmatory pcr testing. expanding confirmatory testing from 50% to 100% substantially reduces deaths and increases gdp considerably, because . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint of the assumed greater adherence rate for highly specific pcr testing than for the screening test alone. in addition, because universal confirmatory testing reduces the number of healthy individuals in isolation; avoiding isolating the healthy allows them to work, increasing gdp. in fact, despite the increase in testing, isolation days are less under program b than under the noscreening baseline because the prevalence of the virus is substantially reduced, reducing the total . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint number of positive diagnostic tests despite the inflow from positive screening tests. universal confirmatory testing increases testing costs, so it is not obvious a-priori whether offering universal confirmatory testing increases or decreases net economic benefits; for the values considered here, the economic benefits of universal testing dominate and net economic benefits increase. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint results for the two-step screening test of program c are shown in figure 7 and figure 8 . this program has a $3 two-step screening test with specificity of 98.5%, adherence of 50%, and no confirmatory pcr testing. mortality gains, proscribed isolation days, employment gains, and . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint gdp gains fall between the tests in programs a and b, a consequence of the assumed lower adherence rates. although net economic benefits are less for program c than for program b, the benefit-cost ratios are greatest for program c because the tests are less expensive. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint table 3 summarizes the results of various sensitivity checks. a common critique of widespread screening is that low specificity can lead to a large number of health individuals, including health workers, needlessly entering isolation (e.g., pettengill and mcadam (2020) ). panel d in table 3 considers this case, for the 98.5% specificity screening test. eliminating confirmatory pcr testing entirely from program a increases the number of healthy people, including healthy workers, in isolation. it also reduces overall adherence because the low-ppv screening test has no follow-up diagnostic testing. these two effects substantially reduce the gains from the screening testing program. in fact, without any confirmatory pcr testing the screening testing program does not pay for itself in most of the cases considered. with . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 no confirmatory testing, there are approximately 1.8 billion proscribed isolation-days with weekly testing, approximately 2.5% of all person-days. single-stage screening test with 97% specificity with partial confirmatory testing. panel e modifies program a by considering a screening test with twice the false positive rate of the test in program a. for comparison purposes we hold adherence constant although plausibly it would be lower for the panel e test. the reduced specificity increases the number of healthy individuals proscribed to isolate. testing costs increase because there are more screening false positives that need confirmation, and isolating so many healthy workers provides an additional drag on gdp. as a result, net economic benefits are less than for program a. single-stage screening test with 98.5% specificity, partial confirmatory testing, and increased adherence. this scenario, shown in panel f, modifies program a by increasing adherence from 25% to 50% for those receiving a positive screening test but not taking a confirmatory test. higher adherence substantially increases deaths averted, gdp, and revenues, and slightly decreases total testing costs because the greater suppression of the virus reduces symptomatic testing costs. net economic benefits are large, even for biweekly testing. table 3 considers program b (98.5% specificity, universal confirmatory testing) except with a more expensive confirmatory test. the cost of the diagnostic testing is borne by the federal government so in the model does not affect private decisions and thus does not affect mortality, employment, or gdp. despite the doubling in the cost of the pcr test, the overall increase in testing cost reduction is modest, for example rising from $56 million for weekly testing in program a (table 1) to $63 million. the reason is that, with universal pcr confirmatory testing, the expected cost of administering the combined test to an uninfected individual increases only slightly from $5 + .015×$50 = $5.75 for a $50 confirmatory test to $6.50 for a $100 confirmatory test. figure 9 displays the simulated time path of deaths and quarterly gdp, along with standard error bands and actual deaths, for four counterfactual scenarios: parts (a), (b), and (c) show respectively programs a, b, and c for a weekly testing rate, and panel (d) shows program c for a four-day testing rate. all cases in figure 9 have a lower path for deaths and higher path for gdp than the no-screening baseline in figure 2 . program a slows the spread of the virus but does not suppress it. the other panels, however, approach suppression and two-step testing (program c) at a 4-day cadence essentially suppresses the virus, supporting a strong economic recovery. at a weekly testing cadence, programs b and c could have avoided the second wave of the summer and fall. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. it might be more efficient to target screening testing based on individual characteristics than having population-wide random screening. because contacts and mortality differ by age, this section considers screening that is random within an age category with testing rates differing across categories. specifically, we calculate the age-based testing rates that maximizes net total benefits (economic plus monetized mortality) of the screening test, subject to the constraint that the population-wide screening testing rate equals a given value. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 the results of the first calculationoptimized age-specific testing ratesfor screening program b (screening test with 98.5% specificity and universal confirmatory pcr testing) are shown in figure 10 , where each line is the probability of testing for a given age. the optimal age-varying testing rates are highest for young adults (ages 20-44) followed by ages 45-64, followed by ages 65-74. these results indicate that the screening testing and isolation is being used to break the chain of transmission from middle-aged adults to the elderly, either through family or service workers serving the elderly. the mortality benefits of this targeting outweigh the economic costs of isolating relatively higher fractions of the working-age population than other ages. note: dots are optimization estimates for given overall population testing rate , lines are smoothed through the estimate by age group. figure 11 shows the total net benefits for age-targeted screening testing and, for comparison, for random population screening testing. for small testing rates, there are substantial gains from targeting testing using the unconstrained allocations in figure 10 . those gains diminish at higher testing rates as the virus is suppressed, however net benefits are always higher with the agetargeted strategy. we note, however, that the costs here do not include developmental and educational costs of children missing school, and including such costs could provide an additional reason to test the young and thus allowing schools to reopen and stay open. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101/2020.10.22.20217984 doi: medrxiv preprint figure 11 . total net benefits from age-specific and age-blind screening testing there are six main arguments against widespread screening testing (e.g., pettengill and mcadam (2020) ). first, low specificity undercuts the program validity and leads to low adherence with the proscription to isolate if positive. second, low specificity unnecessarily pulls many healthy workers out of the workforce. third, because antigen tests have lower sensitivity than pcr tests, many infected individuals would slip through the cracks and undercut the effectiveness of the program. fourth, if paid for federally, their expense would be massive at a time that the federal deficit is already at a postwar high. fifth, to be effective they would need to be done at an infeasible scale, such as daily or every other day. sixth, having a screening program could change behavior, in particularly making individuals who test negative less cautious, for example reducing their willingness to wear a mask. our analysis addresses the first five of these concerns. our results underscore the importance of these first two concerns: in our analysis, the most important parameter is screening test specificity. a screening testing program must have high specificity to be credible and to evoke high adherence. this high specificity can be achieved by two-step testing if the tests are sufficiently independent. the additional costs of two-step testing, even if the second test is a pcr test, are small compared to the benefits, and screening testing with universal pcr . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 confirmatory testing generates large net benefits. test specificity is typically estimated in a laboratory using a small number of samples, so test specificity in the field could differ substantially from laboratory estimates. because low specificity undercuts the testing program, this uncertainty underscores the importance of confirmatory testing to increase specificity. the third concern, sensitivity, is legitimate in theory, but our modeling (like larremore et al (2020) ) finds that even large drops in sensitivity, say to 90%, have a small effect on the epidemiological and economic dynamics. the fourth concern, fiscal sustainability, also is legitimate in theory, but our estimates suggest that the economic gains from suppressing the virus are so large that the testing pays for itself through increased revenue. regarding the fifth concern, scale, we find that weekly testing in a regime with high compliance comes close to suppressing the virus, and moving to a four-day cadence is highly effective. weekly testing with a 98.5% specific screening test and universal confirmatory pcr testing would require increasing the number of pcr tests by roughly three-quarters of what they are today; a four-day testing would require more than doubling pcr testing capacity. our analysis does not tackle the final concern, that testing could induce more risky behavior. with that caveat, it is not self-evident that this must be the case. individuals undertake social distancing and masking to self-protect, to protect others, and to conform to local norms and laws. testing negative in the morning does not reduce the incentive to self-protect during the day. the effect on behavior of testing positive is ambiguous: altruism would lead one to reduce contacts even if not isolating, but no longer worrying about one's own health while caring little about the health of others could increase risky behavior. empirical research on this effect is needed. given the ambiguous nature of this effect, we do not consider it a reason to postpone the deployment of inexpensive rapid high-specificity screening testing. finally, our study of the economic benefits of covid-19 screening tests does not consider the public health benefits of the data generated from such a testing program for disease surveillance purposes. (we note that testing for both diagnostic and public health surveillance purposes is already routinely employed for both seasonal influenza and detection of novel strains of influenza a.) although at-home screening test results would not get into a public data system, universal confirmatory pcr testing would increase data coverage by overcoming the current selection into diagnostic testing of the symptomatic. thus, our proposed testing regimes would allow for much more timely and fine-grained analysis of the response of covid-19 prevalence and transmission to a wide range of public health interventions and disease mitigation strategies than is possible with current diagnostic testing data. presumably, consideration of the utility of the data generated from a widespread screening testing regime in shaping the design of effective and low-cost mitigation measures would add to the economic benefits that we have calculated here. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 appendix 1 this appendix provides more details on the model, which extends the model developed in baqaee, farhi, mina, and stock (2020) . our model departs from bfms in several important respects. first, we extend the model to include both a screening test regime and a diagnostic test regime. second, we assume that individuals are instructed to isolate upon receiving a terminal positive test, either a screening test with no confirmatory test or a positive diagnostic test. individuals awaiting diagnostic test results are instructed to quarantine. third, we distinguish between individuals who have recently recovered and fully recovered from the disease to capture that individuals may still test positive on a pcr test after they are no longer infectious. finally, we allow for imperfect adherence to quarantine and isolation. there are five age groups indexed by a, representing ages 0-19, 20-44, 45-64, 65-74, and 75+. there are 66 private sectors in the economy indexed by i. individuals are either s (susceptible), e (exposed), i (infected), r (recently recovered), f (fully recovered), or d (dead). in addition, individuals who are not dead are either actively circulating (a), awaiting diagnostic test results (d), awaiting screening test results (s), or in isolation following a positive test (q). thus, the population is partitioned into 21 states. for example, 2 2 , 2 , and 2 denote the number of persons aged 20-44 that are susceptible and actively circulating, susceptible and awaiting screening test results, susceptible and awaiting diagnostic test results, and susceptible and in isolation, respectively. we assume that the recovered (either recently recovered or fully recovered) are immune through the end of our simulation period. the rates of screening and diagnostic testing are given by the parameters , 0 , and 1 , described in table 2 . we assume that these parameters are equal to zero in the estimation period of our model, which runs through june 1 st , and thereafter calibrated according to the main text of this paper. the state variables (i.e. sa) are all five-dimensional vectors. let denote the a th element of any state x (the a th age group). the epidemiological side of the model has 21 transition equations: . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 where denotes the number of individuals of age a (summing across all 21 states) and denotes the effective number of infected individuals actively circulating, = + + + . in this final expression we treat the signal value of taking a diagnostic test as being the same as receiving a positive screening test (these would be the same for the screeningtest positives taking a confirmatory pcr test), so non-adherence with quarantine is the same as non-adherence with a terminal positive screening test. the parameter is a weighted average of the parameters and , which are the isolation adherence rates for those who received screening tests and diagnostic tests, respectively. the weights are endogenously determined and given by the relative share of those instructed to . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 isolate who arrived from screening tests versus diagnostic tests. thus, is the effective adherence rate of those in isolation. if those in isolation mainly through the screening test regime, then will be close to , the quarantine adherence rate for the screened population. given the parameters appearing in equations (1) through (21) above and a set of initial conditions, the model is straightforward to solve in discrete time by forward iteration. the unit of time is a single day and the model is solved 12 steps per day. the contact matrix c describes the expected number of contacts between each age group in the population. an actively circulating individual of age who interacts with an individual of age has an instantaneous infection probability of times the probability that the age-individual is infected. the probability that an individual of age a is infected in a given period is therefore given by summing across all their contacts. we distinguish between contacts that are made at home, at work, and elsewhere. the contact matrix is time-varying, and can change due to, for instance, npis put in place by the government or personal behavioral adaptations to avoid contracting the virus. we have: where ℎ , ℎ , indicate the expected number of contacts in each of home, work, and other environments, conditional on being at home, at work, or elsewhere. the parameters ℎ , ℎ , and , indicate the probability that an age-a individual is at home, at work, or elsewhere. we note that , is the fraction of employed in the indicated sector: where , is the number of workers of age a employed in sector i. see sections 1.2 and 2 of bfms (2020) for more information on the construction and historical estimation of the contact matrix. the behavioral component of this model endogenously determines the contact matrix in our simulation period (i.e. after june 1). this portion of the model is unchanged from bfms 2020. for completeness, we will briefly describe the key elements of this control rule here. in our simulation period (june 1 st through december 31 st ), we assume that the contact matrix responds endogenously to changes in the course of the pandemic. we formalize this by implementing a linear proportional-integral-derivative (pid) control rule, in which feedback depends on current deaths, the 14-day change in deaths, the current unemployment rate and the . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 integral of the unemployment rate. the linear pid control rule can be expressed as: where is the unemployment rate and ̇ is the time derivative of the death rate. both and ̇ are generally unknown, available only with time aggregation and/or with reporting lags. we therefore use the 14-day average of the unemployment rate, the cumulative daily unemployment rate since march 7 th , deaths over the previous two days, and the 14-day change in the two-day death rate for the various terms on the right-hand side of this equation. the pid controller determines a sequence of sectoral labor supply shocks, shifted by the gdpto-risk index: where is the workforce in sector i at date t as a fraction of the workforce prior to the pandemic (i.e. february 2020), is the date of the beginning of the simulation period (june 1 st ), and φ is the cumulative gaussian distribution (which plays no role except as a sigmoid to constrain the controller between 0 and 1). the term is the gdp-to-risk index: the gdp-to-risk index can be interpreted as measuring the ratio of the marginal contribution to output, relative to the marginal contribution of 0 , from an additional worker of age a returning to work in sector i. up to scale, the gdp to risk index does not depend on epidemiological parameters except the contact matrix. the units of are not meaningful, so we standardize it to mean zero and unit variance across sectors (equally weighted). thus, the controller effectively alters the work contacts component of the contact matrix. similarly, we can think of the controller as generating a sequence of labor supply shocks that can be used to back out gdp using hulten's theorem as a first-order approximation: where the subscript • denotes summation over ages and ψ denotes the labor income share for sector i. . cc-by 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted october 27, 2020. ; https://doi.org/10. 1101 /2020 testing, voluntary social distancing and the spread of an infection emergency department study group on respiratory v immunochromatographic rapid test for influenza a and b virus among adult patients in the emergency department epidemiological and economic effects of lockdown group testing in a pandemic: the role of frequent testing, correlated risk, and machine learning policies for a second wave. forthcoming an seir infections disease model with testing and conditional quarantine an economic model of the covid-19 epidemic: the importance of testing and age-specific policies accuracy of rapid influenza diagnostic tests: a meta-analysis a tip against the covid-19 pandemic how did covid-19 and stabilization policies affect spending and employment? a new real-time economic tracker based on private sector data the accuracy of cbo's baseline estimates for fiscal year 2019 the macroeconomics of epidemics the macroeconomics of testing and quarantining test sensitivity for infection versus infectiousness of sars-cov-2 (2020). nber working paper 27780 fear, lockdown, and diversion: comparing drivers of pandemic economic decline 2020 national coronavirus response: a road map to reopening mandated and voluntary social distancing during the covid-19 epidemic beat covid without a vaccine test sensitivity is secondary to frequency and turnaround time for covid-19 surveillance. medrxiv preprint which workers bear the burden of social distancing policy? manuscript roadmap to recovery: a public health guide for governors assessment of sars-cov-2 screening strategies to permit the safe reopening of college campuses in the united states unnecessary obstacles to covid-19 mass testing can we test our way out of the covid-19 pandemic? optimal covid-19 quarantine and testing policies national covid-19 testing & tracing action plan roadmap to responsibly reopen america how cbo estimates automatic stabilizers a national decision point: effective testing and screening for covid-19. duke margolis center for health policy adherence to the test, trace and isolate system: results from a time series of 21 nationally representative surveys in the uk (the covid-19 rapid survey of adherence to interventions and responses population-scale testing can suppress the spread of covid-19. medrxiv preprint a realistic blueprint for reopening the economy by sector while ramping up testing binaxnow tm covid-19 ag card instructions for use key: cord-328320-1f3r80r5 authors: kim, edward title: drawing on israel’s experience organizing volunteers to operationalize drive-through coronavirus testing centers date: 2020-04-16 journal: disaster medicine and public health preparedness doi: 10.1017/dmp.2020.104 sha: doc_id: 328320 cord_uid: 1f3r80r5 to increase the country’s capacity to test and track suspected coronavirus disease 2019 (covid-19) cases, israel launched drive-through testing centers in key cities, including tel aviv, jerusalem, be’er sheva, and haifa. this article examines the challenges that the national emergency medical services and volunteers faced in the process of implementing drive-through testing centers to offer lessons learned and direction to health-care professionals in other countries. i srael's ministry of health confirmed their first case of coronavirus disease 2019 (covid-19) on february 21, 2020, following the repatriation of 1 of the 11 passengers aboard the diamond princess cruise ship. 1, 2 one month later, the first related death was made public along with details of over 700 confirmed cases. 3 by the end of march, the number of cases had increased to over 4500. 4 to respond to the increasing prevalence and spread of coronavirus throughout the country, there was an urgent need to increase testing of suspected individuals and trace community spread. 5 working alongside the ministry of health, magen david adom (mda)-israel's emergency medical, disaster, ambulance, and blood bank service-operationalized drive-through coronavirus testing centers, mirroring the models used by china and south korea. 6 drive-through testing centers, referred to in israel as "drive and test" facilities, allow for increased testing throughput while maintaining social distancing guidelines. 7 the process starts after people with symptoms of covid-19 are instructed to contact mda's dispatch center, where they are screened by a physician, given an appointment if necessary, and then instructed to fill out electronic medical forms. following this registration process, patients receive a qr code to their mobile device to be scanned at the facility on the day of testing. those who lack access to a car or are unable to travel receive a home visit, where they are tested by mda staff. after the first "drive and test" facility was piloted on march 23 in tel aviv to test capacity and refine operations, additional facilities were opened in jerusalem, be'er sheva, and haifa. the following week, mobile testing centers were set up in tamra, modi'in-maccabim-re'ut, wadi ara, and rahat, the largest bedouin community in israel. 8 facilities were set up in large open spaces, such as the parking lots of convention centers, stadiums, parks, and markets. the testing centers are staffed with medical teams, police officers, security personnel, and volunteer students, and can process 6 lanes of cars in parallel, taking on average 3-5 min per car. the medical teams consist primarily of paramedics who oversee operations, police officers and security personnel who direct traffic, and volunteers who don personal protective equipment (ppe) while testing individuals. volunteer efforts were coordinated by the federation of israel medical students (fims), which enlisted volunteers from a combined pool of approximately 700 medical students in their clinical studies. 9 while the initial setup and deployment of the "drive and test" facilities were executed swiftly and strategically, there were outstanding issues associated with preparation, implementation, and ongoing operations that needed to be addressed. anticipating an increased demand on the country's emergency call center, which typically receives 5500 calls per day, other independent call centers were repurposed and additional centers were set up in schools to expand capacity and respond to 25,000 calls per day. concurrently, over 200 volunteers signed up and underwent training to staff the multiple "drive and test" facilities and scale operations. however, to comply with health policies, students were trained in small groups of less than 10, which created time lags and constraints on the quality of the training they received. in more detail, ppe protocol reviews were limited to demonstrations only, leaving many students to don ppe for their first time on-site. a further layer of confusion arose from the fact that "drive and test" facilities and home visits used different ppe equipment and protocols. with respect to communication, several days after the launch of the "drive and test" facilities, fims changed the platform volunteers use to sign up for and manage their shifts. ultimately, this left many volunteers unable to log into the new system and view available shifts. this problem was compounded by the fact that the team relied on a single whatsapp group, of over 250 participants, to disseminate all information related to volunteering. although it was helpful to access a centralized resource, unless volunteers were checking their messages constantly, information was quickly lost due to the steady stream of new messages. not only did volunteers face challenges related to training and staffing during the preparation and early implementation phases, they also confronted operational issues during the first 2 wk of launch. for example, volunteers were not given a script to reference before testing individuals. in the first week, a driver stopped their car during testing using only the brake pedal with their car still in gear. startled during the swabbing process, the driver jerked pressing the accelerator instead and caused an accident. after the incident, volunteers were given a script that, among other things, instructed drivers to put their vehicle in park and engage their parking brake. more broadly, the rush to set up "drive and test" facilities and coordinate volunteers has had unintended downstream consequences. laboratories have received viral cultures that are unlabeled or contain inaccurate identification stickers; bags and cultures with identification stickers that do not match; as well as missing or illegible patient information on handwritten forms. coolers have also been left open for extended periods of time between sample collections. if not quickly addressed, these quality control issues will result in wasted resources, repeat testing, and inaccurate results, putting patients and others at risk if left unresolved. recognizing and responding to shortcomings is critical to successful operations in times of crisis. accordingly, an anonymous survey was disseminated to collect complaints and recommendations from volunteers to gain insights on outstanding issues. fortunately, mda addressed many of the operational issues quickly and decisively, in some cases overnight. for problems related to limited training, mda shared updated internal standards and protocols in-person and pushed the same updates to the "mda teams" app to improve the onboarding process for new volunteers. subsequently, new volunteers were paired with experienced ones for at least 1 full shift. this was a feasible solution, considering each station already required 2 volunteers: 1 to prepare and log the sample and the other to take the sample from the patient. the pairing strategy helped with oversight between the volunteers without fostering inefficiencies. it is worth noting that the continued tightening of governmentmandated travel restrictions and stricter enforcement of such policies did not reduce mda and fims' ability to organize volunteers. both organizations were quick to offer guidance on how to travel to testing sites. while there was no mechanism to provide volunteers with official certificates to expedite travel exceptions, a separate hotline was set up for those who faced any difficulty with law enforcement and to dismiss any accrued fines. in times of crisis, communication and organizational challenges occur more frequently. even with ample experience in emergency response and pilot testing, it is not possible to anticipate and prevent every hurdle. an appropriate balance between speed and thoroughness was achieved by mda's handling of the "drive and test" facilities. to quickly operationalize "drive and test" facilities while allowing for process improvements, mda used the plan-do-check-act methodology. this enabled the steering team to quickly identify and address areas of improvement while maintaining a balance between iterative refinement and consistency. 10 testing center processes evolved over the first 2 wk of implementation and resulted in a largely successful operation. by the end of march, "drive and test" centers had collected over 2000 tests daily, which accounted for 18,000 of the 53,000 tests completed nationwide. 11 world health organization disaster medicine and public health preparedness news-releases/first-confirmed-coronavirus-case-in-israel-at-sheba-medicalcenter-tel-hashomer-301009210.html world health organization world health organization substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2) drive-through screening center for covid-19: a safe and efficient screening system against massive community outbreak drive-through medicine: a novel proposal for rapid evaluation of patients during an influenza pandemic the fight against corona the quality toolbox complex arrives to modi'in the author has no potential conflicts of interest to disclose. key: cord-292274-upwn9o2m authors: ghaffari, abdi; meurant, robyn; ardakani, ali title: covid-19 serological tests: how well do they actually perform? date: 2020-07-04 journal: diagnostics (basel) doi: 10.3390/diagnostics10070453 sha: doc_id: 292274 cord_uid: upwn9o2m in only a few months after initial discovery in wuhan, china, sars-cov-2 and the associated coronavirus disease 2019 (covid-19) have become a global pandemic causing significant mortality and morbidity and implementation of strict isolation measures. in the absence of vaccines and effective therapeutics, reliable serological testing must be a key element of public health policy to control further spread of the disease and gradually remove quarantine measures. serological diagnostic tests are being increasingly used to provide a broader understanding of covid-19 incidence and to assess immunity status in the population. however, there are discrepancies between claimed and actual performance data for serological diagnostic tests on the market. in this study, we conducted a review of independent studies evaluating the performance of sars-cov-2 serological tests. we found significant variability in the accuracy of marketed tests and highlight several lab-based and point-of-care rapid serological tests with high levels of performance. the findings of this review highlight the need for ongoing independent evaluations of commercialized covid-19 diagnostic tests. coronavirus disease 2019 (covid19) was first discovered in a cluster of patients with severe respiratory symptoms in hubei province, china, in december 2019. the early nucleic acid analysis of known pathogen panels led to negative results, suggesting the causative agent was of unknown origin. by early january 2020, analysis of bronchoalveolar lavage (bal) fluid from infected patients revealed a pathogen, later named sars-cov-2, with 50%, 80%, and 96% genetic sequence overlap to the genome of the middle east respiratory syndrome virus (mers-cov), the severe acute respiratory syndrome virus (sars-cov), and bat coronavirus ratg13, respectively [1, 2] . like sars-cov and mers-cov, sars-cov-2 is a single-stranded rna virus belonging to the beta genus coronavirus in the coronaviridae family [3] . as sars-cov-2 can be transmitted from human to human, the disease has spread swiftly to over 200 countries, infecting nearly 6 million people and resulting in at least 350,000 deaths worldwide (as of 27 may 2020) [4] . an unprecedented and rapidly growing global effort is underway to develop covid-19 vaccines and therapeutics, but at the time of this review, there are no vaccines, and only one antiviral drug (remdesivir) with modest clinical benefit has been approved under the u.s. food and drug administration (fda) emergency use authorization (eua) [5, 6] . under these circumstances, countries were forced to implement physical distancing measures to control the outbreak and, in the process, place approximately 3 billion people under lockdown. in any infectious disease outbreak, accurate and accessible diagnostic testing must be one of the pillars of control-measure policies to understand and minimize the spread of disease. the epidemiological studies of the outbreak in china estimated the proportion of undetected covid-19 cases to be as high as 86% [7] . as asymptomatic or mild cases could play a significant role in the transmission and spread of the sars-cov-2 virus [7, 8] , symptoms alone are not reliable diagnostic markers. there are two major types of diagnostic technologies available to address this: molecular and serological tests. currently, much of the focus is on the sars-cov-2 molecular test, which can detect, with high accuracy, the virus-specific rna molecules circulating in the host body. the gold-standard molecular test is based on reverse transcriptase polymerase chain reaction (rt-pcr) technology. however, the pcr test is not useful in distinguishing between highly infective viruses versus ones that have been neutralized by the host, and it cannot assess immunity status against sars-cov-2 [9] . serologically based antibody tests can complement molecularly based tests by providing a more accurate estimate of sars-cov-2 incidence and by potentially detecting individuals with immunity against the disease, as these tests detect markers of the immune response. in humoral immune response to infection, pathogen-specific antibodies, produced by b cells, neutralize and prevent further spread of the disease. the activation and differentiation of b cells into antibody-secreting plasma b cells are triggered by a cascade of events involving virus digestion by antigen-presenting cells (e.g., dendritic cells, macrophages) and presentation of virus-specific antigens to helper t cells ( figure 1 ). antibodies protect the host by binding to specific antigens (proteins) on the virus to neutralize its fusion and entry into the host cell and facilitate recognition and killing by phagocytic immune cells [10] . in humans, three types of antibodies or immunoglobulins have been the target of covid-19 serological tests: igm, igg, and iga. although the dynamics of the immune response in covid-19 are not fully understood, typically igm antibodies are produced by host immune cells during the early stages of a viral infection. igg is often the most abundant antibody in the blood and plays a more prominent role in the later stages of infection and in establishing long-term immune memory [11] . while igm and igg antibodies have been the leading candidates in covid-19 serological test development, recent studies show that iga, predominately present in the mucosal tissue, may also play a critical role in the immune response and disease progression [12] . in any infectious disease outbreak, accurate and accessible diagnostic testing must be one of the pillars of control-measure policies to understand and minimize the spread of disease. the epidemiological studies of the outbreak in china estimated the proportion of undetected covid-19 cases to be as high as 86% [7] . as asymptomatic or mild cases could play a significant role in the transmission and spread of the sars-cov-2 virus [7, 8] , symptoms alone are not reliable diagnostic markers. there are two major types of diagnostic technologies available to address this: molecular and serological tests. currently, much of the focus is on the sars-cov-2 molecular test, which can detect, with high accuracy, the virus-specific rna molecules circulating in the host body. the goldstandard molecular test is based on reverse transcriptase polymerase chain reaction (rt-pcr) technology. however, the pcr test is not useful in distinguishing between highly infective viruses versus ones that have been neutralized by the host, and it cannot assess immunity status against sars-cov-2 [9] . serologically based antibody tests can complement molecularly based tests by providing a more accurate estimate of sars-cov-2 incidence and by potentially detecting individuals with immunity against the disease, as these tests detect markers of the immune response. in humoral immune response to infection, pathogen-specific antibodies, produced by b cells, neutralize and prevent further spread of the disease. the activation and differentiation of b cells into antibody-secreting plasma b cells are triggered by a cascade of events involving virus digestion by antigen-presenting cells (e.g., dendritic cells, macrophages) and presentation of virus-specific antigens to helper t cells ( figure 1 ). antibodies protect the host by binding to specific antigens (proteins) on the virus to neutralize its fusion and entry into the host cell and facilitate recognition and killing by phagocytic immune cells [10] . in humans, three types of antibodies or immunoglobulins have been the target of covid-19 serological tests: igm, igg, and iga. although the dynamics of the immune response in covid-19 are not fully understood, typically igm antibodies are produced by host immune cells during the early stages of a viral infection. igg is often the most abundant antibody in the blood and plays a more prominent role in the later stages of infection and in establishing long-term immune memory [11] . while igm and igg antibodies have been the leading candidates in covid-19 serological test development, recent studies show that iga, predominately present in the mucosal tissue, may also play a critical role in the immune response and disease progression [12] . proteins. (2,3) following replication and release from the host cells, a subset of viruses will be engulfed and digested by antigen-presenting cells (apcs) like macrophages or dendritic cells. (4) fragmented sars-cov-2 antigen(s) will be presented to t helper cells, which in turn will interact and activate b cells. (5) activated b cells will proliferate and differentiate into plasma or memory b cells with high-affinity binding receptors for the original sars-cov-2 antigen. plasma cells secrete their sars-cov-2-specific receptors in the form of igm, igg, or iga antibodies. (6) antibody-mediated neutralization occurs when sars-cov-2-specific antibodies bind to viral antigen(s) and prevent virus interaction and entry into host cells. serological, or antibody, tests detect immunoglobulins produced by the host's plasma b cells following exposure to foreign antigens. the sars-cov-2 genome encodes approximately 25 proteins that are required for infection and replication, including four major structural proteins: spike (s), envelope (e), membrane (m), and nucleocapsid (n) (figure 1 ). the s protein plays a critical role in fusion and entry into the host cell, and it comprises an n-terminal s1 receptor-binding domain (rbd), n-terminal domain (ntd), and a c-terminal s2 subunits. the primary function of the sars-cov-2 n protein (np) is binding and packing of the viral rna genome into a helical nucleocapsid structure during viral replication [13, 14] . studies on the serum of recovered covid-19 patients suggest that host-neutralizing antibodies primarily work against s and n proteins [15, 16] . consequently, the likelihood of predicting immunity status could increase in serological tests that target various regions of s or n proteins. therefore, the characterization of specific sars-cov-2 antigen domains targeted by the humoral immune response becomes an integral part of the serological test development. there are four major types of serological diagnostic tests: the rapid diagnostic test (rdt), enzyme-linked immunosorbent assay (elisa), chemiluminescence immunoassay (clia), and neutralization assay. the neutralization assay is a lab-based test that uses live virus and cell culture methods to determine if patient antibodies can prevent viral infection in vitro. this test must be performed in laboratories with designated biosafety certificates to culture sars-cov-2-infected cells and has a time-to-result of 3-5 days. an rdt is a simple and rapid test based on lateral flow immunoassay (lfia) technology, commonly found in pregnancy test kits, for example. rdt can potentially be administered as a point-of-care (poc) test or self-test. typically, rdt test strips use a drop of blood to detect the presence of patient antibodies (igg, igm, or iga) produced against a specific sars-cov-2 antigen ( figure 2 ). an rdt is simple to use with a time-to-result anywhere between 10 and 30 min. therefore, it has the potential to be deployed in large-scale serological surveys. elisa assay, currently the most commonly used format of the serological test, is a lab-based test with an average time-to-result of 2-5 h. elisa typically uses a surface coated with specific viral antigen(s) to bind to and detect the corresponding patient antibodies (igg, igm, iga) in blood, plasma, or serum samples. the bound antigen-antibody complex is then detected by using a second antibody and a substrate that produces a color-or fluorescent-based signal. elisa assays can be found in different formats including direct, competitive, and, the most commonly used, sandwich or double-antigen-bridging assay (daba) (figure 3 ). clia technology follows a similar concept to elisa by taking advantage of high binding affinity between the viral antigen(s) and host antibodies but uses chemical probes that yield light emission through a chemical reaction to generate a positive signal. clia has an average time-to-result of 1-2 h. clia and elisa are both high-throughput laboratory-based assays with high level of analytical agreement [17, 18] . overview of rapid diagnostic serological test. rapid diagnostic tests (rdts) are typically based on colorimetric lateral flow immunoassay, in which host antibodies migrate across an adhesive pad (e.g., nitrocellulose) and interact with bound virus-specific antigens and secondary antibodies (antihuman igm/g antibodies). conjugated sars-cov-2-specific antigen(s) (labeled with gold here) will bind with the corresponding host antibodies. as antibody-antigen complexes travel up the membrane, bound anti-sars-cov-2 igm antibodies interact with fixed anti-igm secondary antibodies on the m line, and anti-sars-cov-2 igg antibodies interact with anti-igg antibodies on the g line. if the blood sample does not contain sars-cov-2-specific antibodies, the m or g lines do not appear in the final test results; only the control (c) line will be revealed. overview of enzyme-linked immunosorbent assay (elisa)-based diagnostic test. elisa can be presented in different formats based on differences in antigen immobilization and antibody labeling. in direct elisa, sars-cov-2 antigen(s) bound to a plastic solid phase is detected by the addition of a conjugated antibody. in sandwich elisa, the capture antibody is attached to the plastic solid phase. antigen(s) in the sample will bind to the capture antibody and then be detected by a second enzyme-labeled antibody. in competitive elisa, sample sars-cov-2 antigen is preincubated with the primary antibody and then added to a well coated with a secondary antibody along with an enzyme-conjugated antigen that competes with the sample antigen for binding with the primary antibody. the more sars-cov-2 antigen in the sample, the less conjugated antigen will be bound and the lower the signal will be. knowledge of virus and host immune response dynamics are essential in formulating diagnostic testing and treatment strategies. studies of covid-19 suggest that seroconversion, when antibody levels become detectable in the blood, may take place days after the viral load has peaked [19] . therefore, serological tests would be less effective in the early stages of covid-19. wolfel and colleagues further confirmed these findings by reporting igm and igg seroconversion in 50% of patients at 1 week after the onset of symptoms [20] . the median time for the detection of igm and igg in covid-19 patients was reported to be 5 and 14 days, respectively [21] . yu and colleagues detected the seroconversion of iga in direct elisa, sars-cov-2 antigen(s) bound to a plastic solid phase is detected by the addition of a conjugated antibody. in sandwich elisa, the capture antibody is attached to the plastic solid phase. antigen(s) in the sample will bind to the capture antibody and then be detected by a second enzyme-labeled antibody. in competitive elisa, sample sars-cov-2 antigen is preincubated with the primary antibody and then added to a well coated with a secondary antibody along with an enzyme-conjugated antigen that competes with the sample antigen for binding with the primary antibody. the more sars-cov-2 antigen in the sample, the less conjugated antigen will be bound and the lower the signal will be. knowledge of virus and host immune response dynamics are essential in formulating diagnostic testing and treatment strategies. studies of covid-19 suggest that seroconversion, when antibody levels become detectable in the blood, may take place days after the viral load has peaked [19] . therefore, serological tests would be less effective in the early stages of covid-19. wolfel and colleagues further confirmed these findings by reporting igm and igg seroconversion in 50% of patients at 1 week after the onset of symptoms [20] . the median time for the detection of igm and igg in covid-19 patients was reported to be 5 and 14 days, respectively [21] . yu and colleagues detected the seroconversion of iga on day 2 and igm/igg on day 5 after onset of symptoms. furthermore, the study reported that 100% of cases had detectable levels of iga, igm, and igg on day 32 after onset of symptoms [12] . their findings also revealed igm and igg levels to be significantly higher in severe covid-19 cases than in patients with mild or moderate disease [12] , suggesting that serological tests require high sensitivity to detect lower levels of antibodies in mild cases. studies on the persistence of antibodies in blood suggest that high levels of igg are detectable for at least 49 days after the onset of symptoms, while igm levels declined rapidly on day 35 postinfection [22] . the diagram in figure 4 depicts the timelines and peak levels for sars-cov-2 viral load relative to blood igm, igg, and iga antibodies. improved understanding of humoral antibody response time kinetics in covid-19 is crucial to the correct application of serological tests. diagnostics 2020, 10, x for peer review 6 of 13 on day 2 and igm/igg on day 5 after onset of symptoms. furthermore, the study reported that 100% of cases had detectable levels of iga, igm, and igg on day 32 after onset of symptoms [12] . their findings also revealed igm and igg levels to be significantly higher in severe covid-19 cases than in patients with mild or moderate disease [12] , suggesting that serological tests require high sensitivity to detect lower levels of antibodies in mild cases. studies on the persistence of antibodies in blood suggest that high levels of igg are detectable for at least 49 days after the onset of symptoms, while igm levels declined rapidly on day 35 postinfection [22] . the diagram in figure 4 depicts the timelines and peak levels for sars-cov-2 viral load relative to blood igm, igg, and iga antibodies. improved understanding of humoral antibody response time kinetics in covid-19 is crucial to the correct application of serological tests. the urgent need for the development of serological diagnostic tests in response to the covid-19 outbreak has compelled regulatory bodies to implement emergency use authorization programs to expedite the commercialization process of these tests. in light of this, independent and robust postmarket evaluations of covid-19 serological tests are needed to confirm manufacturers' performance claims. the basic measures of quantifying diagnostic test performance are sensitivity and specificity. sensitivity is the ability of a test to detect the disease agent or the host's response to the disease (i.e., antibodies) when it is truly present, whereas specificity is the ability of a test to correctly return a negative result when disease or host response is absent [23] . we conducted a systematic review of independent studies that assessed the performance of currently available sars-cov-2 serological tests. we included studies that reported sensitivity and specificity, stage of disease (early, intermediate, or late), the test format (clia, elisa, rdt), and antibody target (iga, igg, igm, or igg + igm) [24, 25] . if available, the sars-cov-2 antigens used for antibody detection was recorded. the studies that did not specify the disease stage of test samples were grouped under the "overall" category and assessed separately. in total, we reviewed the urgent need for the development of serological diagnostic tests in response to the covid-19 outbreak has compelled regulatory bodies to implement emergency use authorization programs to expedite the commercialization process of these tests. in light of this, independent and robust post-market evaluations of covid-19 serological tests are needed to confirm manufacturers' performance claims. the basic measures of quantifying diagnostic test performance are sensitivity and specificity. sensitivity is the ability of a test to detect the disease agent or the host's response to the disease (i.e., antibodies) when it is truly present, whereas specificity is the ability of a test to correctly return a negative result when disease or host response is absent [23] . we conducted a systematic review of independent studies that assessed the performance of currently available sars-cov-2 serological tests. we included studies that reported sensitivity and specificity, stage of disease (early, intermediate, or late), the test format (clia, elisa, rdt), and antibody target (iga, igg, igm, or igg + igm) [24, 25] . if available, the sars-cov-2 antigens used for antibody detection was recorded. the studies that did not specify the disease stage of test samples were grouped under the "overall" category and assessed separately. in total, we reviewed performance data on 5 serological clia tests, 15 serological elisa tests, and 42 serological rdts currently on the market (see supplementary materials) . the distribution plot of the data shows a higher degree of variability in test sensitivity values compared to specificity ( figure 5 ). this level of variability further emphasizes the need for independent evaluations of serological tests on the market. the sensitivity/specificity plots highlight tests at various stages of covid-19 and confirm the expectation that serological tests are more effective in later stages of the disease when higher igg and igm levels are present in the blood ( figure 6 ). the heatmap of tests in the "overall" category ranks the highest-performing test in each target antibody category based on sensitivity, followed by specificity (figure 7) . top-performing covid-19 serological tests (>95% sensitivity and specificity) from the xy plots and heatmap are summarized in table 1 . diagnostics 2020, 10, x for peer review 7 of 13 effective in later stages of the disease when higher igg and igm levels are present in the blood ( figure 6 ). the heatmap of tests in the "overall" category ranks the highest-performing test in each target antibody category based on sensitivity, followed by specificity (figure 7) . top-performing covid-19 serological tests (>95% sensitivity and specificity) from the xy plots and heatmap are summarized in table 1 . . independent evaluation of sars-cov-2 serological test overall performance. sensitivity and specificity data from studies that did not specify the covid-19 stage ("overall" group) are represented in a heatmap. the heatmap is ordered according to the antibody target (igg, igm, and igg/igm) followed by sensitivity and specificity values. all analyses were conducted using software r version 3.6.3. heatmaps were generated using the gplots and rcolorbrewer packages. elisa: enzyme-linked immunosorbent assay; clia: chemiluminescence immunoassay; rdt: rapid diagnostic test. figure 7 . independent evaluation of sars-cov-2 serological test overall performance. sensitivity and specificity data from studies that did not specify the covid-19 stage ("overall" group) are represented in a heatmap. the heatmap is ordered according to the antibody target (igg, igm, and igg/igm) followed by sensitivity and specificity values. all analyses were conducted using software r version 3.6.3. heatmaps were generated using the gplots and rcolorbrewer packages. elisa: enzyme-linked immunosorbent assay; clia: chemiluminescence immunoassay; rdt: rapid diagnostic test. in light of policies to ease the lockdown and reopen the economy, large-scale seroprevalence studies to screen for immunity status are being implemented in several jurisdictions. critics point to gaps in our understanding of immune response to covid-19 infection, including the ability of serological tests to detect neutralizing antibodies and the capacity of the immune system to provide long-term immunity against sars-cov-2. however, some argue that in the context of a global viral outbreak with a relatively high mortality rate, inaction due to uncertainty can have negative consequences compared to the harm caused by false-positive and false-negative serological test results [31] . several jurisdictions have initiated seroprevalence studies to provide a more accurate estimate of cases with positive sars-cov-2-specific antibodies, irrespective of disease symptoms. in los angeles county, the prevalence of sars-cov-2 antibodies in the community was estimated to be 4.65%, equivalent to 367,000 adults, which was substantially greater than the 8430 confirmed cases in the same county at the time of the study [32] . in new york city, 19.9% of the population has been estimated to have sars-cov-2 antibodies, compared to 2.1% confirmed cases as of 2 may 2020 [33]. similar studies from germany, the u.k., singapore, and china show significantly higher estimates of positive sars-cov-2 antibody cases compared to symptomatic cases confirmed by molecular tests [34] . as undetected cases with mild or no symptoms can transmit the virus, it is not surprising that countries (e.g., south korea, germany, and singapore) with large-scale and well-organized testing programs, combined with extensive isolation and contact tracing for infected individuals, have had some success in minimizing covid-19-related death in their populations [35] . as serological tests are in high demand, in part due to an increase in large-scale seroprevalence studies, it is imperative for national and regional governments to continue coordinated efforts to independently validate serological test performance and partner with industry to scale up manufacturing and production capacity. existing emergency authorization programs, intended to accelerate the manufacturing of diagnostic tests, must also be accompanied by clear and informed guidelines on preferred and minimally acceptable profiles of covid-19 serological tests designed for specific indications. despite the unprecedented response to the outbreak, major gaps remain in our understanding of the interaction between sars-cov-2 and the immune system, which can negatively impact serological testing utilization. coordinated research efforts are urgently needed to investigate some of the key gaps in our knowledge, including: which serological tests can identify sars-cov-2-neutralizing antibodies? 2. is there cross-reactivity between neutralizing antibodies and other coronaviruses? 3. which sars-cov-2 antigens are optimal for the detection of neutralizing antibodies? 4. what is the correlation between sars-cov-2-specific antibodies and protective immunity status? 5. how long does protective immunity last in recovered patients? are individuals susceptible to reinfection with sars-cov-2? 6. is humoral antibody response the best indicator for protective immunity, or are there other immune-cell-based mechanisms? in the context of the covid-19 outbreak and the execution of return-to-work policies, failing to take advantage of available diagnostic tools due to uncertainty can have profound consequences. medical professionals frequently rely on imperfect evidence with the possibility of false positives and false negatives. it is, however, important to clearly understand the limits and potential of serological tests to make informed decisions based on risk and benefit assessment in each specific situation. in the words of tedros adhanom ghebreyesus, director-general of the world health organization, "countries cannot fight this pandemic blindfolded. countries should know where the cases are." a novel coronavirus from patients with pneumonia in china a pneumonia outbreak associated with a new coronavirus of probable bat origin genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding world health organization covid-19 situation report 127 remdesivir in adults with severe covid-19: a randomised, double-blind, placebo-controlled, multicentre trial compassionate use of remdesivir for patients with severe covid-19 substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) presumed asymptomatic carrier transmission of covid-19 the important role of serology for covid-19 control the immune system in health and disease immune response to sars-cov-2 and mechanisms of immunopathological changes in covid-19 distinct features of sars-cov-2-specific iga response in covid-19 patients how to discover antiviral drugs quickly nucleocapsid protein recruitment to replication-transcription complexes plays a crucial role in coronaviral life cycle detection of sars-cov-2-specific humoral and cellular immunity in covid-19 convalescent individuals neutralizing antibodies against sars-cov-2 and other human coronaviruses in vitro diagnostic assays for covid-19: recent advances and emerging trends laboratory testing of sars-cov, mers-cov, and sars-cov-2 (2019-ncov): current status, challenges, and countermeasures temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov-2: an observational cohort study virological assessment of hospitalized patients with covid-2019 profiling early humoral response to diagnose novel coronavirus disease (covid-19) viral kinetics and antibody responses in patients with covid-19. medrxiv (preprint) 2020 simple statistical measures for diagnostic accuracy assessment food & drug administratin. independent evaluations of covid-19 serological tests evaluation of nine commercial sars-cov-2 immunoassays evaluation of nucleocapsid and spike protein-based enzyme-linked immunosorbent assays for detecting antibodies against sars-cov-2 serology characteristics of sars-cov-2 infection since the exposure and post symptoms onset. medrxiv (preprint) 2020 serological detection of 2019-ncov respond to the epidemic: a useful complement to nucleic acid testing performance characteristics of the abbott architect sars-cov-2 igg assay and seroprevalence in waiting for certainty on covid-19 antibody tests-at what cost? seroprevalence of sars-cov-2-specific antibodies among adults the world is still far from herd immunity for coronavirus countries test tactics in 'war' against covid-19 this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord-331617-1ytcd0ax authors: horvath, karl; semlitsch, thomas; jeitler, klaus; krause, robert; siebenhofer, andrea title: antikörpertests bei covid-19 was uns die ergebnisse sagen date: 2020-05-15 journal: z evid fortbild qual gesundhwes doi: 10.1016/j.zefq.2020.05.005 sha: doc_id: 331617 cord_uid: 1ytcd0ax introduction: in the context of the severe acute respiratory syndrome coronavirus 2 (sars-cov-2) pandemic, the detection of virus-specific antibodies (ab) will play an increasing role. the presence or absence of such antibodies can potentially lead to considerations regarding immunity and infection. issue: how reliable are inferences from positive or negative test results regarding the actual presence of sars-cov-2 specific antibodies? methods: calculation of the probability that, depending on the pretest probability (prevalence of sars-cov-2 infection) and test properties, antibodies are present or absent in the case of positive or negative test results. results: sensitivity and specificity of different sars-cov-2 ab test systems vary between 53 % and 94 % and between 91 % and 99.5 %, respectively. when using a test with high test quality, the positive predictive value (ppv) is 42 % and 7 9%, respectively, with a pre-test probability of 1 % to 5 %, as can currently be assumed for the general population in austria or germany. for persons with an increased pre-test probability of 20 %, e. g. persons from high-risk professions, the ppw is 95 %, with a pre-test probability of 80 % the ppw is almost 100 %. the negative predictive value (npv) is at least 99.7 % for persons with a low pre-test probability of up to 5 % and 79.1 % for persons with a pre-test probability of 80 %. when using test systems with lower sensitivity and specificity, the reliability of the results decreases considerably. the ppv is 5.9 % with a pre-test probability of 1 %. conclusions: a sufficiently high sensitivity and specificity are prerequisites for the application of antibody test systems. positive test results are often false if the pre-test probability is low. depending on the assumed prevalence of a sars-cov-2 infection, there are substantial differences in the significance of a concrete test result for the respective affected persons. introduction: in the context of the severe acute respiratory syndrome coronavirus 2 (sars-cov-2) pandemic, the detection of virus-specific antibodies (ab) will play an increasing role. the presence or absence of such antibodies can potentially lead to considerations regarding immunity and infection. issue: how reliable are inferences from positive or negative test results regarding the actual presence of sars-cov-2 specific antibodies? methods: calculation of the probability that, depending on the pretest probability (prevalence of sars-cov-2 infection) and test properties, antibodies are present or absent in the case of positive or negative test results. positive predictive value negative predictive value pre-test probability results: sensitivity and specificity of different sars-cov-2 ab test systems vary between 53 % and 94 % and between 91 % and 99.5 %, respectively. when using a test with high test quality, the positive predictive value (ppv) is 42 % and 7 9%, respectively, with a pre-test probability of 1 % to 5 %, as can currently be assumed for the general population in austria or germany. for persons with an increased pre-test probability of 20 %, e. g. persons from high-risk professions, the ppw is 95 %, with a pre-test probability of 80 % the ppw is almost 100 %. the negative predictive value (npv) is at least 99.7 % for persons with a low pre-test probability of up to 5 % and 79.1 % for persons with a pre-test probability of 80 %. when using test systems with lower sensitivity and specificity, the reliability of the results decreases considerably. the ppv is 5.9 % with a pre-test probability of 1 %. conclusions: a sufficiently high sensitivity and specificity are prerequisites for the application of antibody test systems. positive test results are often false if the pre-test probability is low. depending on the assumed prevalence of a sars-cov-2 infection, there are substantial differences in the significance of a concrete test result for the respective affected persons. [2] . potenziell können aus dem vorhandensein oder fehlen von virusspezifischen ak auch überlegungen zur immunität oder infektion einer person angestellt werden [3] . ak tests spielen auch in den überlegungen zur implementierung von zukünftigen teststrategien und daraus abgeleiteten maßnahmen im weiteren verlauf der sars-cov-2 pandemie eine rolle. maßnahmen, die auf falschen annahmen hinsichtlich einer bestehenden infektion oder immunität beruhen, können zu falschem oder sorglosem verhalten führen. dies wiederum kann eine verstärkte ausbreitung der infektion und gefährdung betroffener personen zur folge haben. ziel dieser arbeit ist es daher, darzustellen, wie sicher, unter berücksichtigung von testeigenschaften und vortestwahrscheinlichkeit, positive bzw. negative ak testresultate auf das tatsächliche vorhandensein oder fehlen von sars-cov-2spezifischen ak schließen lassen. darüber hinaus soll diskutiert werden, welche schlüsse aus den testresultaten auch unter berücksichtigung weiterer testunabhängiger faktoren hinsichtlich einer potenzielle immunität oder infektion gezogen werden können. da weitgehend alle zur diagnose eingesetzten tests nicht vollständig fehlerfrei funktionieren, ist auch bei der testung auf vorliegen von sars-cov-2 spezifischen ak damit zu rechnen, dass es einen anteil von personen gibt, der vom test falsch klassifiziert wird. d.h. es wird personen geben, bei denen keine ak vorliegen, die aber dennoch ein positives testresultat erhalten (falsch positives testresultat). genauso wird es einen anteil von personen geben, bei denen trotz eines negativen testresultats dennoch ak vorhanden sind (falsch negatives testresultat). eine übersicht zu den möglichen testergebnissen in bezug auf das vorliegen einer erkrankung findet sich in abbildung 1. wie groß dieser jeweilig falsch klassifizierte anteil an allen getesteten personen ist, d.h. wie sicher ein positives oder negatives testresultat ist, ist abhängig von der sensitivität und spezifität des jeweiligen tests sowie von der gegebenen vortestwahrscheinlichkeit. die sensitivität berechnet sich aus dem anteil der personen mit positivem testresultat an der gesamtzahl der personen, bei denen ak vorliegen. sie gibt an wie gut der test personen mit sars-cov-2 spezifischen ak erkennt. würde die sensitivität 100% betragen, würde der test alle personen mit virusspezifischen ak erkennen und keine einzige person mit virusspezifischen ak übersehen. die spezifität berechnet sich aus dem anteil der personen mit negativem testresultat an der gesamtzahl der personen, bei denen keine ak vorliegen. sie gibt an wie gut der test personen, bei denen keine sars-cov-2 spezifischen ak vorliegen, erkennt. würde die spezifität 100% betragen, würden nur personen, bei denen tatsächlich ak vorliegen, und keine einzige person, bei der keine ak vorliegen, ein positives testresultat erhalten. die vortestwahrscheinlichkeit ist jene wahrscheinlichkeit, mit der bereits vor durchführung des tests davon auszugehen ist, dass bei einer person virusspezifische ak vorliegen. sie entspricht der häufigkeit, mit der in einer bestimmten personengruppe mit einer (aktiven oder durchgemachten) sars-cov-2 infektion zu rechnen ist. die vortestwahrscheinlichkeit ist z.b. bei personen mit covid-19 kompatiblen symptomen, aufenthalt in einer region mit hoher durchseuchung (jeweils aktuell oder in den letzten wochen) und die ggf. in einem risikoberuf tätig sind als höher anzunehmen als bei symptomlosen personen im homeoffice außerhalb eines risikogebiets. mit anderen worten beschreibt die vortestwahrscheinlichkeit das risiko, dass bei einer bisher ungetesteten person eine sars-cov-2 infektion vorliegt bzw. bestand. über die wahrscheinlichkeit, mit der bei einem positiven testresultat davon ausgegangen werden kann, dass die gesuchte erkrankung tatsächlich vorliegt, gibt der positiv prädiktive wert (ppw) auskunft. er berechnet sich aus dem anteil der richtig positiven testresultate an allen positiven testresultaten (abbildung 1). der negativ prädiktive wert (npw) gibt darüber auskunft, wie hoch die wahrscheinlichkeit ist, dass eine person mit einem negativen testresultat tatsächlich nicht erkrankt ist. der npw berechnet sich aus dem anteil der richtig negativen testresultate an allen negativen testresultaten (abbildung 1). der ppw ist umso höher, je höher die sensitivität und spezifität und je höher die prävalenz ist. der npw ist umso höher, je höher sensitivität und spezifität und je niedriger die prävalenz ist. bei der interpretation der testresultate sind neben den angeführten testeigenschaften auch weitere faktoren wie z.b. der testzeitpunkt im verlauf einer infektion zu berücksichtigen. verschiedene hersteller und distributoren geben ihrerseits sensitivitätswerte zwischen 80% und 100% und spezifitätswerte zwischen 92,5% und 100% an [4] [5] [6] [7] [8] [9] [10] . bei etablierten ak tests zu anderen virusinfekten (z.b. epstein-barr-virus, cytomegalie-virus, herpes-simplex-viren) werden werte für die sensitivität von rund 86% bis 100% und für die spezifität von rund 83% bis 100% angegeben [11] . in einer rezenten untersuchung zu den testeigenschaften von neun kommerziell erhältlichen sars-cov-2 ak tests -sowohl enzyme-linked immunosorbent assay (elisa) als auch point-of-care-testing (poc) -konnten für die sensitivität hingegen werte von 67% bis 93% und für die spezifität werte von 80% bis 100% ermittelt werden [12] . ein aktueller systematischer review gibt für elisa testsysteme sensitivitätswerte von 72,2% bis 94,4% und spezifitätswerte von 96,7% bis 99,5% an. für poc testsysteme werden werte von 52,8% bis 82,8% für die sensitivität bzw. werte von 91,4% bis 99,4% für die spezifität angeführt [13] . die prävalenz von sars-cov-2 infektionen in unterschiedlichen personengruppen kann aus derzeit verfügbaren daten nur abgeschätzt werden. aus den resultaten der aktuell vorliegenden querschnittsstudien in österreich kann abgeleitet werden, dass die prävalenz von sars-cov-2 infektionen in der allgemeinen bevölkerung zum zeitpunkt der untersuchung (1. bis 4. april 2020) wahrscheinlich bei rund 1%, zumindest aber unter 5% liegt (abgesehen von ausnahmen in tirol) [14] . eine zweite prävalenzstudie, die ende april 2020 durchgeführt wurde bestätigte diese niedrigen werte [15] . für deutschland kann von einer ähnlichen prävalenz ausgegangen werden. diese annahmen gelten auch für die prävalenz aller bisher durchgemachten infektionen, die für die überlegungen zur bedeutung der testergebnisse zum nachweis von igg-ak relevant ist. in der letzte märzwoche betrug der anteil positiv getesteter personen an der zahl aller getesteten personen (jeweils pcr test) in österreich rund 20% [16] . die bedeutung eines positiven testresultats für eine symptomlose person, die sich nicht in einer region mit hoher durchseuchung aufhält (und für die beides auch in der vergangenheit nicht der fall war) und die auch nicht der gruppe der risikoberufe angehört, unterscheidet sich somit deutlich von der bedeutung eines ebenfalls positiven testresultats für eine person mit covid-19 kompatiblen symptomen in einem risikogebiet (jeweils aktuell und in der näheren vergangenheit) und/oder tätigkeit in einem risikoberuf. negative testresultate sind bei personen mit geringer vortestwahrscheinlichkeit weitgehend verlässlich. sie bringen aber keine wesentliche zusätzliche sicherheit, da die wahrscheinlichkeit, dass keine sars-cov-2 infektion vorliegt, schon vor durchführung des tests hoch ist. das fehlen von sars-cov-2 spezifischen ak kann unterschiedliche ursachen haben. die ak können fehlen, weil tatsächlich keine infektion stattgefunden hat oder weil zwar eine infektion vorliegt, diese jedoch noch nicht lange genug besteht. die bildung von ak benötigt zeit, deshalb sind sie in der frühphase der infektion noch nicht nachweisbar. entsprechend einer untersuchung erfolgt die serokonversion bei 50% der infizierten innerhalb der ersten 7 tage. erst nach 14 tagen kann bei allen mit sars-cov-2 infizierten personen mit dem ak nachweis gerechnet werden [17] . bei der sars-cov infektion, die in den jahren 2002/2003 auftrat, konnten bei infizierten personen nach etwa 3 bis 6 tagen igm-ak und nach 8 tagen igg-ak nachgewiesen werden [18, 19] . da sars-cov-2 zur gleichen viren-familie gehört und genomstudien gezeigt haben, dass sars-cov-2 zu etwa 80% mit sars-cov identisch ist, kann davon ausgegangen werden, dass der prozess der antikörperbildung ähnlich verläuft. eine aktuelle untersuchung zum ak profil von 34 an covid-19 erkrankten personen in china bestätigt diese annahme [20] . aus einem negativen ak test bzw. dem fehlen von ak kann somit nicht sicher auf eine nicht vorhandene infektion geschlossen werden. der alleinige einsatz von ak testsystemen zum nachweis einer infektion würde daher einer fehlverwendung gleichkommen [21] . die beurteilung der bedeutung von testergebnissen für eine konkrete person wird auch dadurch limitiert, dass eine ausreichende datengrundlage zur einschätzung der vortestwahrscheinlichkeit fehlt. dies ist aber für die einschätzung des ausmaßes einer potenziellen fehlannahme essenziell. zu bedenken ist auch, dass sich die prävalenz einer sars-cov-2 infektion und damit die vortestwahrscheinlichkeit mit fortschreiten der pandemie ändern wird. die zukünftig zu erwartende zunahme bedingt, dass jeweils die für den testzeitpunkt bestehende prävalenz grundlage der beurteilung sein muss. ob selbst bei vorhandenen sars-cov-2 spezifischen ak auch eine immunität besteht, gilt derzeit als nicht gesichert [22] . ergebnissen von tierversuchen sowie erkenntnissen zu sars-cov geben jedoch hinweise darauf, dass genesene personen nur ein sehr geringes reinfektionsrisiko haben. beobachtungen von bisherigen coronaviren-infektionen deuten darauf hin, dass bis zu drei jahre nach erstinfektion eine immunität bestehen könnte [23, 24] . in wie weit das ausmaß einer potenziellen fehlannahme akzeptabel ist, hängt vom kontext der fragestellung ab. die fehlerhafte annahme einer immunität bei angehörigen von pflege-oder medizinberufen kann schwerwiegende auswirkungen haben. für epidemiologische fragestellungen kann der fehler aber möglicherweise hingenommen oder in modellen berücksichtigt werden. gezielte testungen ausgewählter personengruppen mit vermutlich sehr hoher vortestwahrscheinlichkeit könnten, eingebettet in strukturierte prozesse mit nachfolgenden handlungsanleitungen, sinnvoll sein. die verfügbaren angaben zu den testeigenschaften können nicht als gesichert angenommen werden. so basieren die aktuell vorhandenen test-evaluierungen vorwiegend auf fall-kontroll-studien und weisen somit ein erhöhtes verzerrungspotenzial auf. zudem zeigen unabhängige publikationen geringere werte für sensitivität und spezifität als die von den herstellern bzw. distributoren angegebenen. wie die berechnungen zu ppw und npw zeigen, ist jedoch eine ausreichend hohe testgüte voraussetzung für einen sinnhaften einsatz von ak testsystemen. dementsprechend sind qualitativ hochwertige evaluierungsstudien der vorhandenen ak testsysteme erforderlich. ebenso werden weitere studien zur prävalenz einer sars-cov-2 infektion in unterschiedlichen populationen benötigt. eine unkontrollierte anwendung von ak tests zum aktuellen zeitpunkt kann zu unerwünschten nachteiligen effekten führen, wenn aus den testresultaten falsche schlussfolgerungen und daraus abgeleitete handlungen resultieren. personen, bei denen fälschlicherweise eine immunität angenommen wird, können z.b. aus einem falschen sicherheitsgefühl wieder vermehrt kontakte aufweisen oder die hygieneregeln weniger beachten. das kann in folge zu einer gefährdung anderer und der eigenen person führen und die verbreitung von covid-19 wieder erhöhen. ebenso kann z.b. der fälschliche ausschluss einer akuten infektion bei personen im pflegeberufen zu weitreichenden negativen konsequenzen führen. anwendungen von ak tests sind daher nur im rahmen von organisierten, strukturierten testprogrammen zu spezifischen fragestellungen mit begleitender evaluation der daten zu empfehlen. ausreichend hohe sensitivität und spezifität sind unabdingbare voraussetzung für eine anwendung von ak testsystemen. auch bei nahezu idealen testeigenschaften sind bei geringer vortestwahrscheinlichkeit (wie sie bei der testung von personen ohne symptome und risikofaktoren für eine sars-cov-2 infektion besteht) positive testresultate häufig falsch. abhängig von der anzunehmenden prävalenz für eine sars-cov-2 infektion zeigen sich wesentliche unterschiede in der bedeutung eines konkreten testresultats für die jeweils betroffenen personen. zum nachweis bzw. ausschluss einer infektion sind ak tests alleine nicht geeignet. von einem positiven testresultat kann nicht sicher auf das vorliegen von ak und damit auf eine potenzielle immunität geschlossen werden. antibody responses to viral infections: a structural perspective across three different enveloped viruses igm in microbial infections: taken for granted? distributor: technomed gmbh. clinical data for sample correlation hersteller: intec products inc, distributor: nanorepro ag. sars-cov-2 antikörper-schnelltest (igm/igg). rev. 03 distributor: szabo-scandic handelsgmbh. wantai sars-cov-2 ab rapid test. rapid test for detection of total antibodies to sars-cov-2 distributor: szabo-scandic handelsgmbh. wantai sars-cov-2 ab elisa. diagnostic kit for total antibody to sars-cov-2 (elisa) distributor: szabo-scandic handelsgmbh. wantai sars-cov-2 igm elisa. diagnostic kit for igm antibody to sars-cov-2 (elisa) product catalogue evaluation of nine commercial sars-cov-2 immunoassays antibody tests in detecting sars-cov-2 infection: a meta-analysis ? idcservice = get pdf file&ddocname = 123050 coronavirus in österreich: daten und karten virological assessment of hospitalized patients with covid-2019 production of specific antibodies against sars-coronavirus nucleocapsid protein without cross reactivity with human coronaviruses 229e and oc43 ifa in testing specific antibody of sars coronavirus profile of specific antibodies to sars-cov-2: the first report the promise and peril of antibody testing for covid-19 immunity passports'' in the context of covid-19 immunity after sars-cov-2 infection. rapid review 2020. oslo: norwegian institute of public health covid-19 (coronavirus sars-cov-2) key: cord-300520-vxn7uh41 authors: baunez, c.; degoulet, m.; luchini, s.; pintus, p.; teschl, m. title: tracking the dynamics and allocating tests for covid-19 in real-time: an acceleration index with an application to french age groups and departments date: 2020-11-07 journal: nan doi: 10.1101/2020.11.05.20226597 sha: doc_id: 300520 cord_uid: vxn7uh41 an acceleration index is proposed as a novel indicator to track the dynamics of the covid-19 in real-time. using french data on cases and tests for the period following the first lock-down from may 13, 2020, onwards our acceleration index shows that the ongoing pandemic resurgence can be dated to begin around july 7. it uncovers that the pandemic acceleration has been stronger than national average for the [59-68] and [69-78] age groups since early september, the latter being associated with the strongest acceleration index, as of october 25. in contrast, acceleration among the [19-28] age group is the lowest and is about half that of the [69-78], as of october 25. in addition, we propose an algorithm to allocate tests among french departments, based on both the acceleration index and the feedback effect of testing. our acceleration-based allocation differs from the actual distribution over french territories, which is population-based. we argue that both our acceleration index and our allocation algorithm are useful tools to guide public health policies as france enters a second lock-down period with indeterminate duration. the current covid-19 pandemic not only caught most of the world by surprise, it also highlighted the crucial aspect of uncertainty under which all societies live since the first mentioning of the new viral infection at the end of 2019. this uncertainty however not only concerns the novel sars-cov-2 pathogen and how it spreads, its effects on human health, the best possible treatments, when a vaccine will be available and how efficient it will be, to just cite some of the concerns. there are at least two more uncertainties which receive less attention. one is parameter-uncertainty of typical sir (susceptible-infected-removed)-models that are widely used to predict the evolution and consequences of the pandemic, and the other is data-uncertainty. 1 while a continuous effort is put into remedying and improving data collection, because this is, after all, the first and foremost important source of empirical knowledge on which all else is built, the parameter-uncertainty is intrinsic to modelling strategies and thus much less prone to any easy solutions. 2 a key point of this paper therefore is to say that given these uncertainties, what is of crucial importance to know, here and now, for public health specialists and decision makers and other people alike, is whether harm is accelerating or whether measures that are put in place to control the pandemic are contributing to curbing the spread and thus to decelerate harm (see taleb [11] ). in the case of a pandemic, harm can be defined, for example, as the number of confirmed cases. however, simply plotting the number of positive cases over time, as it has been done on many governmental sites, is not the right way of understanding the dynamics of harm. for example, the knowledge of positive cases depends on testing and hence cases will very much depend on the underlying testing strategy put in place. but this is exactly our point: to say something meaningful, we need to put cases and tests in relation to each other. that is, to understand whether harm, defined as the number of cases, is accelerating or decelerating, we need to plot cases against tests and not to consider them separately. although this might be seen as just another tool to visualize the data, we show that such a way to organize the data is extremely useful to uncover an index of acceleration, which turns out to be a simple and plain scale elasticity at the end date of the sample. it tells us how much additional cases are detected given additional tests, in percentage of the total of both variables attained at the date for which one has the last vintage of data. we argue that updating such an acceleration index in real time provides very valuable information and is an essential tool to apprehend the pandemic uncertainties with which we currently live. it is also dealing with parameter-uncertainties and this simply because we do not use any. instead of engaging in assumptions and probability attributions to ex-ante uncertain events, we use the data that is currently available, which is the best we have, to shed light onto the dynamics of the pandemic and the harm it produces. our aim strategy combines two aspects: information about the virus circulation and the expected feedback that testing implies, in terms of isolation and contact tracing most importantly. we propose a parsimonious algorithm to allocate tests across age groups and space, based on both our acceleration index and the average positivity rate, and on the extent to which tests reduce the virus propagation. because the latter feature can never be measured with certainty, we argue that any allocation of tests should also reflect the beliefs of (benevolent) public health authorities and experts and we add a parameter to the analysis to take into account this dimension explicitly. we show that our acceleration-based allocation of tests differs from the actual distribution of tests, which has essentially been determined by population size. but our analysis shows that population size is not necessarily the criterion that affects acceleration, at least not all the time. hence, to the extent that testing is accompanied by contact tracing and efficient isolation, this observation means that allocating tests according to where the virus accelerates the most would be a better way to control the pandemic. accurate and reliable information about how pathogens spread over space and time is of firstorder importance to fight epidemics and to properly design efficient public health policies. what is striking in the case of covid-19 is that, at least in north american and european countries, is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint in figure 1 the problem with the information contained in the graphs of the first row in figure 1 is that they would be a fine summary of the pandemic's dynamics if the number of tests performed over a unit of time was roughly constant. in that case, the evolution of the number of new cases, given a constant over time number of new tests, would capture well how fast the virus spreads. however, as shown 4 all the data used in this paper is publicly available from the web page "données relatives aux résultats des tests virologiques covid-19 si-dep" https://www.data.gouv.fr/fr/datasets/donnees-relatives-aux-resultats-destests-virologiques-covid-19/. 5 santé publique france decided at some point to report also the positivity rate, that is, the ratio of positive cases to tests at weekly frequency. however, as we will argue, even this indicator does not convey enough information and possible early warning signals that public health authorities could rely on. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint in panels (c) and (d) in figure 1 (bottom row), this assumption is far from correct in practice, and this is not only true for france because of several reasons that has concerned most countries. 6 in the period after the end of the generalized lock-down in france, when the who message to "test, test, test" 7 was becoming effectively received in france, one sees from panel (c) in figure 1 that the number of tests performed each day is far from constant and has in fact been trending up most of the time, not surprisingly given the level of unpreparedness of the french government at the beginning of the pandemic. it is obvious that the dynamics of the number of cases is strongly related to the dynamics of performed tests. after all, diagnostics are the only, albeit imperfect, way to confirm that people are positive to covid-19. 8 however, observing each independently (side by side) may give only a blurring view at best, and even a confusing sense, of the extent to which the pandemic accelerates. a manifestation of how confusing that coarse piece of information can be, if not properly organized, is the statements often voiced that one should not be worried to observe more positive cases as long as more tests are performed as time passes. after all, more tests imply more cases, almost by definition and this basic fact should not be taken at face value to indicate that the pandemic is worsening. we argue that such reasoning is wrong and that the correct understanding, in terms of measuring the acceleration/deceleration of the pandemic, is gained if a scatter-plot of the number of positive cases against the number of the tests is used in real time, instead of the panels in figure 1 . this is the first contribution of this paper, in our view, and it is presented in figure 2 . before presenting our preferred way to organize the data, a few remarks are in order. first, we use the cumulated numbers of daily cases and of daily tests as a way to keep track of the size and overall prevalence of the pandemic. second, our proposed scatter-plot has merit only if it is updated in real time, say, at daily frequency, so as to add information about additional tests and cases. this immediately raises the question of how data should be normalized, given that the number of data points obviously increases with time. we choose a simple and rather innocuous re-scaling method known as min-max normalization, 9 which in our case amounts to divide all historical values by the last data point figure. as a result, our (all positive) normalized data values are contained within the unit interval (0, 1) so that our scatter-plot lies in the unit square. 6 an important reason behind the observed non-constancy of tests over time in most countries is undoubtedly the fact that the pandemic caught most of the world by surprise in the first quarter of 2020. as a consequence, the widespread goal to progressively increase the capacity to perform tests on a large scale has led many countries to increase the number of tests since march. 7 see https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefingis the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. in figure 2 we present our proposed scatter-plot, which is in our view a very useful way to visualize the dynamics over time of the number of cases in relation to the number of tests. as alluded to above, there is a causality arrow going from the latter to the former, so we plot cases against tests, all in cumulated values. for example, panel (a) of figure 2 graphs all data points from march 13 to june 13, 2020, each point/dot representing a particular date. on the x-axis, all numbers are between zero and one due to min-max normalization and a particular value gives the fraction of tests performed that day, say, t, out of the total cumulated number of tests performed up to end date t , say. for example, 0.8 means 80% of the total amount of tests cumulated from date 0 (may 13, 2020, in our case) to date t (june 13, 2020). similarly, along the y-axis we report the cumulated number of cases for each day t ≤ t , as a fraction of the total number of cases at date t . all such normalized data points fill in a solid black curve which then represents the actual data over time. starting at the origin and moving along the solid curve in the north-east direction means moving forward through time, since we use values that add up over time. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint the dashed black line in panel (a) of figure 2 , on the other hand, is the first diagonal and it turns out to be an interesting benchmark if one is interested in capturing the acceleration/deceleration of the pandemic. counter-factually, if the dashed diagonal would represent the real data, that would tell us that the virus spreads according to a linear pattern whereby, say, 20% of the total amount of tests accounts for 20% of the total number of positive cases, and so on. and this coincidence would hold at all dates between zero and t . in other words, the pandemic neither accelerates nor decelerates since a given proportion of tests account for the same proportion of detected cases for all the data period, that is, over time. panel (a) of figure 2 shows that this has in reality not been the case for the period may 13-june 13, 2020, or for all the period covered by our analysis for that matter. more precisely, the solid line, which represents actual data, turns out to be entirely lying above the dashed diagonal, which should be interpreted in the following manner. first, the blue line representing the tangent to the black curve at the end point (1, 1) turns out to have a slope smaller than one (this is easy to see since the slope of the dashed diagonal equals one by definition). this means that between the day right before t and day t , the observed fraction of new tests (out of the total number of tests to this day) has been associated to a smaller fraction of new cases (out of the total of cases to this day). we then want to say that at date t the pandemic is decelerating. now going backward from date t into the past (that is, moving from the farthest north-east corner toward the origin along the solid curve), one sees the "slope" of the tangent to the solid curve going up as time moves backward. in other words, the slope of the tangent to the solid curve has been decreasing from date 0 to date t , which is an equivalent way of saying that the solid curve is concave: for a given fraction of tests, lower and lower fractions of cases have been detected over time. this indeed captures in a visual fashion the fact that the pandemic has been continuously decelerating between may 13 and june 13. this observation indicates that france has somehow benefited from the lock-down period in the sense that the circulation of the virus, in the aggregate, slowed down. unfortunately, this state of affairs was not to persist. panel (b) in figure 2 shows, using color coding, that a month later, on july 13, the slope of the solid curve at end date (the green line) was about one -the green line almost coincides with the dashed diagonal. in other words, the pattern of the pandemic over time switched from decelerating to roughly linear, between mid-june and mid-july. quite importantly, comparing panel (a) to panel (b) suggests that the slope at end date had started to increase and has crossed unity between june 13 and july 13. while comparing panels (a) and (b) suggests the end of deceleration, panel (c), dated august 13, clearly shows that acceleration was then taking place: the slope at end date is about two. in addition, the scatter-plot at later dates reveal that acceleration is still underway, to this day (october 25, 2020). one important conclusion is that figure 2 delivers an indicator in real time of the pandemic's dynamics, in the sense that it helps to visualize whether the pandemic accelerates or decelerates at any date. the property that the pandemic accelerates when the slope of the solid curve at end date is larger than one, and that it decelerates when the slope is smaller than one, is in fact no is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint surprise. as shown in appendix a, this slope is essentially an elasticity: its value gives the variation of cases detected by a given variation of tests, in proportion to levels of (cumulated) cases and tests attained at end date. 10 for example, a value about two as in panel (c) of figure 2 means that a given proportional increase in the amount of tests accounts for a proportional increase in the number of cases that is twice as much. one can therefore think of this elasticity as an acceleration index: acceleration happens when the elasticity is larger than one, otherwise the pandemic decelerates. in the razor's edge case when the elasticity is exactly one, a linear pattern emerges so that the pandemic neither accelerates nor decelerates. figure 2 , we also report the points corresponding to the previous dates associated with the other panels. for example, the blue dot in panel (b) corresponds to the situation at june 13. similarly, the red dot in panel (d) corresponds to august 13. interestingly, our scatterplot helps also visualizing the pandemic's dynamics in a retrospective fashion, to the extent that one can visually track not only the slope of the derivative to the solid line, but also the slope of the cord that links the origin to the point corresponding to the date under scrutiny and we now turn to this subtle point. more specifically, although figure 2 and the depiction of our acceleration index are very helpful to visualize in real-time whether the pandemic accelerates at end date, those two visualization tools have their own limitations and must be refined if one is to do a retrospective analysis of the pandemic using the information in the scatter-plot and not only the elasticity at end point. 11 the way we see it is that the scatter-plot should be updated in real time to visualize where the pandemic goes in terms of acceleration/deceleration. we can in particular infer each day, from the most recent data point, the slope of the tangent to the solid curve, the value of which is what we called the acceleration as explained above. but such an index by definition combines two types of information: variations (between two successive dates in our example) and levels at final date. to use the notation of appendix a and if we denote the elasticity/acceleration index at end date ε t , the following decomposition holds: where p t and d t denote respectively the cumulated numbers of positives/cases and diagnosis/tests at date t , and similarly for date t − 1. if we define from equation (1) the daily positivity rate to be the ratio between additional positives and additional tests, that is, what we have called 10 more generally, the (local) elasticity of a given function is the ratio between its derivative and its average value, taken at a particular point. see arrow et al. [1] . the empirical counterpart that we use in this paper is conceptually similar. 11 mathematically speaking, the normalization and updating of the data means that data points are subject to a rotational homothety over time. for instance, while the end point at t has coordinates (1, 1), it is subject at date t + 1 to a mapping which is a composition of a contraction, due to the new values of cases and tests at end point, and of a rotation, due to the new value of the ratio between the latter. this is why the colored points move over time in figure is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint earlier the tangent's slope, then the above decomposition relates the elasticity, the average positivity rate and the daily positivity rate. more precisely, the elasticity is defined as the ratio of the daily positivity rate to the average positivity rate (much as the elasticity of a function is the ratio of its derivative to its average value). this fact suggests that looking at the average positivity rate alone is not informative enough, in the sense that this indicator does not say how much additional tests translate into additional cases, even though it is of course a fine measure of the average detection rate. it is a proxy for the prevalence of the virus in the population, a key parameter for herd immunity (see britton et al. [3] , fontanet and cauchemez [5] ). that said, there might be situations in which any pair of those indicators summarize all there is to know, the third indicator being somehow redundant. for instance, the case depicted in panel (a) in figure 2 is simple in the sense that, at all dates, both the tangent's slope and the average positivity rate go down over time, and that the former looks, on close inspection, smaller than the latter. from this property, one concludes that the elasticity is smaller than one. this is of course another way to express the property that the solid line is concave (hence above the dashed diagonal since the solid line goes through the origin and the upper-right edge of the unit square). in contrast, panel (c) in figure 2 reveals that there has been a change in curvature, that is, a reversal from deceleration to acceleration, between june 13 and august 13. this can be seen from the fact that the least recent part of the solid curve is above the dashed line, while the most recent part is below it. in other words, the convex part indicates acceleration of the pandemic, while the concave part indicates deceleration at earlier dates. one difficulty that arises from such a change in curvature/motion is that the daily and average positivity rates may vary in different directions, with ambiguous implications about whether the elasticity stays smaller than one or, on the contrary, becomes persistently larger than one. in panel (c) of figure 2 , one notices an inflexion point, when the solid curve intersects the dashed diagonal in its interior. at the date when this happens, the tangent's slope/daily positivity rate starts to rise, whereas it was declining over time before that date. the average positivity rate, however, continues falling right after the inflexion point is attained, and it starts to rise only later. eventually, both the daily and the average positivity rate do coincide, and when this happens the elasticity equals one. after that date, the elasticity stays larger than one. more specifically, imagine that the elasticity is smaller than unity and that the daily positivity starts increasing while the average positivity rate is still declining over time. it then becomes possible that the elasticity starts rising and eventually crosses the unit value and possibly stays persistently larger than one. the latter situation occurs whenever the daily positivity rate rises faster (or declines more slowly) than the average positivity rate. note that all this is of course reminiscent of what happens when the graph of a continuous and increasing function changes curvature, from concave to convex. we therefore conclude that while the colored line's slope is an estimate of the elasticity at end point, looking back in time at past events from the perspective of date t through the lens of our scatter-plot requires to examine how is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint both the daily and the average positivity rates vary over time, as shown in figure 3 . in figure 3 we report over time the magnitudes of the three terms that can be inferred from the decomposition in equation (1) . remember that our objective is to detect whether the pandemic decelerates, accelerates or switches between those two types of motion. a striking feature in figure 3 is that the acceleration index -in panel (b) -leads the average positivity rate -in panel (a) -in the sense that the former starts rising about two weeks before the latter starts rising as well. in other words, the evolution of the acceleration over time appears to be an early warning of a future rise in the average positivity rate. eventually, in view of the reasoning above, one expects of course that the rising acceleration index crosses the unit value when the (rising) average positivity rate equals the (rising) daily positivity rate. the bottom line is that the decomposition that follows from the scatter-plot in figure 2 may be useful to detect the early reversal from deceleration to acceleration. in the case of france, the elasticity starts rising about july 7, while the average positivity rate starts rising about two weeks later, on july 23, as indicated by the orange bars in figure 3 . 12 to sum up, the important piece of information that one can derive from figure 3 is that a reliable measure of the pandemic's dynamics is the elasticity, which we can think of as an acceleration index. this elasticity leads the average positivity rate and is more dynamic than the daily positivity rates, 12 our acceleration index has, to our knowledge, not been studied before, but many health agencies around the world report the positivity rate over time. in particular, santé publique france started, during the second quarter of 2020, to report the weekly positivity rate on top of the number of cases. the first time a rise of such an indicator is noted is july 23, in the "point épidémiologique hebdomadaire du 23 juillet 2020", whereas our acceleration index starts rising on july 7 as shown above. this two-week gap may suggest that using daily data rather than weekly data is more appropriate. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint which are other key statistics to follow. in practice, therefore, panels (a) and (b) should be used as real-time indicators of the pandemic's dynamics. from the acceleration index and average positivity rate then follows the daily positivity rate, which is is still useful to translate the transmission dynamics back into levels (as opposed to normalized ratios). from panel (b) in figure 3 , one infers that the period since the end of the first lock-down in france can roughly be split into four sub-periods. from may 13 to july 7, our acceleration index fluctuates around a value slightly lower than one, indicating that the pandemic is in a deceleration regime. our indicator suggests that july 7 is the start of an ongoing acceleration regime in which the acceleration index increases, crosses the unit value on july 23, and continues to increase until about mid-august, when our acceleration index equals about 2. in that sense, this second sub-period can be seen as the fact that the pandemic worsens more and more over time. then a new plateau occurs until around october 1st, after which the acceleration index rises again to reach a value of about 3 on october 25. of course, the challenge that the country faces as the second lock-down period starts, on october 30, is to reverse that pattern to ensure that the acceleration index starts decreasing and eventually reaches values below one, which would indicate deceleration of the pandemic. the simple case whereby both cases and tests grow exponentially, as detailed in section b of the appendix, is perhaps useful to interpret the four sub-periods just described. the first plateau associated with deceleration is compatible with a situation where cases have been growing at a slower rate than tests for a while. in that situation, the acceleration index is roughly constant over time, below unity. then if the growth rate of positives becomes larger than the growth rate of tests, then the acceleration index starts rising and eventually crosses unity, to converge to a new and higher plateau associated with a value larger than one. finally the most recent rise, starting early october, can be understood as a new jump of the growth rate of positives. although the above interpretation stresses changes in the growth rate of positives, what matters more generally is how the ratio of growth rates -that of positives divided by that of tests -varies over time. in addition, the scatter-plot also directly offers a measure of "global acceleration/convexity": in panels (d)-(e) of figure 2 , the solid curve is located entirely below the dashed diagonal, which indicates that the persistent acceleration somehow "dominates" the earlier and short-lived deceleration. 13 based on the analysis in the previous section, we propose to use the acceleration index, calculated from the end point of our scatter-plot, as a parsimonious way to track the pandemic's dynamics. in this section, we show how useful such an index is to decompose the evolution of acceleration over age groups, first, and space, in a second step. 13 we conjecture that such a global measure of convexity relates to δ-convexity as defined by hyers and ulam, 1952. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint in figure 4 , we report the evolution over time of our acceleration index for nine age groups, all aggregated over space. in each panel, the blue line is a replica of panel (b) in figure 3 . for comparison against such an average benchmark, the orange line represents the acceleration index corresponding to the age group of each panel. one striking feature is that acceleration is underway for all age groups except people older than 79 years, as of october 25, 2020. in addition, the acceleration for each age group tracks rather closely that of the average for some groups but much less so or not all for others. to start with the former groups, the middle row in figure 4 shows that people aged 29 to 58 experience acceleration in much the same way as the average over all age groups. this is not the case for younger and older groups. children aged 10 to 18 and, quite spectacularly, young adults aged 19 to 28 seem to face a noticeable but slower acceleration since summer than the general acceleration. the latter group, in particular, is associated with an acceleration index that is maintained at intermediate levels around 2 in the last quarter of the data sample. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint the fact that the share of confirmed cases is the second highest (22.9%) for the [20-29] age group (see table 1 in appendix c for incidence rates among age groups) may contribute to the view that youngsters such as college students have contributed a lot to the resurgence of the virus since last summer. this view does not agree with the data in figure 4 . the age group [0 − 9] has experienced ups and downs since last summer. in particular, the acceleration index has started to rise again quickly in early september across france, after it went down all the way to unity at the end of summer. the question here of course is whether the rapid increase has to do with the beginning of the academic year and the fact that young pupils did not wear face masks at school. however, another reason may be that younger kids have again been in more contact with their grand-parents, who take care of them some of the time while parents are working since the beginning of the school year. yet in figure 4 , it is striking that the acceleration index for the age group [69 − 78] has risen importantly since the beginning of august, to attain the highest level at the end of the data sample, around 4! a similar, though slightly less strong rise, can be seen for people aged [59 − 68]. hence maybe it is not the young kids who infect their grand-parents, but it is the other way around? studies suggest that the "engines of the sars-cov-2 spread" are household transmissions, which may support this hypothesis (see lee et al. [9] )). equally remarkable is the group of people older than 79, which has experienced a reduction in acceleration since the end of the first lock-down, in sharp contrast with the dynamics of all other age groups. this suggests that, in spite of the large number of deaths among the very elderly at the beginning of the pandemic, the continued presence of covid-19 has been handled quite effectively in specialized nursing homes (the "ephad"). our analysis can also be applied to shed light on the local evolution of the pandemic. in figure 5 , we report in a map of france the acceleration index at the level of administrative division which is intermediate between the city and the region, called "département". red entries represent places where the pandemic accelerates (the elasticity is larger than one), while green entries represent decelerating départements. our map does not necessarily coincide with the map very recently introduced by the european centre for disease prevention and control (ecdc) for europe. 14 in that map, green-orange and red lights depend on the incidence rate and the test positivity rate of a country but their combination of factors is not dynamic and does not have any clear conceptual foundation such as our acceleration-based index has. their demarcation criteria between the different colours are set at a certain level, but it is not clear whether that level is sound from the perspective of our acceleration index. the upper left panel in figure 5 depicts the level of the acceleration index for each département on may 15, that is, on the third day after the end of the lock-down period in france. while at that date, the pandemic is decelerating over almost all the country, the situation starts to gradually but dramatically change within the next two months and to reverse course over the summer break. 14 https://www.ecdc.europa.eu/en/covid-19/situation-updates/weekly-maps-coordinated-restriction-freemovement 14 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint between july 15 and august 15 (panels in second row), the country switch to a situation such that acceleration takes over the country. as of october 25, most département face an acceleration index larger than two, with no sign of reversal. at the time when france enters a second lock-down period, lessons should be drawn about what went wrong in the period following the end of the first lock-down and we argue that it is important to analyse closely the reasons for local and group-specific increased accelerations. knowing about local and group specific accelerations can be an important guide in investigating further possible transmission channels and networks, which would then allow to intervene specifically at a network and structural level if needed. to give an example of such a fine-grained analysis, we dig a little deeper into the heterogeneity that is visible in figure 5 , and the accelerations we spotted in the different age groups. we report in in all panels, the blue line stands for the evolution of the acceleration index for all the other départements, while the second colored lines in each row represents the dynamics of acceleration for all age groups in panel (a) and the specific age group in panels (b) − (c). looking first at panel (a), one sees that the dynamics in all four départements, resemble that of the others, with perhaps more volatility in the case of bouches du rhône and rhône. note, however, that the latter have experienced a period when the acceleration index decreased, roughly from mid-august to mid-september. this is not really the case for paris and nord. in panel (b), we see considerable volatility, largely due to the fact that the absolute numbers of tests and positives are rather low for this age group at the level of the département, the acceleration index for the [0 − 9] age group hover around a plateau below one from mid-may to early july. what is striking, however, is that the acceleration index drops a lot before the end of august to levels close to unity, only to starts rising sharply again in the first half of september. as we asked before, why is there this rapid acceleration in this age group? does it have to do with the dynamics of acceleration for the age group of grand-parents, depicted in panel (c). the rapid acceleration in this age-group is especially the case in paris and nord, where the acceleration index for the elderly rises continuously from the beginning of august, and may be less so for bouches du rhône and rhône. in contrast, deceleration was pronounced both for kids and for grand-parents before the beginning of summer. although the above comments in relation to figure 5 are merely descriptive, and deliberately so, they illustrate how our acceleration index can be used at any levels of granularity that is allowed by the available data and how it may also guide further analysis about transmission channels to 16 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint complement contact tracing. an interesting avenue for research would obviously be to perform a panel data analysis, but this is beyond the scope of this paper. test strategies have many dimensions, including where to test and to allocate staff, locating groups at risk. we show in this section how to design an algorithm to allocate covid-19 test resources. we focus, for the sake of illustration, on the spatial distribution of tests based on our acceleration index. although testing is acknowledged as an efficient tool to curb epidemics 15 , little is known on the context of covid-19 about how to geographically allocate a limited quantity of test. imagine now that public health authorities would like to decide how tests should be allocated to region i at date t . according to the decomposition stated in equation (1), a small increment of the per-period fraction of tests ∆d i would be expected to lead to a proportional increase of positives the logic of our criterion to allocate tests across regions has two steps. firstly, we attribute a weight that equals w i = p i t d i t ε t to each region i. note that the weight w i attributed to each region is the product of two terms: (i) the first term is the average rate of cumulated positives as a fraction of cumulated tests, at end date t , that is, p i t /d i t ; (ii) the second term is the acceleration index ε t , which measures whether the pandemic accelerates (i.e. ε t > 1) or decelerates (i.e. ε t < 1). this means that regions that are allocated more tests relative to others are those where either the overall positive rate is larger at end date, the pandemic accelerates more (or decelerates less), or both, relative to others. second, we propose to allocate to region i a share of national test capacity that is given by: where β is a parameter which we assume takes positive real values: therefore, s i goes up with w i so that a region with a larger weight is allocated a larger share. our premise here is that any test strategy combines two aspects: information about the virus circulation and the expected feedback that testing implies, in terms of isolation and contact tracing most importantly. we therefore propose a parsimonious criterion to allocate tests across age groups and space, based on both our acceleration index and the positivity rate, and depending on the extent to which tests reduces the virus propagation. because the latter feature cannot be measured with certainty, we argue that any allocation of tests should reflect the beliefs that public health authorities have about it and we therefore add a parameter to the analysis to take this key dimension explicitly into account. obviously, we are assuming here a benevolent public health authorities, who act on objective grounds and are not trying to manipulate the indicator for political purposes. 15 the importance of testing is most clearly spelled out in an official document that has been produced by the south korean government for the international community. see "flattening the curve on covid-19 -how korea responded to a pandemic using ict", dated april 15, 2020. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint therefore, the allocation scheme that we propose builds upon the following logic. when β = 0, each and every region is allocated an equal fraction 1/n of the total amount of tests available at the national level. the other extreme configuration occurs when β tends to ∞ and it is easy to check that, in that extreme case, the region with the highest w i is allocated all national tests. one way to interpret our allocation rule for intermediate values of β is as follows. parameter β measures the extent to which testing is believed to exert a negative feedback effect on the pandemic, for example because detecting positives entails isolating them and contact tracing, thereby reducing the number of contacts that produce additional contaminations. in the limit case when β = 0, there is no such feedback and a natural benchmark arises, which consists in allocating tests equally across regions. this corresponds to a situation where information about the spatial distribution of the pandemic is maximized, since each and every region has the same input in terms of testing. in this case, the objective is to gather information about where the pathogen is currently circulating. on the other hand, when β tends to ∞, all tests are allocated to the region with the highest w i and this amounts to maximizing the number of positives so as to curb the pandemic, since in this case the negative feedback is believed to be the strongest. 16 middle cases happen when β is between zero and infinity, reflecting the belief held the public authorities deciding over the allocation of tests across regions about the strength of the feedback effect. 17 we use figure 7 to visually compare the allocation of tests, as computed from our algorithm based on the acceleration index, to the observed allocation and to the population distribution, as of october 25, 2020. in panel (a) and (b), we report the observed shares of tests and of total population in each département, respectively. comparing the two upper panels reveals that the observed allocation of tests at that date reflects the geographical distribution of population to a large degree: in short, more tests have been done in more populated areas. figure 7 depict the allocation of tests over french départements, for different weights β put on the acceleration index. comparing the upper and the lower panels makes clear that the spatial distribution of tests observed at october 25, 2020, has little connection to where the pandemic accelerates, which is somewhat paradoxical to the extent that detecting positives should help to curb the virus circulation. more precisely, while in panel (a) one sees that tests have been concentrated, especially in about roughly three sub-regions (mediterranean south east, north, and rhône), panels (c) − (d) shows that the allocation proposed by our algorithm based on the acceleration index is somewhat less concentrated or concentrated in different areas. this 16 in reinforcement learning terminology, that would lead to stop exploring others aera and only to exploit the region with highest w i when the algorithm starts much like a greedy algorithm would do -see sutton and barto [10] . this is however not a desirable property in the case of a pandemic and acceleration weights should always lead to some exploration to gather information about the virus circulation in other areas, such as in -greedy algorithms. 17 note that in theory, β could be negative. this would the case, for instance, if positives would have even more contacts with susceptible persons, generating even more contaminations, compared to a situation with no test. in this case, the share of tests a i would go down when w i goes up. although such a situation is hypothetically possible -think about bio-terrorism -we rule it out for the sake of realism. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint is particularly clear when β = 1. when β is higher, our algorithm allocates more tests where the acceleration index is larger than unity (and less tests where it is smaller than one) so that the allocation of tests becomes more concentrated but panel (d) shows how more tests should go to the south-east quarter of france. in contrast, panel (a) shows that this is not what has actually happened. the difference between the actual distribution of tests over départements, which is very much population-driven, and our proposed allocation, which acceleration-centered can be established a it more formally by looking at the following measures. in appendix d, we report in tables 2-4 the numbers behind figure 7 and we use those to compute two indicators that shed light on the differences between the actual and proposed spatial distributions of covid-19 tests. we first carry out a comparison between the different distributions using jensen-shannon (js) divergence, which is based on kullback-leibler divergence and is normalized between 0 and 1, to measure similarities between the actual distribution of tests, population shares and our accelerationbased allocaton of tests. from tables 2-4 in appendix d, we find that js divergence between the observed distribution of tests and the distribution of population equals 0.016, hence very small. this confirms our suggestion that relative population size has been an important determinant of the distribution of tests across french départements. in contrast, js divergence between the observed distribution of tests and our proposed distribution based on acceleration is higher: it equals 0.115 when β = 1 and 0.194 when β = 3. 18 not surprisingly in view of the above discussion in relation to figure 7 , we conclude that observed and proposed distributions differ in a quantitative sense, but also that putting more weight on acceleration -that is, increasing β -further enlarges thus discrepancy. obviously, the allocation of tests and the pandemic's dynamics are both endogenous and interactions between the two form a complex system. our analysis is so far limited to tracking such dynamics and proposing an algorithm to allocated tests across space, according to the acceleration index. as such, it has little to say about the extent to which testing impacts negatively the virus circulation. for instance, it is reasonable to think the weights our algorithm put on acceleration should necessarily be the same in all départements. perhaps isolation and contact tracing is more efficient in some regions that in others. in addition, additional constraints should arguably be introduced, such as the fact that the number of tests allocated to a département should not be larger than its population. 19 despite all these limitations, we still argue that our algorithm to allocate tests according to where the pandemic accelerates could be a valuable input to public health policies, not only in france of 19 while such constraint is unlikely to bind at the département level, it could at more granular levels at which our algorithm could operate as well. for example, some little populated cities could be allocated too much tests in view of their populations. in that case, our algorithm could easily be amended to account for such constraints, by capping test and reallocating the surplus of test according to the next cities where acceleration is the highest. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint course but also in all countries facing covid-19. of particular importance, at this point, is how to reallocate the national testing resources and effort within département to specific age groups, and of course when the pandemic will start to decelerate, which one hopes is not too far ahead in time. the purpose of this paper is to show that plotting the number of cases, of tests and even the positivity rate against time is far from the best way to measure the acceleration/deceleration of the virus. we propose instead a simple yet novel way to look at the data in real time. our premise is that looking solely at the number of cases over time to measure acceleration of the pandemic is not accurate and hence problematic as a foundation for public health policies. this is because the amount of tests is far from constant per unit of time and this for a number of reasons. we thus argue that a much better understanding is gained by plotting the number of cases against the number of tests, that is, a simple and plain scatter-plot. from such a scatter-plot we derive an acceleration index, which we propose, is a useful indicator to track the dynamics of pandemics like covid-19 in real-time. using french data on confirmed cases and tests for the period following the first lock-down -from may 13, 2020, onwards -our acceleration index shows that the ongoing pandemic resurgence can be dated to begin around july 7. rather surprisingly, it helps to underline the fact that the pandemic acceleration has been stronger than national average for the [59 − 68] and [69 − 78] age groups since early september, the latter being associated with the strongest acceleration index, as of october 25. in contrast, acceleration among the [19 − 28] age group is the lowest and is about half that of the [69 − 78]. we also propose an algorithm to allocate tests among french départements, based on both the acceleration of the pandemic and the feedback effect of testing. our acceleration-based allocation differs significantly from the actual distribution over french territories, which is population-based. we would like to stress that our approach has admittedly limitations, mostly due to the fact that it relies on actual data only and abstracts away from any theoretical model like those that have been stressed and used by some governments from early on. yet, we do so deliberately and this for two reasons. first and foremost, covid-19 has clearly put at the forefront the importance of admitting that the novelty of a new pathogen has implications for modelling. any estimates obtained through, say, sir or any other models, are subject to considerable and often unknown parameter uncertainty. this problem is particularly acute in the face of a novel pathogen, when public health authorities must act swiftly to prevent the virus from spreading in an uncontrolled way. our analysis offers a real-time analysis that builds only on raw data and is, as such, subject to data measurement issues as any other approaches, but not to parameter uncertainty. second, our real-time analysis is scalefree and can therefore be used at very granular levels, say cities or counties or age groups, where any model becomes rapidly either untractable or not transparent enough. in that sense, we think 22 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint that our acceleration index carries important information, for instance compared to the ubiquitous reproduction numbers r 0 and r e . in fact, the results in this paper could be used to go in the direction of advocating more diversity in the tools that should be used in an emergency crisis, that is, a type of "method averaging" (in the same sense that model-averaging is used) that seems to us necessary to take the best policy actions given the available information. the index that we have exposed in this paper could also be used in the following weeks to monitor the pandemic and to detect positive signs of deceleration and to respond rapidly to new accelerations. the point is that our acceleration index might better track any reversal than looking solely at the average or daily positivity rate. when deceleration is under way, our index gives a very clear target for an all-clear: the acceleration rate has to be lower than one independently of where we look through our "acceleration lenses", that is a spatial and group based analysis for example. it is also in this sense that our acceleration index can help assessing and modulating health policies to respond to those areas or groups where an acceleration takes place. arguably, this would help avoiding generalized lock-downs, which are extremely costly, or other policies that affect the general society independently of whether they face an acceleration or not. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint all of the above implies that, given the estimate of the first-order derivative, ε t ≡ (f i ) (1), equation (3) can be rewritten in terms of the numbers of positive and tested persons, that is: in other words, equation (4) can be used to decompose the effect of tests on positives in levels, that is, how many additional positives are detected given additional tests, between t and t + dt: equation 5 is identical to equation 1 in section 2.1, where ε t is estimated as the ratio of variations of cases and tests between t and t − 1. similarly, the above decomposition in levels holds for any date t < t , as follows: from equation (6), the effect of tests on positives in percentage terms from the perspective of date t is therefore written as: the elasticity of the number of positives with respect to the number of tests is now, because it is evaluated at date t as opposed to end date t , the product of the derivative at the relevant point, times the ratio of average positive rates -that of date t over that of date t. this section explores what the decomposition stated in section a reveals when time is assumed to be continuous and when the number of cases grows exponentially over time, as usually assumed in epidemiological models, of sir type and related for example. although typically absent in the latter strand of literature, we have to introduce tests and we assume that they also grow exponentially. more formally, using the notation in the previous section, suppose that the number of cases per unit of time is denoted by p(t) = αe βt while the number of tests per unit of time is d(t) = γe νt , where the growth rates β and ν are assumed to be positive for the sake of illustration. cumulated cases and tests are then noted p (t) = t 0 p(τ ) dτ and d(t) = t 0 d(τ ) dτ , respectively. it is easy to derive, by straight integration, the expressions: it follows that our acceleration index is given, as function of time, by: 25 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint from equation (9) , then, one infers that two cases occur. when β = ν, that is, when both cases and tests grow at the exact same rate, then our acceleration index equals 1 at all dates. when the two growth rates differ, however, ε(t) converges, when t goes to infinity, to the ratio of growth rates β/ν, independently of the scale parameters α and γ. as an illustrative example, suppose that β > ν, so that positives grow faster than tests. then the pattern of our acceleration index ε(t) over time will have two regimes: it first grows almost linearly and eventually reaches the upper bound β/ν > 1. obviously, in that case both the daily positivity rate p(t)/d(t) and the average positivity p (t)/d(t) grow over time, and the latter quantity exceeds the former all the time so that acceleration prevails. this closely resembles the pattern following early august to early october in figure 3 , as underlined in the main text. in table 1 we report a few statistics for all age groups, as of october 25, 2020. in the second and third columns we report the numbers of cumulated cases and tests, respectively. the fourth column depicts that average positivity rate, defined as the ratio of cumulated cases and cumulated tests, while the actual test shares appear in the fifth column. finally, the last column shows the share of cases by age group, which is defined as the ratio of cumulated cases. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted november 7, 2020. ; https://doi.org/10.1101/2020.11.05.20226597 doi: medrxiv preprint capital-labor substitution and economic efficiency towards controlling of a pandemic. the lancet s0140-6736(20)30673-5 a mathematical model reveals the influence of population heterogeneity on herd immunity to sars-cov-2 sensitivity to rare and extreme events in rats: the black-swan-avoidance bias. forthcoming biorî §iv and amse covid-19 herd immunity: where are we? on the fallibility of simulation models in informing pandemic responses data mining: concepts and techniques on information and sufficiency the engines of sars-cov-2 spread reinforcement learning, an introduction antifragile, things that gain from disorder equation (3) key: cord-320970-ru2iw0py authors: peeling, rosanna w; wedderburn, catherine j; garcia, patricia j; boeras, debrah; fongwen, noah; nkengasong, john; sall, amadou; tanuri, amilcar; heymann, david l title: serology testing in the covid-19 pandemic response date: 2020-07-17 journal: lancet infect dis doi: 10.1016/s1473-3099(20)30517-x sha: doc_id: 320970 cord_uid: ru2iw0py the collapse of global cooperation and a failure of international solidarity have led to many low-income and middle-income countries being denied access to molecular diagnostics in the covid-19 pandemic response. yet the scarcity of knowledge on the dynamics of the immune response to infection has led to hesitation on recommending the use of rapid immunodiagnostic tests, even though rapid serology tests are commercially available and scalable. on the basis of our knowledge and understanding of viral infectivity and host response, we urge countries without the capacity to do molecular testing at scale to research the use of serology tests to triage symptomatic patients in community settings, to test contacts of confirmed cases, and in situational analysis and surveillance. the who r&d blue print expert group identified eight priorities for research and development, of which the highest is to mobilise research on rapid point-of-care diagnostics for use at the community level. this research should inform control programmes of the required performance and utility of rapid serology tests, which, when applied specifically for appropriate public health measures to then be put in place, can make a huge difference. the covid-19 pandemic, now only a few months old, 1,2 has brought into sharp focus inequalities within and among countries. john nkengasong, director of the africa centres for disease control and prevention, reported that "the collapse of global cooperation and a failure of international solidarity have shoved africa out of the diagnostics market". 3 sadly, the same is true of many other low-income and middle-income countries (lmics) outside africa. why are diagnostics important? in any epidemic response, diagnostic testing plays a crucial role and this pandemic is no exception. because early clinical presentations of infected patients are non-specific, testing is needed to confirm the diagnosis of covid-19 in symptomatic patients, as soon as possible, so that these patients can be appropriately isolated and clinically managed. 1,4,5 diagnostic testing is also needed for individuals who have come into contact with someone with confirmed covid-19. some testing strategies examine only contacts who have symptoms or develop illness of any kind during the 14-day period after contact. other strategies examine all contacts when identified, regardless of whether they have any symptoms. studies have shown that a large number of infected individuals might have no symptoms at all, and there is concern that these individuals are still able to shed the virus and transmit infection through saliva droplets as they speak. [4] [5] [6] [7] [8] [9] tracking all contacts of confirmed cases and testing them for severe acute respiratory syndrome coronavirus 2 (sars-cov-2) is key to successful pandemic control. diagnostics are also needed to support rapid serosurveys that establish whether and to what extent sars-cov-2 has circulated in a community, and sur veillance systems, such as that for influenza-like illness, that monitor disease trends over time. diagnostics can also be used to identify atrisk populations and assess the effectiveness of control strategies. tedros adhanom ghebreyesus, director-general of who, urged countries to implement a comprehensive package of measures to find, isolate, test, and treat every case, and trace every contact. goodwill between countries has already been shown through the publishing of the sars-cov-2 genetic sequence and shared laboratory protocols to detect the virus. 10 however, as these molecular assays require sophisticated labora tory facilities, countries with insufficient infrastructure quickly accumulate a backlog of testing. the rapid spread of covid-19 around the world has led to a global shortage of reagents and supplies needed for testing. point-of-care molecular assays for sars-cov-2 detection are now available to enable community-based testing for covid-19 in lmics. unfortunately, the production of these test cartridges takes time and, again, global demand has outstripped supply, leaving lmics struggling for access. in march, 2020, who urged member states to "test, test, test". 11 widespread testing can help countries to map the true extent of the outbreak, including identifying hot spots and at-risk populations, and monitor the rate at which the epidemic is spreading. however, most lmics find that molecular testing, including point-of-care testing, is neither scalable nor affordable on a large scale. relying solely on centralised testing puts countries at risk of having nothing to use. what diagnostic alternatives are available to support decentralised testing that would allow countries to mount an adequate response to the pandemic? rapid antigen detection tests that are simple to do at point of care and can give results in less than 30 min would be viable alternatives to molecular testing for confirming covid-19 cases, enabling appropriate case management, and guiding public health measures, such as quarantine or self-isolation. however, although scaling up rapid antigen testing offers an effective means of triaging symptomatic individuals in community settings, early evaluations of rapid antigen detection tests show personal view suboptimal sensitivity for these tests to be recommended for clinical diagnosis or triage. 12 rapid antibody detection lateral flow tests are also simple to use, generally requiring a few drops of whole blood from a finger prick placed onto the test strip with no processing needed. these tests take 15-20 min to do with minimal training and can be done at the point of care as most do not require any equipment. rapid antibody testing is an attractive option for scaling up testing but only if these tests show satisfactory performance for a clearly specified use. the detection of sars-cov-2 infection and immune response has been described in relation to different diagnostic tests. 13 in this section, we summarise the evidence from studies to date. studies have shown that sars-cov-2 rna can be detected 2-3 days before onset of symptoms and can remain detectable up to 25-50 days after the onset of symptoms, particularly in patients who remain symptomatic for an extended period. 7,14,15 sars-cov-2 rna can be detected for longer in respiratory samples from patients with severe disease than in samples from patients with mild illness. 16 viral rna concentrations peak within the first 5 days after onset of symptoms and decrease slowly with rising antibody concentrations. 7, 17, 18 however, rna clearance is not always associated with rising antibody concentrations, particularly in patients who were critically ill. 6,18 an important question for the potential for spread of covid-19 is whether individuals who are rna-positive are shedding infectious virus. a small study in nine patients found that viral replication stopped 5-7 days after onset of symptoms but patients remained rna-positive for 1-2 weeks after this point. 6 hence, there remains some uncertainty as to whether a patient who is rna-positive is shedding live virus or not. maturation of the immune response typically takes 40 days with variations in the dynamics of the antibody response depending on disease severity and other factors still to be discovered. in most studies of laboratoryconfirmed covid-19 cases, igm antibodies start to be detectable around 5-10 days after onset of symptoms and rise rapidly. 14,18-21 igg antibody concentrations follow the igm response closely. seroconversion is typically within the first 3 weeks with the mean time for seroconversion being 9-11 days after onset of symptoms for total antibody, 10-12 days for igm, and 12-14 days for igg. 14, 18, 19, 22 antibodies against the receptor-binding domain of the spike protein and the nucleocapsid protein have been associated with neutralising activity. 14,23,24 neutralising anti bodies to these domains can be detected approximately 7 days after onset of symptoms and rise steeply over the next 2 weeks. 23, 24 several studies showed that patients can remain rna-positive despite high concentrations of igm and igg antibodies against the nucleocapsid protein and the receptor-binding domain of the spike protein. 18 whether the presence of neutralising antibodies translates into protective immunity in patients with covid-19 is unclear. some researchers speculate that antibodies can enhance infectivity as higher antibody concentrations have been observed in patients with severe disease than in those with mild disease. 18, 25 in one study (n=222), a greater proportion of patients with high igg concentrations had severe disease than did those with low igg concentrations (52% vs 32%, p=0·008). 26 the role of antibody response in the pathogenesis of covid-19 remains unclear pending further studies. who and the pan american health organization have stated that they do not currently recommend the use of immunodiagnostic tests except in research settings, 27, 28 because of scarce information on test performance and appropriate use when immunity to covid-19 is not well under stood. however, many countries are struggling to scale up testing to implement the key strategies of diagnosing all symptomatic patients and tracing all contacts. delays in confirming covid-19 cases allow continued transmission within communities and can result in failure to contain the pandemic despite other mea sures such as physical distancing and travel restrictions. countries are assessing all available testing options to address their range of needs. in settings where challenges with molecular testing exist or access to laboratories is scarce, rapid serology tests offer a needed additional option. a rapid serology test with good performance character istics is extremely important to avoid missing true cases of covid-19 and imposing unnecessary quaran tine for people with false-positive results due to cross-reactivity with seasonal coronaviruses. studies have shown more anti body cross-reactivity between the nucleo capsid proteins of sars-cov-2 and common corona viruses than between their spike proteins. 20, 22 tests that use the spike protein or fragments of the spike protein as targets might have the least amount of cross-reactivity with common coronaviruses, on the basis of sequence analysis. 22 clear articulation of the benefits and limitations of serology tests will hopefully incentivise manufacturers to improve performance. 29 when is serology testing recommended? where there is little or no access to molecular testing, rapid serology tests provide a means to quickly triage personal view suspected cases of covid-19, provided the test is highly specific for the disease. a positive result for igm in symptomatic patients fulfilling the covid-19 case definition is strongly suggestive of sars-cov-2 infection. this approach is probably most effective in individuals 5-10 days after symptom onset. in peru, public health facilities for molecular testing are sparse and only 500 beds in intensive care units exist for a population of 32 million. the ministry of health has set up a hotline and website for individuals who have symptoms to be interviewed by a health professional for possible follow-up, prioritising the visits according to age, risk factors, and severity of symptoms. a testing team visits the individual at home to do the rapid anti body test. individuals who are igm and igg positive and have mild symptoms are quarantined, whereas people who need critical care are referred to hospital. all contacts are also tested with the rapid serology test. anyone who tests negative in the antibody test has a swab collected for molecular testing. as of may 2, 2020, 355 604 people had been triaged in peru with 42 534 testing positive, 26 362 of whom were found to be positive by use of a rapid test. 30 this approach has allowed a large number of sympto matic individuals and contacts to be rapidly tested in the community, relieving the backlog, reducing waiting time for molecular testing, and preventing the health-care system from being overwhelmed. experience in china has also shown that, in symptomatic patients, the use of igm tests or total antibody tests can increase the sensitivity of covid-19 case detection. 18 further research should explore the performance and utility of rapid antigen-igm and antigen-igg combo tests and the timing of testing. in individuals who test negative for igg, research should also explore, if resources allow, the value of doing a follow-up antibody test 10-14 days later to document a definitive diagnosis through seroconversion. studies have shown that a large number of infected individuals could have only mild symptoms or no symptoms at all, but they can still transmit infection, with as much as 44% of infections being transmitted by presymptomatic individuals. 6,7,9 experience from singapore shows that tracking down all contacts of people with confirmed covid-19, testing them for evidence of infection, regardless of symptoms, and putting those contacts who test positive into isolation is an urgent priority for interrupting the chain of transmission and containing the epidemic. [31] [32] [33] this approach is particularly important in the early stages when there are only sporadic or clusters of cases, or for countries coming down from the peak to continue to reduce the extent of infection in the community. only individuals who test negative should have a throat swab collected for molecular testing, which will reduce the strain on laboratories doing these tests. in countries that have set up syndromic surveillance, such as surveillance for influenza-like illness or severe acute respiratory infections, and where blood or throat swabs are routinely collected at these sentinel sites, collected samples can be tested for covid-19 with molecular, antigen, or serology tests, either alone or in combination. if any of these samples are positive, it means covid-19 has been circulating in the community. where serial samples are available, it might be possible to date when covid-19 established itself in a community or country. in general, antibody tests can be used to establish the true extent of an outbreak, map its geographical distribution, and identify hotspots and populations that are particularly at risk. this information can in turn be used to inform public health measures and control strategies. in this case, researchers need to stick to the same serology test and test sentinel populations repeatedly, avoiding the variation in sensitivity between different rapid tests. the use of serology tests for population surveys is not recommended in low prevalence settings as this approach will probably result in more false-positive than truepositive results, even if a test with high specificity is used. for example, if the prevalence of infection is 1% in the general population, a test with 98% specificity will identify two false-positive results for every true positive result. these results could lead to a false sense of security regarding the extent of immunity in the population and premature easing of public health measures on the basis of misleading disease estimates. patients at an early stage in the disease course, or asympto matic or paucisymptomatic patients, might have low antibody concentrations that could give falsenegative results. patients' disease stage and severity are important points to consider, along with the population being tested. the estimated level of risk can be considered before using a serology test, because of the changing falsepositive rate or low positive predictive value across different populations. among the groups with the highest risk of the disease are symptomatic patients with clinical presentation of covid-19, patients with other respiratory symptoms, contacts of confirmed cases, and health-care workers in settings with little personal protective equipment. we suggest countries consider risk levels before using serology tests and creating public health guidance. scaling up testing, particularly at the community level, allows for better estimates of risks, which in turn allows more effective public health measures to be put into place than would be otherwise. in a pandemic, countries must strive to maintain a robust health-care workforce. key workers who develop personal view symptoms should be prioritised for molecular testing and receive care if infected. on recovery, should a serology test be used to decide when they can safely return to work? this strategy is based on the assumption that antibodies confer protective immunity. although antibodies against the receptor-binding domain of the spike protein and the nucleocapsid protein have been correlated with neutralising activity, 14, 23, 24 the development and duration of immunity has not yet been established. 34, 35 although it is tempting to speculate that serology tests based on the detection of neutralising antibodies can be used as markers of protective immunity, and people who test positive can get a so-called immunity passport to return to work, studies have shown that a significant proportion of patients remain rna-positive despite high concen trations of antibodies against the receptor-binding domain of the spike protein and the nucleocapsid protein. 6, 15, 18, 20, 21 wang and colleagues 36 found that elevated serum igm concentrations are correlated with poor outcomes in patients with covid-19 pneumonia, and tan and colleagues 21 found that high concentrations of igg antibodies were correlated with severe disease outcomes. hence a substantial igm or igg response is not necessarily a surrogate marker of protective immunity. to date, insufficient evidence exists to recommend the use of serology testing for health-care workers to return to work. a negative molecular test remains the safest option to establish whether health-care workers can work again safely. a policy brief by the world bank suggested that serology testing could potentially have a high net benefit if it can allow dilution of restrictions for essential workers to return to work and revive essential segments of the economy. 37 the type of tests that can be used for immunity passports remains unclear. a better understanding of the interaction between infection and immune response dynamics is needed before these passports can be considered. hospital beds are often in short supply. the recommended criteria for hospital discharge are two negative molecular tests over several days. however, molecular testing is often scarce or unavailable. can serology tests be used for discharging recovered patients when molecular testing is not available? as patients can remain positive for viral rna despite rising concentrations of antibodies against the nucleocapsid protein and receptor-binding domain of the spike protein, which are correlated with neutralising activities, antibody tests cannot be used in the place of molecular tests to confirm that the patient is virus-free or at least no longer shedding live virus. the events over the past few months have taught us that this pandemic is caused by an extraordinary pathogen that requires extraordinary measures to combat its spread and end the pandemic. the latest finding that as much as 44% of covid-19 transmission happens before index cases become symptomatic 7 means that a great deal still needs to be learnt about this novel pathogen and its spread through a population. the paucity of knowledge on the dynamics of the immune response to infection has led to much hesitation on recommending the use of rapid immuno diagnostic tests, particularly serology tests. on the basis of our current knowledge and understanding of viral infectivity and host response, we urge countries with restricted capacity for molecular testing to embark on research into the use of serology tests in triaging symptomatic patients in community settings, testing contacts of confirmed cases, and in situational analysis and surveillance. rapid and scalable tests are needed to deal with this pandemic. rapid serology tests, applied in the right situation for appropriate public health measures to be put into place, can make a huge difference. on feb 10, 2020, leading health experts from around the world identified eight research and development priorities at the who r&d blue print meeting in geneva, switzerland, of which the top priority was to "mobilize research on rapid point of care diagnostics for use at the community level". 38 in line with this decision, research on the use of rapid serology tests to inform control programmes of their required performance and utility is an urgent priority in the covid-19 pandemic response. rwp wrote the first draft of the manuscript. all authors contributed to the manuscript conception and supported manuscript revisions. we declare no competing interests. who. molecular assays to diagnose covid-19: summary table of available protocols director-general's opening remarks at the media briefing on covid-19 find evaluation update: sars-cov-2 immunoassays 2020 interpreting diagnostic tests for sars-cov-2 temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by sars-cov-2: an observational cohort study diagnostic value and dynamic variance of serum antibody in coronavirus disease 2019 viral load dynamics and disease severity in patients infected with sars-cov-2 in zhejiang province, china evidence summary for covid-19 viral load over course of infection antibody responses to sars-cov-2 in patients of novel coronavirus disease 2019 serology characteristics of sars-cov-2 infection since exposure and post symptoms onset profiling early humoral response to diagnose novel coronavirus disease (covid-19) viral kinetics and antibody responses in patients with covid-19 antibody responses to sars-cov-2 in patients with covid-19 neutralizing antibody responses to sars-cov-2 in a covid-19 recovered patient cohort and their implications evaluation of nucleocapsid and spike protein-based enzyme-linked immunosorbent assays for detecting antibodies against sars-cov-2 profile of igg and igm antibodies against severe acute respiratory syndrome coronavirus 2 (sars-cov-2) immune phenotyping based on neutrophil-to-lymphocyte ratio and igg predicts disease severity and outcome for patients with covid-19 advice on the use of point-of-care immunodiagnostic tests for covid-19 advice on the use of point-ofcare immunodiagnostic tests for covid-19 developing antibody tests for sars-cov-2 covid-19 in peru investigation of three clusters of covid-19 in singapore: implications for surveillance and response measures connecting clusters of covid-19: an epidemiological and serological investigation interrupting transmission of covid-19: lessons from containment efforts in singapore covid-19 immunity passports and vaccination certificates: scientific, equitable, and legal challenges what policy makers need to know about covid-19 protective immunity elevated serum igm levels indicate poor outcome in patients with coronavirus disease 2019 pneumonia: a retrospective case-control study how-two-tests-can-help-contain-covid-19-and-revive-the-economy.pdf?sequence=1&isallowed=y world experts and funders set priorities for covid-19 research we thank sergio carmona and jilian sacks of the foundation for innovative new diagnostics for helpful discussions. key: cord-325455-e464idc0 authors: atchison, christina; pristerà, philippa; cooper, emily; papageorgiou, vasiliki; redd, rozlyn; piggin, maria; flower, barnaby; fontana, gianluca; satkunarajah, sutha; ashrafian, hutan; lawrence-jones, anna; naar, lenny; chigwende, jennifer; gibbard, steve; riley, steven; darzi, ara; elliott, paul; ashby, deborah; barclay, wendy; cooke, graham s; ward, helen title: usability and acceptability of home-based self-testing for sars-cov-2 antibodies for population surveillance date: 2020-08-12 journal: clin infect dis doi: 10.1093/cid/ciaa1178 sha: doc_id: 325455 cord_uid: e464idc0 background: this study assesses acceptability and usability of home-based self-testing for sars-cov-2 antibodies using lateral flow immunoassays (lfia). methods: we carried out public involvement and pilot testing in 315 volunteers to improve usability. feedback was obtained through online discussions, questionnaires, observations and interviews of people who tried the test at home. this informed the design of a nationally representative survey of adults in england using two lfias (lfia1 and lfia2) which were sent to 10,600 and 3,800 participants, respectively, who provided further feedback. results: public involvement and pilot testing showed high levels of acceptability, but limitations with the usability of kits. most people reported completing the test; however, they identified difficulties with practical aspects of the kit, particularly the lancet and pipette, a need for clearer instructions and more guidance on interpretation of results. in the national study, 99.3% (8,693/8,754) of lfia1 and 98.4% (2,911/2,957) of lfia2 respondents attempted the test and 97.5% and 97.8% of respondents completed it, respectively. most found the instructions easy to understand, but some reported difficulties using the pipette (lfia1: 17.7%) and applying the blood drop to the cassette (lfia2: 31.3%). most respondents obtained a valid result (lfia1: 91.5%; lfia2: 94.4%). overall there was substantial concordance between participant and clinician interpreted results (kappa: lfia1 0.72; lfia2 0.89). conclusion: impactful public involvement is feasible in a rapid response setting. home self-testing with lfias can be used with a high degree of acceptability and usability by adults, making them a good option for use in seroprevalence surveys. lateral flow immunoassays (lfia) offer a rapid point-of-care (poc) approach to novel coronavirus (sars-cov-2) antibody testing. while lfias may not currently be accurate enough for individual-level clinical decisions (1, 2) , they are valuable as a public health tool. on a population level, by conducting seroprevalence surveys through widespread random sampling of the general public, and by adjusting for the sensitivity and specificity characteristics of the lfia used, it is possible to estimate the levels of past infection with sars-cov-2 in the community (3) . however, testing hundreds of thousands of people would be impractical if it required a blood sample to be drawn followed by processing in a laboratory. one solution is to use self-sampling and selftesting in the home with participants reporting results to the researchers. however, there is limited understanding of public acceptability and usability of these lfias in the home setting, as most are currently designed as poc tests performed by healthcare professionals. self-sampling and self-testing are widely used in healthcare for monitoring, for example in diabetes management (4) , and for diagnostics, for example for hiv (5, 6) . there are many advantages in terms of uptake, cost, patient activation and scale (4, 6) , but also potential disadvantages in relation to validity, usability and practicality which should be explored (6, 7) . usability research on hiv selftesting has generally found good acceptability, the devices easy to use and high validity in interpretation of self-reported test results (7) (8) (9) . however, these hiv test kits were designed for selfsampling and self-testing and went through several iterations before designs were appropriate for home use and therefore the same levels of acceptability and usability for home-based self-testing for sars-cov-2 antibody using lfias cannot be assumed. as part of the real-time assessment of community transmission (react) programme (10), we evaluated the acceptability and usability of lfias for use in large seroprevalence surveys of sars-cov-2 antibody in the community. a c c e p t e d m a n u s c r i p t 5 we evaluated two lfias with different usability characteristics from five lfias being validated in parallel in our laboratory-based study (11). both lfias required a blood sample from a finger-prick and produced a self-read test result after 10 or 15 minutes. lfia1 (guangzhou wondfo biotech co ltd) was a cassette-based system containing a "control" indicator line and a "test" indicator line (for detection of combined igm and igg antibodies). lfia2 (fortress orient gene biotech co ltd) was a cassette-based system containing a "control" indicator line and separate indicator lines for igm and igg ( figure 1 ). in early may 2020 we carried out rapid, iterative public involvement and a pilot usability study including an online forum with four discussion groups (n=37), a study of lfia1 test use with volunteers (n=44) and a broader public sample (n=234), and a nested observation and interview study (n=25). further details on the methods, including how we recruited participants from our existing involvement networks, are available online (supplement, s1). the test kits dispatched in the pilot study included one test cassette, one button-activated 28g lancet and a 2ml plastic pipette, alongside an instruction booklet also containing a weblink to an instructional video. based on findings from the pilot study, for the larger population-based usability study, the lancet and pipette were replaced with two pressure-activated 23g (larger) lancets and a smaller 1ml plastic pipette, respectively. the design and language in the instructional booklet and video were changed and an alcohol wipe was also included in the kit. a c c e p t e d m a n u s c r i p t 6 in late may 2020 we carried out a larger population-based usability study of a representative sample of the adult population (aged 18 years and over) in england. we used addresses from the postal address file to draw a random sample of 30,000 households in england to which study invitation letters were sent. we allowed up to four adults aged 18 and over in the household to register for the study. self-testing lfia kits were then posted to each registered individual. on completion of the test, participants recorded their interpretation of the result as part of an online survey, with the option of uploading a photograph of the test result. reminder letters were sent to participants who had not completed the online survey or uploaded a photograph within 10 days of test kits being dispatched. metrics to evaluate usability and acceptability were based on the hiv self-testing literature (5, 6, 8) and were measured as the percentage of participants responding to specific closed questions in the online survey. the questionnaire used is available as an online supplement (supplement, s2). the main outcome was usability of the lfia kits. this was defined as a participant's ability to complete the antibody test, and how easy or difficult it was to understand the instructions and complete each step in the process. acceptability was measured in terms of people consenting to and using the provided self-test, and the proportion who reported they would be willing to repeat a self-administered finger-prick antibody test in the future. analyses were conducted in stata (version 15.0, statacorp, texas, usa). data obtained from the questionnaires on acceptability and usability were summarised by counts and descriptive statistics, and comparisons were made between lfia1 and lfia2 using pearson's chisquared test. multivariate regression was used to identify sociodemographic factors independently associated with the proportion of participants conducting the test that achieved a valid result. variables that appeared to be associated (p<0.05) in the unadjusted analyses were considered in the adjusted analyses. adjusted odds ratios (aor) and 95% confidence intervals (ci) were estimated. associations with a p-value <0.05 in the adjusted analyses were considered statistically significant. agreement between participant-interpreted and clinician-interpreted result for test outcomes (negative, igg positive, invalid, unable to read) was assessed using the fleiss kappa statistic. a c c e p t e d m a n u s c r i p t 8 (11). therefore, we used the same operational definition of positivity for lfia2. participants were informed in the instructions to consider igm results as negative. the study obtained research ethics approval from the south central-berkshire b research ethics committee (iras id: 283787). overall, 315 members of the public contributed feedback during the involvement and pilot. this led to changes in the design and language used in the instructional video and booklet, the type and number of lancets and the size of pipette in the lfia kits (further details available online, supplement s1). for the national study, 25,000 household invitation letters were sent, and 17,411 individuals registered for the study from 8,508 households. due to the maximum number of kits we had available for the study, 14,400 participants were selected at random from those who registered. thus, 10,600 lfia1 kits were distributed, with 8,754 individual user surveys completed (82.6% response rate), and 3,800 lfia2 kits were distributed, with 2,957 individual user surveys completed (77.8% response rate). most commonly, two adults participated per household (table 1 ). baseline characteristics of study participants are shown in table 1 . the median age of participants across lfia1 and lfia2 was 51.0 years (range 18 to 95). there were some differences between lfia1 and lfia2 participants by ethnicity, region and household size. acceptability of self-testing was high ( a c c e p t e d m a n u s c r i p t 9 as in the pilot, most respondents were willing to perform another finger-prick antibody test in the future (lfia1: 98.3%; lfia2: 97.0%). only a minority preferred to do the antibody test in a clinical care or community setting than at home. as with the pilot study, respondents with children showed a high willingness to perform the antibody test on them, and this proportion increased with the age of the children (table 2) . in the pilot study most people (86.5%, 225/260) who attempted the test managed to complete it. however, significant usability issues were identified, including challenges with the lancet to obtain a blood drop and the pipette to transfer the blood to the sample well. the problems with the lancet led to some participants using alternative objects to draw blood, including pins and sewing needles, while others opened the lancet casing to access the blade. some people reported minor problems putting buffer into the buffer well. this led to the inclusion of two lancets and changes to the instructions for the national study. in the national study, almost all participants who attempted the antibody test reported completing it (97.5% for lfia1 and 97.8% for lfia2) ( table 2) . reasons for not completing the test are shown in the table. of lfia1 participants who reported damaging the test, the majority reported either accidentally removing the entire lid off the buffer bottle and spilling the solution all over the test cassette or putting the blood and buffer in the wrong well. for lfia2, few participants damaged the test and they all reported putting the blood and buffer in the wrong well. about one in four participants asked someone to help them to administer the test. most found the instructions easy to understand (figures 3), but as in the pilot, participants reported some difficulties in performing the test. for lfia1, difficulties with using the pipette were reported by 17.7% (1,512/8,521) of participants. in addition, 10.6% (908/8,556) had difficulties applying the blood to a c c e p t e d m a n u s c r i p t 10 the sample well ( figure 3 ). therefore, for lfia2 the instructions were changed to omit the use of the pipette and instead directly transfer blood from the finger-prick site to the sample well. however, participants still found creating a blood drop from the finger-prick site (23.2%; 664/2,862) and then applying the blood to the well (31.3%; 894/2,858) difficult. lfia2 was deployed after lfia1 because there was a delay in arrival of lfia2 from the supplier. this differential timing in dispatch of the kits had the unexpected benefit of allowing us to make iterative changes to the instructions. overall, 7.4% of lfia1 and 4.8% of lfia2 participants reported an invalid result (table 3 ). there was some variation in the proportion of participant-reported invalid results using lfia1 by age and gender. the higher the number of participants registered for the study in the same household, the lower the odds of the participant reporting an invalid result. no sociodemographic factors were associated with a participant-reported invalid result using lfia2 (table 4 ). after adjusting for sociodemographic differences between lfia1 and lfia2 participants, there was no difference between lfia1 and lfia2 in terms of being able to read the result (1.1% vs. 0.81%; aor 0.76 (95% ci: 0.48-1.2); p 0.24). but a lower percentage of invalid test results were reported by lfia2 participants (7.9% vs. 4.8%; aor 0.64 (95% ci: 0.53-0.77); p <0.001). table 5 shows concordance between participant and clinician interpreted results in the national study. for lfia1, there was substantial agreement overall (kappa 0.72 (95% ci: 0.71-0.73); p<0.001), however there were important differences: while there was 100.0% agreement for results reported as negative, and 98.5% agreement for invalid results, a clinician confirmed only 62.8% of participantreported positives. visible reasons (from the photograph) were insufficient blood volume to cover the bottom of the sample well, or insufficient movement of the blood and buffer solution across the result window. in addition, the clinician was able to interpret the results of all but four out of 66 a c c e p t e d m a n u s c r i p t 11 (6.1%) tests from the photographs of results participants reported as "unable to read". the four results unreadable by the clinician were because blood had leaked out of the sample well and obscured the result window. of participant-reported unable to read results, 78.8% were clinicianinterpreted as invalid (for all these tests, the control and "test" lines were both absent). table 5 ). the clinician could not interpret the results from 10 photographs reported as readable results by participants, reasons included blurred photographs and shadowing obscuring the indicator lines. overall, we found that self-testing with the two lfia kit designs used in this study was highly acceptable among adults living in england. high acceptability of in home self-testing is in keeping with self-sampling and self-testing studies in diabetes management (4), and hiv diagnostics (5, 6) . the majority of participants who attempted the test successfully completed it despite some continued difficulties with using the pipette (lfia1), creating a drop of blood from the finger-prick site (lfia2) and applying the blood to the sample well (lfia1 and lfia2). based on these findings, we proportion of invalid tests for lfia2 could reflect the better designed buffer bottle (which was a significant issue for lfia1), better performance characteristics of lfia2 over lfia1 (e.g. easier movement of the blood and buffer solution across the result window), or improved ability to get sufficient blood in the sample well using the direct blood transfer technique over using a pipette. participants' ability to obtain a valid test result using lfia1 varied marginally by age and gender and increased with the number of participants registered for the study in the same household. the latter observation is not surprising as observing another household member performing the test is likely to improve the performance of others in the same household. no sociodemographic differences in obtaining a valid result were found for lfia2. of note, about one in four participants reported that they had help administering the test, irrespective of lfia used. this could put individuals living alone at a disadvantage in terms of usability. however, comparing those that had help to those that did not, we found no difference in ability to complete the test (97.7% vs. 98.2%; p 0.08) or reported invalid results (6.1% vs. 7.0%; p 0.06). overall, there was good agreement between self-reported results and those reported by a clinician. therefore, our findings broadly support self-reporting of home-based test results using lfias. but the public and individual health impact of misinterpreting a test result that is negative but read as positive is a concern as an individual could falsely conclude that they have antibodies for sars-cov-2 and may change their behaviour as a result. to mitigate against this, and given the scientific our study is original because focusing on the acceptability and usability of lfias for self-testing for sars-cov-2 antibody in a home-based setting has not been done at such scale in the general population. it provides an attractive solution for conducting large seroprevalence surveys. the study has, however, some limitations. study participants may not be representative of the general adult population of england. however, we had data on the england population profile (2011 census data (13)), as well as the study registration profile and survey completion profile of the study participants which gave us an indication of response bias. our sample was broadly similar to the england population profile. in addition, the usability study was conducted in parallel with our laboratorybased study of performance characteristics of lfias. as such, we did not know the accuracy of the lfias chosen for the usability study at the time, or whether either would perform well enough in the laboratory to be considered for the large national seroprevalence study planned as part of the react programme (10). however, given that the majority of commercially available lfias have a similar cassette-based design and test result read out to lfia1 or lfia2 we were confident that our results would be generalisable and applicable to whichever lfia was finally selected. findings from our laboratory-based study, including the performance characteristics of lfia1 and lfia2 are forthcoming (11). overall, our study has demonstrated that home-based self-testing lfias for use in large communitybased seroprevalence surveys of sars-cov-2 antibody are both acceptable and feasible. although this study identified a few usability issues, these have now been addressed. lfia2, fortress orient gene, has been selected for a large national seroprevalence study as part of the react programme. this decision was based on criteria including the usability and acceptability determined in this study, a c c e p t e d m a n u s c r i p t 21 years where asked whether they would carry out the test on children of that age living in their households. denominator is th e number of participants who reported having children of that age living in their household. m a n u s c r i p t 23 antibody testing for covid-19: a report from the national covid scientific advisory panel evaluation of nine commercial sars-cov-2 immunoassays prevalence of sars-cov-2 in spain (ene-covid): a nationwide, population-based seroepidemiological study self-monitoring of blood glucose in type 2 diabetes: recent studies acceptability, feasibility, and individual preferences of blood-based hiv self-testing in a population-based sample of adolescents in kisangani, democratic republic of the congo reliability of hiv rapid diagnostic tests for self-testing compared with testing by health-care workers: a systematic review and meta-analysis interferences and limitations in blood glucose self-testing: an overview of the current knowledge usability assessment of seven hiv self-test devices conducted with lay-users in we thank key collaborators on this work --ipsos mori: stephen finlay, john kennedy, duncan key: cord-324373-mgdtb98z authors: antonelli, andrea; guilizzoni, dario; angelucci, alessandra; melloni, giulio; mazza, federico; stanzi, alessia; venturino, massimiliano; kuller, david; aliverti, andrea title: comparison between the airgo™ device and a metabolic cart during rest and exercise † date: 2020-07-15 journal: sensors (basel) doi: 10.3390/s20143943 sha: doc_id: 324373 cord_uid: mgdtb98z the aim of this study is to compare the accuracy of airgo™, a non-invasive wearable device that records breath, with respect to a gold standard. in 21 healthy subjects (10 males, 11 females), four parameters were recorded for four min at rest and in different positions simultaneously by airgo™ and sensormedics 2900 metabolic cart. then, a cardio-pulmonary exercise test was performed using the erg 800s cycle ergometer in order to test airgo™’s accuracy during physical effort. the results reveal that the relative error median percentage of respiratory rate was of 0% for all positions at rest and for different exercise intensities, with interquartile ranges between 3.5 (standing position) and 22.4 (low-intensity exercise) breaths per minute. during exercise, normalized amplitude and ventilation relative error medians highlighted the presence of an error proportional to the volume to be estimated. for increasing intensity levels of exercise, airgo™’s estimate tended to underestimate the values of the gold standard instrument. in conclusion, the airgo™ device provides good accuracy and precision in the estimate of respiratory rate (especially at rest), an acceptable estimate of tidal volume and minute ventilation at rest and an underestimation for increasing volumes. respiratory rate measurement has been shown to be able to predict adverse clinical events, such as admission to the intensive care unit (icu). under specific circumstances, it is more effective than pulse or blood measurements at discriminating between stable patients and patients at risk [1, 2] . however, the number of trials in which the respiratory rate has been studied remains limited and mostly confined to the use of spot measurements. the work by yañez et al. [3] is based on a direct measurement of the flow to assess the respiratory frequency in copd patients. in this study, it was observed than the mean respiratory rate was raised 15 days prior to hospitalization (two days before, 15% more with respect to the baseline; the day before, the goal of the presented work is to compare a metabolic cart, considered a gold standard, with airgo™ (myair inc, boston, ma, usa; myairgo italy srl, milan, italy), a resistance-based wearable device able to derive breathing parameters from body surface motion detection acquired at the level of the lower ribcage. a study protocol has been conducted on a population of twenty-one healthy subjects comparing the output of the airgo™ system with the sensormedics 2900 metabolic cart (sensormedics inc., yorba linda, ca, usa) [22] . the research has been approved by the ethical committee of the azienda ospedaliera s. croce e carle in cuneo (opinion number 3-17, 7 june 2017) and was part of the clinical trial nct03368612 ("comparative study between the airgo™ system and standard tests in the assessment of the respiratory function"). in the first part of the protocol, the healthy subjects were asked to perform quiet breath maneuvers in five different standardized postures while wearing the airgo™ device: standing position, sitting position with back against chair, supine position, right lateral decubitus and left lateral decubitus. in the second part of the test, after the completion of the test at rest, a symptom-limited incremental exercise was performed on the electronically braked cycle ergometer ergoline 800s (ergoline gmbh, bitz, germany). the obtained data have been processed to allow the extraction of relevant respiratory parameters. the comparison between the two systems was done considering the following four parameters: normalized tidal volume, respiratory rate, normalized minute ventilation and duty cycle. the airgo™ band measures the thoracic circumference changes with a stretchable knitted matrix of nylon and spandex with a knitted-in silver coated yarn. the system employs an electrically conductive, the system calculates resistance changes continuously. each variation in resistance values, caused by each expansion and volume reduction of the chest wall during breathing activity, reflects a measurable variation in current in the silver-coated wire. the stretchable band is coupled to a microprocessor, embedded in a sintered nylon shell. the microprocessor includes an sd memory card and monitors the respiratory activity by collecting raw data at a frequency of 10 hz. the microprocessor includes: an analog-to-digital converter (adc) that converts the analog resistance level of the girth band into a 10-bit number ranging from 0 to 1023 and corresponding to the amplitude of the torso expressed in arbitrary units; a battery for operating the microprocessor and for charging the band; paths to transfer data to the sd card for on-board data storage; a bluetooth module to wirelessly communicate with a computational device (laptop computer or smartphone). the device's on-board microprocessor is connected to a nine-axis inertial measurement unit (imu). the motion detection circuit provides movement and postural orientation datapoints with the final aim to associate breathing information to the patient's posture or activity. an activity recognition algorithm based on raw accelerometer data has been developed and is described by qi and aliverti [23] . the girth band is operably coupled at the first end to the microprocessor and then encircles the torso at the level of the lower rib cage. it is then coupled at the second end to the microprocessor from the other side. when in use, the band is pre-tensioned to ensure a fit around the torso that does not dislocate when the band is worn and can record data without interruption. the part in direct contact with the skin is made of a silicon rubber with a thickness of about 0.35 mm called gecko ® (gottlieb binder gmbh & co. kg, holzgerlingen, germany), or gecko tape, because it emulates the adhesion of gecko feet and can be attached to both wet and dry vertical surfaces. this material can be detached easily and quickly from a surface, while it shows exceptional adhesion properties when attached to it. figure 2 shows a typical stretch metallic knit static resistance qualitative curve. initially, the resistance rapidly changes with elongation (l0 to l1). following this initial phase, there is an approximately linear increase in resistance with respect to length (l1 to l2). resistance continues to increase with length but with less marginal increment, as shown from l2 to lt, which is the threshold above which resistance decreases with increasing length until the elastic band reaches its maximum length (lm). the operative range of the band is therefore within the range l1-l2 in order to obtain accurate measurements without significant calibration difficulties. the system calculates resistance changes continuously. each variation in resistance values, caused by each expansion and volume reduction of the chest wall during breathing activity, reflects a measurable variation in current in the silver-coated wire. the stretchable band is coupled to a microprocessor, embedded in a sintered nylon shell. the microprocessor includes an sd memory card and monitors the respiratory activity by collecting raw data at a frequency of 10 hz. the microprocessor includes: an analog-to-digital converter (adc) that converts the analog resistance level of the girth band into a 10-bit number ranging from 0 to 1023 and corresponding to the amplitude of the torso expressed in arbitrary units; a battery for operating the microprocessor and for charging the band; paths to transfer data to the sd card for on-board data storage; a bluetooth module to wirelessly communicate with a computational device (laptop computer or smartphone). the device's on-board microprocessor is connected to a nine-axis inertial measurement unit (imu). the motion detection circuit provides movement and postural orientation datapoints with the final aim to associate breathing information to the patient's posture or activity. an activity recognition algorithm based on raw accelerometer data has been developed and is described by qi and aliverti [23] . the girth band is operably coupled at the first end to the microprocessor and then encircles the torso at the level of the lower rib cage. it is then coupled at the second end to the microprocessor from the other side. when in use, the band is pre-tensioned to ensure a fit around the torso that does not dislocate when the band is worn and can record data without interruption. the part in direct contact with the skin is made of a silicon rubber with a thickness of about 0.35 mm called gecko ® (gottlieb binder gmbh & co. kg, holzgerlingen, germany), or gecko tape, because it emulates the adhesion of gecko feet and can be attached to both wet and dry vertical surfaces. this material can be detached easily and quickly from a surface, while it shows exceptional adhesion properties when attached to it. figure 2 shows a typical stretch metallic knit static resistance qualitative curve. initially, the resistance rapidly changes with elongation (l0 to l1). following this initial phase, there is an approximately linear increase in resistance with respect to length (l1 to l2). resistance continues to increase with length but with less marginal increment, as shown from l2 to lt, which is the threshold above which resistance decreases with increasing length until the elastic band reaches its maximum length (lm). the operative range of the band is therefore within the range l1-l2 in order to obtain accurate measurements without significant calibration difficulties. figure 2 . stretch metallic knit "static" resistance qualitative curve. the graph shows the non-linear resistance/stretch curve of the resistive fabric and the different ranges of resistance (non-linear, approximately linear, flat and reverse, with lt the length threshold above which resistance decreases and lm indicating the maximum length). the section in evidence indicates the operative range. this study is a monocentric, prospective, observational, single arm clinical trial involving 21 healthy volunteers. it was conducted at the azienda sanitaria ospedaliera (aso) s. croce e carle, allergologia e fisiopatologia respiratoria, cuneo (cn), piemonte. the 21 healthy subjects comprised 11 females and 10 males with a mean age of 36 (age range from 24 to 51), mean weight of 67 kg (standard deviation ±16 kg) and mean height of 170 cm (standard deviation ± 8 cm). in order to be enrolled in the study, the following inclusion criteria have been used: this study is a monocentric, prospective, observational, single arm clinical trial involving 21 healthy volunteers. it was conducted at the azienda sanitaria ospedaliera (aso) s. croce e carle, allergologia e fisiopatologia respiratoria, cuneo (cn), piemonte. the 21 healthy subjects comprised 11 females and 10 males with a mean age of 36 (age range from 24 to 51), mean weight of 67 kg (standard deviation ±16 kg) and mean height of 170 cm (standard deviation ± 8 cm). in order to be enrolled in the study, the following inclusion criteria have been used: • the acquisition protocol was divided into three main phases: (1) preparation: measurement of the subject's thoracic oblique circumference length (c) at the end of a forced expiration maneuver. knowing this measurement, the initial length of the elastic band (l0) was decided to be 7% shorter than c in order to obtain a pre-tensioned girth band to ensure an effective fit around the torso. after being cut, the band was coupled with the airgo™ electronic device and positioned against the subject's skin; (2) test at rest: recording of respiratory parameters while letting the subject breathe quietly in five different standardized positions (standing, seated, supine, right lateral decubitus, left lateral decubitus) for 4 min. the supine position required an elevation of the subject's head not greater than 10 • with a pillow under the subject's head. tidal volume, respiratory rate, minute ventilation, inspiratory time, expiratory time and duty cycle (explained in table 1 ) were simultaneously recorded by the airgo™ system and the sensormedics 2900 metabolic cart. in order to facilitate the off-line synchronization between the two systems, subjects were asked to perform a big, deep breath at the beginning of the test for each position; (3) test under physical exercise: execution of a cardiopulmonary exercise test on the ergoline cycle ergometer 800s, followed by a recovery period. a symptom-limited incremental exercise test was performed and designed to achieve a maximum load in 10 ± 2 min in each subject wearing the airgo™ device. the physical exercise test followed a linear incremental protocol (with a slope between 15 and 20 watts per minute) was identical for men and women and chosen based on the level of training of each subject so that each test lasted approximately between 6 and 12 min, as recommended in the guidelines. subjects were asked to cycle at a velocity of 60 revolutions/minute for 2 min with no load, then to pedal at incremental workload maintaining the same speed as before until the maximal effort was reached. once the subject reached his/her maximal effort and he/she was not able to ride anymore, the exercise test was interrupted, and the recovery phase started. this phase consisted of cycling for 2 min without any resistance followed by 2 min at rest. the same respiratory parameters as test at rest have been acquired with the big-breath synchronization maneuver. airgo™'s processing unit acquires a respiratory signal based on the change in girth band resistance over time. many other factors not related to the breath cycle may influence the stretched length of the band, causing girth measurement inflections at a much higher frequency than that of the breath. therefore, raw data have been processed to filter out motion and heartbeat artefacts. the latter are characterized by a smaller amplitude and a higher frequency with respect to the respiratory signal. the system samples data at a frequency of 10 hz, averages the acquired data over 9-34 reading, blurs the averaged data from 0.3 to 1 s to filter out artefacts, determines the beginning and the end of a breath and records an adverse event if a predetermined period of time has elapsed without detecting a new breath, as is described in the related patent (number us201462007142p). filtering has an effect on knit motion artefacts, too. the total measured breath cycle resembled an upside-down "w" (figure 3 ): this non-correspondence is not only due to the heartbeat but also to the motion and acceleration artefacts of the signal coming from the knitted metallic band. a portion of the resistance is static and related to the static length of the stretched knit (the overall bell shape), but part of the resistance is also due to the spontaneous motion of the fabric. this accounts for the slight hump at the beginning of the total measured breath cycle and the double hump in the middle. sensors 2020, 20, x for peer review 6 of 19 latter are characterized by a smaller amplitude and a higher frequency with respect to the respiratory signal. the system samples data at a frequency of 10 hz, averages the acquired data over 9-34 reading, blurs the averaged data from 0.3 to 1 s to filter out artefacts, determines the beginning and the end of a breath and records an adverse event if a predetermined period of time has elapsed without detecting a new breath, as is described in the related patent (number us201462007142p). filtering has an effect on knit motion artefacts, too. the total measured breath cycle resembled an upside-down "w" (figure 3 ): this non-correspondence is not only due to the heartbeat but also to the motion and acceleration artefacts of the signal coming from the knitted metallic band. a portion of the resistance is static and related to the static length of the stretched knit (the overall bell shape), but part of the resistance is also due to the spontaneous motion of the fabric. this accounts for the slight hump at the beginning of the total measured breath cycle and the double hump in the middle. once raw data are filtered, they are processed in order to obtain the proper breath signal. the main steps of this process, shown in figure 4 , can be summarized as follows: identification of maximum and minimum of each breathing cycle;  representation of each breath by means of a vector that connects the maximum and the minimum;  automatic reconstruction of segmented breath, e.g., due to obstructions or movement;  automatic removal of fake breaths;  computation of tidal volume, respiratory rate and minute ventilation. breath data can be then transmitted to a computer via bluetooth low energy, stored, visualized and further analyzed using airgo™'s dedicated software. once raw data are filtered, they are processed in order to obtain the proper breath signal. the main steps of this process, shown in figure 4 , can be summarized as follows: • identification of maximum and minimum of each breathing cycle; • representation of each breath by means of a vector that connects the maximum and the minimum; • automatic reconstruction of segmented breath, e.g., due to obstructions or movement; • automatic removal of fake breaths; • computation of tidal volume, respiratory rate and minute ventilation. the signal acquired by the girth band is expressed in arbitrary units ranging from 0 to 1023 (10bit analog to digital converter) and expresses the amplitude of the signal recorded in the range of differential potential levels between 0.5 and 3.6 v, which is the microprocessor supply voltage. this signal represents the signal amplitude variation due to the trunk expansion and reduction during breathing. in figure 5 , an explicative diagram of the feature extraction process is reported. breath data can be then transmitted to a computer via bluetooth low energy, stored, visualized and further analyzed using airgo™'s dedicated software. the signal acquired by the girth band is expressed in arbitrary units ranging from 0 to 1023 (10-bit analog to digital converter) and expresses the amplitude of the signal recorded in the range of differential potential levels between 0.5 and 3.6 v, which is the microprocessor supply voltage. this signal represents the signal amplitude variation due to the trunk expansion and reduction during breathing. in figure 5 , an explicative diagram of the feature extraction process is reported. the signal acquired by the girth band is expressed in arbitrary units ranging from 0 to 1023 (10bit analog to digital converter) and expresses the amplitude of the signal recorded in the range of differential potential levels between 0.5 and 3.6 v, which is the microprocessor supply voltage. this signal represents the signal amplitude variation due to the trunk expansion and reduction during breathing. in figure 5 , an explicative diagram of the feature extraction process is reported. figure 5 . box scheme of the raw signal processing and extraction process of the parameters of interest: amplitude (amp), which represents airgo™'s signal variation and models sensormedics' tidal volume; respiratory rate (rr); airgo™'s minute ventilation (amp×rr), which models sensormedics' minute ventilation; duty cycle, which is the ratio between inspiratory time and total breath period. figure 5 . box scheme of the raw signal processing and extraction process of the parameters of interest: amplitude (amp), which represents airgo™'s signal variation and models sensormedics' tidal volume; respiratory rate (rr); airgo™'s minute ventilation (amp×rr), which models sensormedics' minute ventilation; duty cycle, which is the ratio between inspiratory time and total breath period. as the breath period (t) is computed as the temporal distance between two consecutive minima, the respiratory rate is the inverse of the breath period expressed as breaths per minute and the expiratory time is computed as the difference between breath period and the inspiratory time. among sensormedics' data, outliers, fake breaths and missing data were detected and deleted. these errors may be due to the incorrect positioning of the mouthpiece and the consequent air leakage. in these cases, it is necessary not to consider the entire breath. in order to validate the airgo™ system, a breath-by-breath comparison with the sensormedics outputs was performed. since the amplitude (amp) represents the same features as the tidal volume read by the metabolic cart and expressed in liters, the values obtained with the two devices had to be normalized in order to be compared. for each position of each subject, the mean of the first twenty values of quiet breathing after the initial big breath was computed and used as a reference. this leads to normalized values for airgo™'s amplitude and minute ventilation and the corresponding sensormedics' volume and minute ventilation in all conditions, allowing to derive four relative parameters: the alignment of the two signals covered a fundamental role. in order to facilitate the process of identification and extraction of the correct signal window referring to a given position to be compared in the two instruments, subjects have been asked to perform a deep, big breath at the beginning of each test in each position. the comparison between the breath signals recorded by the airgo™ device and the sensormedics metabolic cart is based on the number of breaths in the same period and on the big-breath maneuver correspondence. after this phase, in order to obtain the correct number of breaths in the same time span, all airgo™ values representing heartbeat artefacts or fake breaths have been filtered out according to the airgo™ signal processing algorithm explained in section 2.4. with the airgo™ software it was possible to reconstruct two segmented contiguous breaths that are recognized activities by linking the minimum of the first breath to the maximum of the second. thanks to the synchronization with the big breaths, the same number of breaths as in sensormedics file in the same period was determined. however, when performing the alignment, a progressive alignment shift between the two measurements after the big breath was noticed. this was due to a temporal delay in the airgo™ system's sampling process that led to a delay of about 3-4 s at the end of each position test. to overcome this problem and visualized the two sequences of values correctly, it was necessary to represent the outputs of airgo™ and sensormedics with respect to the number of breaths of each position test instead of the breath period. an example of synchronization of the amplitude outputs in a static position is shown in figure 6 , while an example of synchronization of the respiratory rate outputs is shown in figure 7 . in exercise test recordings, because of movement artefacts, noises coming from changes in posture, air losses through the mouthpiece caused by the increasing level of physical effort and the greater amount of breaths than the test at rest, the alignment process was based only on the initial big breath synchronization, so the final shift in alignment was not eliminated. an example is reported in figure 8 , where another common problem is shown: for volumes higher than the tidal volume, airgo™'s normalized amplitudes tend to underestimate the sensormedics' normalized volume. in order to improve the effectiveness of the statistical analysis and to allow comparisons between subjects with different levels of maximum workload, the entire exercise test was divided in four different regions according to the increasing physical stress intensity: low intensity ("l"), medium intensity ("m"), high/maximum intensity ("h" and "i max ") and recovery phase ("rp"). to identify the maximum of exercise intensity for each subject, the sensormedics' maximum value of normalized value of minute ventilation was considered as reference. in exercise test recordings, because of movement artefacts, noises coming from changes in posture, air losses through the mouthpiece caused by the increasing level of physical effort and the greater amount of breaths than the test at rest, the alignment process was based only on the initial big breath synchronization, so the final shift in alignment was not eliminated. an example is reported in figure 8 , where another common problem is shown: for volumes higher than the tidal volume, airgo™'s normalized amplitudes tend to underestimate the sensormedics' normalized volume. in order to improve the effectiveness of the statistical analysis and to allow comparisons between subjects with different levels of maximum workload, the entire exercise test was divided in four different regions according to the increasing physical stress intensity: low intensity ("l"), medium intensity ("m"), high/maximum intensity ("h" and "imax") and recovery phase ("rp"). to identify the maximum of exercise intensity for each subject, the sensormedics' maximum value of normalized value of minute ventilation was considered as reference. to equally divide the four regions, two points of reference were identified, one at one third and the second at two third of the normalized minute ventilation maximum value of the subject. within the first three regions characterized by an increasing ventilation, four smaller sections constituted by at least twenty breaths were identified: the first one referring to the low intensity region ("l"), the second one referring to the medium intensity region ("m"), while the third and fourth referring to the high and maximum intensity region ("h" and "imax"). because each subject reached a different maximum load according to his/her physical capacity, the four sections were centered differently from subject to subject. the section "l" was constituted by the central breaths of the first region, the section "m" by the central breaths of the second region; the section "h" by the central breaths of the third region and the section "imax" by the last twenty breaths of the exercise text track before the section "rp". in this last region, two other smaller sections were identified: the first one ("rp1") containing at least ten breaths centered around the closest normalized minute ventilation central value within the "h" section and the second one ("rp2") containing at least ten breaths centered around the closest ventilation central value within the "m" section. an example of this division into sections in the case of the minute ventilation can be seen in figure 9 . to equally divide the four regions, two points of reference were identified, one at one third and the second at two third of the normalized minute ventilation maximum value of the subject. within the first three regions characterized by an increasing ventilation, four smaller sections constituted by at least twenty breaths were identified: the first one referring to the low intensity region ("l"), the second one referring to the medium intensity region ("m"), while the third and fourth referring to the high and maximum intensity region ("h" and "imax"). because each subject reached a different maximum load according to his/her physical capacity, the four sections were centered differently from subject to subject. the section "l" was constituted by the central breaths of the first region, the section "m" by the central breaths of the second region; the section "h" by the central breaths of the third region and the section "i max " by the last twenty breaths of the exercise text track before the section "rp". in this last region, two other smaller sections were identified: the first one ("rp1") containing at least ten breaths centered around the closest normalized minute ventilation central value within the "h" section and the second one ("rp2") containing at least ten breaths centered around the closest ventilation central value within the "m" section. an example of this division into sections in the case of the minute ventilation can be seen in figure 9 . sensors 2020, 20, x for peer review 11 of 19 in order to verify the goodness-of-fit of values distribution, the kolmogorov-smirnov test of normality has been run both on single distributions and on the differences between the two methods. since the test returned that data did not come from a normal distribution, a non-parametric statistical analysis has been conducted. to have a general overview of airgo™'s estimate error in each position both during quiet breath and during the exercise test for each subject, relative error medians, interquartile ranges (iqrs) and limits of agreement (loas) were calculated for each parameter. then, to find a significant difference between different positions and intensity levels (within the exercise test), the one-way kruskal-wallis test has been conducted with a level of significance of p < 0.05. in the next paragraphs, box and whiskers plots of the differences between the two methods are reported for each position and for the two conditions (static postures and exercise test). an example of a bland-altman plot for a subject is reported to graphically show the agreement between the two instruments. the relative error median percentage, the interquartile range and the limits of agreement of the parameters during the test at rest and the exercise test are reported in tables 2 and 3 , respectively. negative signs in the percentages stand for an understimation of the airgo™'s computed parameters with respect to those of sensormedics 2900, and negative values indicate the amount of the understimation. in the case of the respiratory rate, the overall relative error median percentage in all positions was of 0%, with the highest value of interquartile range of 15.0 in the supine position. the second best result was the overall relative error median percentage in the case of the normalized amplitude parameter: in fact, in the supine position the percentage of 0.8% was the lowest relative error for the test at rest after the results of the respiratory rate. moreover, by looking at changes in the relative error median between positions for the normalized amplitude parameter it was possible to notice a difference in values between standing and horizontal positions (supine, right lateral decubitus, left lateral decubitus). in particular, for horizontal positions the overall percentage in order to verify the goodness-of-fit of values distribution, the kolmogorov-smirnov test of normality has been run both on single distributions and on the differences between the two methods. since the test returned that data did not come from a normal distribution, a non-parametric statistical analysis has been conducted. to have a general overview of airgo™'s estimate error in each position both during quiet breath and during the exercise test for each subject, relative error medians, interquartile ranges (iqrs) and limits of agreement (loas) were calculated for each parameter. then, to find a significant difference between different positions and intensity levels (within the exercise test), the one-way kruskal-wallis test has been conducted with a level of significance of p < 0.05. in the next paragraphs, box and whiskers plots of the differences between the two methods are reported for each position and for the two conditions (static postures and exercise test). an example of a bland-altman plot for a subject is reported to graphically show the agreement between the two instruments. the relative error median percentage, the interquartile range and the limits of agreement of the parameters during the test at rest and the exercise test are reported in tables 2 and 3 , respectively. negative signs in the percentages stand for an understimation of the airgo™'s computed parameters with respect to those of sensormedics 2900, and negative values indicate the amount of the understimation. in the case of the respiratory rate, the overall relative error median percentage in all positions was of 0%, with the highest value of interquartile range of 15.0 in the supine position. the second best result was the overall relative error median percentage in the case of the normalized amplitude parameter: in fact, in the supine position the percentage of 0.8% was the lowest relative error for the test at rest after the results of the respiratory rate. moreover, by looking at changes in the relative error median between positions for the normalized amplitude parameter it was possible to notice a difference in values between standing and horizontal positions (supine, right lateral decubitus, left lateral decubitus). in particular, for horizontal positions the overall percentage values were lower (0.8%, 4.4% and 1.1%) compared to the seated position (9.0%) with the standing position percentage value standing between the two (6.8%). the normalized minute ventilation parameter, calculated as the product of tidal volume and respiratory rate, gained similar results to the normalized amplitude parameter and it was affected by similar errors in the same positions. to obtain representative error values for each respiratory parameter independently from the position, considering the mean error in all positions, the relative errors were 4.4% for normalized amplitude, 0% for respiratory rate, 4.2% for normalized ventilation and −0.3 for duty cycle. table 3 . exercise test and recovery, all intensities, all parameters. aggregated data are expressed as relative error median percentage, interquartile range (iqr) and limits of agreement (loas), computed as ±1.96sd. iqr and loas are presented in brackets (iqr/loas). as shown in figure 2 , airgo™'s estimates tend to underestimate with increasing volumes and consequently with increasing ventilation during exercise, as was graphically shown in figure 8 . in the normalized amplitude values reported in table 3 , the relative error increased drastically from −11.1% at low intensity to −39.7% at maximum intensity. in the second recovery phase, the overall median relative error was −24.8%, a value that stands in between the low-intensity and medium-intensity values. this result was also more evident by looking at the overall normalized ventilation relative error medians. in this case, the respiratory rate was confirmed as the best estimate with an overall relative error median percentage of 0% in all physical effort conditions, while the duty cycle parameter had the highest error of −0.27 in the medium intensity level of the exercise. the results of one way kruskal-wallis analysis are reported in table 4 for the test at rest in different positions and for the exercise test at different intensity levels. for the test at rest, normalized amplitude, normalized ventilation and respiratory rate parameters were independent on position (p > 0.05). on the other hand, duty cycle was influenced by posture, with a statistically significant difference between standing and supine positions (p < 0.05). for the exercise test, the respiratory rate was the only parameter which resulted to be independent from different levels of exercise intensity (p > 0.05). in the cases of normalized amplitude (p < 0.01), normalized ventilation (p = 0.001) and duty cycle (p < 0.01) parameters, there was a significant effect of the intensity level of exercise on the error. in particular, the post hoc pairwise comparison highlighted that the first two parameters presented a statistically significant difference between conditions l and h (p < 0.05 for normalized amplitude and p < 0.01 for normalized ventilation), and conditions l and i max (p = 0.014 for normalized amplitude and p < 0.01 for normalized ventilation). the duty cycle presented, instead, a significant difference between m and rp1 conditions (p < 0.01), and m and rp2 conditions (p < 0.05). in figure 10 , the four parameters in the different positions for all subjects are reported in boxplots, where the different positions of the test at rest are indicated with roman numbers. the same analysis has been performed on the parameters during the exercise test and the result is shown in figure 11 . the bland-altman plot allows us to appreciate how much the system being validated differs from the reference system. the results obtained in the case of a healthy female subject (38 years old, 166 cm, 52 kg; same subject shown in figures 6, 7 and 9 ) are reported in figure 12 in the case of exercise and recovery tests for normalized ventilation and respiratory rate parameters. the obtained results are not significantly different from the results obtained in the previously illustrated representation methods. focusing on normalized ventilation (figure 12 , left), points tended to decrease with increasing ventilation, thus displaying the behavior of a proportional error. respiratory rate (figure 12 , right) showed a constant variability of the error. the bland-altman plot allows us to appreciate how much the system being validated differs from the reference system. the results obtained in the case of a healthy female subject (38 years old, 166 cm, 52 kg; same subject shown in figures 6, 7 and 9 ) are reported in figure 12 in the case of exercise and recovery tests for normalized ventilation and respiratory rate parameters. the obtained results are not significantly different from the results obtained in the previously illustrated representation methods. focusing on normalized ventilation (figure 12 , left), points tended to decrease with increasing ventilation, thus displaying the behavior of a proportional error. respiratory rate (figure 12 , right) showed a constant variability of the error. bland-altman plots of the aggregated results of respiratory rate at rest. each dot represents the value obtained from a single subject. the yellow line represents the upper limit of agreement (mean error +1.96× standard deviation); the red line represents the mean error; the purple line represents the lower limit of agreement (mean error −1.96× standard deviation). bland-altman plots of the aggregated results of respiratory rate at rest. each dot represents the value obtained from a single subject. the yellow line represents the upper limit of agreement (mean error +1.96× standard deviation); the red line represents the mean error; the purple line represents the lower limit of agreement (mean error −1.96× standard deviation). the comparison of the airgo™ device with the metabolic cart showed that the respiratory rate was the most accurate parameter, in all positions, for both rest and exercise tests in all conditions of physical effort: medians were always positioned on the zero error line with low dispersion around this value. normalized amplitude and normalized ventilation showed similar trends and similar error changes both in different positions and during exercise. at rest, the overall error was small and remained almost constant in different positions, with the supine position characterized by better results compared to the others, even though this aspect did not have statistical significance. on the other hand, airgo™'s underestimation of increasing breath volumes during exercise was confirmed and it was also more evident in normalized minute ventilation. additionally, the recovery phase was characterized by a progressive reduction in absolute errors, which tend to assume initial median values. figure 16 . bland-altman plots of the aggregated results of respiratory rate during exercise. each dot represents the value obtained from a single subject. the yellow line represents the upper limit of agreement (mean error +1.96× standard deviation); the red line represents the mean error; the purple line represents the lower limit of agreement (mean error −1.96× standard deviation). the comparison of the airgo™ device with the metabolic cart showed that the respiratory rate was the most accurate parameter, in all positions, for both rest and exercise tests in all conditions of physical effort: medians were always positioned on the zero error line with low dispersion around this value. normalized amplitude and normalized ventilation showed similar trends and similar error changes both in different positions and during exercise. at rest, the overall error was small and remained almost constant in different positions, with the supine position characterized by better results compared to the others, even though this aspect did not have statistical significance. on the other hand, airgo™'s underestimation of increasing breath volumes during exercise was confirmed and it was also more evident in normalized minute ventilation. additionally, the recovery phase was characterized by a progressive reduction in absolute errors, which tend to assume initial median values. in the results of the exercise tests, it was previously observed that in the second recovery phase, the overall median relative error was −24.8%, a value that stands between the low-intensity and medium-intensity values. the reason for this error is still a subject of research. several possible explanations might have contributed separately or in combination to this result, however it is not possible to understand the magnitude of the contribution of each of these single explanations. the first one regards the pre-tensioned length of the girth elastic band obtained during the preparation phase of the test protocol: by cutting the band at a length slightly shorter than the real thoracic circumference (−7%) to ensure an effective fitting, it is likely that, for high volumes, the band reached the saturation region of the stretch metallic knit resistance qualitative curve in figure 2 . in this case, the resistance changes with elongation fail to have a correct dynamics, exiting from the linear region. the second explanation is related to the different relative contributions of the abdomen and the chest wall to the breathing pattern according to the posture and the trunk position assumed during exercise. in fact, it is known from the literature that a progressively increased inclination of the trunk determines a progressive reduction in the chest wall contribution and a progressive increase in abdominal contribution to the tidal volume [24] , while the airgo™ band only detects changes in circumference at the level of the abdominal rib cage, thus losing information related to the abdominal contribution. finally, as can be seen in the box and whiskers plots, there was a constant underestimation of duty cycle in all positions at rest and also for all intensity levels during exercise. in particular, sensormedics' values were always within an interval between 0.4 and 0.6, while airgo™'s estimates were between 0.2 and 0.4. this means that, in the first case, inspiratory and expiratory time have a proportion of about 1:1, while in airgo™'s duty cycle estimation, this proportion is about 1:2. this explains the error underestimation and it could be due to imperfections in the airgo™ processing algorithm. specifically, the ending of a breath is determined by computational means when the girth is either smaller than what it was when the breath began or when the girth is smaller than the maximum girth in a given breath cycle by a certain amount, as specified in the related patent. the fact that the end of expiration is detected by means of a threshold might cause errors in the precise identification of the minimum. in fact, the duty cycle is computed as described in table 1 and an underestimation of inspiratory time with respect to expiratory (and total) time causes an underestimation of the duty cycle as well. the overall error remains constant between the test at rest and the exercise test, suggesting that duty cycle could be very accurate in both conditions when this issue is solved. the overall performance of airgo™ in comparison with the sensormedics metabolic cart was satisfactory and it must be noted that the research on this device, both from the point of view of the hardware and of the signal processing software, is still ongoing. respiratory rate is a vital sign used to monitor the progression of several illnesses, to predict adverse clinical events and to discriminate between patients at risk and stable patients. given that airgo™'s respiratory rate estimates had a good correspondence with the analyses performed with an instrument that can be considered the gold standard, it can be concluded that the airgo™ device could be useful to monitor respiratory rate in a non-invasive and non-intrusive way during everyday activities and sleep. applications of this device can be found in chronic respiratory diseases as well as acute pathologies, such as the novel covid-19, where the respiratory rate is predictive of the worsening of the disease. further research is needed on the estimations of tidal volume, minute ventilation and duty cycle, however the obtained results are encouraging and research works combining activity recognition and estimation of respiratory parameters are needed to assess the validity of the system in a non-controlled environment. the airgo™ device is partially described in the us patent number us201462007142p, title "breath volume monitoring system and method" by david kuller for myair llc. respiratory rate predicts cardiopulmonary arrest for internal medicine inpatients respiratory rate: the neglected vital sign monitoring breathing rate at home allows early identification of copd exacerbations exacerbations in chronic obstructive pulmonary disease: identification and prediction using a digital health system measurement of heart rate and respiratory rate using a textile-based wearable device in heart failure patients lower mortality of covid-19 by early recognition and intervention: experience from jiangsu province monitoring of physiological parameters to predict exacerbations of chronic obstructive pulmonary disease (copd): a systematic review remote respiratory monitoring in the time of covid-19 validation of the hexoskin wearable vest during lying, sitting, standing, and walking activities qualitative and quantitative evaluation of a new wearable device for ecg and respiratory holter monitoring respiration rate and volume measurements using wearable strain sensors smart vest for respiratory rate monitoring of copd patients based on non-contact capacitive sensing assessment of breathing parameters using an inertial measurement unit (imu)-based system estimation of respiration rate from three-dimensional acceleration data based on body sensor network smart textile for respiratory monitoring and thoraco-abdominal motion pattern evaluation reliability of a wearable wireless patch for continuous remote monitoring of vital signs in patients recovering from major surgery: a clinical validation study from the tracing trial towards continuous respiration monitoring research on non-contact monitoring system for human physiological signal and body movement wearable technology: role in respiratory health and disease telemonitoring systems for respiratory patients: technological aspects respiratory frequency during exercise: the neglected physiological measure validation study of the airgo™ device for continuous monitoring of respiratory function a multimodal wearable system for continuous and real-time breathing pattern monitoring during daily activity effects of gender and posture on thoraco-abdominal kinematics during quiet breathing in healthy adults this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord-315077-i1xjcuae authors: branda, john a.; lewandrowski, kent title: utilization management in microbiology date: 2014-01-01 journal: clin chim acta doi: 10.1016/j.cca.2013.09.031 sha: doc_id: 315077 cord_uid: i1xjcuae the available literature concerning utilization management in the clinical microbiology laboratory is relatively limited compared with that for high-volume, automated testing in the central core laboratory. however, the same strategies employed elsewhere in the clinical laboratory operation can be applied to utilization management challenges in microbiology, including decision support systems, application of evidence-based medicine, screening algorithms and gatekeeper functions. the results of testing in the microbiology laboratory have significant effects on the cost of clinical care, especially costs related to antimicrobial agents and infection control practices. consequently many of the successful utilization management interventions described in clinical microbiology have targeted not just the volume of tests performed in the laboratory, but also the downstream costs of care. this article will review utilization management strategies in clinical microbiology, including specific examples from our institution and other healthcare organizations. clinical microbiology includes bacteriology and antimicrobial susceptibility testing, virology, parasitology, mycobacteriology, mycology, serology and molecular microbiology. unlike the core laboratory (chemistry/hematology), where the majority of testing is performed on highly automated analyzers, most testing in microbiology is performed manually or on semi-automated platforms. many microbiology tests also require interpretation by a skilled microbiology technologist, including visual interpretation of culture results and microscopic examinations. for these reasons the unit cost of microbiology testing is usually greater than that for routine automated testing. the results of microbiology tests have a significant impact on the overall cost of clinical care, most notably in the use and selection of antimicrobial therapy. therefore, when approaching utilization management in microbiology, it is important to consider not only the cost of testing within the microbiology laboratory but also the downstream costs resulting from clinical decisions based on the test results. the published literature on utilization management in microbiology is relatively limited when compared to reports on managing utilization of routine automated testing in the chemistry and hematology laboratories. this article will outline a number of utilization management interventions in microbiology that have been reported in the literature. we will also describe several unpublished initiatives that have proven successful in our institution. the specific interventions to be discussed are outlined in table 1 , and will be described in more detail in the text that follows. in a number of cases, the initiative's success arose not only from a reduction in laboratory testing per se, but rather also from its impact in the clinical care arena (for example, a reduction in antibiotic use or hospital length of length-of-stay). this observation highlights the importance of the clinical microbiology director in forming collaborative, interdepartmental teams to improve quality and reduce the cost of medical care. tests for cytomegalovirus (cmv) include antigenemia testing, viral load testing by quantitative polymerase chain reaction (qpcr), viral genotyping, shell vial culture, or serologic tests for the detection of a host immunologic response (cmv igm and igg antibody tests and antibody avidity tests). for an individual patient, the most appropriate test depends on the clinical indication. it is difficult for clinicians to keep upto-date with esoteric tests in rapidly evolving specialties, especially when there are numerous tests that can be ordered. in these situations, the use of a decision support tool is an effective mechanism to assist physicians in proper test selection, potentially avoiding inappropriate test selection. as one example, fig. 1 shows a screen display from the on-line laboratory handbook at the massachusetts general hospital. when the clinician types "cytomegalovirus" or "cmv" into the handbook's search function, the available tests and their appropriate indication are presented. in addition, the same decision-support information is provided in the electronic physician order entry (poe) system when a clinician views cmv-related test options. an advantageous feature of this approach is that when new tests become available, or outdated ones are removed from the test menu, the decision-support function can be updated accordingly. for example, the mgh microbiology laboratory recently discontinued the cmv antigenemia assay in favor of the cmv qpcr test. the information provided in the on-line handbook makes it clear that the preferred test has changed. this approach can be applied to many other tests in microbiology, particularly in areas such as molecular microbiology where new assays are supplanting more traditional assays at a rapid rate. the topic of decision support is covered in more detail in another chapter of this special edition. the problem of contamination of blood cultures from improper or poor technique is well known. it has been estimated that up to 5% of positive blood cultures may represent contaminants [1] , resulting in significant increases in resource utilization. consequently many hospitals have engaged in ongoing efforts to reduce blood culture contamination by improving staff training, or designating specific types of employees to collect the blood culture specimens. for example, blood cultures collected by medical house officers are more likely to be contaminated than those collected by phlebotomists [2] . in a retrospective study, bates et al. studied the impact of contaminated blood cultures on hospital length-of-stay and hospital charges [3] . in patients with falsely positive blood cultures, there was a 4.5-day increase in the median length-ofstay and an increase in hospital charges of 33.4%. false-positive episodes were associated with increased pharmacy charges for intravenous antibiotics (39% increase) and laboratory charges (20% increase). in another study, segal and chamberlain assessed the impact of false-positive blood cultures in a pediatric emergency department [4] . the authors reported an increase in phone calls, return visits to the emergency department, unnecessary laboratory tests, inappropriate antibiotic administration and hospital admissions. finally, tabriz et al. evaluated the practice of repeating blood cultures serially [5] . blood cultures were repeated in 31.6% of cases and amounted to approximately one-third of all blood cultures handled in the laboratory. the results of the repeated cultures showed no growth in 83.4% of cases, the same pathogen in 9.1% of cases, and a new pathogen or contaminant in 2.5% and 5.0% of cases respectively. the authors concluded that repeating blood cultures provides little additional yield and that guidelines for when to repeat blood cultures might decrease utilization. laboratory reports that are not optimally designed can lead to confusion among clinicians, with the potential for misdiagnosis or unnecessary requests for additional testing. ackerman et al. evaluated the interpretation of 5 typical microbiology reports by physicians in a teaching hospital [6] . the investigators found that reports were often misinterpreted. for example, one report of "isolation of a gram negative rod from sputum was misinterpreted by 4 out of 5 physicians." the reasons for misinterpretation were reported to be the use of jargon, unfamiliar names of bacterial species, or ill-defined reporting conventions, and the omission of a clear-cut conclusion in many reports. the misunderstandings resulted in both inappropriate use of antibiotics and orders for unnecessary testing in the laboratory. this study highlights the importance of developing clear, concise, standardized reporting formats in microbiology, and the need for the laboratory to work closely with physicians in designing and communicating microbiology reports. 2.4. rapid identification of bacteria using matrix-assisted laser desorptionionization time of flight mass spectroscopy (maldi-tof ms) to improve clinical decision making and guide antibiotic use it has long been known that rapid bacterial identification and susceptibility testing lead to more appropriate use of antibiotics and a reduction in antimicrobial utilization [7] . in the past, rapid identification and susceptibility testing were mainly accomplished using automated instrumentation to perform conventional tests. more recently, manufacturers have developed maldi-tof ms for rapid organism identification, a method that has been demonstrated to reduce turnaround time for the identification of bacteria and yeasts by 1.45 days compared with conventional methods [8] . another advantage of maldi-tof ms is its simplicity and the relatively low cost of consumables [9] . for table 1 examples of utilization management initiatives in clinical microbiology (see text for details). 1. decision support: test selection for cytomegalovirus testing 2. reducing blood culture contamination 3. proper formatting of microbiology reports to avoid misinterpretation 4. use of maldi-tof mass spectroscopy for rapid identification of pathogens 5. antimicrobial stewardship of carbapenems and other expensive antimicrobial agents 6. rapid point-of-care testing for influenza a and group a streptococcus: impact on test ordering and antibiotic utilization 7. rapid molecular diagnostic testing for patients previously colonized with methicillin resistant staphylococcus aureus (mrsa) 8. use of screening methods to reduce low-yield urine cultures 9. restricting stool examinations in hospital acquired diarrhea 10. rapid testing for respiratory viruses: impact on inpatient bed management 11. application of evidence based medicine: discontinuation of fungal blood cultures 12. selection and oversight of molecular diagnostics in microbiology example, one study performed at a large, academic medical center demonstrated potential cost savings to the laboratory (for reagents and labor) of $100,000 per year [8] . although maldi-tof ms does not provide antimicrobial susceptibility data, rapid organism identification may help clinicians earlier to select an effective empirical antimicrobial strategy [10] . carbapenems including imipenem, meropenem, ertapenem and doripenem are broad spectrum antimicrobial agents active against many gram-positive, gram-negative and anaerobic organisms. they are highly effective when used appropriately but are also very expensive relative to potential alternative agents. indiscriminant use of carbapenems will also contribute to antimicrobial resistance, decreasing the effectiveness of the drug class. many hospitals have established antimicrobial formularies in the pharmacy to assist in management of expensive antibiotic drugs. in our institution some antibiotics are restricted and can only be obtained after approval from the division of infectious diseases. however, once approved there was no requirement or formal mechanism in place to re-evaluate the ongoing need for a restricted antibiotic once the results of microbiology culture and antimicrobial susceptibility testing became available. the clinical pathology staff in our microbiology laboratory recently began an antimicrobial stewardship program related to carbapenems. each day, a microbiology fellow and laboratory director review the clinical history, culture results and susceptibility test results for all patients newly started on a carbapenem, to determine appropriate versus inappropriate use of the drugs. if objective data indicate that a patient's infection can be treated using a non-restricted agent, an email is sent to the clinician as shown in the example below. "dear dr.________ your patient ________ is currently receiving a restricted antibiotic: ertapenem. the use of this restricted drug is being monitored by the mgh antimicrobial stewardship program. recent culture and antimicrobial susceptibility data from your patient reveal that the organism(s) is/are susceptible to other, nonrestricted antibiotics (see sensitivity report below). given these data, if clinically appropriate, please consider discontinuing the restricted carbapenem and/or changing to a non-restricted antimicrobial option. this may help reduce both the development of future resistance to these broad spectrum drugs and costs of therapy. if you have not already done so, you may request an infectious disease consult in order to obtain assistance on the choice of antimicrobial agents". after implementation of this stewardship effort, a decrease in carbapenem use was observed, and carbapenems have been removed from the "top 10" list of money spent on antimicrobials. the microbiology group is now extending the program to include other high cost antimicrobials. point-of-care testing (poct) performed at the patient's bedside provides rapid, real-time results. in some cases this facilitates clinical decision making and improves the efficiency of clinical operations [11] . a number of rapid point-of-care tests are available for the diagnosis of infectious diseases [12] . among these are single use, visually read, lateral flow tests for influenza a and b. in a randomized, prospective controlled study by bonner et al., the use of a rapid poc influenza test in a pediatric emergency department was associated with a significant reduction in laboratory tests ordered (complete blood count, blood cultures, urinalysis and urine cultures), a decrease in chest radiographs performed and a reduction in emergency department length-of-stay [13] . in another study of pediatric patients presenting to the emergency department with acute pharyngitis, the authors compared antibiotic use in patients who received a rapid streptococcus a test to those who were managed by conventional throat culture alone. they reported a decrease in antibiotic use of 50% when the rapid test was employed (22.45% versus 41.38%) and concluded that the rapid test significantly reduced unnecessary prescription of antibiotics [14] . patients who are colonized with mrsa require contact precautions when admitted to the hospital. this entails placing them in a private room or cohorting the patient in a semi-private room with another colonized patient. in addition, hospital staff must wear gloves and gowns, and use dedicated equipment when interacting with the patient. some of these patients also have a longer hospital length of stay due to delays in discharge to other health care facilities. collectively, these features of mrsa colonization result in greater use of hospital resources compared with non-mrsa colonized patients. because mrsa colonization can be transient, protocols have been put in place to identify previously mrsa colonized patients who are no longer colonized, and can be managed without contact precautions. in our institution, patients with a history of mrsa are eligible for discontinuation of contact precautions if for either screening method, the patient cannot have been received antibiotics active against mrsa during the 48 h prior to screening. discontinuation of contact precautions on the basis of a single negative mrsa pcr is faster than screening using the culture-based method, and could result in an increase in discontinuation of contact precautions because of a reduction in the number of samples needed [15] . uncomplicated urinary tract infection is very common, especially among women. various strategies have been employed to make (or rule out) a diagnosis of community acquired uti in adult women without the need for urine culture. some sources suggest that uncomplicated uti in outpatients can be diagnosed and managed without culture, unless the patient fails treatment or has had recurrent utis [16, 17] . others have even suggested that suspected uti can be managed over the telephone in women with typical symptoms of cystitis and without vaginal symptoms or major co-morbidities [18] . this approach eliminates both office visit and any subsequent laboratory testing. another approach is to limit urine culture utilization by pre-screening urine using various methods. for example, dipstick urinalysis for leukocyte esterase and nitrites has a high negative predictive value, and may be used to exclude bacteriuria without a culture step when the results are negative. studies have also demonstrated that urine can be screened for significant bacteriuria prior to culture using automated urine sediment examination [19] or flow cytometry [20] . diarrhea is a common complaint among hospitalized patients. although it is common for clinicians to request a routine stool culture and ova and parasites (o&p) examination for patients with diarrhea, these tests are designed to detect agents of community-acquired rather than hospital-acquired infection. a number of studies have indicated that routine stool culture or stool o&p examination is usually not warranted in adult patients who develop diarrhea more than three days after admission to the hospital [21] [22] [23] . in contrast, testing for clostridium difficile should be considered, as this is a major cause of nosocomial diarrheal illness. new molecular diagnostic tests offer the promise of providing rapid and reliable testing for c. difficile. highly sensitive molecular testing could potentially permit rapid ruleout of c. difficile, obviating the need for unnecessary antibiotics and contact precautions in many patients. upper respiratory viral infections in the united states are frequently caused by rhinoviruses, coronaviruses, influenza a and b, parainfluenza, respiratory syncytial virus, adenovirus or metapneumovirus. while many of these illnesses can be managed on an outpatient basis, patients with severe illness or major comorbidities are often admitted to the hospital. usually they present first to the emergency department, where the initial challenge is to confirm the presence of a respiratory viral infection and then to identify the specific offending virus in order to direct specific therapy (if available) and aid in hospital bed assignment. often, respiratory viral infections occur in seasonal epidemics (especially influenza a and b), resulting in hospital overcrowding and a shortage of hospital beds. when this occurs, managing the availability of hospital beds becomes a priority. in cases where the offending virus has been identified, contact and/or droplet precautions must be instituted to prevent transmission between patients as shown in table 2 . patients on contact or droplet precautions must be placed in a private room. alternatively, if two patients are infected with the same respiratory virus, they can be cohorted together in one hospital room. in our hospital, we offer rapid molecular diagnostic testing for influenza a and b for patients who require hospital admission for a flu-like illness. testing is performed 24 h per day, 7 days per week, in order to facilitate bed management and timely initiation of anti-viral therapy. we also offer a respiratory viral panel by direct immunofluorescence for the many viruses listed above, 7 days per week during peak respiratory virus season. the ability to identify the specific virus causing the infection greatly assists in managing our inpatient beds and maintaining effective infection control measures. the savings to the hospital from improved bed management during epidemics greatly exceeds the cost of the testing in the laboratory. many clinical laboratories offer "fungal blood cultures" that employ specialized media designed to enhance the detection of yeast or mold fungemia. however, it is well-established that specialized fungal blood culture media are not superior to routine blood culture media for the detection of candida fungemia (candidemia) [24, 25] . thus, the main rationale for the use of fungal blood cultures is to improve detection of cryptococcal yeast, endemic fungi (histoplasma capsulatum, coccidioides spp., etc) and filamentous fungi (e.g., aspergillus spp.) in the blood. until recently, our institution offered fungal blood cultures using myco/ f lytic bottles (becton dickinson) designed for use with the bactec automated culture monitoring system. this approach was costly, not only in terms of reagent costs but also in terms of technical labor. the highly-enriched and non-selective culture medium needed to be incubated for up to 30 days before a negative result could be obtained, and this frequently promoted the growth and eventual detection of skin contaminants that would not have been detected using routine blood culture bottles and a 5 to 7 day incubation period. thus, we reviewed our experience with this approach to understand whether or not it provided a significant clinical benefit. we reviewed the results of all fungal blood cultures over a 44 month period. during this time period, 5544 myco/f lytic fungal blood cultures were performed. our review revealed the following: â�¢ no dimorphic fungi were recovered by fungal blood culture. â�¢ mold (fusarium sp.) was recovered twice from a single patient using fungal blood culture. however, in this case fusarium was also recovered from 2 sets of routine blood cultures (2 out of 4 bottles), several days prior to recovery from fungal blood culture. â�¢ cryptococcus neoformans was recovered from 3 patients by fungal blood culture. two of the patients had cryptococcal meningitis, and in both cases the organism was detected in numerous other ways (csf and blood cryptococcal antigen tests, csf gram stain, routine and fungal csf cultures, and routine blood cultures). the third patient had cryptococcal fungemia (but not meningitis); a cryptococcal antigen test on blood was positive, and the organism was recovered twice from routine blood cultures. based on these findings, we concluded that fungal blood cultures had failed to detect any dimorphic fungi, molds or cryptococcal yeast that were not otherwise detected by routine blood culture. furthermore, blood culture is not a useful method for the detection of invasive infection caused by molds or dimorphic fungi. therefore, blood culture specifically designed for the detection of fungi was discontinued at our institution. many microbiology tests suffer from drawbacks that limit their clinical utility. for example, results from traditional culture methods may take several days to become available, and antibiotic treatment prior to specimen collection may degrade sensitivity. serologic studies often cannot distinguish current from past infection unless acute and convalescent specimens are available. techniques for specialized cultures such as for viruses are beyond the capability of most hospital laboratories. finally many microbiology tests do not yield quantitative results which may be desirable in some situations. recent advances in molecular diagnostic techniques for microbiology offer promise to resolve some of these issues. some molecular tests, such as those for influenza a and b, hepatitis b virus dna and hiv viral load are requested in sufficient volume that many hospital laboratories are able to offer the tests in-house. in these examples, the molecular diagnostic test is either superior to alternative testing methods or provides unique information of clinical importance (e.g. viral load). however, in many cases the volume of requests for molecular microbiology tests is too low to warrant performing the test in the hospital laboratory. typically these tests are sent to outside reference laboratories for analysis which, over time, can prove very expensive. intuitively a molecular diagnostic test is expected to be highly sensitive and specific, but this is not always the case. in some situations, conventional microbiology tests are more appropriate [26] . for this reason, it is important for the clinical microbiology director to work with infectious disease specialists to scrutinize the send out budget for potential inappropriate test ordering. further, such analysis may reveal opportunities for in-sourcing the testing at significant savings. for example, our microbiology laboratory recently insourced nucleic-acid testing for epstein bar virus (ebv) saving over $100,000 per year and providing superior turnaround time. tests performed in the clinical microbiology laboratory are ripe for utilization management efforts. clinicians are frequently confused about which test to order, and appropriate test selection can be guided through decision support mechanisms and gatekeeper functions. the diagnostic yield of routine cultures can be improved by facilitating proper specimen collection and transport, or screening specimens prior to culture. clinical decision making, particularly in selecting appropriate antibiotics and implementing (or discontinuing) infection control measures, can be aided by efforts to reduce turnaround time and by antimicrobial stewardship efforts directed by microbiologists. finally, errors can be prevented by carefully formatting reports to avoid miscommunication. analysis of strategies to improve cost effectiveness of blood cultures pathology tests: is the time for demand management ripe at last contaminant blood cultures and resource utilization: the true consequences of false positive results resource utilization and contaminated blood cultures in children at risk for occult bacteremia repeating blood cultures during hospital stay: practice pattern at a teaching hospital and a proposal for guidelines consumer survey on microbiology reports rapid identification and antimicrobial susceptibility testing reduce antibiotic use and accelerate pathogen-directed antibiotic use prospective evaluation of a matrix-assisted laser desorption ionization-time of flight mass spectrometry system in a hospital clinical microbiology laboratory for identification of bacteria and yeasts: a bench-by-bench study for assessing the impact on time to identification and costeffectiveness maldi-tof mass spectroscopy: transformative proteomics for clinical microbiology impact of matrixassisted laser desorption ionization time-of-flight mass spectrometry on the clinical management of patients with gram-negative bacteremia: a prospective observational study implementing point-of-care testing to improve outcomes infectious disease testing at the point-of-care impact of the rapid diagnosis of influenza on physician decision making and patient management in the pediatric emergency: results of a randomized, prospective controlled study impact of rapid streptococcal test on antibiotic use in a pediatric emergency department discontinuation of contact precautions for methicillin-resistant staphylococcus aureus: a randomized controlled trial comparing passive and active screening with culture and polymerase chain reaction management of urinary tract infections in adults laboratory diagnosis of urinary tract infections in adult patients urine dipstick for diagnosing urinary tract infection bacteriuria screening by automated whole-field-image-based microscopy reduces the number of necessary urine culture evaluation of the sysmex uf-100 urine cell analyzer as a screening test to reduce the need for cultures for community-acquired urinary tract infection role of the microbiology laboratory in the diagnosis of nosocomial diarrhea rational testing for faeces in the investigation of sporadic hospital-acquired diarrhoea when should a stool culture be done in adults with nosocomial infections optimal use of myco/f lytic and standard bactec blood culture bottles for detection of yeast and mycobacteria principles and procedures for blood cultures; approved guideline. clsi document m47-a. wayne, pa: clinical and laboratory standards institute introducing a molecular test into the clinical laboratory: development, evaluation, and validation key: cord-332481-y0rd70ry authors: ljubic, t.; banovac, a.; buljan, i.; jerkovic, i.; basic, z.; kruzic, i.; kolic, a.; kolombatovic, r. r.; marusic, a.; andjelinovic, s. title: the effect of serological screening for sars-cov-2 antibodies to participants' attitudes and risk behaviour: a study on a tested population sample of industry workers in split-dalmatia county, croatia date: 2020-06-17 journal: nan doi: 10.1101/2020.06.15.20131482 sha: doc_id: 332481 cord_uid: y0rd70ry rapid serological tests for sars-cov-2 antibodies have been questioned by scientists and the public because of unexplored effects of negative test results on behaviour and attitudes, that could lower the level of adherence to protective measures. therefore, our study aimed to investigate the changes in personal attitudes and behaviour before and after negative serological test results for sars-cov-2 antibodies. we conducted a survey questionnaire on 200 industry workers (69% males and 31% females) that have been previously tested negative. the survey examined participants' self-reported general attitudes towards covid-19, sense of fear, as well as their behaviour related to protective measures before and after the testing. the participants perceived the disease as a severe health threat and acknowledged the protective measures as appropriate. they reported a high level of adherence to measures and low level of fear both before and after the testing. although those indicators were statistically significantly reduced after the test (p < 0.004), they did not result in risk behaviour. therefore, the serological tests are not an additional threat regarding the risk behaviour in an environment where protective measures are efficient. in contrast, they might contribute to reducing the fear in the society and working environment. since november 2019, the coronavirus (sars-cov-2) has been spreading around the globe leading to a pandemic with more than 4,5 million recorded cases and more than 300 000 deaths worldwide recorded on 15 may 2020 (worldometer, 2020) . countries worldwide are testing their populations to estimate the number of people with active virus infection and the number of those who have recovered from it. it is highly recommended to prioritise testing hospitalised patients, healthcare facility workers, workers in congregate living settings, first responders, residents in long-term care facilities and generally persons with symptoms of (potential) covid-19 infection using rt-pcr tests (centers for disease control and prevention, 2020) . to estimate the number of people previously exposed to and/or infected with the virus, serological immunoassay tests are currently the best option, especially due to lower cost and shorter amount of time needed to obtain the results (johns hopkins center for health security, 2020). the first case of covid-19 in the republic of croatia was reported in late february. to prevent the spreading and exponential growth of the disease, restrictive measures were introduced by the croatian government on 19 march 2020 (koronavirus.hr, 2020a (koronavirus.hr, , 2020b . from 23 march 2020, leaving the place of residence was also prohibited (koronavirus.hr, 2020c) . with such restrictive measures, croatia earned first place on the stringency scale provided by the oxford covid-19 government response tracker on 26 march (hale & webster, 2020) . among mandatory protective measures, citizens were continuously provided with recommendations for personal protection, including social distancing (1 meter in open areas, 2 meters in closed areas), wearing face masks, maintaining personal hygiene, etc. along with the national measures (koronavirus.hr, 2020a (koronavirus.hr, , 2020b (koronavirus.hr, , 2020c , many companies introduced additional measures to protect the health of employees further and to maintain the manufacture. such is the case for div group, specialised in shipbuilding and production and trade of screws and mechanical parts, which introduced serological testing for employees using rapid serological immunoassay, as a health protection element within their corporate security system (div group, 2020; jerković et al., 2020) . although the findings of serological testing for covid-19 can be an essential part of investigating the disease, the tests vary in sensitivity and specificity and also produce false negative and false positive results (long et al., 2020; west et al., 2020) . these issues impose a danger to not only the health of tested individuals and communities but can also reduce the positive effects of national . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june 17, 2020 . . https://doi.org/10.1101 health policies and protective or restrictive measures necessary for the containment of the disease (lippi et al., 2020) . this can be especially devastating as some health experts worry that testing populations and providing them with a knowledge of their health and/or immunity status regarding covid-19 could lead to yet unexplored psychological and behavioural effects (green et al., 2020) . the aforementioned psychological and behavioural effects have already been investigated concerning negative test results received in various screenings. the concerns regarding these effects showed that people would, after receiving negative test results in a screening program, perceive they have a lower risk of developing the disease they were tested for and thus less likely take precautions not to get sick in future (larsen et al., 2007; marteau et al., 1996) . a systematic review included eight studies on screening programs for diseases linked to lifestyle behaviours (type 2 diabetes, breast, bowel, lung and cervical cancer, and abdominal aortic aneurysm) to determine the post-screening changes in behaviours, attitudes, and emotions. the study showed that negative screening results are unlikely to cause changes in observed characteristics and to have a negative impact on behaviour (cooper et al., 2017) . nevertheless, since covid-19 is a novel disease whose spread is most effectively prevented by maintaining social distancing, community consciousness, personal protection and hygiene practices (lakshmi priyadarsini & suresh, 2020)behaviours that are most dependent on the conscientiousness and self-control of all individuals, it is of utmost importance to examine the behaviours and attitudes of people who received negative test results. furthermore, these factors are vital in a specific working environment where interpersonal contact cannot be avoided entirely due to production characteristics. in these settings, changes in behaviour and attitudes of workers could impact the general psychological environment and, most importantly, the health of company workers and their families. thus, this study aims at investigating the changes in personal attitudes and behaviour of div group industry workers before and after receiving negative serological test results for sars-cov-2 antibodies. from may 10 to may 15, 2020, we conducted a survey of div group industry workers in split-dalmatia county, croatia, who were previously tested for sars-cov-2 antibodies by rapid immunoassays. the previous serological screening was conducted from april 23 to april 28, 2020, in collaboration with the clinical department for pathology, forensic medicine and cytology, university hospital centre split and university department of forensic sciences, university of split (split, croatia). the named testing comprised 1316 participants and it was the first mass testing in the republic of croatia, and, to the authors' knowledge, one of the first and largest studies on the corporative level in the world at that time (jerković et al., 2020) . that study analysed the test results of 1316 participants, revealing that only 0.99% of participants (95% ci 0.53-1.68) were positive for sars-cov-2 antibodies (jerković et al., 2020) . the div group facility in split employs about 2200 people, which makes them the second largest employer in the county. the split facility employee structure includes those working in production, as well as management and administration (div group, 2020; jerković et al., 2020) . to examine if the test results affected participants' attitudes and behaviour, we constructed a short questionnaire and surveyed the employees, with the permission of the management of the company. the companies' occupation safety officers distributed the questionnaire to the different company departments, including the management, administrative, and production staff. they visited different departments separately and offered employees that participated in the screening voluntary participation in the study. as only a small proportion (0.99%) of employees were tested positive for antibodies, we included only those with negative test results. the questionnaire had six parts: (1) information of the study and informed consent; (2) general demographic data and test results; (3) participants' general attitudes towards covid-19; (4) participants' protective behaviour and fear from the disease prior to testing; (5) participants' protective behaviour and fear from the disease after the testing; and (6) the factors related to compliance with personal protection measures. the general and demographic questions included gender, age, test results (negative / igg positive/ igm positive / igm + igg positive), and participants level of education. other personal data were not included to ensure the participants' anonymity. the third part of the questionnaire included the questions on participants' perception of the disease and its severity, as well as their attitudes on the protective and restrictive measurements given on the national and the company level. this question section provided seven statements that participants should have rated on a five-level likert scale for agreement (1 = strongly disagree; to 5 = strongly agree). in the fourth and the fifth part of the questionnaire, the participants were asked about their anxiety and fear of the covid-19, compliance with restrictive measures and application of protective equipment before and after the testing. this section was composed of two sub-sections. the first one included nine statements regarding the participants' fear and perception of their environment, that participants were asked to rate on a five-level likert scale for agreement (1 = strongly disagree; to 5 = strongly agree). in the second sub-section, participants were asked to rate their frequency of obeying the restrictive measurements and applying the personal protective equipment. it included four statements with responses on a five-level likert scale for frequency (1 = never; to 5 = very often). in the final section, the participants were provided with four statements about factors that influence their adherence to the restrictive and protective measures, including the serological test results and level of actual restrictive measures and recommendations. they were also asked to select the one four statements that best suited their views. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020. . https://doi.org/10.1101/2020.06.15.20131482 doi: medrxiv preprint categorical variables, including the gender, education level, and factors affecting adherence to the protective measures, were given as frequencies and percentages. for the remaining variables, we provided the mean values with 95% confidence intervals. differences in categorical variables were examined using the chi-squared test, while the differences in participants' responses before and after the testing were examined using a paired-samples t-test. due to the increased number of multiple comparisons (n=14), we set statistical significance at p ≤ 0.004 (bonferroni correction). all analyses were performed with jasp 0.12.1 (jasp team, 2020). the sample comprised 200 participants (68% men; median age = 43, interquartile range = 21). most of them had undergraduate or graduate education (47.7%) or completed secondary education (32.7%), while fewer participants completed non-university college or professional studies (18.6%). there were only two participants with primary education (1%), and one answer was missing. most participants perceived covid-19 as a dangerous disease and reported that restrictive measures and protective guidelines conducted on a national and company level were efficient and appropriate (table 1) . *the response to the statements ranged from 1strongly disagree; 2disagree; 3neutral; 4agree; 5strongly agree. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020. . https://doi.org/10. 1101 on average, low levels of fear related to infection or infecting others with covid-19 were observed both before and after the testing (table 2 , statements 1-6). adherence to protective measures was also high prior to and post-testing (table 2, statements 7-9). nonetheless, changes in participants' behaviour and attitudes before and after the testing were statistically significant for most variables. suspicions and fear that a person or people in their physical vicinity were infected were significantly reduced. however, participants' perception of other peoples' adherence to measures did not change significantly (table 2, statements 7-8). the response to the statements ranged from 1strongly disagree; 2disagree; 3neutral; 4agree; 5strongly agree. *paired-samples t-test. †statistically significant values are in bold. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020. . https://doi.org/10. 1101 the participants on average showed a high frequency of adherence to protective measures and restrictions (table 3) . when they were asked about their pre-test and post-test adherence frequencies, they reported maintaining the application of personal protective equipment on the almost same level, but lower adherence to social distancing (table 3 , statements 2-4). although the participants reported changes in behaviour and attitudes before and after receiving the test results, most of the participants did not attribute their behaviour to the test itself but rather to the level of company and national protective measures (table 4) . . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020 . . https://doi.org/10.1101 the results of the present study showed that negative serological test result is associated with changes in the behaviour and attitudes of participants, but not to the extent that would lead to irresponsible or dangerous behaviour. to the best of our knowledge, this is the first study that investigates the changes before and after receiving negative serological covid-19 test results on behaviours and attitudes connected to this disease. the results of this study indicate that the levels of fear of being infected or infecting others with covid-19, as well as behaviours regarding adherence to protective measure, changed significantly in the timeframe after receiving negative test results. however, the subjects' fear of infection and/or infecting was initially at the lower moderate to low level and dropped to an even lower level after the testing. although the disease had a pandemic character and was at that time relatively unexplored, the situation at the company was under control as the company introduced protective measures at the end of february. this was influenced by the experience of their partners . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020. . https://doi.org/10. 1101 in china and italy, which were at the time global pandemic hotspots. these measures, along with national protective measures, introduced by the second half of march (koronavirus.hr, 2020a (koronavirus.hr, , 2020b (koronavirus.hr, , 2020c , were probably beneficial for the participants' lower level of fear. the frequency of positive behaviour related to social distancing also reduced after the testing, but still remained high. in contrast, results indicate no significant change in behaviours related to wearing protective equipment, masks, and gloves which were highly adherent. both of the findings could be attributed to the stimulating climate in the company and society that raised awareness of protective and restrictive measures. we also did not find changes in the perception of colleagues' compliance with protective measures pre-and post-testing. these findings might additionally support the participants' responsibility and conscientiousness, regardless of their test results. this is also evident from the fact that most of them attributed their behaviour less to the tests results but more to the current level of restrictive measures. studies on screening for various diseases such as different types of cancer, sexually transmitted diseases (stds), diabetes, etc. have been conducted to determine their psychological and behavioural effects but also the perception of ones' health and future risk of getting sick (ashraf et al., 2009; berstad et al., 2015; collins et al., 2011; eborall et al., 2007; sznitman et al., 2010) . the recent review on these types of studies shows a small decrease in perceived risk of the disease screened for, slightly lower levels of anxiety or worry in the screen negative group, and highlights that out of 28 studies only five showed an unfavourable change in the negatively screened groups' health-related behaviours (cooper et al., 2017) . although our study findings indicate changes of similar direction and extent, it is difficult to compare its results to abovementioned studies. this is due to the very nature of covid-19, an infectious disease spread primarily by human contact and interaction. the other diseases populations are usually screened for are (ashraf et al., 2009; berstad et al., 2015; collins et al., 2011; eborall et al., 2007; sznitman et al., 2010) , except for stds, not transmittable. but the comparison is not possible even with stds since their transmittance is usually restricted to the most intimate of human interactions and thus limited. the cessation of infectious disease spreading such as covid-19 is impossible without necessary changes in human interactions and behaviour, which must be applied to all members of society. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june 17, 2020. . https://doi.org/10. 1101 the limitation of this study is that compared pre-and post-testing self-ratings were all given after the testing, thus potentially introducing a reporting bias. to obtain pre-testing measurements, it was only possible to survey the participants on the day of voluntary serological testing. from the organisational and protective standpoint, it was of utmost importance to minimise the time participants spent at the testing station to the time required for serological testing and completing a mandatory accompanying questionnaire on disease-related factors (jerković et al., 2020) . this would result in not only the prolonged absence of participants from their workplace but also their potentially increased exposure to the virus. since the period between testing and completing the survey questionnaire lasted a maximum of 21 days for each participant, by testing at a single point in time we relied on the participants' ability to recall recent behaviours and attitudes. while other studies on the impact of negative screening results repeated measurements after several months or years (cooper et al., 2017) , this was not possible for this study due to the very nature of covid-19 as well as differing levels of national restrictive measures. the additional limitation of this study was the lack of a control group. the covid-19 screening in div group in split (jerković et al., 2020) resulted in an insufficient number of positive participants to represent a separate group of subjects in research. therefore, due to the extremely low seroprevalence in the tested sample (about 1%), including positive participants would not provide relevant information for the scope of the study. also, having an adequate control group of non-tested participants was not possible since almost all div group industry workers in split were screened. surveying the general population for that purpose would not be appropriate, as div employees were immersed in an all-encompassing working atmosphere with special and more severe protection measures prescribed by the employer, that were introduced considerably earlier than the national measures. however, even if we would have detected that general population control group had adhered less to the protective measures, due to social climate influenced by the smaller number of newly infected or the current level of national restrictions, it could have only implied that test results had even fewer negative consequences on behaviour related to protective measures. in conclusion, our study results indicate that covid-19 serological testing does not impose an additional threat regarding the potentially irresponsible or risk behaviour in an environment where . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june 17, 2020. . in front of you is a 4-page questionnaire which aims to examine the impact of the results of serological testing for sars-cov antibodies on personal attitudes and behaviour of participants. this survey is conducted by the university department of forensic sciences in cooperation with the medical faculty of the university of split and the clinical department of pathology, forensic medicine and cytology of the clinical hospital centre split. the research was approved by the ethics committee of the university department of forensic sciences (2181-227-05-12-19-0003; 024-04 / 19-03 / 00007). the survey is conducted anonymously, and your personal data will not be available to researchers or your employer, and the results will be used exclusively for research purposes. it takes a maximum of 10 minutes to complete the questionnaire. effect of ct screening on smoking habits at 1-year follow-up in the danish lung cancer screening trial (dlcst) long-term lifestyle changes after colorectal cancer screening: randomised controlled trial evaluating and testing persons for coronavirus disease emotional impact of screening: a systematic review and meta-analysis do negative screening test results cause false reassurance? a systematic review psychological impact of screening for type 2 diabetes: controlled trial and comparative study embedded in the addition (cambridge) randomised controlled trial molecular and antibody point-of-care tests to support the screening, diagnosis and monitoring of covid-19 oxford covid-19 government response tracker sars-cov-2 antibody seroprevalence in industry workers in split-dalmatia and serology-based tests for covid-19 odluka o mjerama ograničavanja društvenih okupljanja, rada u trgovini, uslužnih djelatnosti i održavanja sportskih i kulturnih događanja odluka o privremenoj zabrani prelaska graničnih prijelaza republike hrvatske odluka o zabrani napuštanja mjesta prebivališta i stalnog boravka u rh factors influencing the epidemiological characteristics of pandemic covid 19: a tism approach impact of colorectal cancer screening on future lifestyle choices: a three-year randomized controlled trial potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (covid-19) diagnosis of the coronavirus disease (covid-19): rrt-pcr or ct? the psychological impact of cardiovascular screening and intervention in primary care: a problem of false reassurance? british family heart study group the impact of communitybased sexually transmitted infection screening results on sexual risk behaviors of african american adolescents covid-19 testing: the threat of falsenegative results covid-19 coronavirus pandemic the authors would like to thank div group company and tomislav debeljak, along with all study participants. we are especially thankful to boško ramljak, marija čečuk and ivica sinovčić for their assistance in organisation and data collection. protective measures are efficient. additionally, they might be beneficial to reducing the level of fear in society and the working environment. this article does not represent in whole or in part the views of the authors' institutions. however, it does express those of the authors. key: cord-334274-4jee19hx authors: waelde, k. title: how to remove the testing bias in cov-2 statistics date: 2020-10-16 journal: nan doi: 10.1101/2020.10.14.20212431 sha: doc_id: 334274 cord_uid: 4jee19hx background. public health measures and private behaviour are based on reported numbers of sars-cov-2 infections. some argue that testing influences the confirmed number of infections. objectives/methods. do time series on reported infections and the number of tests allow one to draw conclusions about actual infection numbers? a sir model is presented where the true numbers of susceptible, infectious and removed individuals are unobserved. testing is also modelled. results. official confirmed infection numbers are likely to be biased and cannot be compared over time. the bias occurs because of different reasons for testing (e.g. by symptoms, representative or testing travellers). the paper illustrates the bias and works out the effect of the number of tests on the number of reported cases. the paper also shows that the positive rate (the ratio of positive tests to the total number of tests) is uninformative in the presence of non-representative testing. conclusions. a severity index for epidemics is proposed that is comparable over time. this index is based on covid-19 cases and can be obtained if the reason for testing is known. background. statistics have gained a lot in reputation during the covid-19 pandemic. almost everybody on this globe follows numbers and studies "the curve"on recorded cases, on daily increases or on incidences of cov-2 infections. the open question. what do these numbers mean? what does it mean that we talk about "a second wave"? intuitive interpretations of "the curve" suggest that the higher the number of new infections, say in a country, the more severe the epidemic is in this country. is this interpretation correct? when the number of infections increases, decision makers start to discuss additional or tougher public health measures. is this policy approach appropriate? our message. reported numbers of cov-2 infections are probably not comparable over time. when public health authorities report x new cases on some day in october 2020, these x new cases do not have the same meaning as x new cases in april, may or june 2020. the bias results from di¤erent testing rules that are applied simultaneously. private and public decision making should not be based on time series of cov-2-infections as the latter do not provide information about the true epidemic dynamics in a country. if the reason for testing was known, a unbiased measure of the severity of an epidemic could be computed easily. our framework. we present a theoretical framework that allows one to understand the link between testing and the number of reported infections. we extend the classic sir model (kermack and mckendrick, 1927, hethcote, 2000) to allow for asymptomatic cases and for testing. 2 our fundamental assumption states that the true numbers of susceptible, infectious and removed individuals are not observed. results. the reason for the intertemporal bias consists in relative changes of test regimes. if a society always employed only one rule when tests are taken, e.g. "test for sars-cov-2 in the presence of a certain set of symptoms", then infection numbers would be comparable over time. if tests are undertaken simultaneously, e.g. "test in the presence of symptoms"but also "test travellers without symptoms", and the relative frequency of tests changes, a comparison of the number of reported infections over time bears no meaning. the paper illustrates the bias by a "second wave"in reported cases which -by true epidemiological dynamics -is not a second wave. understanding this bias also provides an answer to one of the most frequently asked question when it comes to understanding reported infection numbers: what is the role of testing? do we observe a lot of reported infections only because we test a lot? should we believe claims such as "if we test half as much, we have half as many cases". this paper will provide a precise answer to what extent the reported number of infections is determined by the number of tests in a causal sense. the answer in a nutshell: if tests are undertaken because of symptoms, there is no causal e¤ect from the number of tests on the number of reported infections. if tests are undertaken for other reasons (travellers, representative testing), the number of reported infections go up simply because there is more testing. 3 we show that time series on the number of tests and time series on reported infections do not allow one to obtain information about the true state of an epidemic. we also study the positive rate as the ratio of the number of positive tests to the total number of tests. the positive rate is informative if we undertook representative testing only. the positive rate is not informative about true epidemiological dynamics when there are several reasons for testing. understanding the biases also allows us to understand how to correct for it. the paper presents a severity index for an epidemic that is unbiased. one can obtain this index in two ways: record the reason why a test was undertaken or count only the covid-19 cases. such an index should be used when thinking about relaxing or reimposing public health measures. testing is important for detecting infectious individuals, counting covid-19 cases is important for private and public decision making. structure of paper. the next section presents the model. section 3 shows biased and unbiased measures of the true but unobserved dynamics of an epidemic. it also studies the (lack of) informational content of time series on reported infections and time series on the number of tests, and the properties of the positive rate. it …nally presents an unbiased severity index. the conclusion summarizes. the basic assumption of our extension of the susceptible-infectious-removed (sir) model consists of the belief that true infections dynamics are not observable. simultaneous testing of an entire population or weekly representative testing is not feasible -at least given current technological, administrative and political constraints. this section therefore …rst describes the true but unobserved infection dynamics, then introduces tests into this framework and …nally computes the number of reported infections within this framework. the classic sir model we study a population of …xed size p: individuals can be in three states as in a standard sir model. the number of individuals that are susceptible to infection is denoted bys (t) : this number is unobservable to the public and to health authorities. the numbers of infectious and removed (i.e. recovered or deceased) individuals are denoted byĩ (t) andr (t) ; respectively. we assume that individuals are immune and non-infectious after being removed. let the (expected) number of individuals in the state of being susceptible at a point in time t be denoted bys (t) : 4 the number of susceptible individuals falls according to where r is a constant and c (t) rĩ (t) can be called the individual infection rate. it captures the idea that the risk of becoming infected is the greater, the higher the number of infectious individuals. 5 merging individual recovery rate and death into one constant , the number of on the bias due to tests but discuss perceptions brie ‡y below. 4 we write expected number as ordinary di¤erential equations in sir models could or should be understood as kolmogorov backward equations describing means of continuous time markov chains. see karlin and talyer (1998) or ross (1996) for an introduction. 5 in the tradition of diamand-mortensen-pissarides search and matching models in economics (diamond, 1982 , mortensen, 1982 , and pissarides, 1985 , this individual infection rate can be expressed capturing similar ideas as in a matching function: it should not only increase in the number of infectious individuals but also fall in the number of susceptible individuals. the latter reduces the probability that a random contact is infectious. see donsimoni et al. (2020a) for an implementation. 3 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . infectious individuals changes according to finally, as a residual, the number of removed individuals rises over time according to dr (t) =dt = ĩ (t). we illustrate the dynamics in the following …gure, employed also later on. the true infection dynamics (a simple sir model) we can easily integrate asymptomatic cases into this framework. we split the true number of infectious individuals described in (2) into symptomatic and asymptomatic cases, this allows us to capture the infection process in (2) by two distinct di¤erential equations. when hold, (2) holds as well. individual infection rates are now de…ned as symp c the epidemiological idea behind these equations is simple. the rate with which one individual becomes infected is the same for everybody and given by rĩ (t) : the higher the number of infectious individuals in society,ĩ (t) ; the higher the rate with which one individual gets infected. it then depends on various, at this point partially unknown, physiological conditions of the infected individual whether they develop symptoms or not. we denote the share of individuals that develop symptoms by s: we assume this share is constant. 6 epidemiological dynamics this completes the description of the model. let us now describe how we can understand (unobserved) epidemiological dynamics. we start with some initial condition fors (t). a good candidate would bys (0) = p; i.e. the entire population of size p is susceptible to being infected and become infectious. initially, there are very few infectious individuals, say, there are two,ĩ symp (t) =ĩ asymp (t) = 1: given infection rates (6a) and (6b) and parameters, the number of infectious symptomatic and asymptomatic cases evolves according to (4) and (5). 6 the model neglects the e¤ect of quarantine. if infectious individuals know about their status and therefore stay in quarantine, they should be removed fromĩ (t) or at least get a lower weight in (6). infectious individuals are removed from being infectious at a rate : 7 the number of susceptible individuals follows (1). the epidemic is over with herd immunity, i.e.s = 0 at some point (far in the future) or when recovery is su¢ ciently fast relative to in ‡ows such thatĩ = 0: 8 the epidemic is heading towards an end when d dtĩ symp (t) < 0 and d dtĩ asymp (t) < 0; i.e. the number of infectious individuals falls. 9 we abstract from public health measures and their e¤ects (as studied e.g. by dehning et al., 2020 or donsimoni et al., 2020a . if we wanted to include them, we could allow public health measures to a¤ect r in the individual infection rate in (2). 10 to understand the e¤ects of tests, we now introduce testing into our sir model. the following …gure displays all unobserved quantities in the model by dashed lines. the red circles represent the standard sir model illustrated in …gure 1. testing can take place for a variety of reasons described in test strategies adopted by various countries. the reasons for tests we take into account at this point is testing due to the presence of typical symptoms, representative testing and testing travellers. while testing by symptoms and representative testing is well-de…ned, testing travellers is really only an example for a larger type of test. this example covers all tests that are applied to a group de…ned by certain characteristics which, however, are not representative of the population as a whole. other examples of this non-representative testing include testing of soccer players, testing in retirement homes or their visitors, testing in hot spots or testing contact persons of infected individuals. the sir model with testing 7 it would be straightforward to assume, e.g. asym p > sym p . this would capture the idea that asymptomatic cases recover faster than symptomatic cases. we ignore this extension as this distinction would not a¤ect our main argument. 8 for analytical solutions of the classic sir model, showing this aspect most clearly, see harko et al. (2014) or toda (2020) . 9 this condition is related to the widely discussed reproduction number. 10 current applications of the sir model also badly neglect the non-exponential distribution in various states. it is well-known (e.g. linton et al., 2020 or lauer et al., 2020 that incubation time is (approximately) lognormally distributed. it is now also understood that the reporting delay per se and added to incubation time is also non-exponentially distributed (mitze et al., 2020, app. a.3) . the "chain trick" (hurtado and kirosingh, 2019) would allow to implement this numerically. meyer-herrmann (2018) employed a related struture but did not focus on densities of duration explicitly. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint testing by symptoms individuals can catch many diseases (or maybe better sets of symptoms) indexed by i = 1:::n. for simplicity, the above …gure displays only two diseases (1 and 2) and covid-19. the number of individuals that have a disease i and go to a doctor on day t is d i (t) : an individuals becomes sick with an arrival rate i and recovers from this speci…c sickness i with a rate of i . for clarity, we add a symptomatic sars-cov-2 infection to this list of diseases. the individual is infected and develops symptoms with rate symp c (t) ; which we know from (6a), and is removed with rate c : the number of symptomatic sars-cov-2 individuals isĩ symp (t) from (4). there is a certain probability p i that a doctor performs a test, given a set of symptoms i: this probability re ‡ects the subjective evaluation of the general practitioner (gp) whether certain symptoms are likely to be related to sars-cov-2. the probability to get tested with symptomatic sars-cov-2 infection (which the gp of course does cannot diagnose without a test) is denoted by p c : hence, the (average or expected) number of tests that are performed at time t due to consulting a doctor is given by the second equality replaces the number of tests by the number of sick individuals per disease times the probability that this individuals is tested. 11 note that, apart from population size p; the number of tests taken because of the presence of symptoms, t d (t) ; is the …rst variable that is observed. if health authorities collected information why a test was performed (set of symptoms that can be observed by a gp), we would observe t d i (t) and t d c : if not, we observe t d (t) only. tests can be performed for a variety of reasons. one consists in testing travellers, another consists in tests for scienti…c reasons and so on. theses tests are not related to symptoms. taking the example of representative tests, the tests are applied to the population as a whole. the number of tests is chosen by public authorities, scientists, available funds, capacity considerations and other. in any case, it is independent of infection-characteristics of the population. concerning representative testing, we denote the number of tests of this type undertaken at t by t r (t) : when it comes to travelers, we denote the number of tests by t t (t) : summarizing, the total number of tests being undertaken in our model is given by the sum of tests due to symptoms, t d (t), representative tests t r (t) and testing travellers, t t (t) ; the second equality employs the number of tests by symptoms from (7). the equation thereby reemphasizes the endogeneity of the number of tests by symptoms, t d (t) is determined by the number of symptoms occurring in a country or region, and the exogeneity of other reasons for testing, t r (t) and t t (t) : the latter are not determined by symptoms. the number of reported infections at time t is given by the sum of reported infections split by the reasons for testing introduced above, 11 in a broader interpretation, one could understand p i and p c as the probabilities that an individual gets tested and that they go to the doctor. no test is ever performed if individuals with symptoms stay at home. 6 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint testing by symptoms as we are perfectly informed in our theoretical world about the (expected) number of cov-2 infections and other diseases, we know that the number of positive cov-2 tests is zero for all diseases, 12 i i = 0: individuals have covid-19 related symptoms because they caught a cold, they have the ‡u or other. the probability that a cov-2 infected individual has a positive test is set equal to one (ignoring false negative tests). the number of positive tests for individuals that are infected with cov-2 is therefore identical to the number of tests, testing for other reasons the probability that a representative test is positive is denoted by p r (t) : this probability is a function of the true underlying and unobserved infection dynamics. if the sample chosen is truly representative, then the probability for a positive test is given by hence, representative tests make the true numberĩ (t) of infectious individuals visible for the moment at which the tests are undertaken. this true number includes symptomatic and asymptomatic cases as in (3). the probability that a test of travellers is positive depends on a multitude of determinants among which region traveled to and behaviour of the traveller. we denote the probability that such a test is positive by p t (t). we consider this probability to be exogenous to our analysis. a …rst step towards the total number of reported infections starts from (9) and takes (10) and (11) into account, this is also the expression displayed in …gure 2 between 'cov-2 tests'and 'con…rmed infections'. reported infections come from testing cov-2 individuals with symptoms, from representative testing and from other sources such as travellers. employing t d c (t) = p cĩsymp (t) from (7) and p r (t) from (12), the number of reported infections can be written as 3 unbiased and biased reporting is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint our central equation is (13). the reported number of infections would be unbiased if only tests by symptoms were undertaken. reported infections (with t r (t) = t t (t) = 0) would by (13) amount to when the reported number of infections i (t) goes up, one would be certain that the unobserved number of symptomatic cov-2 infectionsĩ symp (t) would go up as well. the more infections are reported, the more severe the epidemic is. this equation also shows under which circumstances the number of tests does not have a causal e¤ect on the number of reported infections. if tests are undertaken according to a rule that makes testing dependent on something else (e.g. the presence of symptoms), the number of tests itself is determined by the number of symptoms. hence, while the number of tests and reported infections are correlated, the causal underlying factor is the number of patients visiting a physician with cov-2 related symptoms. a second example of unbiased testing is (exclusive) representative testing. when only representative testing is undertaken, the number of reported infections (with t d c = t t (t) = 0) from (13) amounts to here, the number of reported infections, i (t) ; does rise in the number of tests, t r (t). the more we test, the higher the number of cases. yet, representative testing is (of course) the gold standard of testing. the ratio of positive cases to the number of tests yields the share of infections in the population, 13 this share is driven byĩ (t) which shows that (i) representative testing provides a snapshot at this point in time t of the current epidemic dynamics and that (ii) representative testing provides a measure of overall infections, i.e. symptomatic and asymptomatic ones. we have seen two examples of unbiased reporting, one for symptomatic infections, one for all infections. they show that the question whether the number of reported cases rises in the number of tests is not as important as the question whether the type of testing provides useful information. in the …rst example, the claim that more tests increase the number of reported infections is meaningless as the number of tests is not chosen. in the second example, the number of positive cases rises in the number of tests but the ratio of these two quantities is highly informative. illustrating a bias now imagine several types of testing are undertaken simultaneously. the number of reported infections at t is then given by the full expression in (13). consider …rst the case of symptomatic and representative testing. the number of reported cases (with t t (t) = 0) is then 13 this ratio is an example of the 'positive rate'. we will study it in more detail below. 8 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . imagine someone (the government, researchers, other) decide to undertake more representative testing, i.e. t r (t) goes up. this means that i (t) increases even though there is no change in the true numberĩ symp (t) of symptomatic cases. there is also no change in the true number i (t) of symptomatic and asymptomatic cases. whoever perceives the reported number i (t) is led to believe that something fundamental has changed within the epidemiological dynamics. but this is of course not true. the reported number goes up simply because more tests were undertaken. 14 can we gain some information out of this expression if we divide it by the number of tests t r (t) as it had turned out to be very useful in the case of exclusive representative testing in (15)? we would obtain which does contain the informative infection shareĩ (t) =p as the second term on the right hand side. but the …rst term does not have a meaningful interpretation and neither does the entire term. let us illustrate the potential bias by looking at the third type of testing considered heretesting travellers. the number of reported infections according to (14) in the case of testing by symptoms and testing travellers reads we assume that no testing of travellers took place at the beginning of the pandemic. at some later point (as of t = 60 in our …gure below), the number of tests per day, t (t) ; increases linearly in time. to make this example as close to public and common displays of infection dynamics, let us look at "the curve" represented in …gure 3 by numbers of infected individuals taking recovery into account. 15 looking at (16) shows that a further source of bias, brie ‡y mentioned earlier, can easily be identi…ed. imagine the general perception of gps changes over time concerning . then a gp might be initially sceptical, i.e. p c is low, then become more aware of health risks implied by cov-2, p c goes up, to then maybe during some other period become more reluctand again. if these changes in individual perceptions are not entirely idiosyncratic but driven by the overall attention in society to an epidemic, the number of reported infections would change independently of the true number of infections,ĩ sym p (t) orĩ (t) : 15 one could draw similar …gures with new infections per day or the number of individuals ever infected. the basic argument would remain the same. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . looking at …gure 3 we …rst focus on the blue dashed curve forĩ symp (t) ; the true number of symptomatic sars-cov-2 infections. we chose parameters such that the epidemic is coming to a halt after around 120 units of time (plotted on the horizontal axis). the green curve plots the number of reported infections i d (t) from (14) where testing takes place only in the presence of symptoms. finally, the red curve is an example of a bias in the reported number of infections. it occurs as positive tests from testing travellers are added to tests by symptoms as in (17). we see that this example displays what looks like a "second wave": reported numbers of infections go up again as of t = 100: by construction, however, this second wave is caused by misinterpretation of the reported number of infections. let us stress that we do not claim that the second wave is a statistical artefact due to testing strategies. it could be a statistical artefact, however. the conclusion shows how to obtain a severity index for an epidemic that is not prone to causing arti…cial results and which data is needed to compute such an index. a non-application to germany consider the case of germany. figure 4 shows the number of tests per week and the number of reported infections. when we look at the time series for all tests in this …gure, it corresponds to t (t) from (8). when we consider the reported number of infections per week in germany, it looks as displayed in the right panel of the above …gure. this time series corresponds to i (t) from (13). can we conclude anything from these two time series about the true dynamics of the epidemic, i.e. can we draw conclusions aboutĩ symp (t) orĩ (t)? 16 technically speaking, we have two equations, (8) and (13), reproduced here for convenience, about which the public has access to two variables, i (t) and t (t) : it seems obvious thatunless we want to make a lot of untested assumptions -o¢ cial statistics do not allow to draw 16 one might be tempted to argue that data on the positive rate in (19) should also be useful. as the positive rate is simply i (t) divided by t (t) ; it does not provide additional information. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . any conclusion about the severity of the epidemic. the right-hand side contains at least three unknowns (e.g. tests classi…ed by reason of testing, t d (t) ; t r (t), t t (t)) and two equations with three unknowns usually do not have a solution. hence, from currently available data, true epidemic dynamics,ĩ symp (t) orĩ (t) ; cannot be understood. 17 the positive rate the positive rate is the ratio of con…rmed infections to number of tests, s pos (t) i (t) =t (t) : this statistic is often discussed in the media and elsewhere (see e.g. our world in data, 2020). in our model, (13) and (8) imply what does this positive rate tell us? some argue that a rising positive rate is a sign of the epidemic 'getting worse'. if we understand the latter by a rise in the number of unobserved infections,ĩ (t) ; or the number of infections with symptoms,ĩ symp (t) ; this statement is true if we undertake representative testing only, t d c = t t (t) = 0 as in (15). in this case, when the observed positive rate s pos (t) rises, this clearly indicates that the number of unobserved infectionsĩ (t) is higher. and so would beĩ symp (t) ; whether individuals with symptoms go to a doctor or not, given the constant share s of symptomatic cases in (6a). 18 does this conclusion hold more generally, i.e. for the full expression (19) when tests are undertaken for many reasons? let us assume we only undertake tests due to symptoms and due to travelling, t r (t) = 0. 19 then the positive rate (19) reads when we increase the number of tests for travellers, we …nd (see appendix) this result is easy to understand technically and has the usual structure: when we increase a summand (t t (t) here) that appears in a fraction in numerator and denominator, the sign of the derivative depends on the other summands ( n i=1 p i d i (t) and p cĩsymp (t) in this case). as the summand is multiplied by p t (t) in the numerator, this probability appears in the condition as well. in terms of epidemiological content, the derivative says that the positive ratio can rise or fall when we increase the number of tests for travellers (or related reasons mentioned below …gure 2). testing increases the positive rate if the number of tests undertaken due to symptoms 17 this paper is about conceptional issues related to the …nding an unbiased estimator for an unobserved time series. we ignore practical data problems. the latter include the fact that the number of tests displayed in …g. 4 is not coming from the same sample of tests that yields the number of infections in this …gure. this would have to be taken into account in any application. 18 the appendix shows that the positive rate is also informative and identical toĩ (t) =p if travellers (or visitors of retirement homes, or contact persons of a positively tested individual or visitors of public events etc) are representative. this assumption is questionable, however. 19 quantitatively speaking, representative testing is probably very small relative to other reasons for testing. that are not cov-2 related, n i=1 p i d i (t) ; exceeds the number of tests undertaken because of symptoms related to cov-2, p cĩsymp (t) ; corrected for the probability that a traveller test is positive. while an intuitive interpretation of this condition seems to be a challenge, the condition nevertheless conveys a clear message: it contradicts that a rising positive rate implies a 'worse' epidemic state. we see that when t t (t) goes up and the positive rate goes up, this does not mean anything regarding the dynamics ofĩ symp (t) orĩ (t) : the same is true, of course, when t t (t) goes up and the positive rate goes down. the positive rate is not informative. this …nding also applies to a somewhat more precise statement of the above conjecture. some claim that a rising positive rate in the presence of more tests does show that infections must go up. when we increase t t (t) ; the number of tests goes up. when (21) holds, the positive rate goes up. however, we do not learn anything about infections with or without symptoms. tests go up and the positive rate goes up simply because we test more. we now propose an index for the severity of an epidemic which is comparable over time. the model illustrated in …gure 2 tells us what is needed: the index should be closely related to the number of symptoms in society. as tests that capture these symptoms are those that are undertaken because of symptoms, the index is simply i d (t) as in (14). an alternative would consist in representative testing. while the number of reported cases depends causally on the number of tests, the ratio of reported cases to number of tests is an unbiased estimator of the true epidemic dynamics as shown in (15). as regular representative testing, say with a weekly frequency, is not feasible, the only realistic severity index is i d (t) from (14). very simply speaking: if a severity index for an epidemic is desired that is comparable over time, we should test for cov-2 but count covid-19 cases. this should be done at all levels starting from the gp, through hospital admissions and patients in intensive care and, …nally, counting deaths associated with covid-19. what do these …ndings mean in practice? data which is currently available for the public (see e.g. rki, 2020 for germany or our world in data, 2020, for many other countries in the world) does break down the total numbers of tests by origin (gp, hospital and other) and region. unfortunately, this classi…cation does not relate to the reason for testing and the latter is required to infer the true infection dynamics. what should be done to quantify the relevance of the bias? local health authorities in germany collect the names of individuals with con…rmed cov-2 infections. if additional information on symptoms, that is already being collected (the reporting form 20 allows for ticks on fever, coughing and the like), was made available to the public or scientists, the bias could be computed easily. 21 we currently know covid-19 cases for intensive care in hospitals, but this data is not yet easily accessible (see https://www.intensivregister.de). while only a fraction of covid-19 cases ends up in intensive care, this number might be more informative than cov-2 infections. the number of deaths associated with covid-19 is a further measure as would be excess mortality. while these are only partial measures of covid-19 dynamics, covid-19 measures (positive cov-2 tests with symptoms, number of all covid-19 patients in hospitals, not just intensive care) would provide a better basis for regional and local decision makers than cov-2 infection measures. if one day we know how strong the quantitative bias is, we could now hope that the bias is small. then the guidance given to society by the focus on cov-2 infections would have been correct. but even with a small bias, the focus on cov-2 infections should stop. we know that it is not the perfect measure. it rather biases expectation building (and emotional reactions) of individuals. hence, as soon as better covid-19 measures are available, the cov-2 measure should be replaced. the candidates are estimates of informative positive rates and (regional) time series on covid-19 cases (and not cov-2 infections). this would allow local politicians to base their decisions on intertemporally informative data, i.e. on local covid-19 cases. true epidemic dynamics are unobserved. no country, no health authority and no scientist knows the true number of cov-2 infections with or without symptoms for a given country. this is why testing is undertaken. testing is a means to measure true but unobserved epidemic dynamics. the counted number of cov-2 infections are not relevant for decision making, what matters is the true number of cov-2 infections. infections and the corresponding disease spreads when the true number of infections is high, not when the counted number of infections is high. we extend the classic sir model to take symptomatic and asymptomatic cases into account. more importantly, we treat cov-2 infections as unobserved in the sir model and model testing. we allow for various reasons for testing and focus on testing due to symptoms, representative testing and testing travellers. testing travellers is an example of non-representative and nonsymptom related testing and includes the testing of sports professionals, in retirement homes or their visitors, in hot spots or contact persons of infected individuals. we show that the presence of various reasons for testing biases the number of con…rmed cov-2 infections over time. the number of cov-2 infections cannot be compared intertemporally. we might observe more cov-2 infections today simply because we test more. however, the true number of infections might stay constant or even fall. we do not claim, in any sense, that our …ndings have empirical relevance. we simply do not know, at least given the data that are easily accessible to the public and given the data everybody observes (number of tests and number of reported infections) and on which all public health decisions are based, what the true epidemic dynamic is. we all look at a watch and we know that it is wrong. but we do not know how much it is wrong. it may be seconds, but it can also be hours. what are the positive lessons from this analysis? we propose an index which is unbiased over time. it is deceptively simple. count the number of covid-19 cases, not the number of cov-2 infections. if we knew the number of covid-19 cases, i.e. cov-2 infections with severe acute respiratory symptoms (sars), then we would know at least one part of epidemic dynamics (ĩ symp (t) in our model). let us stress that our …ndings are not an argument against testing. testing is important for identifying infectious individuals. they need to stay in quarantine in order to prevent the further spread of cov-2 infections. this helps to reduce covid-19 cases. testing is important -but adding up con…rmed infections from all sorts of tests is misleading. as long as the public focuses on all sources of positive cases, decisions by private individuals, …rms, journalists, scientists and politicians are badly informed. emotions, decisions and behaviour are misguided. this cannot be good for public health. decisions must be based on the number of covid-19 cases. . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint this sections contains derivations for the main text. when we ignore testing by symptoms, the positive rate (19) reads t r (t) + t t (t) : imagine travellers were representative, then p t (t) =ĩ (t) p and the positive rate would read s pos (t) =ĩ (t) p as in (20) for representative testing. under the assumption that travellers (or visitors of retirement homes, or contact persons of a positively tested individual or visitors of public events) are representative, the positive rate would re ‡ect the true epidemic dynamics as measured bỹ i (t) =p: the derivative of the positive rate in (21) we only take tests due to symptoms and due to travelling into account, t r (t) = 0. then the positive rate (19) reads s pos (t) = p cĩsymp (t) + p t (t) t t (t) n i=1 p i d i (t) + p cĩsymp (t) + t t (t) a + p t t t b + t t ; where the second equality de…nes a and b (for this appendix only) and suppresses time arguments to simplify notation. we compute ds pos dt t = p t b + t t a + p t t t (b + t t ) 2 > 0 , p t b + t t > a + p t t t , p t b > a: when we employ the de…nition of a and b; we obtain adding time arguments gives the condition in the main text. 16 . cc-by-nc-nd 4.0 international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october 16, 2020. . https://doi.org/10.1101/2020.10.14.20212431 doi: medrxiv preprint e¤ects of nonpharmaceutical interventions on covid-19 cases, deaths, and demand for hospital services in the uk: a modelling study inferring covid-19 spreading rates and potential change points for case number forecasts aggregate demand management in search equilibrium projecting the spread of covid19 for germany should contact bans have been lifted more in germany? a quantitative prediction of its e¤ects impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates the mathematics of infectious diseases generalizations of the linear chain trick: incorporating more ‡exible dwell time distributions into mean …eld ode models an introduction to stochastic modeling proceedings of the royal society of london series a, containing papers of a mathematical and physical character the incubation period of coronavirus disease 2019 (covid-19) from publicly reported con…rmed cases: estimation and application incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data uses and abuses of mathematics in biology coronavirus (covid-19) testing mathematical models to guide pandemic response estimation of the cancer risk induced by therapies targeting stem cell replication and treatment recommendations face masks considerably reduce covid-19 cases in germany -a synthetic control method approach property rights and e¢ ciency in mating, racing, and related games estimating unobserved sars-cov-2 infections in the united states short-run equilibrium dynamics of unemployment vacancies, and real wages laborbasierte surveillance sars-cov-2 stochastic processes susceptible-infected-recovered (sir) dynamics of covid-19 and economic impact competing interests: there are no competing interests. data and materials availability: all data and software code key: cord-354005-q5nj0ku1 authors: richaud, m.; vermeersch, o.; dolez, p.i. title: specific testing of textiles for transportation date: 2017-09-29 journal: advanced characterization and testing of textiles doi: 10.1016/b978-0-08-100453-1.00015-5 sha: doc_id: 354005 cord_uid: q5nj0ku1 the use of textiles in transportation may be associated with the need to combine comfort and functionality. the evolution of each mean of transportation along with the increased awareness about the importance of passengers' safety pushed research and development forward towards a larger use of lightweight, fire-resistant, nontoxic, and durable textiles in transportation. this in turn led to the development of test methods assessing the different aspects of textiles in transportation. this chapter starts with a presentation of the transportation textile market. then, different aspects of textile testing relevant to the transportation industry are discussed: safety, flammability, hygiene, performance of composite parts, and durability. the chapter ends with considerations regarding future trends for textiles in transportation. the use of textiles in transportation may be associated with the need to combine comfort and functionality. sails were used in ships transporting goods; cushions and curtains were used in carriages or trains for the comfort of passengers; and airplane wings were made out of woven canvas for instance. the evolution of each means of transportation along with the increased awareness about the importance of passengers' safety pushed research and development forward towards a larger use of lightweight, fire-resistant, nontoxic, and durable textiles in transportation. this chapter starts with a presentation of the transportation textile market. then, different aspects of textile testing relevant to the transportation industry are discussed. the chapter ends with considerations regarding future trends for textiles in transportation. the international trade fair for technical textiles, techtextil, has set up a classification for technical textiles according to their area of application (techtextil, 2016) . one of these twelve areas-which also include agrotech, buildtech, medtech, and oekotech-is mobiltech symbolized by a tire symbol, which covers textiles for ship and aircraft construction as well as all aspects of automobile, railway, and space travel. the mobiltech sector is in full growth due to the rapid development of all means of transportation and increased safety requirements for passengers (bertrand, 2005) . for instance, the need for fire-retardant materials in the case of a plane or train crash or a car accident is critical to limit the rate of ignition of fabrics and flame propagation; the extra time it will give passengers to exit the wreck will definitely increase their chances of survival. reinforcements, filters, fiber-reinforced composites, thermal and acoustic insulators, etc. they are used in cars; in recreational vehicles such as motorcycles, snowmobiles, campers, and quads; in public transportation such as buses and streetcars; and in commercial vehicles such as trucks, fire trucks, ambulances, army vehicles, and construction trucks. textile exports, when considered separately from clothing, accounted for $314 billion in 2014 (wto, 2015) . rising concerns for road traffic safety have motivated governments to impose stricter legislations and apply new standards in automotive transportation, such as the compulsory use of 3-point seatbelts and the presence of airbags for drivers as well as front and rear passengers. a study performed by markets and markets analysis predicts that the increase in the demand for premium cars in regions such as europe, north america, and asia pacific should drive the airbag and seatbelt markets upward to $23.6 and $9.0 billion respectively by 2019 (m&m, 2014a) . in the area of composites, the value of the automotive market is expected to reach more than $7 billion by 2019, with a compound annual growth rate of 22% for carbon fiber reinforced polymer composites (m&m, 2014b) . indeed, textile-reinforced composites offer an interesting opportunity to help solve the weight challenge (wilson, 2015) . in 2014, the automotive industry produced 58,000 tons of carbon fiberreinforced composites, 1.6 million tons of natural fiber-reinforced composites, and 3 million tons of glass fiber-reinforced composites. the use of technical textiles in aircrafts has long contributed to the reduction in weight and the associated reduction in fuel expenses. but even though the introduction of mandatory seatbelts and the presence of life vests under the seats have contributed to saving many lives, the presence of synthetic fabrics in airplanes has also been the cause of many deaths. according to a study led by the national transportation safety board on 26 plane crashes that occurred between 1983 and 2000, among the 45% passengers who did not survive the accidents, 26% lost their lives because of the violence of the impact while almost 5% died of fire-related causes (ntsb, 2001) . in addition, it was observed that, in crashes with a higher survivability rate after impact, the rate of fire-related causalities was higher. as a matter of fact, many polymers used in seats, curtains, cushions, carpets, etc. do not display a high enough flame retardancy and/or they generate toxic fumes while burning (bertrand, 2005) . furthermore, in the last 20 years, the proportion of fiber-reinforced composites used in the aerospace industry has increased from 10% to 50%, such as in the airbus 350 xwb or the boeing 787 dreamliner (airbus, 2016) . the objective is to reduce the overall weight of the structure and cut fuel consumption. these materials also require less maintenance thanks to the larger resistance to corrosion. however, the cost of the carbon fiber raw material is still higher than steel or aluminum, for instance, despite the twofold drop over the last decade made possible by the increased production volume and the development of new manufacturing processes. carbon fiber-reinforced composites are gaining ground and becoming competitive thanks to the reduction in defects and cycle time as well as the development of new resin systems and new fiber placement and composite manufacturing processes (rao, simha, rao, & ravi kumar, 2015) ; this provides interesting perspectives for aircraft manufacturers willing to push innovation forward in this sector of industry. textiles also hold a large place in marine transport (singha & singha, 2012) : sails, inflatable crafts, life jackets, personal flotation devices, hovercraft skirts, reinforcement for composite parts, securing and lifting webbings, upholstery, etc. in fact, some even wonder if, with a compound annual growth rate expected to exceed 6% by 2020, marine composites would not be one of the key components of the future of the textile industry (technavio, 2016) . indeed, marine vessels made of textile-reinforced composites are about 50% lighter than those made of steel and more than 30% lighter than those made of aluminum. in terms of performance, the technical textiles used in the marine industry are subjected to strict requirements to ensure the safety of the passengers and merchandise that are transported. flame resistance may thus be a critical criterion for some applications. in addition, requirements may include resistance to puncture and tearing as well as to ultraviolet rays, sea salt, and humidity exposure. these highly demanding requirements have pushed innovation forward within this sector; the use of new materials has been investigated in order to insure the durability of the structures in particular. as a matter of fact, boat hulls are now made of glass fiber reinforced composite with a vinylester matrix, which provides an increased resistance to uv exposure and salt corrosion (miyano & nakada, 2009 ). other high performance materials used in marine transportation include spectra®, a high strength/high modulus extended chain polyethylene fiber whose chemical and abrasion resistance is superior to aramids; treviria, a heat treated polyester fiber fabric; and carbon fibers (singha & singha, 2012) . the railway industry is another important market for technical textiles. it comprises trains and subways. the growth is fueled by the sustained demand in asia and western europe as well as rising markets in africa/middle east and eastern europe (statista, 2016) . textiles are used in flooring (carpets), seats, ceilings, curtains, as well as reinforcement for composite parts. the main driver for textile use in rail transport is, once again, weight. requirements for textiles in trains include nontoxicity of smoke, flame resistance, low abrasion, and durability. in particular, the toxicity of smoke in the case of a fire and emissions of volatile organic compounds (voc) as a result of use are of large concern. examples of requirements for fabrics and composites used in passenger rail transportation may be found in the standard nfpa 130 (2017) relative to fixed guideway transit and passenger rail systems. it covers seat upholstery, covers, window shades, woven seat cushion suspensions, and other fiber reinforced composite body parts. mass transportation is subject to very high standards in terms of passengers' safety. these translate into high requirements set by national and international regulations on fabric and composite properties and performance. on the other hand, recreational motor vehicles such as snowmobiles, motorcycles, jet skis, and sailing ships are not bound to such strict specifications, for instance, in terms of safety or durability. requirements on constitutive materials are thus less demanding. the aerospace industry has also benefited from the significant improvements in terms of mechanical properties of technical textiles. for instance, the thermal protection system of space rockets/shuttles may include fibrous silica batting, graphite rayon fabric-reinforced phenolic resin composite, aramid felts, and/or alumina fiber cloth (milgrom, 2013) . on the other hand, solar sails, made of aluminized carbon fiber fabrics, for instance, may open up new regions of the solar system to exploration by allowing the development of propellant-free interplanetary space crafts (johnson, 2012) . for most applications, the performance of technical textiles used in mass transportation is highly regulated for the sake of the passengers' safety. for that purpose, standards and specifications have been developed and are enforced by national regulations. tight control of raw materials and components by manufacturers of finished products is also observed. over the years, the technical textile market has experienced a steady growth, increasing from less than 15 million tons and about $80 billion in 1995 to more than 23.5 million tons and about $130 billion in 2010 (david rigby associates, 2003) . the trend is not slowing down: a technavio analysis has forecasted that the global technical textile market will reach a compound annual growth rate of 3.71% over the period -19 (technvio, 2014 . given the broad use of technical textiles on all five continents, this large growth rate has led to a need for manufacturers of technical textiles worldwide to standardize their testing methods for transportation applications. in particular, the requirements of manufacturers of transportation equipment have increased over time as regulations towards passengers' safety have become more stringent (pamuk & çeken, 2009) . part and component suppliers are also experiencing an increase in competition and a request for the improvement in their product quality in order to keep their original partnerships and comply with the market expectations. being able to rely on the suppliers' constant quality is a concern for manufacturers of transportation equipment. some original equipment manufacturers (oem) and their clients have built strong partnerships based on the sharing of common values regarding quality and safety. the key is to offer clients the best products available on the market at the desired price. since the creation of the international organization for standardization (iso) in 1947, a series of technical committees have been formed to support the development of standards in the various transportation means, including the following (iso 2016): • iso/tc22 for road vehicles, which was formed in 1947 and has published 839 standards; • iso/tc 20 for aircraft and space vehicles, which was also formed in 1947 and has published 649 standards; • iso/tc 8 for ships and marine technology, formed in 1947 as well and has published 288 standards; • iso/tc 269 for railway applications, which was formed in 2012 and has published 2 standards; the establishment of these reference testing methods has helped engineers develop products that can easily be exported worldwide. at the state level, standardization activities in the field of transportation equipment have also been conducted within the american society for testing and materials (astm), the canadian standards association (csa), the association française de normalisation (afnor), the deutsches institut für normung (din), the british standards institution (bsi), the italian organization for standardization (uni), the japanese engineering standards committee (jesc), and the standardization administration of china (sac) for instance. in addition, the society of automotive engineers (sae) is a us-based association whose mission includes the development of standards. initially limited to the automobile industry, its scope has been broadening in 2016 to include engineers in all types of mobility-related professions. passengers' safety is the major concern for mobiltech textile manufacturers and has to be ensured by a tight control of the fire performance of the textile components and assemblies. great efforts have been devoted over the last years to developing fire-resistant and nontoxic smoke generating textiles for airplanes, trains, ships, and road vehicles. in parallel, standards were established to confirm the level of performance of these materials. for instance, the national fire protection association (nfpa) in the united states has published a series of standards for textiles used in transportation. these documents include good practices in terms of testing methods of the fire behavior on fibers, fabrics, and composites, and the measurement of the level of toxicity of the fumes generated upon burning. the use of these standards is not limited to the united states and has found a large echo among manufacturers of transportation equipment. a certain hierarchy may be observed regarding fire protection and smoke toxicity requirements. in the case of aircrafts, the us federal aviation administration is the leading authority (horrocks, 2013) . at a second level, aircraft manufacturers (boeing, bombardier, airbus, etc.), as well as part and component suppliers set the performance thresholds along with the certification procedures. for trains, the nfpa is a leader for urban communities (for subways and commuter trains) and states (for other rolling stocks) in north america. in europe, the standard series en 45545 relative to fire protection on railway vehicles, which was published in 2010, is gradually being adopted and should prevail over state authorities as well as rolling stock manufacturers. some textiles used for mobiltech applications are also subject to various mechanical constraints. fiber reinforced composites used for structural applications such as in airplane fuselages or wings, car doors, roofs and bodywork parts have to maintain their mechanical properties for the entire lifetime of the structure. it is therefore critical, especially when passengers' safety is at stake, to assess the durability of structural parts and observe the damages induced by mechanical stressors, including impact, fatigue, and abrasion. in addition, technical textiles used for carpets, curtains, upholstery, and headliners may also have to resist use and abuse, including acts of vandalism. in europe, fabrics are increasingly used for railway seat manufacturing in order to reach passengers' expectations in terms of comfort but also reduce the overall weight of the train. these fabrics have to be particularly resistant to tearing, cutting, marker and paint, chewing gum stains, etc. resistance to these aggressors has to be assessed by laboratory tests prior to introducing the product on the market. many textiles used inside private or public means of transportation also serve an aesthetic function; thus, this aspect has to be maintained as much as possible over the lifetime of the product. for that purpose, color fastness to ultraviolet or perspiration exposure along with crocking tests are conducted on dyed fabrics, and resistance to pilling is assessed on carpets, curtains, and upholstery. all of these tests are required by car, airplane, train, and ship manufacturers and must be conducted by mobiltech textile suppliers. these tests may also be stipulated by competent authorities such as the national highway traffic safety administration (nhtsa) in the united states, motor vehicle safety (mvs) in canada, or the european council for automotive r&d (eucar) in europe. the testing of automotive equipment safety has been implemented for a long time by the car manufacturers themselves. some major groups such as ford motor co. and general motors corp. in the united states have invested millions of dollars to evaluate safety accessories in high-tech research centers (johnson, 2005; white, 2015) . for instance, ford invested $65 million in 2005 to improve its safety testing facilities. efforts include engineering adaptive airbags and seatbelts. similarly, the suppliers of safety components and materials are expected to design and manufacture very high quality products with no room for failure, especially when it comes to tires, airbags, seatbelts, emergency handles, oxygen masks in airplanes, and life vests in airplanes and boats. this implies a high safety factor in the design and no manufacturing defects. the first airbag system for cars was developed in 1951 (hetrick, 1953) . it consisted originally of bladders inflated with compressed air. nowadays, they are made of plain weave super high tenacity nylon 6,6 or polyester (pet) fabrics (orme, walsh, & westoby, 2014) . some are coated with silicone and equipped with vents in the back as a way to allow a controlled deflation after impact. in addition, a large increase in the airbag inflation rate upon impact has been obtained. they now inflate through a chemical reaction generating nitrogen gas in <50 ms (tan & yu, 2012) . during a car crash, the kinetic energy of the car is released into the bodies of the passengers who are propelled toward the windshield. their acceleration relative to the car depends on the speed of the vehicle prior to the impact and the accumulation of energy during the crash (sobhani, young, logan, & bahrololoom, 2011) . as a complement to seatbelts, the presence of airbags is now considered as a key factor for survival during a crash. airbags are controlled by a central unit which monitors a series of sensors, including accelerometers that can detect variations in the vehicle speed indicative of a crash. due to the forces involved and the speed of the airbag deployment and inflation, it has to be constructed out of very high strength fabrics so that it does not explode, distort, or tear during the two most critical phases of its action: inflation and impact of the body. the airbag has thus to be designed so that it does not fail catastrophically, doesn't allow tear propagation, and includes very high safety factors. a first step in assessing airbag requirements consists of evaluating the different speeds at which bodies will be propelled toward the airbags depending on the weight of the body and the initial car velocity. in the case of a crash involving two moving vehicles, the kinetic energy of the crash may double if the collision is frontal. to help car manufacturers determine the requirements of their airbags, sobhani et al. (2011) developed a model allowing the estimate of the injury severity in the case of a twovehicle crash based on the calculation of the collision kinetic energy using the car speed, total mass, angle of collision, etc. tests assessing the performance of airbags in the united states may be conducted on the entire system using unbelted 50th-percentile size and weight male instrumented crash dummies seated in the front of a vehicle impacting a fixed rigid barrier perpendicular to its axis of travel at speeds of up to 48 km/h (fmvss 208). injury criteria are set in terms of head acceleration, thoracic acceleration, chest deflection, force transmitted axially through the upper leg, and level of neck injury. in addition, all portions of the dummy shall remain inside the passenger compartment. by comparison, the corresponding european test method involves belted crash test dummies (ece 94, 2003) . the airbag module may also be evaluated using a test stand that simulates deployment conditions in a vehicle (astm d5428, 2008) . the performance is assessed in terms of the variation in the airbag pressure and geometry over time as well as final conditions of the airbag module components. the requirements may also be set on the airbag fabric, yarn, and sewing thread (fung & hardcastle, 2001) . efforts to provide standardized test methods to that extent have been carried out by the astm subcommittee d13.20 on inflatable restraints, the society of automotive engineers (sae), and car manufacturers, for instance. table 14 .1 provides a list of fabric properties relevant to airbag with examples of associated test methods. in addition, the fabric may be inspected for imperfections using astm d5426 (2012). the presence of seat belts in cars dates back to 1930 after two american doctors found that crash related injuries in cars were largely related to the motion of passengers under the effect of their kinetic energy and could be prevented by an appropriate retention system (imre & cotetiu, 2014) . lap or 2-point seat belts became compulsory in europe for front car seats in 1965. a further improvement was the introduction of 3-point seat belts that restrain the motion of the chest toward the steering wheel or the front panel, which proved to be fatal in many cases: they are now mandatory in front and back seats of cars in many countries and are also used in a series of other road vehicles such as trucks and in the front seats of other commercial vehicles such as buses and ambulances. lap seat belts are still found in buses and aircrafts for passenger seats. in the case of child safety seats as well as for aircraft and racing car pilots for instance, 4-, 5-, or 6-point seat belts are used with additional buckling belts over the other shoulder and between the legs. current 3-point seat belt systems comprise a piece of webbing, a retractor, a pillar-loop, a tongue, a buckle, and a 3rd point anchor. the webbing is generally a multiple layer woven made of high-tenacity continuous filament polyester yarns (fung & hardcastle, 2001) . it has to have very high strength and good abrasion resistance, be soft and flexible in the longitudinal direction and rigid in the transverse direction, and resist uv degradation. according to the current process in the car industry, the information about seat belt components is provided to the component manufacturer in a design goal document (dgd) (imre & cotetiu, 2014) . this includes the positioning of each seat belt component, which depends on the inside layout of the car, as well as the vehicle destination market, which will dictate which standards the seat belt system has to comply with. for instance, europe refers to ece r16 (restraints and safety belts) and ece r44 (child restraints in power driven vehicles), while requirements for seat belts used in the united states are provided in fmvss 208 (occupant crash protection), fmvss 209 (seat belt assemblies), fmvss 210 (anchorages for seat belt assemblies), and fmvss 213 (child restraint system). if a car manufacturer decides to develop a new component for its seat belts, it will go through several steps of validation before having its component accepted by the competent committees and organisms. computer aided design (cad) will help model the component and choose the raw materials. it is also used for running virtual failure analysis through a failure mode and effects analysis (fmea) and constraint accumulation calculation by finite elements analysis (fea). then tests will be run on prototypes, including abrasion, puncture resistance, stiffness, elongation at break, ultimate tensile strength, accelerated fatigue testing, and uv, heat, and humidity aging. finally, a serial process control (spc) will determine the stability and predictability of the process. as in the case of airbags, seat belts may be tested as a system using a 50th percentile adult male dummy in the driver and passenger front seats of a vehicle subjected to frontal barrier crash test, lateral moving barrier crash test, and rollover test (fmvss 208). the frontal barrier crash test involves the vehicle impacting a fixed rigid barrier perpendicular to its axis of travel at speeds up to 48 km/h. in the lateral moving barrier crash test, the vehicle is impacted laterally on either side by a barrier moving at 32 km/h. the rollover test is conducted at 48 km/h over a concrete surface. depending on the test performed, injury criteria may be set in terms of head acceleration, thoracic acceleration, chest deflection, force transmitted axially through the upper leg, and/ or level of neck injury. in addition, all portions of the dummy shall remain inside the passenger compartment. requirements may also apply to the webbing materials. the performance factors to be assessed include the following (tc tsd 209, 2013): • width after conditioning for at least 24 h in an atmosphere having a temperature of 23°c and a relative humidity between 48% and 67%; • breaking strength measured with a rate of grip separation between 51 and 102 mm/min; • elongation when subjected to a force of 11.120 n; • resistance to abrasion, measured in terms of residual breaking strength; • resistance to light, measured in terms of residual breaking strength and color retention after 100 h exposure to type e carbon-arc at 60°c; and • resistance to microorganisms, measured in terms of residual breaking strength. if they cannot be considered as a safety accessory per se, tires play a very large role in ensuring the safety of the vehicle. at the interface with the ground, they are a key component that determines the vehicle behavior while it ramps up and brakes. developed and patented in the mid-19th century by a scottish engineer, r. w. thompson, pneumatic tires take advantage of the exceptional performance of vulcanized rubber (fung & hardcastle, 2001) . a scottish-born veterinary surgeon, john boyd dunlop, rediscovered pneumatic tires at the end of the nineteenth century while trying to improve the rolling comfort of his son's tricycle. his design involved the use of a fabric as reinforcement for the rubber. another large step was made in 1946 by michelin, who invented the radial tire method of construction. the more stable structure it produces allows a large increase in the tire longevity, driving safety, and fuel efficiency compared to the original cross-ply construction. they now have taken over most of the passenger road vehicle market, and the use of cross-ply tires is limited to some applications for trucks, trailers, farm equipment, and emerging markets (lindenmuth, 2006) . car and light truck radial tires contain about 5% polymer textiles for the carcass and 10%-12% of steel for the belt, which provide strength to the tire (mcdonel, 2006) . the amount of textile in cross-ply tires is larger with about 21% (fung & hardcastle, 2001) . textiles in radial carcasses are primarily composed of polyester, while they are mostly nylon for cross-ply structures (mcdonel, 2006) . aramids are also used when high strength-to-weight ratio and temperature resistance are required, for instance, in aircrafts and racing cars (fung & hardcastle, 2001) . a small amount of aramid and nylon may be found as well in belt overlays (mcdonel, 2006) . other tire components include natural and synthetic rubber, reinforcing fillers such as carbon black and silica, and various additives including vulcanization agents, plasticizers, stabilizers, antioxidants, and antiozonants (lindenmuth, 2006) . some tests are performed on the whole tire structure as described for instance in the us federal motor vehicle safety standard (fmvss 139) or in its european counterpart (ec 661, 2009 ). in the case of pneumatic tires, they cover the tire dimensions, high speed performance, endurance, low inflation pressure performance, and strength. in addition, the tire textile components may also be tested at the yarn, cord, and fabric scale (astm d885, 2010). when goods are transported by road, boat, train, and air, they have to be strongly secured to keep them from tilting, sliding, or moving around, which would be a source of danger and/or might damage them (dvsa, 2017). straps and slings are continued generally used to ensure this function. they may also be employed as a way to lift objects to load them on and unload them from the transportation mean. they are generally made with high-tenacity polyamide, polyester, or polypropylene multifilament. a series of european standards cover the safety aspects of textile slings: flat woven webbing slings made of manmade fibers (en 1492-1, 2008) , roundslings made of manmade fibers (en 1492 (en -2, 2008 , and lifting slings made from natural and manmade fiber ropes (en 1492 (en -4, 2008 . a fourth standard for disposable flat woven slings is in preparation. properties and performance of slings assessed in these standards comprise the type of polymer and yarn; the webbing construction, width, thickness, and tenacity; the change in webbing width under load; the working load limit and minimum failure force; and the interaction of the sling with fittings. a description of requirements and test methods may also be found in asme b30.9 (2010) for slings used in conjunction with cranes and ansi b77.1a (2012) for passenger ropeways (aerial tramways, aerial lifts, surface lifts, tows and conveyors), for instance. 14.3 testing related to flammability, smoke generation, and toxicity tests concerning flammability, smoke generation, and toxicity are very common for a large number of transportation applications because reduced flammability is one of the main safety requirements of almost all textiles used for passenger transportation. however, test methods may vary depending on the conditions the textile will be exposed to while in service. in addition, performance thresholds may also depend on the type of transportation means. for instance, the same textile flammability test may be associated with a more constraining threshold for aerospace applications than in the for automotive applications, standards concerning flame resistance of materials have been prepared by international and national organizations as well as car manufacturers. for instance, volvo has broadened the scope of the nhtsa flammability test for motor vehicle interior materials, fmvss 302, by conducting the test on specimens aged for 14 days at 38°c at 95% rh and at 70°c in addition to conditioned specimens (vcs 5031,19, 2004) . some countries have also adopted modified versions of standards issued by organization such as nfpa, the eu council, and nhtsa in the united states. the most common flammability test for road vehicle interior materials measures the horizontal burning rate of specimens exposed to a low-energy flame for 15 s in a combustion chamber (fmvss 302; iso 3795, 1989; astm d5132, 2011; vcs 5031,19, 2004) . the test also determines if and when the flame self-extinguishes. it applies to textiles situated within 13 mm of the occupant compartment air space in passenger cars, multipurpose passenger vehicles, trucks, and buses, as well as for tractors and agriculture and forestry machinery. other flammability test methods may involve a test chamber replicating the section of a school bus to assess the burning behavior of upholstered seating used in school buses (astm e2574, 2012). the chamber is equipped with two ventilation openings at each end and holds three rows of seats. the ignition source consists of a propane gas burner installed either on top or under one of the seats. the flammability performance is assessed in terms of the time elapsed between ignition and flame extinguishment, mass loss of the seat assembly, occurrence of flame spreading to other seats, and seat material melting or dripping. in europe, the european aviation safety agency (easa) has regulatory authority and executive tasks in the field of civil aviation safety. it works jointly with the national aviation authorities (naas) of the different european country members of the esea in order to ensure that airplane manufacturers and oems from these different countries fulfill the various standardization requirements and regulations. easa also works on technical agreements with its counterparts in the world such as the federal aviation administration (faa) in the united states or the canadian aviation regulation (car). the faa publishes federal aviation regulations (far) and advisory circulars (ac), which provide guidance for compliance with airworthiness standards for airplane manufacturers. for instance, the faa standard on airworthiness of airplanes (far/cs 25.853) includes several flammability test methods that apply to textile components: easa for its part is currently reviewing its legislation on civil aircrafts to take into account the increased air traffic and arrival of new technologies such as drones (juul, 2016) . in particular, it intends to better standardize the way flammability testing is conducted (easa cmcs-004, 2013 the nfpa standard for fixed guideway transit and passenger rail systems (nfpa 130, 2014) provides guidelines on tests to be performed on materials used in trains. their behavior in the case of a fire is characterized by their flame resistance and smoke emission. tests may be carried out on component materials or complete seat or mattress assembly. table 14 .3 lists test methods recommended in the nfpa 130 standard (2014) as well as in other us standards such as astm e2061 (2015) and fra 216 (1999) of the federal railroad administration for the different types of textiles or textile-based items. the permanence of the surface flammability and smoke emission characteristics of the materials is also verified after dynamic fatigue testing using roller shear or constant force pounding (astm d3574, 2016), after washing according to the manufacturer's recommended procedure or astm e2061 (2015), and, if relevant, after dry cleaning according to astm d2724 (2007). in europe, a common strategy in terms of fire requirements and test methods for materials used in railway vehicles has been set with the recent publication of the standard en 45545-2 (2013). requirements depend on the amount of material, its location, its use, and if it is in contact with another material. three main categories of performance are measured: the international maritime organization (imo) has the worldwide authority to set new standards for safety, minimum official requirements, and environmental performance of naval transportation ships. for instance, it has published fire test procedures (ftp) for testing flammability of materials including textiles used onboard. table 14 .4 provides a list of these different tests. the flammability test for vertically suspended textiles and films is also performed on specimens subjected to accelerated aging by dry-cleaning, laundering, water leaching, and weathering (imo ftp code, 2010). indications about fire test methods for textiles used in marine ships may also be found in the us code of federal regulations 46 cfr part 72.05-55 subsection relative to structural fire protection for furniture and furnishings for shipping. in addition, ship manufacturers may have to assess the fire characteristics of mattresses and bedding assemblies according to nfpa 267 (1998). the test method uses an open calorimeter environment to determine the heat release, smoke density, weight loss, and generation of carbon monoxide of mattresses and bedding assemblies exposed to a flaming ignition source. part as closed environments, transportation vehicles may facilitate the transmission of infectious diseases, in particular airborne ones (santos o'connor, 2012) . for instance, cases of influenza, severe acute respiratory syndrome (sars), meningococcal disease, tuberculosis, and measles transmission onboard planes are seen relatively frequently. cruise ships, trains, and school buses have also been documented as potential vehicles for the spreading of various diseases. in an effort to improve air quality, cabin interior air filter systems equipped with nonwoven filters have now become a requirement for many transportation vehicles. textile filters are also used in other parts of vehicles for air, oil, and fuel filtration. a complementary strategy aimed at limiting the transmission of diseases is based on the use of antimicrobial textiles. transportation was the leading segment of nonwoven filter media applications in 2014 with 21.2% of the total market revenue (james, 2015) . this trend has been maintained since then and may be attributed in part to tougher regulations towards the reduction in carbon emissions from automobiles. further, with respect to cars, large concerns have also been raised about the air quality inside the passenger compartment (fung & hardcastle, 2001 ). indeed, research has shown that, because of the tunnel effect, the exhaust gas concentration inside a car may be six times higher than outside. this phenomenon is amplified when a car is driven as a closed distance from the preceding one. the question of inside air filtration is also critical in public transportation, especially for long distance travel segments (santos o'connor, 2012) . cabin air filters may combine three mechanisms (fung & hardcastle, 2001) : mechanical filtration of the solid particles through the pores of the nonwoven, electrostatic attraction of the solid particles on the charged nonwoven fibers, and adsorption of gases and removal of odors by activated carbon granules distributed across the nonwoven filter. the performance of air filters may be tested for particulate and gas filtration. for instance, standard iso/ts 11155-1 (2001) allows assessing the pressure loss, fractional filtration efficiency, and accelerated particulate holding capability of filter elements for road vehicle passenger compartments using standardized laboratory particulate challenges larger than 0.3 μm. the dynamic gas adsorption of the passenger compartment air filters of road vehicles may be characterized according to the test methods described in iso/ts 11155-2 (2009). the air pressure loss as well as the gas and vapor removal characteristics are measured for a series of relevant contaminants. textiles used for seats, handles, carpets, etc. may also be treated to provide them with an antimicrobial function. the antimicrobial agent may be applied as a finishing treatment on the yarn or the fabric, or it may be incorporated into the polymer extrusion solution or the spinning bath (zhao & chen, 2016) . in the latter case, it has to slowly migrate towards the fiber surface to provide its function during use. finishing treatments may use conventional exhaust and pad-dry-cure methods as well as newly developed padding, spraying, coating, foam finishing, and microencapsulation techniques. for instance, silver nanoparticles have recently generated interest because they appear as a more health and environmentally friendly antimicrobial alternative to halogenated phenols such as triclosan. in addition, bacteria are less prone to develop resistance to silver nanoparticles than conventional antibiotics (chernousova & epple, 2013) . other strategies involve the use of natural compounds such as chitosan to limit the occurrence of side effects on health and the environment (lim & hudson, 2004) . the antibacterial activity of textile products may be quantified by directly inoculating fabric swatches with gram-positive staphylococcus aureus and/or gram negative klebsiella pneumoniae organism cultures (aatcc tm100, 2012). the antibacterial activity value of the tested textile is computed using bacteria counts on the samples immediately after inoculation and after the desired contact period. other standard test methods provide alternative transfer and printing techniques for inoculation (iso 20743, 2013) . more details about antibacterial efficiency test methods are available in chapter 6. fiber reinforced composites have taken an increasing share of transportation applications. glass reinforced plastics were developed in the 1920s. one of their first use in transportation can be dated back to the late 1940s, with boat hulls made of glass fiber reinforced polyester resin (bunsell & renard, 2005) . composites were introduced in military aircrafts in the 1960s. the renault 5 was the first car in 1972 to have a bumper made of glass fiber reinforced polyester. since then, the development of new fibers, for instance carbon and aramid fibers, has allowed an increase in the performance of composites and in the ratio of composite parts in all types of vehicles. for instance, the new airbus a350 xwb contains 53% of fiber reinforced composites that can be found in the wings, fuselage, empennage, and belly fairing as well as in the skin panels, doublers, joints, and stringers (airbus, 2016) . this has allowed airbus to save 20% in mass compared to aluminum and 25% in fuel consumption. it has also seen a strong improvement in the resistance of parts to corrosion and fatigue. the aston martin one-77 structural core is made of a carbon fiber composite monocoque, making the car weight only 1500 kg (aston martin, 2008) . in railway applications, alstom teamed up with france's national railways (sncf) to develop train noses made of carbon fiber reinforced composite in order to reduce the overall weight and improve the aerodynamics of high speed trains (mason, 2004) . however, all of these composite parts have to go through intense testing in order to fulfill the requirements on which the passengers' lives depend. these may be conducted at the preform and/or composite stage and may involve destructive and nondestructive test methods. textile reinforcements for composites include yarns, strands, tows or rovings (strong, 2008) . they may be used directly, for instance, in filament winding or fiber placement processes. they may also be transformed into more complex textile structures such as wovens, knits, and braids with anisotropic properties in all three directions. on the other hand, nonwovens or mats are manufactured directly using fibers or chopped strands that are more or less randomly distributed in the plane of the structure. recently, noncrimp fabrics (ncf) have been developed to combine the strength and perfect fiber placement of wovens with the flexibility, ease of manufacture, and absence of fiber crimp of nonwovens (schnabel & gries, 2011) . the 2-d sheet materials may then be assembled by stitching, z-pinning, or tufting, for instance, to create complex shaped, 3-d structures that will be cut to shape to be fitted in the composite part mold (mouritz, 2011) . near net-shape preforms may also be prepared directly using 3-d textile manufacturing techniques, which include 3-d interlock and orthogonal noncrimp weaving as well as 3-d braiding. this allows improving the pace of the process and the quality of the composite. the 2-d reinforcements and 3-d preforms may also be delivered as prepregs, i.e., with the textile being coated with the resin only partially cured (strong, 2008) . prepregs need to be stored in cold conditions to prevent premature complete polymerization. the properties of the textile reinforcement as well as its compatibility with the matrix play a major role in the performance of the composite material (strong, 2008) . indeed, the reinforcement gives the composite its strength and stiffness because it bears the stress applied on the composite part, which is transferred by the matrix. properties measured on the reinforcement include the following: • density of the 1-d raw material (fiber/filament/yarn/strand/tow/roving), e.g., using archimedes' method (astm d3800, 2016) or a pycnometer (astm d70, 2009; astm d5550, 2014); • elastic modulus, strength, and elongation at break of the 1-d raw material, e.g., using astm d4018 (2017) for continuous filament carbon and graphite fiber tows; • dry uniaxial bending of the reinforcement, e.g., based on standard test astm d1388 (2014) or using the apparatus developed by jldain (2015) . buckling of inner yarns may be observed using a translucent bending surface. • draping behavior of the reinforcement, e.g., using the double curvature technique developed by harrabi et al. (2008) ; • in-plane shear of the reinforcement using a bias extension or picture frame technique (long, boisse, & robitaille, 2005) . in-plane shear is considered to be the main deformation mechanism taking place when the reinforcement is formed to a 3-d geometry. this test also provides a measurement of the maximum deformation, with the yarn locking angle; • biaxial in-plane tension of the reinforcement, e.g., using the biaxial tensile device with cruciform specimens described in long et al. (2005) . it allows characterizing the warp and weft yarn interaction in woven fabrics. • dry compaction of the reinforcement, e.g., adapted from standard test iso 5084 (1996) . the test involves the application of compression cycles and provides a measurement of the preform thickness under a certain level of normal stress as well as the through-thickness rigidity of the material. it also gives an estimate of the maximum fiber volume fraction that can be achieved (robitaille & gauvin, 1998 ); • pore size distribution in the reinforcement, e.g., using microscopy, x-ray microtomography, and capillary flow porometry (bonnard, causse, & trochu, 2017) . this last technique allows accessing the dual scale structure of fibrous reinforcements; • resin permeability of the reinforcement using unidirectional injection or bidirectional flow measurement from a pointwise injection gate (demaría, ruiz, & trochu, 2007) . the test may be conducted with 100 cp silicon oil that behaves as a newtonian fluid. in addition, the type of reinforcement also has a large impact on the manufacturing process, which in turn controls the performance of the final composite part. defects in the textile reinforcement may also induce reduced performance and/ or premature failure in the composite part. these defects include fiber misorientation (saboktakin, dolez, & vu-khanh, 2011) , broken fibers (mouritz, 2011) , wrinkling (zhu, yu, zhang, & tao, 2011) , and local strain concentration due to stitching (chen, endruweit, harper, & warrior, 2015) . preform inspection may be conducted using x-ray microtomography (desplentere et al., 2005) or by reconstructing geometries from sections obtained by electronic microscopy (blanc, germain, da costa, baylou, & cataldi, 2006) . depending on its functions, the composite part will have to meet different requirements. these requirements include the following (mallick, 2007) : various destructive test methods exist to characterize the properties as well as the short and long-term performance of composite parts. some tests are conducted on composite specimens (table 14 .5). oems have also developed some applicationspecific test procedures that they may perform on full-scale parts (kia, 2012; perret, mistou, fazzini, & brault, 2012) . as an alternative to destructive testing, nondestructive test (ndt) methods allow looking for defects and damages inside composite parts without cutting them apart or even decreasing their performance. these techniques are critical for in-service inspection but may also be useful for production quality control as well as rapid sample testing. they are categorized as contact and noncontact methods in table 14 .6. for instance, x-ray computed tomography and ultrasound-based techniques were successfully used to detect two significant process-induced defects, namely fiber breakage and ply misorientation, in woven-reinforced composites manufactured by vacuum assisted resin transfer molding (saboktakin rizi, 2013) . in addition, new ndt techniques are continuously developed. for instance, a micro-vibrothermography device was designed to detect deep submillimeter flaws in stitched t-joint carbon fiber reinforced polymer (cfrp) composites (zhang et al., 2016) . a burst of ultrasound waves is delivered to the specimen with a 200 pa ultrasound excitation transducer. the temperature profile is captured with an infrared camera equipped with a microlens. the same team also developed a microlaser line thermography technique, which was successfully used to detect internal microporosities in the same stitched t-joint cfrp composites. if materials and parts used in transportation have to display a certain level of performance when they come out of the production line, they also have to maintain these performance as much as possible over the entire life of the vehicle. for instance, cars today are built to last about 250,000 miles (gorzelany, 2013) . on the other hand, aircraft lifespan is determined by the number of takeoff and landing cycles they experience (maksel, 2008) ; aircrafts that only make long flights may last more than 20 years because of the lower number of pressurization cycles. however, textiles used in floorings, upholstery, and draperies in the interior cabin will not last as long and may need to be cleaned, repaired, and/or replaced at regular intervals. the performances of textile and textile-based materials related to durability for transportation applications include the following (national research council, 1995 pulse echo ultrasonic testing through transmission ultrasonic testing acoustic emission radiography (e.g. x-ray tomography) electromagnetic testing (e.g., eddy current) thermography (e.g., infrared testing) liquid penetrant testing holography magnetic particle testing shearography visual inspection table 14 .6 nondestructive test methods for textile-reinforced composites (gholizadeh, 2016) a first set of aging agents are environmental. indeed, temperature, humidity, and uv radiations may induce changes in the materials properties and performance over time. for instance, some polymers like polyolefins are prone to thermo-and photooxidation degradation, while polyester and polyamide are especially sensitive to hydrolysis and polyvinylchloride may discolor and become brittle at high temperatures (verdu, 1984) . polymers and natural fibers may also be degraded by microorganisms. the resistance of textiles and textile-based materials to heat, moisture, water spray, and uv aging is usually tested using conditions that accelerate the aging process to reduce the duration of the test. weathering programs simulating climatic conditions to which the material is likely to be subjected in service will combine cycles with variations in temperature, water, and/or radiation conditions. resistance to microorganisms may also be assessed to evaluate the effect of bacteria, mildew, and rot. the durability is generally characterized in terms of color change as well as effect on mechanical and other physical properties. table 14 .7 provides a list of test methods used in the transportation industry to characterize the resistance of materials to environmental aging. some of them have been developed by private companies, e.g., peugeot citroën (psa test methods in table 14 .7) and general motors (gm test methods). aging may also result from normal wear as well as accidental or intentional damage during service. damage may be generated by a mechanical action, for instance crocking, pilling, abrasion, snagging, tear, laceration, etc. (kern, 2014) . it may also be of chemical nature, for example, resulting from a water leak, fluid spill, perspiration, soiling by a sick passenger, etc. if the test item is designed to be laundered, colorfastness and resistance to laundering cycles may also have to be assessed. the effect may be visual with a change in color or a stain. a loss in performance and/or physical integrity of the material may also observed. table 14 .8 provides a list of test methods used in the transportation industry to characterize the resistance of materials to aging due to service conditions. some of them have been developed by private companies like volkswagen (pv test methods in (fung & hardcastle, 2001; ul, 2016) fatigue is a specific aspect of material aging where the part is subjected to loading/ unloading cycles. this mechanism is critical for the durability of rigid materials such as textile-reinforced composites. but it is also relevant for textiles and coated textiles which are more flexible. first of all, fibers themselves exhibit fatigue failure (miraftab, 2009) . loading may be applied in tension, lateral compression, flexion, and torsion. the damage may come during the manufacture of the woven or braided structure for instance as well as during use. one typical example is the cyclic fatigue experienced by a tire cord. in addition to the mechanical stress mode and conditions of application (frequency, amplitude, offset, etc.), other parameters may affect the fatigue failure of a fiber: its composition and manufacturing process/conditions, its dimensions, the presence of impurities, its environment (temperature, humidity, uv, ph, bacteria), etc. the measurement is conducted by applying cyclic loading conditions in a specified environmental. the result is expressed in terms of stress vs. number of cycles to failure (s-n) curves and survival diagrams. fatigue may also be experienced at the textile structure scale. yet, not much research appears to have been done in that field. in the case of flexural fatigue of woven (fung & hardcastle, 2001; ul, 2016) fabrics, it was shown that the performance depends on the structure of the fabric, the position and structure of the yarn, and the yarn material (schiefer & boyland, 1942) . surface fatigue also contributes to wear behavior upon abrasion (özdil, kayseri, & mengüç, 2012) . fatigue cracks form at the surface of the material due to alternating compression-tension stresses and propagate to subsurface regions where they may rejoin. fatigue failure performance is a major component in composite part design and has been the focus of several studies (carvelli & lomov, 2015) . a series of test methods for textile-reinforced composites has also been developed by various standardization organizations (table 14 .9). in addition, it was shown in a study that the s-n curve should be combined with the time-temperature superposition principle to take into account the effect of the environmental conditions, in that case temperature and water absorption (miyano & nakada, 2009 since the beginning of the 21st century, the rising concern about global warming has led the scientific community to integrate sustainable development in its research projects and look for more eco-friendly solutions in the design of new products: improvement in process manufacturing (lower energy consumption and reduction of wastes), product lifetime (lightweight fabrics, low voc emission, maintenance-free systems), and product afterlife (reuse or recycle). as a result, green fabrics and composites are becoming an interesting alternative to many synthetic products, with the use of natural fibers, biosourced resins, and nontoxic and biodegradable dyes and finishes (chard, creech, jesson, & smith, 2013) . it also pushes research forward into the reduction of greenhouse gases through the development of new lightweight and durable materials in transportation. natural fibers have been used for some time as a replacement for glass and carbon fibers in composites for nonstructural applications. indeed, in addition to their low cost, they display a low density which can lead to reduced energy consumption as well as competitive specific mechanical properties. they are also biodegradable. in the automotive industry, for instance, jute was used by mercedes-benz in 1996 for its e-class vehicle door panels (koronis, silva, & fontul, 2013) . a blend of flax and sisal later found its way as reinforcement into audi's 2000 a2 midrange car door trim panels; kenaf in toyota's 2003 spare tire cover; bamboo fibers in mitsubishi motors' interior components; and wheat straw in ford 2010 flex crossover vehicle storage bin and inner lid. natural fibers are now considered for more high-performance applications thanks to improvements in their compatibility with polymer matrices (pickering, efendy, & le, 2016) . for instance, a green hydrophobic treatment based on zinc oxide nanorods and stearic acid was developed for recycled jute fibers (arfaoui, dolez, dubé, & david, 2017) . however some issues remain, including the inherence variability in their physical properties as well as their poor moisture resistance and limited thermal stability (koronis et al., 2013) . this thus requires some adjustments in testing programs to ensure that they perform as required by the application. it may be noted that, to the authors' knowledge, natural fiber-based textiles have not found an application in transportation by themselves, i.e., without being combined with a polymer matrix. this may be eventually attributed to their short life resulting from their biodegradability. biosourced resins provide an interesting alternative to the recycling dilemma of composites. for instance, toyota has been using polylactic acid (pla), a biodegradable thermoplastic polyester derived from renewable resources, in the spare tire cover of its raum 2003 (koronis et al., 2013) ; it was sugar cane and sweet potato in that case. other bioderived resins foreseen as a matrix for green composites include poly-l-lactide (plla), polyhydroxybutyrate (phb), poly(3-hydroxybutyrate-co-3hydroxyvalerate) (phbv), and thermoplastic starch. current issues that limit the use of biosourced resins in the transportation industry include their propensity to biodegrade and their high price (koronis et al., 2013) . a debate also exists on whether or not they represent a real sustainable alternative to conventional plastics. indeed, they should not reduce the amount of cultivated food available to humans and animals, for instance, by decreasing the amount of fertile lands for edible crops, or increasing land clearing. many dyes or finishes currently used in the textile industry are highly pollutant and require very important amounts of water and energy for processing. for instance, brominated flame retardants still constitute the most common solution used for seating, curtains, carpets, etc. in transportation vehicles (icl, 2012 ). yet, their effects on health and the environment have been clearly demonstrated (see section 7.6 in chapter 7 on toxicity testing of textiles). other toxic chemicals that may be found in textiles include heavy metals, toxic dyes, pesticides, phthalates, nonylphenol ethoxylates, dioxins, and furans. in addition, some finishes like polybrominated diphenyl ether (pbde) used in airplane carpets are responsible for the emission of vocs in confined environments (allen et al., 2013) . large efforts are currently deployed to develop eco-friendly additives and finishes for textiles. for instance, phosphorous-based compounds (salmeia, gaan, & malucelli, 2016) as well as nanoclay and carbon nanotube composites (arao, 2015) are considered as an alternative to halogenated fire retardants. natural dyes may also ultimately replace the toxic synthetic dyes currently used in the textile industry (bechtold, turcanu, ganglberger, & geissler, 2003) ; in addition, their application does not require the use of solvents or other chemicals, and they lead to a reduction in the chemical load released with waste waters. tougher regulations are being set to better control the amount of chemicals used in textile processes and limit those that are the most toxic. for instance, nonylphenol ethoxylates (npe), which had been banned from use within its borders by the european union for 20 years, have also recently been voted by all eu member states to be excluded from textile imports (flynn, 2015) . in response to the largest interest of consumers for green products and sustainable development, a majority of textile companies are now including environmental management systems (ems) such as iso 14001 and/or the adhesion to voluntary eco-labels as part of their business model (see chapter 7 on toxicity testing of textiles). conferring multifunctionality to fabrics is one of the main contemporary goals of technical textile manufacturers. a fabric that can be made fire-resistant with integrated nanotechnologies such as carbon nanotubes (cnt) or nanoclays while being hydrophobic, self-cleaning, and antibacterial at the same time is gold for public transportation manufacturers (alongi, carocio, & malucelli, 2013) . cnt may also be added to composites to provide them with electrical conduction capabilities, thus reducing the use of cables for carrying information or electricity. in addition, they can be used for the in-situ detection of defects, cracks or delamination (nofar, hoa, & pugh, 2009) . they even displayed an increased sensitivity compared to strain gauges. finally, smart textiles are opening new paths in the transportation industry. for instance, smart seat belts for airplanes have been developed by ctt group in partnership with belt-tech (decaens & vermeersch, 2016) . this smart belt sends a signal if it is not buckled. this technology and the others imply the need to develop new testing methods or adapt existing ones in order to take into account the new material or functionality. in that case, the durability of the connective wire inside the seat belt should be assessed against environmental and service aging among others. the remarkable evolution of technical textiles for transportation through the last century has been driven by a constant concern for passenger's safety. insuring a comfortable and secure journey is also a key marketing strategy for aerospace, railway, marine, and automotive manufacturers, as well as a path towards bigger market shares. these companies are thus pushing research forward into developing state-of-the-art textile products respecting specifications but also featuring unique performances in order to be a step ahead of their competitors. this has led to the development of test methods assessing the different aspects of textiles in transportation. this includes performance related to safety for airbags, seat belts, tires, and slings; flammability, smoke generation, and toxicity, with test methods and specific requirements for each type of application; hygiene for filters and antimicrobial textiles; destructive and nondestructive tests conducted on textile-reinforced composites at the textile and composite level; and durability. technical textiles are promised to a bright future in transportation, with new developments involving natural fibers, biosourced resins, eco-friendly additives and finishes, and multifunctional and smart materials among others. more information about test methods may be obtained from dedicated committees in various organizations, including the following: antibacterial finishes on textile materials: assessment of american association of textile chemists and colorists flammability lab a350 xwb-cost-effectivness exposure to flame retardant chemicals on commercial airplanes smart (nano) coatings. update on flame retardant textiles: state of the art, environmental issues and innovative solutions passenger ropeways-aerial tramways, aerial lifts, surface lifts, tows and conveyors-safety requirements flame retardancy of polymer nanocomposite development and characterization of a hydrophobic treatment for jute fibres based on zinc oxide nanoparticles and a fatty acid safety standard for cableways, cranes, derricks, hoists, hooks, jacks, and slings-slings standard test method for density of semi-solid bituminous materials (pycnometer method) standard test methods for tire cords, tire cord fabrics, and industrial filament yarns made from manufactured organic-base fibers standard test method for stiffness of fabrics standard test methods for bonded, fused, and laminated apparel fabrics standard test methods for flexible cellular materials-slab, bonded, and molded urethane foams standard test method for density of high-modulus fibers standard test methods for properties of continuous filament carbon and graphite fiber tows standard guide for testing polymer matrix composite materials test method for horizontal burning rate of polymeric materials used in occupant compartments of motor vehicles standard practices for visual inspection and grading of fabrics used for inflatable restraints standard practice for accelerated aging of inflatable restraint fabrics standard practice for evaluating the performance of inflatable restraint modules standard practice for determining physical properties of fabrics, yarns, and sewing thread used in inflatable restraints standard test method for specific gravity of soil solids by gas pycnometer standard guide for testing fabric-reinforced "textile" composite materials standard guide for fire hazard assessment of rail transportation vehicles standard test method for fire testing of school bus seat assemblies aston martin-one-77 natural dyes in modern textile dyehouses-how to combine experiences of two centuries to meet the demands of the future moving fabric fiber orientation measurements in composite materials experimental characterization of the pore size distribution in fibrous reinforcements of composite materials fundamentals of fibre reinforced composite materials 2016 top markets report-technical textiles-a market assessment tool for fatigue of textile composites green composites: sustainability and mechanical performance inter-ply stitching optimisation of highly drapeable multi-ply preforms silver as antibacterial agent: ion, nanoparticle, and metal technical textiles and nonwovens: world market forecasts to wearable technologies for ppe: embedded textile monitoring sensors, power and data transmission, end-life indicators in-plane anisotropic permeability characterization of deformed woven fabrics by unidirectional injection. part i: experimental results micro-ct characterization of variability in 3d textile architecture load securing: vehicle operator guidance notification of a proposal to issue a certificate memorandum on flammability testing of interior materials type-approval requirements for the general safety of motor vehicles, their trailers and systems, components and separate technical units intended therefor textile slings-safety-part 1: flat woven webbing slings made of man-made fibres for general purpose use textile slings-safety-part 4: lifting slings for general service made from natural and man-made fibre ropes railway applications. fire protection on railway vehicles. requirements for fire behaviour of materials and components airworthiness standards: transport category airplanes. 14 cfr part dc: united states department of transportation, federal aviation administration eu countries agree textile chemical ban. the guardian occupant crash protection. 49 cfr 571.208-federal motor vehicle safety standard no. 208 flammability of interior materials. 49 cfr part 571.302-federal motor vehicle safety standard no. 302 passenger equipment safety standards. 49 cfr part 216 textiles in automotive engineering a review of non-destructive testing methods of composite materials cars that can last for 250,000 miles (or more) characterization of protective gloves stiffness: development of a multidirectional deformation test method safety cushion assembly for automotive vehicles. united states patent and trademark office flame retardant textiles for transport applications fire protection for automotive and transportation international code for application of fire test procedures contribution to validation and testing of seatbelt components about iso-what are standards? road vehicles, and tractors and machinery for agriculture and forestry-determination of burning behaviour of interior materials textiles-determination of thickness of textiles and textile products reaction to fire tests-spread of flame-part 2: lateral spread on building and transport products in vertical configuration plastics-smoke generation-part 2: determination of optical density by a single-chamber test reaction-to-fire tests-heat release, smoke production and mass loss rate-part 1: heat release rate (cone calorimeter method) and smoke production rate (dynamic measurement) reaction to fire tests for floorings-part 1: determination of the burning behaviour using a radiant heat source reaction to fire tests-ignitability of products subjected to direct impingement of flame-part 2: single-flame source test textiles-assessment of the ignitability of bedding items-part 2: ignition source: match-flame equivalent textiles-determination of antibacterial activity of textile products reaction-to-fire tests-full-scale room tests for surface products-part 2: technical background and guidance road vehicles-air filters for passenger compartments-part 1: test for particulate filtration road vehicles-air filters for passenger compartments-part 2: test for gaseous filtration nonwoven filter media market to reach $7.18 billion by 2022 behaviour and inspection of novel non-crimp dry thick reinforcement fabrics ford beefs up its safety testing facilities solar sail propulsion new civil aviation safety rules interior care and maintenance. aviationpros. retrieved from www.aviationpros.com (accessed focal project 4: structural automotive components from composite materials. automotive composites consortium green composites: a review of adequate materials for automotive applications application of a fiber-reactive chitosan derivative to cotton fabric as an antimicrobial textile finish an overview of tire technology. the pneumatic tire design and manufacture of textile composites automotive airbag and seat belt market for passenger cars-a global analysis: by geography-trends and forecasts automotive composites market by type (glass fiber composites, carbon fiber composites, natural fiber composites, metal matrix composites, and ceramic matrix composites) and by application (interior components, exterior components, chassis & powertrain components and others)-global trends & forecast to what determines an airplane's lifespan? fiber-reinforced composites: materials, manufacturing, and design composites aboard high-speed trains tire cord and cord-to-rubber bonding. the pneumatic tire space shuttle thermal protection system. ibook (18 p.) basic principles of fatigue accelerated testing for long-term durability of various frp laminates for marine use three-dimensional (3d) fiber reinforcements for composites fire-and smoke-resistant interior materials for commercial transport aircraft standard for fixed guideway transit and passenger rail systems standard method of test for fire characteristics of mattresses and bedding assemblies exposed to flaming ignition source self sensing glass/epoxy composites using carbon nanotubes survivability of accidents involving part 121 u.s. air carrier operations equivalency or compromise? a comparative study of the use of nylon 6,6 and polyester fiber in automotive airbag cushions analysis of abrasion characteristics in textiles research on the breaking and tearing strengths and elongation of automobile seat cover fabrics global behaviour of a composite stiffened panel in buckling a review of recent developments in natural fibre composites and their mechanical performance carbon composites are becoming competitive and cost effective compaction of textile reinforcements for composites manufacturing. i: review of experimental results integrity assessment of preforms and thick textile reinforced composites for aerospace applications damage identification of thin textilereinforced composite plates using vibration-based testing methods recent advances for flame retardancy of textiles based on phosphorus chemistry emerging and re-emerging infectious diseases note on flexural fatigue of textiles production of non-crimp fabrics for composites applications of textiles in marine products a kinetic energy model of two-vehicle crash injury severity rail industry-statistics & facts fundamentals of composites manufacturing. materials, methods and applications new device design and performance test on gas generator in automobile airbag technical standards document no. 209, revision 2r-seat belt assemblies global marine composites market global technical textiles market application areas automotive testing and engineering services flammability of interior materials le vieillissement des plastiques gm paving way to smarter and safer driving at all-new active safety test area vehicle weight is the key driver for automotive composites. reinforced plastics international trade statistics comparative study of microlaser excitation thermography and microultrasonic excitation thermography on submillimeter porosity in carbon fiber reinforced polymer composites halogeneted phenols and polybiguianides as antimicrobial textile finishes experimental investigation of formability of commingled woven composite preform in stamping operation the authors wish to thank mr. valério izquierdo, amir fanaei, and jonathan levesque for their assistance in preparing the manuscript. key: cord-290251-ihq8gdwj authors: hasell, joe; mathieu, edouard; beltekian, diana; macdonald, bobbie; giattino, charlie; ortiz-ospina, esteban; roser, max; ritchie, hannah title: a cross-country database of covid-19 testing date: 2020-10-08 journal: sci data doi: 10.1038/s41597-020-00688-8 sha: doc_id: 290251 cord_uid: ihq8gdwj our understanding of the evolution of the covid-19 pandemic is built upon data concerning confirmed cases and deaths. this data, however, can only be meaningfully interpreted alongside an accurate understanding of the extent of virus testing in different countries. this new database brings together official data on the extent of pcr testing over time for 94 countries. we provide a time series for the daily number of tests performed, or people tested, together with metadata describing data quality and comparability issues needed for the interpretation of the time series. the database is updated regularly through a combination of automated scraping and manual collection and verification, and is entirely replicable, with sources provided for each observation. in providing accessible cross-country data on testing output, it aims to facilitate the incorporation of this crucial information into epidemiological studies, as well as track a key component of countries’ responses to covid-19. across the world, researchers and policymakers look to confirmed counts of cases and deaths to understand and compare the spread of the covid-19 pandemic. however, data on cases and deaths can only be meaningfully interpreted alongside an accurate understanding of the extent and allocation of virus testing 1 . two countries reporting similar numbers of confirmed cases may in fact have very different underlying outbreaks: other things being equal, a country that tests less extensively will find fewer cases. many countries now publish official covid-19 testing statistics, but the insights offered by these numbers remain relatively unexplored both in public discourse and scientific research. this may be because of barriers limiting access to this data: the statistics are scattered across many websites and policy documents, in a range of different formats. no international authority has taken on the responsibility for collecting and reporting testing data. we developed a new global database to address this lack of access to reliable testing data, thereby complementing the available international datasets on death and case counts 2 . the database consists of official data on the number of covid-19 diagnostic tests performed over time across 94 countries (as of 31 august 2020). we rely on figures published in official sources, including press releases, government websites, dedicated dashboards, and social media accounts of national authorities. we do not include in our database figures that explicitly relate to only partial geographic coverage of a country (such as a particular region or city). the resulting database is (i) updated regularly through a combination of automated scraping and manual collection and verification, and (ii) entirely replicable, with sources provided for each observation. in addition, the database includes extensive metadata providing detailed descriptions of the data collected for each country. such information is essential due to heterogeneity in reporting practices, most notably regarding the units of measurement (people tested, cases tested, tests performed, samples tested, etc). series also vary in terms of whether tests pending results are included, the time period covered, and the extent to which figures are affected by aggregation across laboratories (private and public) and subnational regions. the comprehensiveness of our database enables comparisons of the extent of testing between countries and over time -in absolute terms, but also relative to countries' population, and to death or confirmed case counts (fig. 1) . such variation offers crucial insights into the pandemic. at the most basic level, it is clear that a country that tests very few people -such as the democratic republic of congo, or nigeria (fig. 1a ) -can only have very few (2020) 7:345 | https://doi.org/10.1038/s41597-020-00688-8 www.nature.com/scientificdata www.nature.com/scientificdata/ confirmed cases. the number of performed tests should be seen as an upper limit for the number of confirmed cases. further, high positive test rates ( fig. 1 -see reference lines) may help identify severe underreporting of cases. the relationship between test positivity rate and case underreporting has been explored in the context of other infectious diseases 3 . in terms of covid-19, this link is discussed by ashish jha and colleagues at the harvard global health institute, who provide a sketch of the relationship between cases, deaths and the positivity rate in the united states (see https://globalepidemics.org/2020/04/18/why-we-need-500000-tests-per-d ay-to-open-the-economy-and-stay-open). in a more formal analysis, golding et al. 4 find that their modelling estimates of the case ascertainment rate are weakly correlated (kendall's correlation coefficient of 0.16) with the number of tests per case -the inverse of the test positivity rate -derived from our database, with a positive relationship evident in the range of 10-35 tests per case 4 . the institute for health metrics and evaluation (ihme) www.nature.com/scientificdata www.nature.com/scientificdata/ include testing data sourced from our database in their covid-19 models (www.healthdata.org/covid/faqs#dif-ferences%20in%20modeling). in bringing this data together, our hope is that we will facilitate future research in this direction. more generally, our aim is to provide an essential complement to counts of confirmed cases and deaths. these are the figures that guide public policy, both in the initiation of control measures and as they start to be relaxed. but without the context provided by data on testing, reported cases and deaths may offer a very distorted view on the true scale and spread of the covid-19 pandemic. the database consists of two parts, provided for each included country: (1) a time series for the cumulative and daily number of tests performed, or people tested, plus derived variables (discussed below); (2) metadata including a detailed description of the source and any available information on data quality or comparability issues needed for the interpretation of the time series. for most countries, a single time series is provided: either for the number of people tested, or the number of tests performed. for a few countries for which both are made available, both series are provided. in such cases, metadata is provided for each separate series. data collection methods. the time series data is collected by a combination of manual and automated means. the collection process differs by country and can be categorized into three broad categories. firstly, for a number of countries, figures reported in official sources -including press releases, government websites, dedicated dashboards, and social media accounts of national authorities -are recorded manually as they are released. secondly, where such publications are released in a regular, machine-readable format, or where structured data is published at a stable location, we have automated the data collection via r and python scripts that we execute every day. these are regularly audited for technical bugs by checking their output against the original official sources (see 'technical validation' , below). lastly, in some instances where manual collection has proven prohibitively difficult, we source data from non-official groups collecting the official data, most often on github. these are also regularly audited for accuracy against the original official sources (see 'technical validation' , below). any available information on data quality or comparability issues needed for the interpretation of the time series is gathered and summarized manually into detailed metadata for each series, guided by a checklist of data quality questions (see data records, below). general been the basis for covid-19 case confirmation, in line with who recommendations 5 . since the primary purpose of the database is to provide information on testing volumes specifically to aid the interpretation of data on confirmed cases, it is exclusively this category of testing technologies that the database aims to include. in order to be included, a data point for a given country must report an aggregate figure that includes both negative tests (or negatively tested individuals) plus positive tests (or confirmed cases). the units (whether the number of tests or individuals is being counted) must be consistent across positive and negative outcomes. the aggregate figure must refer to a known time period -for instance, the number of tests performed in the last day or week. however, where a cumulative total is provided, it is not a requirement that the specific start date to which the cumulative count relates must be specified, provided that it is clear that the figure aims to capture the whole of the relevant outbreak period. figures relating to testing 'capacity' or to rough indications of average testing output, or to the number of tests that have been distributed (rather than actually performed) are not included in the database. where figures for pending tests are provided separately by a source these are excluded from our counts. where they cannot be separated, the figures including pending tests are reported. details concerning pending tests for individual countries can be found in the metadata. the database provides a time series for both the cumulative number of tests (or people tested) and for daily new tests. exactly how these series are derived depends on the way the raw data is reported by the source. where a source provides a complete time series for daily tests, we derive an additional cumulative series as the simple running total of the raw daily data. where a source provides cumulative figures, we derive an additional daily series as the day-to-day change observed in consecutive observations. in many cases the source data is not available at a daily frequency (fig. 2) . in order to facilitate cross-country comparisons over time, we derive an additional 'smoothed' daily testing series calculated as the seven-day moving average over a complete, linearly interpolated daily series (described in more detail in the data records section below). retrospective revisions in the source data. due to the efforts to produce timely data, official testing figures are subject to frequent retrospective revisions. this can occur for instance where some laboratories have longer reporting delays than others, and previously uncounted tests are then subsequently included. this issue presents no difficulties where sources provide an updated time series within which such revisions are appropriately incorporated; for instance, by backdating the additional tests to the date they were performed. however, a number of the sources we rely on provide only a 'snapshot' of the current cumulative figure, with no time series. we construct our cumulative and daily testing time series from the sequence of these 'snapshots' . for these cases, retrospective revisions do impact our data since revisions to the data are included on the day the revision is made, not when the revised tests occurred. typically, this results in only small deviations in the www.nature.com/scientificdata www.nature.com/scientificdata/ cumulative figure in proportional terms, but the derived daily testing series can be impacted more meaningfully. at the extreme, in a few cases, such revisions result in a fall in the cumulative total from one day to the next, implying a negative number of tests for that day. this issue is mitigated in two ways. firstly, given that much of retrospective revision relates to testing conducted over the last few days, the 'smoothed' daily time series we derive reduces some of the artificial volatility introduced. secondly, we alert the user as to which data is subject to such concerns as part of the information included in the metadata (see below). a copy of the database has been uploaded to figshare 6 . this provides a version of the database as it stood at the time of submission, on 31 august 2020. a live version of the database, which continues to be updated, can be downloaded from a public github repository (https://github.com/owid/covid-19-data/tree/master/public/data/testing) in csv, xlsx, and json formats, which may be imported into a variety of software programs. structure. the database consists of two components: a time series file including observations of cumulative and daily testing (covid-testing-all-observations.csv), and metadata (covid-testing-source-details.csv). each row in the metadata table provides source details (discussed below) corresponding to a given country-series (i.e. the combination of country and series fields make up a unique id within covid-testing-source-details.csv). the time series for cumulative and daily testing for each country-series is then provided in the covid-testing-all-observations.csv file. in addition, we provide the raw data (raw-collected-data.csv), as collected from the source, in order to make it plain how our time series data is constructed from the original observations. we also provide the united nations population data for 2020 (un-2020-population.csv) used to derive the per capita measures included in the time series. raw-collected-data.csv. country. each observation relates to testing conducted within the indicated country. we do not include in our database figures that explicitly relate to only partial geographic coverage of a country (such as a particular region or city). the country's 3-letter iso 3166-1 code is also provided as a separate field. units. a short description of the unit of observation of the collected testing figures, selected out of three possible categories: "people tested", "tests performed", "samples tested". series for which it was not possible to discern the category are labelled as "units unclear". series. multiple series (e.g. people tested and samples tested) are included for some countries, and are demarcated by this field. common to covid-testing-all-observations.csv and raw-collected-data.csv. date. depending on the source, this may relate to the date on which samples were taken, analyzed, or registered, or simply the date they were included in official figures (see 'retrospective revisions in the source data' , above). in general, sources try to provide testing data relating to a given, stable cut-off time each day. where significant changes in reporting windows have been found, these have been noted in the notes field (see below). www.nature.com/scientificdata www.nature.com/scientificdata/ cumulative total. the reported cumulative amount of testing as of date. the specific date to which the cumulative figures date back to, if known, is provided in the metadata (see below). in many cases this is not explicitly stated by a source, but only figures that appear to intend to capture the entire period of the testing response to covid-19 outbreak within the country are included in the database. in covid-testing-all-observations.csv, for those sources only providing daily testing figures, this field is derived as the running total of the raw daily data, and is also provided per thousand people of the country's 2020 population. daily change in cumulative total. broadly, this field may be interpreted as the number of new tests (or people tested) per day. for sources that report new tests per day directly, this field in covid-testing-all-observations. csv is identical to the raw data presented in raw-collected-data.csv. for sources that report only cumulative testing figures, the field is derived as the day-to-day change observed in consecutive observations of the raw cumulative total data. this may fail to correspond to the true number of new tests for that date where the source has included retrospective revisions in the cumulative totals (see 'retrospective revisions in the source data' , above). in covid-testing-all-observations.csv, this series is also provided per thousand people of the country's 2020 population. source url. a url at which the specific observation of the corresponding raw data can be found. source label. the name of the source for the observation. notes. contains any notes to aid the interpretation of this specific observation (above and beyond details that apply to the whole series, which are provided in covid-testing-source-details.csv). specific to covid-testing-all-observations.csv. 7-day smoothed daily change. as an outbreak progresses, flows of new tests per day, rather than cumulative figures, become more relevant for understanding trends. daily testing figures however suffer from volatility created by reporting cycles. moreover, since many sources do not provide data at daily intervals, figures for new tests per day are available with more limited coverage. to aid the cross-country analysis of testing volumes over time, we provide this short-term measure of testing output that aims to mitigate these two problems. it is calculated as the right-aligned rolling seven-day average of a complete series of daily changes. for countries for which no complete series of daily changes is available because of the reporting frequency of our source, we derive it by linearly interpolating the daily cumulative totals not available in the raw data, up to a maximum interval of 21 days. the exact code used to derive the 7-day smoothed daily change is available online (see 'code availability' , below). specific to covid-testing-source-details.csv. number of observations. the number of days for which raw observations are available. detailed description. a written summary of available information concerning the nature and quality of the source data needed for proper interpretation and cross-country comparison. the collation of this information is guided by a 'checklist' of data quality questions regarding: the unit of observation; which testing technologies figures relate to; whether tests pending results are included; the time period covered; and the extent to which figures are affected by aggregation across laboratories (private and public) and subnational regions. in practice the documentation we are able to provide is limited by that made available by the official source. we aim to include any information provided by the original source needed for the interpretation and comparison with other countries. coverage. the database includes observation for 94 countries, covering 69% of the world's population. because of differences in the frequency at which countries publish testing data, coverage is somewhat lower for more recent periods: 62% of the world's population is covered with figures relating to 30-31 august 2020; 45% is covered with figures relating to 24-31 august 2020 (fig. 2 ). the database represents a collation of publicly available data published by official sources. as such, the key quality concern for the database itself is whether it represents an accurate record of the official data. we employ four main strategies for ensuring this. firstly, all automated collection of data, whether obtained from official channels or from third-party repositories of official data, is subject to initial manual verification when it is added to our database for the first time. secondly, we employ a range of data validation processes, both for our manual and automated time series. we continually check for invalid figures such as negative daily test figures, out-of-sequence dates, or test positivity rates above 100% (by comparing testing data to confirmed case data), and we monitor each country for abrupt changes in daily testing rates. abrupt positive or negative daily changes are sometimes the result of data corrections in the official data, in which case our database includes them without alteration. these changes can be due, for example, to the deduplication of double-counted tests, or the addition of testing data that was previously not captured by the national system (see table 1 ). in order to mitigate against large impacts due to reporting lags, we automatically exclude the most recent observation for a country if its daily number of new tests is less than half that of the previous observation. this is only applied to the most recent day in each time series: as soon as data for subsequent days becomes available, the data point is reinstated if the sharp fall is still present. www.nature.com/scientificdata www.nature.com/scientificdata/ thirdly, to monitor the ongoing reliability of third-party repositories of official data, we apply a continuous audit process, which will remain active as long as this dataset is updated. each day, three observations are randomly drawn out of all observations in the database that have been obtained via third-party sources. for each selected observation, the recorded figure is manually checked against the direct official channel from which the repository purports to obtain the data. the sampling rate means that each third-party source we make use of is checked around once a week. given that any discrepancies with official channels are likely to be clustered within particular sources, this provides a high degree of quality control on these sources on a timely basis. where any discrepancies are noticed, we switch sources (for the entire time series) to either a different repository or to manual data collection directly from the official channel. finally, the testing data included in the database is viewed by tens of thousands of people every day, including many health researchers, policymakers and journalists, from which we receive a large amount of feedback concerning the data. this serves as a final, 'crowd-sourced' method of verification that has proven very effective, enabling any discrepancies between our data and that published in official channels to be flagged and resolved quickly. code used for the creation of this database is not included in the files uploaded to figshare. our scripts for data collection, processing, and transformation, are available for inspection in the public github repository that hosts our data (https://github.com/owid/covid-19-data/tree/master/scripts/scripts/testing). changes observed in the source data as of 31 august 2020 case-fatality rate and characteristics of patients dying in relation to covid-19 in italy an interactive web-based dashboard to track covid-19 in real time malaria burden through routine reporting: relationship between incidence and test positivity rates reconstructing the global dynamics of under-ascertained covid-19 cases and infections advice on the use of point-of-care immunodiagnostic tests for covid-19: scientific brief this project was funded from multiple sources, including general grants and donations to our world in data. the following is a list of funding sources and affiliations. grants: our world in data has received grants from the bill and melinda gates foundation, the department of health and social care in the united kingdom, and the german philanthropist susanne klatten. sponsors: in addition to grants, our world in data has also received donations from several individuals and organizations: center for effective altruism -effective altruism meta fund, templeton world charity foundation, effective giving, the camp foundation, the rodel foundation, the pritzker innovation fund diana beltekian -data recording, verification and analysis. charlie giattino -data recording, verification and analysis. bobbie macdonald -data recording, verification and analysis. esteban ortiz-ospina -data recording, verification and analysis the authors declare no competing interests. correspondence and requests for materials should be addressed to j.h.reprints and permissions information is available at www.nature.com/reprints.publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.the creative commons public domain dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. key: cord-318570-wj7r6953 authors: xiao, yinzong; thompson, alexander j.; howell, jessica title: point-of-care tests for hepatitis b: an overview date: 2020-10-02 journal: cells doi: 10.3390/cells9102233 sha: doc_id: 318570 cord_uid: wj7r6953 despite the heavy disease burden posed by hepatitis b, around 90% of people living with hepatitis b are not diagnosed globally. many of the affected populations still have limited or no access to essential blood tests for hepatitis b. compared to conventional blood tests which heavily rely on centralised laboratory facilities, point-of-care testing for hepatitis b has the potential to broaden testing access in low-resource settings and to engage hard-to-reach populations. few hepatitis b point-of-care tests have been ratified for clinical use by international and regional regulatory bodies, and countries have been slow to adopt point-of-care testing into hepatitis b programs. this review presents currently available point-of-care tests for hepatitis b and their roles in the care cascade, reviewing evidence for testing performance, utility, acceptability, costs and cost-effectiveness when integrated into hepatitis b diagnosis and monitoring programs. we further discuss challenges and future directions in aspects of technology, implementation, and regulation when adopting point-of-care testing in hepatitis b programs. more than 257 million people, or 3.2% of the world's population, are estimated to be living with chronic hepatitis b virus infection, with the greatest disease burden in low-resource countries in the asia-pacific and sub-saharan africa [1] . without treatment, one in every four persons infected with chronic hepatitis b will develop liver cirrhosis over 20-30 years, and 2-5% of people with cirrhosis will develop liver cancer annually [2] . globally, over 800,000 deaths annually are attributable to hepatitis b infection [1, 3] . most of this disease burden is preventable by appropriate guideline-based treatment [4] [5] [6] [7] [8] . given the magnitude of the global public health burden from hepatitis b, the world health organization (who) has outlined ambitious hepatitis b elimination targets of a 65% reduction in mortality and a 90% reduction in incidence from baseline (2015) by 2030 [9] . however, current estimates suggest that we are a long way from achieving these goals unless investment and care cascade are scaled up [1, 10] . the hepatitis b vaccination has greatly contributed to preventing transmission and reducing hepatitis b incidence globally; however, vaccination coverage is still suboptimal in resource-limited regions [11] , and most countries in africa have been unable to implement the hepatitis b birth-dose vaccine due to multiple logistical and cost barriers [12, 13] . meanwhile, for people who are already living with hepatitis b, receiving early diagnosis and clinical care is the key to reducing morbidity and mortality. however, in 2016, the who estimated that only 11% of people living with hepatitis b were diagnosed, among whom only 17% of those eligible were on treatment [1]. the hepatitis b cascade of care involves multiple steps: screening, diagnosis, linkage to care, assessment of liver disease stage and treatment eligibility, then treatment and/or monitoring, including surveillance for hepatocellular carcinoma (hcc) (figure 1) . laboratory blood tests are required at every step of the care cascade, including blood tests for hepatitis b serology, quantitative hepatitis b virus (hbv) dna level by polymerase chain reaction (pcr) and liver function tests (figure 1 ). these tests require laboratory resourcing, technology and expertise beyond existing peripheral laboratory capabilities in many low-resource and geographically isolated regions [14] [15] [16] . in many countries, laboratory services are centralised due to high costs and limited skilled technician capacity; however, transport of blood samples from regional to centralised laboratories presents its own challenges in geographically isolated or insecure regions, particularly if cold chain supply must be preserved [14] . cost is another major limitation: price reductions for diagnostics have fallen slowly over time compared with medication costs, and hepatitis b diagnostic tests cost more than therapy in many low-income countries [16, 17] . moreover, the requirement for lifelong monitoring for most people living with hepatitis b that involves regular blood tests [4, 5] , combined with barriers to timely healthcare access such as hepatitis b-related stigma [18, 19] , healthcare costs for users and providers [20] and the logistics of accessing consistent, high-quality, affordable healthcare services in a timely manner are major barriers for people to receive guideline-based care [16] . these barriers lead to significant attrition from every step of the hepatitis b care cascade over time, and those lost from care represent missed opportunities for treatment and liver cancer prevention [16, 21] . laboratory-based blood tests are required at every stage of the hepatitis b cascade of care for diagnosis, assessment of liver disease stage, treatment eligibility and long-term monitoring of disease progression. diagnostic testing for hepatitis b involves detection of hepatitis b surface antigen (hbsag) in blood, which indicates active infection with the virus. standard laboratory electro-chemiluminescence immunoassay-based hbsag testing is performed on serum or plasma samples derived from whole blood. if active infection is confirmed, subsequent blood tests are performed to determine the stage of disease and need for treatment, including a hepatitis b virus (hbv) polymerase chain reaction (pcr)-based quantitative dna level or viral load, a hepatitis b eag and eab assay and liver function tests to determine whether an elevated aminotransferase (alt) indicative of liver inflammation or other signs of impaired liver function are present. further assessment for the presence of liver fibrosis and cirrhosis is also required, most commonly by transient elastography and/or liver biopsy. all patients irrespective of treatment require ongoing disease monitoring, including at minimum an hbv dna level, hbeag and hbeab (if not already seroconverted from hbeag positive to hbeab positive) and alt levels every 3-6 months. point-of-care tests (pocs; also known as rapid diagnostic tests, rdts) are simplified versions of laboratory-based tests that have the potential to circumvent major barriers people face to accessing hepatitis b blood-based testing in various settings. pocs usually require small amounts of body fluids (for example, a finger-prick blood sample or oral swab), short turn-around time, and are generally easy to use with minimal required training and therefore can be provided to people in a variety of community and outreach settings by a broad range of trained workers [22] and are scalable to rapidly reach large populations as has been seen with the highly successful egyptian national hepatitis c screening program [23] . the simple collection process (finger-prick or mouth swab) is also highly acceptable, feasible and attractive to people undergoing testing [22, 24, 25] . a key benefit of pocs in the field of hepatitis b is to engage hard-to-reach communities for testing, such as using hbsag poc tests for hepatitis b screening in remote areas, or harm reduction programs [24] [25] [26] [27] . pocs also have great potential for retaining patients in care when used in the community for chronic hepatitis b stage evaluation and disease monitoring [26, 27] . figure 2 outlines the key phases of disease in chronic hepatitis b infection and the indicators for blood testing in each stage. the who recommends that an ideal poc test needs to meet the assured criteria of being "affordable, sensitive, specific, user-friendly, rapid and robust, equipment-free and deliverable to end-users" [28] . since 1998, the who has implemented an evaluation and performance assessment program for all pocs in viral hepatitis to report on accepted quality parameters for widespread clinical use [29] . many pocs have been developed in the field of hepatitis b, particularly for screening and diagnosis; however, only three pocs for detecting hbsag have been prequalified by the who [29] . there is currently a lack of pocs for hepatitis b stage assessment or monitoring that have been endorsed to use by the who; however, several novel tests are now in clinical trials that may fill this important care delivery gap. typically, poc tests have lower accuracy than traditional laboratory-based tests, but they facilitate the triage of people who require more complex and expensive laboratory assays to confirm a positive poc test result and thereby reduce costs. regulatory and economic constraints are additional barriers to transferring pocs to field use. in different settings, they therefore require a comprehensive appraisal of factors including testing performance, feasibility (such as storage requirements, power supply), acceptability and cost-effectiveness when using pocs to scale up access to hepatitis b diagnosis and management under real-life conditions. in this review, we outline the accuracy of available poc tests for hepatitis b and explore the evidence for utility and cost-effectiveness when integrated into hepatitis b diagnosis and monitoring programs. we also describe future technologies and explore how poc tests might best be used to achieve who 2030 hepatitis b elimination goals. practically, the three key clinical requirements for poc hepatitis b assays in the field are for diagnosis of current infection, determining treatment eligibility and also monitoring, as well as diagnosis of hepatitis b immunity and the need for hepatitis b vaccination in the uninfected ( figure 2 ). detection of hepatitis b surface antigen (hbsag) is the primary step to diagnose current hepatitis b infection, and multiple hbsag pocs are commercially available. most are qualitative lateral-flow chromatographic immunoassays which are one-step, easy to use, can be used with a variety of different specimens (whole blood, serum and plasma) and provide rapid semiquantitative visible results (usually within 15-30 min). to date, three hbsag rapid tests (determine hbsag 2, alere medical co. ltd, chiba-ken, japan; vikia hbsag, biomérieux sa, marcy-l'étoile, france; and sd bioline wb, abbott diagnostics korea inc. giheung-gu, republic of korea) have met who prequalification criteria [29] , with multiple studies showing their high accuracy for determining hbsag positivity in various populations, particularly in moderate-high-prevalence populations (table s1) . determine hbsag poc test is the one of the most widely-used hbsag poc tests [30] with the most published data on clinical performance. a 2017 meta-analysis [31] including 9 studies with 7730 samples showed a pooled sensitivity of 90.8% and specificity of 99.1% using determine. though most studies [32] [33] [34] [35] [36] [37] [38] showed high clinical sensitivity of 89-100% in the general population, the reported sensitivity varied widely in hiv-infected populations (56-100%) [39] [40] [41] [42] [43] [44] . the cause of the reported lower sensitivity in hiv-coinfected populations [39, 40, 43, 44] is unclear, but potential reasons may include the cross reaction of hiv-reverse transcriptase inhibitors and hepatitis b virus, a higher rate of occult hepatitis b infection in early hiv cohorts, a higher reported rate of hbsag loss in both untreated and treated hiv-infected populations and the use of tenofovir-based hiv regimens that effectively suppress hepatitis b virus dna levels and a large decline in hbsag titres [31, 45, 46] . sd bioline hbsag [38, [47] [48] [49] and vikia hbsag poc test [32, 33, 38, 50, 51] have also been shown to have good sensitivity (above 90%) and excellent specificity (above 99%) in general populations; however, lower sensitivity was also reported in hiv-infected populations [40] . a common application of these hbsag pocs is to measure seroprevalence in general or specific subpopulations in low-resource settings [52] [53] [54] [55] [56] . they have also been used in mass screening programs for hepatitis b in both community outreach [24, 57] and health-facility-based screening [52, 53, [58] [59] [60] [61] [62] in low-resource settings and shown great public health benefits. for example, in a community-based outreach screening program conducted in 75 camps in southern india [63] , the "screen and vaccinate/linkage to care" strategy led to over 7700 vaccinations in the camps and 162 people with high viral load getting treatment. the program increased the accessibility of hepatitis b diagnostic testing in a low-resource setting, and the timely results of pocs contributed to people's engagement in post screening interventions [63] . the hbsag pocs were also used in programs to engage hard-to-reach populations such as people who inject drugs, sex workers [64] , disadvantaged groups or some ethnic groups [65, 66] by providing self-testing, community or health-facility-based testing services. in a randomised control study conducted in a clinic engaging mostly african immigrants in france [65] , people without health cover attending a clinic were provided free testing for hepatitis b, hepatitis c and hiv using either pocs or prescriptions of testing at a pathologist; a higher rate of testing and linkage to care was observed among people allocated to receive pocs. however, another multicentre randomised control study in france [66] found no difference in effectiveness of linkage to care using the approach of an hbsag rapid test plus a standard lab-based confirmatory serology test versus lab-based standard serology in five clinics. in this study, it was described that participants received testing results via mail or phone call, but it was unclear whether participants received testing results at the same visit if they were in the poc testing group [66] . other than the who-prequalified hbsag pocs, emerging new brands of hbsag have been reported in field studies including the drw-hbsag assay, diagnostics for the real world ltd., (ce-marked) [35, 67] , first response hbsag card test, premier medical corporation [68] (ce-marked), naosign(r) hbs poc strips, bioland [69] , and one step hbsag test, general biologicals corporation [70, 71] (non-exhaustive list). a study in mongolia [72] which tested 19 commercially available hbsag pocs using a serum sample showed the average sensitivity and specificity being 100% and 99%, respectively. whilst most hbsag pocs of various brands have shown promising clinical performance [35, 38, 67, 70, 73] , available validation data are limited and further studies with large sample size and in diverse populations including different ethnicity and hepatitis b prevalence populations and people living with hiv are needed. multiplex diagnostic pocs can be highly attractive for low-middle-resource settings with the capacity to detect multiple pathogens using a single testing strip. some multiplex pocs which detect hbsag are commercially available and some are ce-marked (hbsag/hcv/hiv/syphilis combo test, euro genomas; hbsag and hcv combo test, euro genomas; artron detect 3 hiv/hcv/hbv combo, artron laboratories; hiv, hbsag and hcv rapid test, maternova inc., providence, ri, usa), but none have been listed by who prequalification [16, 29, 74] . accuracy of hbsag detection using multiplex has been shown to be high [75] , but limited clinical validation data are available. innovations in sampling technique have provided more convenient specimen collection methods, such as using oral fluid as specimen collected by an oral swab [25, 48, 51] . the simplified process was highly acceptable to individuals [25, 48] , but testing accuracy is a challenge to overcome [48] , and it may additionally require trained technicians or lab-based enzyme immunoassays or equipment for sample preparation such as requiring a centrifuge for target analyte separation [51] . future development needs to consider combining sample preparation steps together with detection and readout into one single device, without sacrificing testing accuracy. although many studies of different hbsag pocs showed very good sensitivity [31, 38, 73] , false negativity is still among the biggest concerns: around one in ten negative test results on average could be hbsag positive [31] . most cases with false negative results were reported to have low titres of hbsag, such as in studies using determine/vikia hbsag pocs, where most false negative cases had hbsag titres lower than 30 iu/ml [32, 33] . as hbsag level does not correlate with severity of liver damage, there is a chance that people with advanced liver disease may be missed. other potential factors affecting the accuracy of hbsag pocs may include hbv dna level, different genotypes, co-infection with hepatitis c or hiv and hepatitis b variants with s gene mutations that are not detected by the poc hbsag test [31, 33, 76, 77] . as only a few studies have obtained comprehensive serological and genetic profiles of false negative cases, more data are needed to explore these associations and determine the implications for clinical practice. specimen type is unlikely to affect the efficacy of hbsag poc tests: a meta-analysis showed similar pooled sensitivity of studies using whole blood sample compared to plasma or serum [31] , and studies evaluating hbsag pocs using capillary whole blood collected by finger-prick all showed reasonably high sensitivity (88-90%) [32, 44, 78] . in practice, there is no absolute cut-off for testing performance when choosing pocs for hepatitis b programs, and the increased access to testing might mitigate the harm caused by reduced accuracy; however, sensitive pocs that have been validated in similar contexts to their planned use should be prioritised [79] . hepatitis b surface antibody (anti-hbs) is the key marker to determine an individual's immunity status to hepatitis b virus and triage the need for vaccination. a few anti-hbs pocs are commercially available, but most have poor reported sensitivity ranging from 20% to 70% [33, [80] [81] [82] . one study reported a sensitivity of 91.8% using an anti-hbs rapid test card among 1272 samples [70] ; however, these findings require further validation. a study [66] showed using hbsag/ anti-hbs pocs was not effective in increasing vaccination rate due to poor sensitivity of anti-hbs poc and high reliance on confirmatory enzyme immunoassay. though a poc test for anti-hbs can help with triage vaccination need, given hepatitis b vaccination is relatively cheap, context-specific cost-effectiveness analyses would be needed to determine settings where the use of pocs of anti-hbs would be cost-effective. treatment decisions in hepatitis b are guided by patient age, hepatitis b dna viral load and the degree of liver inflammation and fibrosis, as measured by alanine aminotransferase (alt) levels and either transient elastography or liver biopsy, respectively [4, 5, 83] . however, there are few poc tests currently available for these parameters, and none have been widely validated and who prequalification approved. hepatitis b virus (hbv) dna level is the critical indicator when deciding an individual's management plan as per clinical guidelines. polymerase chain reaction (pcr) platforms for nucleic acid detection are still the main technique of quantitative assessment hbv dna levels; conventional pcr platforms are usually built in laboratories and require high manual input and pose barriers for accessibility in remote areas and other resource-limited areas areas [12] . a rapid molecular test, xpert ® hbv viral load (cepheid inc., sunnyvale, ca, usa, ce-marked, approved by american fda and tga in australia), is commercially available for hbv dna quantification that provides test results in less than one hour [84, 85] . the test is a cartridge-based, real-time pcr assay which is run on the genexpert instrument, a molecular diagnostic platform. the processing unit of the system is around the size of a coffee machine, and it also runs a range of other rapid molecular tests such as who prequalified xpert hcv viral load, hiv-1 qual and hiv-1 viral load tests [29] , which poses an opportunity for hepatitis b viral load test to be adopted in areas with existing platforms at a low additional cost. so far, limited data are available on the analytical performance of the assay [84, 85] . two recent studies [84] using serum samples showed a good correlation between hbv dna quantification by using xpert hbv viral load assay with the results of the laboratory reference assay; they also have a low limit of detection (lod) of 7.5 iu/ml, which is similar to most commonly used hbv dna platforms (usually with lod of 10iu/ml) [84] . in practice, xpert testing for hbv dna led to a faster workflow with a mean time to result being 6-8 h, which provided a near-poc solution [84, 86] . however, as a new unit, genexpert facilities are still expensive; the operation requires uninterrupted power supply, as well as technician training and skills for system running, services and reagent maintenance. hepatitis e antigen (hbeag) is a key indicator to determine phase of chronic hepatitis b infection (figure 2) , treatment initiation and is used as a surrogate of hbv dna measurement for evaluating risks of maternal-to-child-transmission [2, 4, 83, 87] . several hbeag pocs are commercially available; however, published data show the accuracy of hbeag pocs has a wide range, with sensitivity of 30-82% and specificity of 67-100% [33, 80, 88] . similarly, anti-hbe pocs are reported to have poor sensitivity but excellent specificity in studies [33, 81] . given the high costs and challenges in accessing hbv dna testing in low-resource settings, the who recommends hbeag to triage treatment [83, 89, 90] ; therefore, the low testing accuracy of hbeag pocs is an urgent issue to be addressed. novel serum biomarkers such as hbv core antigen (hbcrag) have been shown to correlate with serum hbv dna levels and intrahepatic cccdna levels, a marker of hepatitis b-related hcc risk, and have therefore been explored as a potential indicator for treatment determination, off-therapy virologic suppression and hcc risk evaluation [91, 92] . however, there is no rapid lateral flow assay for hbcrag so far. another novel biomarker, serum hbv rna [93] , has been shown to be positively correlated with hbv dna level, and levels were higher than hbv dna in patients on nucleos(t)ide analogues and can thus be a potential marker for off-therapy hbv suppression. however, its clinical predictive utility is not yet well defined, and the measurement of serum hbv rna presents its own challenges even in routine lab-based testing; thus, further studies are needed to guide defining the clinical role and development of hbv rna pocs. alt is part of the liver function test assay panel and is a key marker of liver inflammation, used to determine hepatitis b treatment eligibility [4, 5, 83] (figure 2 ). alt has been proposed as an indicator for treatment in people with positive hbsag in low-resource settings where hbv dna testing is unavailable. a semiquantitative poc using alt 40u/l as the cut-off (biopoint ® alt-1) has been developed and manufacturer data suggest a high sensitivity of 94% and specificity of 85% [94] . several serological biomarkers have been combined in algorithms to offer indirect non-invasive assessment of liver fibrosis and have been validated to varying degrees in hepatitis b populations, such as the ast to platelet ratio index (apri) and fibrosis 4 index (fib-4) [95, 96] . these indices require quantitative testing results of ast, platelets with or without alt, unfortunately none of which are currently available in a poc test format. dried blood spot (dbs), while not a poc test, is a sampling method which offers viable solutions for mass screening or testing in low-resource settings where testing capacity or access are limited. in practice, a single finger-prick blood sample is applied to a chemically modified paper card which collects and store serological markers and nucleic acid; specimens obtained in the field can be transported to a laboratory at ambient temperatures, where the blood sample is processed following a dbs protocol and tested using immunoassays or molecular techniques [83, 97] . dbs samples have a relatively long shelf-life at ambient temperature without sample degradation [98] , which is attractive for regions that are geographically isolated or have varying security situations precluding rapid transport to a central laboratory. this method is now recommended by the 2017 who hepatitis b testing guidelines [83] in settings where no access to venous blood sampling or quality-assured testing assays is available. dbs testing has been used to detect hbsag, hbeag, anti-hbc, hbv dna and even for viral genotyping [99, 100] . a meta-analysis [101] evaluating dbs for hbv dna quantification showed pooled sensitivity of 95% (83-99%) and specificity of 99% (53-100%); however, most of the included studies used cold chain to store samples, which might limit the generalisation of the accuracy estimates in field conditions. although dbs testing increases testing access in low-resource and geographically isolated settings, it still requires high technical expertise and standard laboratory assays that may not be routinely available. cost-effectiveness and affordability are key considerations when adopting poc tests in hepatitis b programs. quoted costs of lateral flow-based hbsag pocs are generally lower than laboratory-based immunoassays, with the estimated procurement costs being us$0.2-0.95 and us$0.4 to 2.8 per test, respectively [72, 83] . conventional lab-based testing usually requires additional costs such as a reading machine, professional laboratory staff and technical training; therefore, the total costs for testing are often much higher than using pocs in low-resource settings. multiplex poc testing is expected to be cheaper than multiple poc tests. for example, the manufacturing costs of a hiv/hcv/hbsag poc is around us$1 [102] ; thus, using multiplex in high-risk populations who require broad spectrum pathogen screening is expected to be resource-saving. costs for conventional hepatitis nucleic acid testing are estimated to range from us$30 to 120 [15] , and the cost can be up to us$400 per assay in resource-limited countries and regions [12] . in 2018, a viral load testing program was introduced in sub-saharan african countries to access an integrated molecular diagnostics instrument (hologic panther system), at an all-inclusive ceiling price of us$12 per patient sample [103] . the foundation for innovative new diagnostics (find) has negotiated the price of xpert hbv viral load assay for 145 developing countries, and it costs us$14.9 per cartridge excluding shipment; however, the testing instrument costs between us$11,530 to us$64,350 depending on the throughput capacity of the processing unit [104, 105] . however, these costs may still be higher than what programs could afford in some settings. in addition, due to reduced accuracy compared with standard assays, diagnostic poc testing is often used as a screening tool to triage those requiring more expensive laboratory-based testing confirmation [15] , which means many of the costs for centralised laboratory services are only partially offset by poc test use. while novel poc testing may have increased testing performance, costs usually fall slowly due to patent protection laws [16] . even for countries that could afford these pocs, it may cost more than lab-based testing where well-established laboratory services are available; therefore, the main demand for pocs is limited to self-testing or outreach programs to improve testing uptake. the cost-effectiveness of using pocs for hepatitis b can therefore be different in different settings. using hbsag pocs as a screening tool was found to be cost-effective in community-based approach in hbv-endemic but low-resource settings. nayagam et al. [106] assessed a community-based hbv screening and treating program in the gambia where hbsag pocs were provided to adult participants door-to-door at a total screening cost of us$7.4 per person. the program was found to be highly cost-effective, with an icer of us$540 per daly averted compared to status quo where no publicly provided hbv screening or treatment was available. integrating low-cost hbv pocs into existed healthcare services such as antenatal screening [52, 58, 107] , blood donor screening [62] and hiv clinics [59] [60] [61] can be another solution to achieve scale-up of hbv testing [108, 109] . zhang et al. [109] showed the integration of hbv screening within the existing antenatal care in cambodia was highly cost-effective. in their model, the unit cost of hbsag and dna test (estimated to be us$1 and us$30) was one of the key parameters driving cost-effectiveness; in such cases, cheap pocs could potentially improve the cost-effectiveness of such an integration program even further. studies in low hbv endemicity countries showed programs offering hepatitis b screening followed by vaccination or linkage to clinical care among people with increased risks are likely to be cost-effective [110] ; however, there is a lack of programs adopting pocs in hepatitis b screening strategy, and thus, a lack of evidence suggesting economic impacts using pocs for hepatitis b in populations who have regular access to healthcare services. however, a few studies have shown rapid hepatitis c or hiv testing nested in harm reduction programs or among priority populations can be cost-effective [111] [112] [113] . more evidence on using pocs in hbv screening or monitoring programs in the field is needed, especially covering the implementation costs and the effects of broader testing access compared to standard testing services or no testing services (where there being no access to testing is the current practice). whilst poc testing theoretically circumvents many test access barriers, acceptability from targeted population remains a key determinant of successful implementation of hepatitis b programs. however, limited data are available on the satisfaction appraisal from users and stakeholders. in general, pocs are highly acceptable to customers due to their easy-to-use nature, short turnaround time, minimal bio sample requirement and provision of testing capacity to familiar staff in contexts where people want to be tested [114] . in a survey conducted among implementers and users of hep b and c testing services from 43 countries, almost half of respondents from low-and middle-income countries preferred a poc test method using capillary whole blood [83] . while there is no agreement on what accuracy would be considered acceptable, half of the respondents would accept an assay with a minimal sensitivity of 95% [83] . acceptability of rapid testing for hepatitis b or other blood-borne viruses and sexual transmitted diseases can be varied in different populations. a survey done in a prison setting showed hcv pocs were highly accepted [115] . another study showed that people may find it stressful when testing hiv, hcv and syphilis using a poc test [116] . when using pocs for blood-borne virus screening in public events or community outreach programs, the acceptance rate varied widely in customers with different socioeconomic status, ethnic or geographic backgrounds [63, 117, 118] . in health facility settings, healthcare providers find pocs generally speed up decision making and improve patients' compliance with chronic management plans requiring repeat testing over time [30] ; however, there are also general concerns such as suboptimal testing accuracy and increased workload for healthcare workers [119] . other than getting tested from healthcare providers or trained personnel, pocs also have the potential to be a self-testing tool with universal access. in some countries, rapid tests for hepatitis b and/or hiv and hepatitis c can be purchased online or over the counter. while self-testing offers a confidential testing solution for customers, a standard approach will be needed to ensure that people having accessible pre-and post-counselling, as well as pathways of linkage to care. a general limitation of pocs for hepatitis b is reduced accuracy compared to standard lab-based testing. there are also specific limitations for individual poc tests which were highlighted above (section 2), such as hbsag poc tests which were shown to have varied sensitivity in hiv-infected populations. in addition, there is still a lack of poc tests for liver cirrhosis and hbv dna levels required to determine treatment eligibility for patients with hepatitis b. there are also limitations in the aspects of regulatory process, procurement and storage management for poc tests, as well as costs when implementing pocs for hepatitis b in different settings. the who prequalification process for in vitro diagnostic tests for diseases with a high individual or public health risk, including hepatitis b, assesses both the test's performance and manufacturing quality. for countries without regulatory procedures in place, this provides a thorough review of potential diagnostic tests they could select based on specific needs; however, the process of getting prequalified approval by the who can be slow. for low-resource settings, stock-out and supply issues can be a barrier for use of poc tests; lack of scale may also mean they can be more expensive than high throughput assays in some settings; testing accuracy as well as instrument maintenance can be impacted by extreme weather conditions (heat, humidity) in the field; novel testing platforms such as genexpert can still be expensive, and the use of the instruments can be subject to field conditions such as power supply. on the other side, for high-income countries, a main challenge for the introduction and implementation of poc tests is the regulatory and reimbursement approval process for new diagnostics, which require demonstration of analytic and clinical validity, as well as clinical usefulness and cost-effectiveness data. as an example, the fda regulatory process can be long and expensive [120] , and the return for investment in high-income countries where poc tests will compete for market with standard diagnostic pathways can be challenging. increasing testing accuracy is the major challenge for poc tests that are already compact and easy to use. when developing poc diagnostics, features targeting resource-limited settings without basic infrastructures or cold chain need to be included; tests with high quality need to be validated across populations and specimen type. rapid affordable serology tests of high accuracy for novel biomarkers which could be alternatives for molecular testing are a major need. technology is needed to integrate convenient sampling and specimen preparation into a one-step testing assay. inter-user variability is another challenge to address if poc tests require technique training or multiple steps; a standard protocol or mobile apps can be used to overcome this problem where suitable. miniaturisation of testing instruments is the trend, especially for instruments that could perform molecular analysis such as portable hand-held devices, without sacrificing testing accuracy. dry blood spot kits for hiv and hepatitis c can already be ordered online to be sent to a home address as a private way to test for infection [121] . faecal occult blood tests are mailed out to all older adults in some regions as a public health initiative to screen for bowel cancer [122] . a similar approach could be evaluated to screen for hepatitis b among populations that are disengaged from traditional health services. in resource-limited settings, adopting hepatitis b testing in existing platforms or programs can be more cost-effective than starting a new initiative [109] . mobile phone technology has the potential to be used for screening and monitoring health conditions [123] . mobile phones are now being used around the world for contact tracing for sars-cov-2, an approach that is immediately applicable to hepatitis b. recently, google searches for anosmia have been linked to the epidemiology of sars-cov-2 [124] . there is a need for the streamlining of regulatory and reimbursement approval processes in high-income countries where the traditional approval process is expensive and slow, particularly for poc diagnostics suitable for use as public health tools to promote the engagement of marginalised individuals, including people who inject drugs, migrants and culturally and linguistically diverse communities affected by hepatitis b. in low-and middle-income countries where regulatory processes can be less demanding, the key is to ensure the quality and performance of tests as they come to market. more than 60 products have been prequalified since the who prequalification process started in 2010 [29] . it has been proposed that a model list of essential diagnostics be developed, comparable with the model list of essential medicines maintained by the who. such a list would help in the selection of diagnostic methods and would facilitate improvements in the regulation and affordability of in vitro diagnostic tests and in training in their use. the who has set ambitious goals for the elimination of hepatitis b as a public health threat by 2030. birth dose vaccination is the most important public health intervention to reduce incidence and will also reduce mortality level long term. for the individual already infected with hepatitis b, the key to preventing liver-related harm is the maintenance of sustained viral suppression. this requires diagnosis and linkage to care; in some people, antiviral therapy will be necessary. hepatitis b is typically asymptomatic until advanced disease has developed. therefore, screening is required. the risk factors and epidemiology of hepatitis b are well described, but screening rates are suboptimal and often occur in the context of opportunistic doctor-patient consultations following presentation with an unrelated problem. testing typically involves venesection followed by centralised testing in a laboratory with batch processing and automation to improve efficiency. this system works well for the engaged individual being cared for by a motivated health care practitioner. however, even in high-income countries, up to 80% of infected patients remain unaware of their infection [125] . thus, there is a need to scale up screening for hepatitis b in high-risk populations, and a need to reconsider current models of care for screening. now that hepatitis b treatment is cheap, safe and highly effective and durable, there is an urgent need to reconsider a public health approach to the management of hepatitis b. point-of-care tests provide a tool for mass screening in community settings. they also provide the opportunity to reduce the care cascade to one of same-day "test and treat". the effective employment of such strategies will be necessary for achievement of who elimination goals. natural history of chronic hepatitis b: special emphasis on disease progression and prognostic factors global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the global burden of disease study clinical practice guidelines on the management of hepatitis b virus infection update on prevention, diagnosis, and treatment of chronic hepatitis b: aasld 2018 hepatitis b guidance long-term suppression of hepatitis b e antigen-negative chronic hepatitis b by 24-month interferon therapy the long-term benefits of nucleos(t)ide analogs in compensated hbv cirrhotic patients with no or small esophageal varices: a 12-year prospective cohort study the risk of hepatocellular carcinoma decreases after the first 5 years of entecavir or tenofovir in caucasians with chronic hepatitis b requirements for global elimination of hepatitis b: a modelling study third dose of hepatitis b vaccine. reported estimates of hepb3 coverage. secondary third dose of hepatitis b vaccine. reported estimates of hepb3 coverage hepatitis b in sub-saharan africa: strategies to achieve the 2030 elimination targets reported estimates of hepb_bd coverage. secondary hepb birth dose. reported estimates of hepb_bd coverage aiming for the elimination of viral hepatitis in australia, new zealand, and the pacific islands and territories: where are we now and barriers to meeting world health organization targets by 2030 the future of viral hepatitis testing: innovations in testing technologies and approaches accelerating the elimination of viral hepatitis: a lancet gastroenterology & hepatology commission diagnosis of viral hepatitis more than a virus: a qualitative study of the social implications of hepatitis b infection in china managing chronic hepatitis b: a qualitative study exploring the perspectives of people living with chronic hepatitis b in australia innovative strategies for the elimination of viral hepatitis at a national level: a country case series pdf;jsessionid=9deca1ff83bc4a8c41b74e3be2649662?sequence=12017 the rapid-ec study-a feasibility study of point-of-care testing in community clinics targeted to people who inject drugs in screening and treatment program to eliminate hepatitis c in egypt acceptability and feasibility of a screen-and-treat programme for hepatitis b virus infection in the gambia: the prevention of liver fibrosis and cancer in africa (prolifica) study implementation of rapid hiv and hcv testing within harm reduction programmes for people who inject drugs: a pilot study hepatitis b screening in an argentine emergency department: a pilot study to increase vaccination in a resource-limited setting point-of-care hepatitis c testing from needle and syringe programs: an australian feasibility study rapid tests for sexually transmitted infections (stis): the way forward helath organisation list of prequalified in vitro diagnostic products. secondary world helath organisation list of prequalified in vitro diagnostic products 2020 a practical guide to global point-of-care testing diagnostic accuracy of tests to detect hepatitis b surface antigen: a systematic review of the literature and meta-analysis validation of rapid point-of-care (poc) tests for detection of hepatitis b surface antigen in field and laboratory settings in the gambia, western africa performance of rapid tests for detection of hbsag and anti-hbsab in a large cohort evaluation of rapid diagnostic tests for the detection of human immunodeficiency virus types 1 and 2, hepatitis b surface antigen, and syphilis in ho chi minh city, vietnam. am evaluation of a new hepatitis b virus surface antigen rapid test with improved sensitivity evaluation of the performance of four rapid tests for detection of hepatitis b surface antigen in antananarivo, madagascar hepatitis b surface antigen seroprevalence among prevaccine and vaccine era children in bangladesh evaluation of four rapid tests for detection of hepatitis b surface antigen in ivory coast prevalence of infection with hepatitis b and c virus and coinfection with hiv in medical inpatients in malawi detection of highly prevalent hepatitis b virus coinfection among hiv-seropositive persons in ghana reliability of rapid testing for hepatitis b in a region of high hiv endemicity viral hepatitis and rapid diagnostic test based screening for hbsag in hiv-infected patients in rural tanzania prevalence and associations with hepatitis b and hepatitis c infection among hiv-infected adults in south africa field performance of the determine hbsag point-of-care test for diagnosis of hepatitis b virus co-infection among hiv patients in zambia occult hepatitis b and hiv infection hiv-hepatitis b virus coinfection prevalence of chronic hepatitis b virus infection before and after implementation of a hepatitis b vaccination program among children in nepal point of care and oral fluid hepatitis b testing in remote indigenous communities of northern australia evaluation of the analytical performance of six rapid diagnostic tests for the detection of viral hepatitis b and c in lubumbashi, democratic republic of congo performance of point of care assays for hepatitis b and c viruses in chronic kidney disease patients evaluating hbsag rapid test performance for different biological samples from low and high infection rate settings & populations hepatitis b virus sero-prevalence amongst pregnant women in the gambia seroprevalence of hbv among people living with hiv in anyigba assessment of hepatitis b immunization programme among school students in qatar seroprevalence of hepatitis b virus infection and associated factors among healthcare workers in northern tanzania seroprevalence of hepatitis b and c virus infections among diabetic patients in kisangani (north-eastern democratic republic of congo) eligibility for hepatitis b antiviral therapy among adults in the general population in zambia maternal hepatitis b infection burden, comorbidity and pregnancy outcome in a low-income population on the myanmar-thailand border: a retrospective cohort study the prevalence of hepatitis b virus among hiv-positive patients at kilimanjaro christian medical centre referral hospital sero-prevalence of hbv and associated risk factors among hiv positive individuals attending art clinic at mekelle hospital hepatitis b infection, viral load and resistance in hiv-infected patients in mozambique and zambia palliduminfections among blood donors and transfusion-related complications among recipients at the laquintinie hospital in douala prevalence of hepatitis b and hepatitis c infection from a population-based study in southern india prevalence estimates of hiv, syphilis, hepatitis b and c among female sex workers (fsw) in brazil simultaneous human immunodeficiency virus-hepatitis b-hepatitis c point-of-care tests improve outcomes in linkage-to-care: results of a randomized control trial in persons without healthcare coverage. open forum infect effectiveness of hepatitis b rapid tests toward linkage-to-care performance of a new rapid test for the detection of hepatitis b surface antigen in various patient populations hepatitis b vaccination in burkina faso: prevalence of hbsag carriage and immune response in children in the western region a simple and inexpensive point-of-care test for hepatitis b surface antigen detection: serological and molecular evaluation a simple and rapid test-card method to detect hepatitis b surface antigen and antibody: potential application in young children and infants evaluation of the performance of two rapid immunochromatographic tests for detection of hepatitis b surface antigen and anti hcv antibodies using elisa tested samples sensitivity and specificity of commercially available rapid diagnostic tests for viral hepatitis b and c screening in serum samples evaluation of performance testing of different rapid diagnostic kits in comparison with eias to validate detection of hepatitis b virus among high risk group in nigeria multi-disease diagnostics landscape for integrated management of hiv, hcv, tb and other coinfections analytical performances of simultaneous detection of hiv-1, hiv-2 and hepatitis cspecific antibodies and hepatitis b surface antigen (hbsag) by multiplex immunochromatographic rapid test with serum samples: a cross-sectional study performance evaluation of 70 hepatitis b virus (hbv) surface antigen (hbsag) assays from around the world by a geographically diverse panel with an array of hbv genotypes and hbsag subtypes comparative performance of three rapid hbsag assays for detection of hbs diagnostic escape mutants in clinical samples: table 1 point-of-care screening for hepatitis b virus infection in pregnant women at an antenatal clinic: a south african experience a guide to aid the selection of diagnostic tests evaluation of rapid diagnostic tests for assessment of hepatitis b in resource-limited settings poor sensitivity of rapid tests for the detection of antibodies to the hepatitis b virus: implications for field studies performance of rapid diagnostic tests for the detection of anti-hbs in various patient populations performance of the xpert hbv viral load assay versus the aptima quant assay for quantifying hepatitis b virus dna evaluation of the xpert hbv viral load for hepatitis b virus molecular testing rapid, random-access, quantification of hepatitis b virus using the cepheid xpert®hbv viral load assay prevention of materno-foetal transmission of hepatitis b in sub-saharan africa: the evidence, current practice and future challenges poor sensitivity of commercial rapid diagnostic tests for hepatitis b e antigen in senegal, west africa validation of the treat-b score for hepatitis b treatment eligibility in a large asian cohort: treat-b improves with age development of a simple score based on hbeag and alt for selecting patients for hbv treatment in africa hepatitis b core-related antigen (hbcrag): an alternative to hbv dna to assess treatment eligibility in africa serum hepatitis b virus rna: a new potential biomarker for chronic hepatitis b virus infection validation of a novel rapid point-of-care alt test in patients with viral hepatitis non-invasive assessment of liver fibrosis and prognosis: an update on serum and elastography markers comparison of diagnostic accuracy of aspartate aminotransferase to platelet ratio index and fibrosis-4 index for detecting liver fibrosis in adult patients with chronic hepatitis b virus infection: a systemic review and meta-analysis dried blood spots-preparing and processing for use in immunoassays and in molecular techniques assessment of dried blood spot samples as a simple method for detection of hepatitis b virus markers diagnostic accuracy of serological diagnosis of hepatitis c and b using dried blood spot samples (dbs): two systematic reviews and meta-analyses evaluation of the efficiency of dried blood spot-based measurement of hepatitis b and hepatitis c virus seromarkers diagnostic accuracy of detection and quantification of hbv-dna and hcv-rna using dried blood spot (dbs) samples-a systematic review and meta-analysis usefulness of simultaneous screening for hiv-and hepatitis c-specific antibodies and hepatitis b surface antigen by capillary-based multiplex immunochromatographic rapid test to strengthen prevention strategies and linkage to care in childbearing-aged women living in resource-limited settings breakthrough agreement will reduce costs and increase access to diagnostic technology for millions in low-and middle-income countries technologies to support diagnosis and linkage to care: knowing your status and getting care 2.0 secondary genexpert®negotiated prices cost-effectiveness of community-based screening and treatment for chronic hepatitis b in the gambia: an economic modelling analysis seroprevalence of hepatitis b virus surface antigen and factors associated among pregnant women in dawuro zone, snnpr, southwest ethiopia: a cross sectional study aile, i. the cost-effectiveness of predonation screening for transfusion transmissible infections using rapid test kits in a hospital-based blood transfusion centre integrated approach for triple elimination of mother-to-child transmission of hiv, hepatitis b and syphilis is highly effective and cost-effective: an economic evaluation the effectiveness and cost-effectiveness of screening for and vaccination against hepatitis b virus among migrants in the eu/eea: a systematic review cost-effectiveness of rapid hepatitis c virus (hcv) testing and simultaneous rapid hcv and hiv testing in substance abuse treatment programs cost-effectiveness of one-time hepatitis c screening strategies among adolescents and young adults in primary care settings cost-effectiveness of strategies for testing current hepatitis c virus infection community-based, point-of-care hepatitis c testing: perspectives and preferences of people who inject drugs time matters: point of care screening and streamlined linkage to care dramatically improves hepatitis c treatment uptake in prisoners in england stressful point-of-care rapid testing for human immunodeficiency virus, hepatitis c virus, and syphilis implementing rapid hiv testing in outreach and community settings: results from an advancing hiv prevention demonstration project conducted in seven u acceptability of hiv self-testing: a systematic literature review exploring the barriers and facilitators to use of point of care tests in family medicine clinics in the united states innovation under regulatory uncertainty: evidence from medical technology do you need a dbs test? secondary do you need a dbs test national bowel cancer screening program secondary national bowel cancer screening program will an innovative connected aidesmart! app-based multiplex, point-of-care screening strategy for hiv and related coinfections affect timely quality antenatal screening of rural indian women? results from a cross-sectional study in india use of google trends to investigate loss-of-smell-related searches during the covid-19 outbreak new virological tools for screening, diagnosis and monitoring of hepatitis b and c in resource-limited settings this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord-226956-n5qwsvtr authors: arbia, giuseppe title: a note on early epidemiological analysis of coronavirus disease 2019 outbreak using crowdsourced data date: 2020-03-13 journal: nan doi: nan sha: doc_id: 226956 cord_uid: n5qwsvtr crowdsourcing data can prove of paramount importance in monitoring and controlling the spread of infectious diseases. the recent paper by sun, chen and viboud (2020) is important because it contributes to the understanding of the epidemiology and of the spreading of covid-19 in a period when most of the epidemic characteristics are still unknown. however, the use of crowdsourcing data raises a number of problems from the statistical point of view which run the risk of invalidating the results and of biasing estimation and hypothesis testing. while the work by sun, chen and viboud (2020) has to be commended, given the importance of the topic for worldwide health security, in this paper we deem important to remark the presence of the possible sources of statistical biases and to point out possible solutions to them the paper by sun, chen and viboud (2020) (henceforth scv) is an important example of the use of crowdsourced data in monitoring the spread of covid-19. indeed, crowdsourcing data can prove of paramount importance in monitoring and controlling the spread of infectious diseases as it is also remarked e. g. by leung and leung (2020) among the many others. the paper relies on an innovative source (potentially obtainable in real time) derived from social media and news report collected in china from 13 th to 31 st of january 2020 during the first outbreak of the corona virus epidemics. the data collected referred to 507 cases. in the paper the crowdsourced data, coming from different sources, are used to estimate several epidemiological parameters of tremendous importance in the process of surveillance and control of the diffusion of the disease such as: the relative risk by age group, the mean age and skewness of infected people, the time of delays between symptoms and seeking care at hospital, the mean incubation period. scv also use the crowdsourced data to test theoretical hypotheses using the wilcoxon test and the kruskal-wallis test 2 these test lead them to conclude that the delay between symptoms onset and seeking care at hospital or clinic decreased significantly after january 18 th and that the delay was significantly longer in hubei with respect to tianjin and yunnan and between international travelers and local population. there are two main statistical problems connected with the use of crowdsourced data in general and with those presenting a spatial configuration in particular (such as those employed by scv), namely: 1. the lack of a precise sample design 2. the presence of spatial/network correlation among the individuals in the sample we will briefly discuss the two problems in more details in the following two sections. 1 catholic university of the sacred heart, milan (italy) 2 as is it well known the wilcoxon test (or the mann-whitney u test) is a nonparametric test used to test the hypothesis of equality between two independent samples. the kruskal-wallis test is also non-parametric which extends the wilcoxon test in comparing two or more independent samples of equal or different sample sizes and can be also seen as the non-parametric version of the one-way analysis of variance (anova). see wilcoxon (1945) and kruskal and wallis (1952) a general characteristics of crowdsourced data, , likewise other unconventionally collected big data 3 , is the lack of any precise sample design (arbia, 2020 ). this situation is described in statistics as a "convenience sampling", in which case it is known that no probabilistic inference is possible (hansen et al, 1953) . as fisher (1935) says "if we aim at a satisfactory generalization of the sample results, the sample experiment needs to be rigorously programmed". indeed, while in a formal sample design the choice of sample observations is guided by a precise mechanism which allows the calculation of the probabilities of inclusion of each unit (and, hence, probabilistic inference), on the contrary with a convenience collection no probability of inclusion can be calculated thus giving rise to over-under-representativeness of the sample units. the advantages of using a convenience sampling are the obvious ease of data collection and cost-effectiveness. however, the disadvantages are that the results cannot be generalized to a larger population because the under-(or over-) representation of units produce a bias. furthermore, convenience sampling-based estimates are characterized by larger standard error and as a consequence by insufficient power in hypothesis testing. in this situation the estimation of parameters (like mean, median, proportions) based on the principle of analogy and the calculation of p-values to take decisions in hypothesis testing is not theoretically motivated. scv, indeed, acknowledge the fact that the collection criterion used could have generated possible biases in their sample. they list problems like the fact that a substantial proportion are travelers (who are predominantly adults), that data are captured by the health system and so are biased toward more severe cases, that geographical coverage is heterogeneous with an under-representation of provinces with a weaker health infrastructure. however, they do not take any action to reduce such biases. the problem raised by the convenience collection of data emerges dramatically in the big data era when we increasingly avail data which, almost invariably, do not satisfy the necessary conditions for probabilistic inference. in recent years researchers are becoming aware of this problem trying to suggest solutions to reduce the distorting effects inherent to non-probabilistic designs (fricker and schonlau, 2002) . one possible strategy consists in transforming crowdsourced datasets in such a way that they resemble a formal sample design. this procedure has been termed post-sampling (arbia et al., 2018) and represents a particular form of poststratification (holt and smith, 1979; little, 1993) . to implement a post-sampling analysis, we need to calculate, in each geographical location (e.g. the chinese provinces), a post-sampling ratio (ps), defined as the ratio between the number of observations required by a reference formal sample design (e.g. random stratified with probability of inclusion proportional to size) and those collected through crowdsourcing. more reliable estimations of population parameters can then be obtained by considering a weighted version of the dataset using the post-sampling ratio as weights. thus, crowdsourced observations will have to be over-weighted if ! > 1 and, on the contrary, down-weighted when ! < 1. this strategy has been adopted in arbia et al. (2018) in order to estimate the food price index in nigeria using data crowdsourced through smartphones and in arbia and nardelli (2020) to estimate spatial regression models. after post-sampling, estimates display less bias and lower standard errors and the reduction in the power of the tests is moderated. the dataset used by scv refer to 507 individuals (364 from china and the rest from abroad). both in the estimation of the epidemiological parameters reported in section 1 and in hypothesis testing the authors treat their crowdsourced data as if they were independent. indeed, both the wilcoxon and the kruskal-wallis test are based on the assumptions that all the observations are independent of each other. however, even if data were collected obeying a formal sample design, a further potential source of bias is the fact that the observational units could display a certain degree of spatial/network correlation (cliff and ord, 1973; arbia, 2006) . observed units that are close in space or in network interaction, may display similar values in the observed variables (e. g. age, incubation period, delays between symptoms and seeking care at hospital) due to the interaction between individuals or/and to presence of some unobserved latent variable with a geographical component. the effects of spatial correlation in the geography of epidemics is well documented in the book by cliff et al. (1981) . when observed data are not independent and display a positive spatial/network correlation, the standard errors are underestimated leading to inefficient estimation of the various parameters (mean, median, proportion etc.) . but the consequence on hypothesis testing can be even worse. due to the underestimation of the standard errors, indeed, the statistical test become artificially inflated leading to lower p-values and, as a consequence, to the rejection of the null hypotheses more frequently than we should. as a result, the significance of the statistical test can become very poor with very artificially high probability of type i error. this could explain the very low levels of p-values (of the order of 10 -4 ) reported in the scv paper despite the relatively small dataset used. standard statistical textbooks like shabenberger and gotway (2004) and cressie and wikle (2011) document how to calculate the level of spatial/network correlation between geographically located individuals and how this parameter can be used in the process of estimation and hypothesis testing in order to obtain more reliable inferential conclusions. the work by sun, chen and viboud (2020) has to be commended, given the absolute relevance of the topic for health security and the timeliness with which results are presented in a period of great uncertainty related to the worldwide diffusion of the new corona virus covid-19. their results are of invaluable help in the process of surveillance, monitoring and control of the disease. in this comment we draw the attention of the authors and of all the researchers and health operators on the fact that the crowdsourced dataset they use can lead to biases in the estimation of the epidemiological parameters and in hypothesis testing procedures. we hope that this note may help future studies in the area, so as to obtain even more reliable estimation and more grounded tests of theoretical hypotheses thus progressing rapidly and rigorously in the knowledge about covid-19 and any possible future new epidemics. a primer for spatial econometrics 2020) statistics, new empiricism and society in the era of big data, forthcoming, springerbrief on spatial lag models estimated using crowdsourcing, web-scraping or other unconventionally collected data, under revision for spatial economic analysis post-sampling crowdsourced data to allow reliable statistical inference: the case of food price indices in nigeria, paper presented at the lx conference of the italian statistical society advantages and disadvantages of internet research surveys: evidence from the literature post stratification, series a use of ranks in one-criterion variance analysis post-stratification: a modeler's perspective crowdsourcing data to mitigate epidemics, the lancet digital health early epidemiological analysis of coronavirus disease 2019 outbreak using crowdsourced data: a population level observational study statistical methods for spatial data analysis individual comparisons by ranking methods key: cord-307187-5blsjicu authors: missel, malene; bernild, camilla; dagyaran, ilkay; christensen, signe westh; berg, selina kikkenborg title: a stoic and altruistic orientation towards their work: a qualitative study of healthcare professionals’ experiences of awaiting a covid-19 test result date: 2020-11-11 journal: bmc health serv res doi: 10.1186/s12913-020-05904-0 sha: doc_id: 307187 cord_uid: 5blsjicu background: extensive measures to reduce person-to-person transmission of covid-19 are required to control the current outbreak. special attention is directed at healthcare professionals as reducing the risk of infection in healthcare is essential. the purpose of this study was to explore healthcare professionals’ experiences of awaiting a test result for a potential covid-19 infection. methods: qualitative interviews with 15 healthcare professionals were performed, underpinned by a phenomenological hermeneutical analytical framework. results: the participating healthcare professionals’ experiences of awaiting a covid-19 test result were found to be associated with a stoic and altruistic orientation towards their work. these healthcare professionals presented a strong professional identity overriding most concerns about their own health. the result of the coronavirus test was a decisive parameter for whether healthcare professionals could return to work. the healthcare professionals were aware that their family and friends were having a hard time knowing that the covid-19 infection risk was part of their jobs. this concern did not, however, cause the healthcare professionals to falter in their belief that they were doing the right thing by focusing on their core area. the threat to own health ran through the minds of the healthcare professionals occasionally, which makes access to testing particularly important. conclusion: the participating healthcare professionals had a strong professional identity. however, a discrepancy between an altruistic role as a healthcare professional and the expectations that come from the community was illuminated. a mental health coronavirus hotline for healthcare professionals is suggested. the covid-19 pandemic puts healthcare professionals (hcp) under pressure both physical and psychological [1] . the challenges include increased workload created by the outbreak but also fears of contagion for themselves, their families and patients. particularly psychological health outcomes and distress are highlighted in current research regarding the initial stage of the covid-19 outbreak in terms of anxiety, depression and post-traumatic symptoms [1] [2] [3] [4] [5] . across these studies hcps working during the epidemic report frequent concerns regarding their own health. based on our knowledge, little information is however available regarding the impact on hcps awaiting a test result for potential covid-19 infection or interventions for supporting them during this waiting time. therefore, this study aim to shed light on hcps' experiences of awaiting a test result for a potential covid-19 infection through individual interviews. this qualitative investigation will thus highlight what is at stake for hcps while in quarantine and awaiting a response as to whether they are infected with the coronavirus. the study offers an in-depth understanding of the meaning of the waiting for the test result for covid-19 infection from the hcps' perspective and should be of interest to a broad readership and add knowledge to the growing covid-19 evidence base and in developing supportive inetrventions targeted hcps in such a pandemic. while hcps, e.g. nurses, physisians, porters and healthcare workers, are caring for some of the most vulnerable groups of people both in hospital but also in primary care, they are currently also facing an unprecedented disease caused by the outbreak of a previously unknown virus [6] . this new coronavirus that can cause covid-19 disease [7] puts hcps in a position where they must avoid exposing themselves to infection but also avoid transmitting the infection to the vulnerable patients and citizens to whom they have a caring responsibility. because an infected hcp is a potential vehicle for virus dissemination, research suggests that reducing the risk of infection amongst hcps is essential [8] . spread of virus has been reported during the ebola outbreak resulting in a compromised healthcare system [9] as well as during the severe acute respiratory syndrome (sars) [10] and the middle east respiratory syndrome (mers) epidemics [11] . experiences from these previous outbreaks highlight fear among hcps in transmitting the disease and the importance of screening for the virus. on 30th january 2020, the world health organization declared the chinese outbreak of covid-19 to be a public health emergency of international concern. the emergency committee stated that the spread of covid-19 may, among other preventive efforts, be interrupted by early detection and isolation [7, 12] . general hygiene precautions are crucial in order to minimize the risk of contamination [8] . hcps have always played an important role in infection prevention, infection control, isolation, containment and public health, which for nurses initially was advocated for by florence nightingale [13] . there are studies that define the pathophysiological characteristics of covid-19 however, the mechanism of spread is uncertain. current knowledge is derived from similar coronaviruses, which are transmitted from human-to-human through respiratory infection [7] . typically, respiratory viruses are most contagious when a patient is symptomatic. however, increasing evidence suggests that human-to-human transmission may be occurring during the asymptomatic incubation period of covid-19 [14, 15] . the disease is reported to be very contagious, and measures to reduce person-to-person transmission of covid-19 are therefore required to control the outbreak [14] [15] [16] . special attention and efforts to prevent or reduce transmission is applied in susceptible populations including hcps in order to reduce transmission to patients or other vulnerable groups of people in the community [17] [18] [19] . hcps are thus among those groups of people who are being rapidly tested for coronavirus in denmark. considering the severity of infection and illness [20] , the test result might be of great importance for the healthcare system but also for the individual hcp. a sudden decrease in the number of hcps because of quarantining or isolation due to covid-19 infection would potentially overload the healthcare system and the capacity to treat either patients with coronavirus or patients with other serious conditions would be challenged [8] . for the individual hcp, it might furthermore be a threat to their own health. as far as we are aware, no research has so far focused on how hcps might perceive this test situation. therefore, the purpose of this study is to explore hcps' experiences of awaiting a test result for a potential covid-19 infection. such knowledge from the hcps' perspective are expected to increase the awareness of potential needed support while awaiting a crucial test result from a contagious and rare virus. furthermore, the study will help hospital managers to establish strategies to ensure the best possible working conditions for hcps during the pandemic. this study used a phenomenological hermeneutical methodology inspired by ricoeur's narrative philosophy [21] . in this study phenomenology was apllied as an epistemological stance for exploring first-person accounts of what it is like to wait for a test result for potential covid-19 infection. pre-reflexive experiences from the participant's lifeworld is the starting point, while hermeneutics was focused on interpreting the surplus meaning contained in this lifeworld. as human beings we leave traces when we express ourselves, and these traces are formed by the meanings and traditions to which we belong. often, it is impossible to directly understand individual's experiences because the sense in the traces is hidden. therefore, reflection on an individual's lived experiences takes place via the narratives expressed by the individuals [21, 22] . the threefold mimesis is central in ricoeur's narrative philosophy and can be seen as an epistemological approach for understanding the participants' lived experiences [23] , which, in this study, has inspired the research process as a three-fold process [22] : mimesis i (prefiguration): the life lived before it is formulated as spoken or written narrative (data collection); mimesis ii (configuration): the language stage, formulating a narrative (from speech to text); and mimesis iii (refiguration): the comprehension stage, when the text is interpreted (analysis and interpretation) [21] [22] [23] . participants in this study were recruited from a population of hcps who had been tested for coronavirus but who did not necessarily care for covid-19 patients. if they had symptoms of covid-19 infection, hcps in denmark were offered testing for the virus. we used a convenience sampling strategy [24] by encouraging tested hcps to approach the research team by e-mail if they were willing to attend an interview. the interviews were conducted by telephone based on ethical accountability for not contributing to the spread of the virus and they were scheduled in the gap between test and its result. the result of the test was during the study period given to a tested person within 24 h. the society of denmark was on lockdown due to the threat of coronavirus on march 11th 2020. coronavirus was in this period still relatively new in denmark, and 300-500 patients were hospitalized and 77 patients died due to covid-19 during week three of the epidemic. fifteen hcps agreed to participate in the study and were interviewed in march and april 2020. thereafter data saturation was achieved, making further interviewing unnecessary [24] . we included hcps with different professional backgrounds and different responsibilities from both primary care and hospitals. the characteristics of the participants are shown in table 1 . data were collected through individual interviews. human events are characterized by unreflecting preunderstanding, which ricoeur calls prefiguration (mimesis 1) [21, 22] . with the aim of gathering the participants' indepth narrative accounts of their experiences of awaiting a covid-19 test results, open questions were used. each interview began with a broad opening question, such as; "could you please tell me what led you to being tested for a potential covid-19 infection and your experiences while awaiting the test result?" table 2 lists the interview questions. the interviews lastet on average 30 min (range 9-55 min). the interviews were separately conducted by three experienced qualitative researchers who all had a professional background as registret nurses, and interviews were audio-recorded and transcribed into 217 pages. the participants' stories were thus transcribed into a textual configuration of their unarticulated experiences (from prefiguration to configuration) [21, 22] . according to ricoeur, people's narratives contain surplus meaning and hermeneutics is concerned with interpreting this surplus meaning (from configuration to refiguration). the study was undertaken in accordance with the guidelines of the danish ethical research committee and was approved by the danish data protection agency (p-2020-276). the investigation conforms with the principles outlined in the declaration of helsinki [25] . the participants received written information about the purpose of the study and their right to withdraw at any time. written informed consent was obtained from each of the participants before the interview. data were anonymized by means of identification codes. the participants were informed that interview data would be treated confidentially. according to ricoeur, interpretation is the central methodology in phenomenological research. interpretation involves a process consisting of naive interpretation, structural analysis, and comprehensive understanding [26] . naive interpretation is superficial interpretation, whereby the narratives are read and re-read to see what the texts mean to the researchers, giving an overall view of the narratives. structural analysis deals with patterns in the text that can explain what it is saying. explaining what the text expresses means moving from what the text says to what it is talking about. during the structural process, we analyzed and structured the narratives based on units of meaning, extracting meaning or themes that recurred in the narratives. the units of meaning were condensed such that the essential meaning was expressed. these units of meaning were then further condensed and gathered into themes [22, 26] . the comprehensive understanding continues with a discussion of the themes that were identified in the structural analysis, the purpose being to reach a new understanding of the possible dimensions of the participants' experiences while awaiting a covid-19 test result. the deeper interpretation of the narratives is a process of understanding in which theoretical perspectives are drawn on to help clarify and comprehend phenomena in the participants' experiences [22, 26] . see fig. 1 . throughout the study methodological rigor was attained by using the qualitative concepts of relevance, validity, and reflexivity, as described by malterud [27] . this study is one of only a few qualitative studies exploring the lived experiences of hcps during the covid-19 pandemic and to our knowledge this is the first qualitative study exploring hcps' experiences of awaiting a test result for a potential covid-19 infection. the qualitative interview method was selected in order to gain insight into these individuals' perspectives in order to understand the meaning of the investigated phenomena, i.e. the transition from experience to meaning [26] . the relevance of the study and the chosen methodology thus seems appropriate. several strategies were employed to demonstrate internal validity, including collecting indepth data, prolonged involvement with the data and use of the participants' own words to formulate and illustrate themes. the participants are quoted in order to ensure transparency and substantiate the findings of the study. ricoeur's steps in the analytical process are clearly set out and have been stringently followed. the process from prefiguration through configuration to refiguration reflects the shift from lived life to narrative accounts of lived life to the final interpretation, which provides an insight into the individual hcps' concrete experiences and into universal phenomena of life for hcps awaiting a test result. thus other researchers are able to judge and validate the extracted themes. reflexivity was ensured by discussions between the authors, both during the data collection phase and in the analysis. the fact that all interviewers were registered nurses meant that a certain agreement but also equality between participant and interviewer was present. this meant that the conversation was relatively easy and straightforward. in order, however, to prevent blind spots in relation to the research purpose, the interviewers were particularly aware of their role as researchers and qualitative interviewers and tried to bridle preunderstandings from their background as hcps and adapting a curious stance. the comprehensive understanding illuminated the meaning of the participants' experiences of awaiting a covid-19 test result as a stoic and altruistic orientation towards their work. these hcps presented a strong professional identity overriding most concerns about their own health. the result of the coronavirus test was a decisive parameter for whether healthcare professionals could return to work. experiences related to the test situation as well as the strong sense of professional identity will be described in more detail in the following. what led the participants to the test for coronavirus were their experiences of mild to moderate symptoms, which aroused suspicion of possible infection. they described the importance of protecting patients, vulnerable citizens and colleagues from the risk of infection and therefore stayed away from work until they were certain that they were not contributing to the spread of the virus. this distance from work, however, had an impact on participants who described a dilemma in terms of both feeling responsible and hypochondriac at the same time. as hcps they already knew the usual workload and therefore described feelings of failing colleagues by not taking part in the work, "we are busy in healthcare, so if there is one who is sick, then the others just have to run faster" (participant k). thus, the test result was extremely important in terms of whether one could return to work and help one's colleagues. the participants, furthermore, talked of particular responsibilities in being prepared to care for and treat patients with covid-19. they watched what is going on in the rest of the world in other healthcare settings where the epidemic of covid-19 exceeded the healthcare systems' resources. they were very concerned about their colleagues in other countries but at the same time had an altruistic view that they themselves must also be prepared. in this context, coronavirus tests are also particularly important for the participating hcps. they did, however, describe an ambivalence around the test response; if you are tested positive, then hopefully you will form some kind of immunity and thus be able to go to work after a period of quarantine without being infected again. if, on the other hand, you are tested negative, you can return to your job immediately, "i hope i don't have corona, but on the other hand, then you have had it …" (participant c). participants describe concerns and fears that many hcps will be infected at the same time, and that there will be no one to take care of the ill patients or vulnerable citizens. therefore, it was necessary to have the hcps tested so that an overview of the workforce can be maintained as hcps cannot easily be replaced. the way to being tested could, however, be quite obscure for some of the participants. for participating hcps working in the hospital, access to testing is easy and straightforward. they noticed symptoms, they discussed it with their boss, and they got tested. however, working in primary care posed major problems in figuring out access to being tested. those hcps narrated experiences of not being taken seriously, which produced a kind of powerlessness, "all of us who work in healthcare, we are there to make a difference, but you just feel that we sometimes are banging our head against the wall [experiencing lack of understanding] … it gives a sense of powerlessness" (participant e). they furthermore described frustrations of wasting precious time waiting to get to the test; time that could have been spent usefully in continuing their work. the particular commitment to caring for vulnerable and ill people was evident when participating hcps were just waiting to be tested. even though being tested for coronavirus when experiencing symptoms was strongly preferred by the participants in this study, the test situation, however, reminded and confronted them with the seriousness of the pandemic. they described their experiences of coming into the interimistic tents outside the hospital and meeting with test staff in protective equipment. the participants, being hcps, were prepared for this scenario but are anyway confronted with feelings of being part of a surreal experience or a science fiction movie but also that this new virus was real, "it is a peculiar experience to meet another person who is covered from head to toe. you suddenly feel very dangerous" (participant f). they also, however, told of a professional set-up and that being tested provided certainty, tranquility and direction. the participating hcps in this study presented a strong sense of professional identity and were highly oriented towards their work. they talked about how they were preparing for battle against the coronavirus despite the risk of being infected themselves. the frontline hcps with the critical task of caring for covid-19 patients told how for a long time and with no evidence of even having the disease, they had isolated themselves at home, "i already decided 14 days ago that we should stop sleeping in the same room and avoid physical contact completely. i have also written on my wife's and my behalf to family and friends that we will not be able to see anybody for a while" (participant b). they were tremendously aware of their specific role and duty and that nobody could stand-in for them and explained it as just being a part of their job and with a fatalistic attitude. these participants expressed a paramount need to know if they were contagious. common to the participants was that, by virtue of their profession, they had important professional knowledge about drop infections, hygiene, symptoms and pathways of infection, all of which gave them a readiness to act. they narrated how they were extremely aware of not transmitting the infection to others, as well as how to take distance and hygiene measures when they noticed symptoms of potential covid-19. these measures seemed to be integrated as an almost natural act in the participants' lives with them not questioning the necessity of doing so, "i've locked myself inside a room now and told the others in the family to stay away. and if i'm going to the toilet ..., our apartment is quite small ... but then i just shout that now i go to the toilet. and then i have hand sanitizer and cleansers and wipe it all off afterwards" (participant c). the situation thus appears to have been tackled with stoic calm by the participants as they awaited answers as to whether their possible symptoms are related to covid-19. despite their professional knowledge, participants also told of chaotic and conflicting information from the healthcare system expressed as an information flow that had become incomprehensible and overwhelming. this resulted in uncertainty and difficulty in keeping up with guidelines. the participants' social network was marked by the possible threat of covid-19 from the hcps who were just doing their job in healthcare. the participating hcps were highly aware that their family and friends were having a hard time knowing that the covid-19 infection risk was a necessary condition of their job, while they at the same time are forced to keep a distance. this concern did not, however, cause participants to falter in their belief that they were doing the right thing by focusing on their core area, which was caring for ill and vulnerable people. the threat to their own health ran though the minds of the participants once in a while, "that people who take care of their work and do what they can to make others survive can end up getting infected with covid-19 themselves, i think that's a little hard, but that's just how it is" (participant g). the participating hcps express a need to share such thoughts with somebody and ask for some kind of follow-up or a hcp corona hotline, e.g. after being tested for the virus, "when you are nervous and scared, it would be helpful if you could go to one specific place where knowledge and expertise about corona was gathered -a mental health corona hotline" (participant d). being oriented towards their job was described as a natural part of the participating hcps approach to life. they had a strong passion for and pride in their work and in this epidemic context showed solidarity across professional boundaries. they did question if they may be too uncritical but explained it with the fact that they are in a time when it is necessary to do as one is told. the participants, however, described how they have experienced the community tribute e.g. public applause for them as on the edge of hypocrisy. they rejected more applause from society and express how genuine societal recognition would be more resources in hospitals to solve problems and to give the hcps a tolerable everyday life and a decent salary. awaiting a covid-19 test result for the participating hcps was associated with a stoic and altruistic orientation towards their work in which the result of the test was crucial. this study illuminated how hcp prepare and get ready for battle against covid-19 in a devoted and solidarity-based way. this war metaphor as a response to the pandemic might illuminate the hcps' stoic and altruistic work identity. seeing the coronavirus as an enemy that should be defeated and as a part of one's job require hcps who approach their work with a stoic calm and an altruistic attitude. a similar commitment to supporting their health system and communities has been reported during the ebola epidemic [28] . the participants in our study presented a strong professional identity and their attention was directed to caring and protecting patients and vulnerable citizens while also preventing the spread of infection among colleagues. being stoic in their approach to work does not mean that hcp are cold and distant, it is rather an attitude of remaining calm and carrying on and may also involve having a certain degree of self-control and maintaining a sense of conscious self-awareness [29] . the altruistic attitude or behavior of the participants was characterized by the fact that the individual sought to promote the well-being of others without thought for their own interests and needs. according to hume, altruism is a character trait of humans that normally extends to strangers only in a weakened form and it is rare to meet with one in whom the affections of altruism do not over-balance the selfish [30] . altruism was, however, a strong moral part of the participants' professional identity which seems to be based on the inner logic of the hcp discipline. understanding of the roles altruism might play in the social and medical response to an epidemic and the stories about the nature of hcps' moral obligations has been discussed and implies the willingness to take personal risks in the line of duty [31] . a professional identity can be defined as a social identity that relates to people's understanding and presentation of themselves as professionals [32] . it is seen as the identity a person has developed through learning and practicing a given profession and thus can fulfill a particular employment function designed and integrated into a given work and professional culture. according to goffman, identities are not created individually, but rather the individual gains his or her professional identity through the attribution of certain characteristics that have the character of normative expectations [33] . in addition to performing the expected functions associated with a specific field, the individual thus supports and supplements his or her position by simultaneously playing the normatively expected role associated with that group [33] . to follow goffman [33] , the stoic and altruistic orientation towards their work presented by hcp in the present study might also point to these hcp acting in accordance with a specific role within a given social context, such as healthcare. society's normative expectations of hcp may influence their perception of their own professional identity. our study, however, illuminates a discrepancy between an altruistic role as hcps and the normative expectations that come from the community that pays tribute to them, and then an experience of working conditions and salaries that do not indicate recognition. altruism has been reported to be declining in the face of economic and pragmatic motivation [34] which might threaten healthcare practice during an epidemic such as covid-19. another threat to our study participants' stoic and altruistic orientation towards their work was also experiences of receiving chaotic, conflicting and an overwhelming information flow resulting in difficulties in keeping up with best practice guidelines. research from the a/h1n1 influenza pandemic have demonstrated how perceived sufficiency of information was associated with reduced degree of worry and how hcps less frequently felt unprotected [35, 36] . these points highlight that hospital managers should try to provide and direct information for hcps according to what is needed during the different and specific phases of a pandemic based on the affected hcps' perspectives in order to offer favourable working conditions in times of extreme distress. being tested for coronavirus for the hcps in our study was significant in order to maintain their professional identity and continue working. they did, however, also describe experiences of uncertainty and fear for own health and expressed a need to share such thoughts with somebody. a threat to the mental health of hcps during epidemics has been reported [4, 5, 37] , and interventions to promote mental well-being in hcps exposed to covid-19 are suggested to be immediately implemented [37] . a hotline for patients during the current covid-19 outbreak has been established in some places, e.g. in new york where citizens are guided to assess their own symptoms at home and can discuss any psychological impact from the disease [38] . similar initiatives directed at hcps are needed. recomandations from a recent systematic review also suggest to establish a forum for medical personnel to voice their concerns as well as a psychological assistance hotline comprised of volunteers who have received relevant psychological training to be able to provide telephonic guidance to personnel to help effectively tackle mental health problems [39] . telephone interviews in this study were unavoidable due to the risk of virus transmission between participants and interviewers. such interviews do, however, have some disadvantages. they are more impersonal in that it is not possible to have eye contact, and as an interviewer, it is difficult to show that you are interested and included in what is being said. in addition, breaks are generally less acceptable [24] . despite this, we found that participants were willing to participate in the study and appreciated talking about their experiences. the sample included in this study consisted of more female hcps (n = 11) and most were nurses (n = 8) which might be an uneven distribution of participants. women, however dominate the nursing profession, and nurses are the largest professional group in healthcare [40, 41] and the sample thus represents the general healthcare workforce. what is worth noting is that this study was conducted during the first phase of the pandemic. this means that the stoic and altruistic orientation as well as the war metaphor that we have found and described may change over time as the pandemic progresses and hpcs may experience burnout. the perspectives of hcps awaiting a test result for coronavirus provide an important contribution to the growing body of literature about covid-19. these hcps had a strong professional identity with their attention directed towards caring and protecting patients and vulnerable citizens while also preventing the spread of infection among colleagues. a discrepancy between an altruistic role as a hcp and the normative expectations that come from the community was also illuminated. the clinical implications of this study is thus, that as a stoic and altruistic attitude dominated hcps' identity, access to testing for covid-19 for these professionals is crucial. furthermore, a mental health corona hotline for hcps should be established. abbreviations hcp: healthcare professionals mental health care for medical staff and affiliated healthcare workers during the covid-19 pandemic timely mental health care for the 2019 novel coronavirus outbreak is urgently needed prevalence of depression, anxiety, and insomnia among healthcare workers during the covid-19 pandemic: a systematic review and meta-analysis the psychological impact of epidemic and pandemic outbreaks on healthcare workers: rapid review of the evidence the psychosocial impact of flu influenza pandemics on healthcare workers and lessons learnt for the covid-19 emergency: a rapid review updated understanding of the outbreak of 2019 novel coronavirus (2019-ncov) in wuhan world health organization declares global emergency: a review of the 2019 novel coronavirus (covid-19) covid-19 diagnosis and management: a comprehensive review the health impact of the 2014-15 ebola outbreak uniformed service nurses' experiences with the severe acute respiratory syndrome outbreak and response in taiwan working experiences of nurses during the middle east respiratory syndrome outbreak world health organization. novel coronavirus (2019-ncov), situation report-3. 2020 covid-19: emerging compassion, courage and resilience in the face of misinformation and adversity transmission of 2019-n-cov infection from an asymptomatic contact in germany early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia the epidemiology and pathogenesis of coronavirus disease (covid-19) outbreak a rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-ncov) infected pneumonia (standard version) retningslinjer for håndtering af covid-19 i sundhedsvaesenet (guidelines for managing covid-19 i the healthcare system). copenhagen; 2020 severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (covid-19): the epidemic and the challenges ricoeur's narrative philosophy: a source of inspiration in critical hermeneutic health research en hermeneutisk brobygger. tekster af paul ricoeur nursing research: generating and assessing evidence for nursing practice declaration of helsinki discource and the surplus of meaning qualitative methods in medical research we and the nurses are now working with one voice": how community leaders and health committee members describe their role in sierra leone's ebola response the therapy of desire. in: theory and practice in hellenistic ethics altruism in hume's treatise diminishing returns? risk and the duty to care in the sars epidemic bankmedarbejderen-splittet mellem varnaes og scrooge (bank employees -split between varnaes and scrooge) two studies in the sociology of interaction. united states: martino fine books professional nursing values: a concept analysis general hospital staff worries, perceived sufficiency of information and associated psychological distress during the a/h1n1 influenza pandemic psychological impact of the pandemic (h1n1) 2009 on general hospital workers in kobe health professionals facing the coronavirus disease 2019 (covid-19) pandemic: what are the mental health risks? a phone call away: new york's hotline and public health in the rapidly changing covid-19 pandemic factors affecting the psychological well-being of health care workers during an epidemic: a thematic review sundhedsvaesen og sundhedspolitik (healthcare and healthcare politics) closing the gap in indigenous health inequity -is it making a difference? publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations the research team wishes to thank all those people who collaborated and participated in this study by sharing their experiences. without them, this study would not have been possible. we also thank anne alexandrine øhlers, camilla rotvig jensen, christina jensen, mette skriver and miriam bianca besser biyai for their help in relation with the transcription of the interviews. all authors conceived and contributed to the design and conduct of the study. skb, cb and mm conducted the collection of data material and led the analysis together with swc and id. all authors were involved in the analysis and the writing of the manuscript. all authors contributed to the preparation of this manuscript and read and approved the manuscript. this work was supported by the novo nordisk foundation (grant number nnf20sa0062831), and centre for cardiac, vascular, pulmonary and infectious diseases, rigshospitalet, copenhagen university hospital, denmark.availability of data and materials all authors have full control of all primary raw data (interview transcripts) and allow the journal to review our data if requested. all raw data are written in danish. data are stored in a locked file cabinet in a locked room at the copenhagen university hospital as requested by the danish data protection agency. the data material used in this study are available from the corresponding author on reasonale request which will not conflict with the anonymity and confidentiality of the data.ethics approval and consent to participate registration and permission was received from the authorities in the danish data protection agency under the capital region of denmark: (p-2020-276) and the study were undertaken in accordance with the guidelines of the danish ethics research committee. the participants received verbal and written information about the study prior to the study. written consent was obtained from the participants. given the qualitative nature of the study, the local ethics committee in the capital region of denmark ruled that no formal ethical approval was required in this particular case. not applicable. the authors have no conflicts of competing interest to declare.author details 1 clinical nurse specialist at the department of cardiothoracic surgery, centre for cardiac, vascular, pulmonary and infectious diseases, rigshospitalet, key: cord-342181-x14iywtr authors: taipale, j.; romer, p.; linnarsson, s. title: population-scale testing can suppress the spread of covid-19 date: 2020-05-01 journal: nan doi: 10.1101/2020.04.27.20078329 sha: doc_id: 342181 cord_uid: x14iywtr we propose an additional intervention that would contribute to the control of the covid-19 pandemic, offer more protection for people working in essential jobs, and help guide an eventual reopening of society. the intervention is based on: (1) testing every individual (2) repeatedly, and (3) self-quarantine of infected individuals. using a standard epidemiological model (sir), we show here that by identification and isolation of the majority of infectious individuals, including those who may be asymptomatic, the reproduction number r0 of sars-cov-2 would be reduced well below 1.0, and the epidemic would collapse. we replicate these observations in a more complex stochastic dynamic model on a social network graph. we also find that the testing regime would be additive to other interventions, and be effective at any level of prevalence. if adopted as a policy, any industrial society could sustain the regime for as long as it takes to find a safe and effective cure or vaccine. our model also indicates that unlike sampling-based tests, population-scale testing does not need to be very accurate: false negative rates up to 15% could be tolerated if 80% comply with testing every ten days, and false positives can be almost arbitrarily high when a high fraction of the population is already effectively quarantined. testing at the required scale would be feasible if existing qpcr-based methods are scaled up and multiplexed. a mass produced, low throughput field test kit could also be carried out at home. economic analysis also supports the feasibility of the approach: current reagent costs for tests are in the range of a dollar or less, and the estimated benefits for population-scale testing are so large that the policy would be cost-effective even if the costs were larger by more than two orders of magnitude. to identify both active and previous infections, both viral rna and antibodies could be tested. all technologies to build such test kits, and to produce them in the scale required to test the entire world's population exist already. integrating them, scaling up production, and implementing the testing regime will require resources and planning, but at a scale that is very small compared to the effort that every nation would devote to defending itself against a more traditional foe. number r0 remains greater than 1, the virus spreads rapidly until most people have been infected (fig. 1a) , creating a temporary surge of infected individuals. if, using pharmacological or social interventions, r0 can be reduced below 1, then the epidemic collapses (fig. 1b) , and most people remain uninfected (but still susceptible). because of the exponential nature of epidemics, the outcomes are nearly binary. even when r0 exceeds one by only a small amount the disease spreads at an accelerating pace, whereas as soon as r0 falls just below one it rapidly collapses. these two outcomes correspond to two distinct strategies for epidemic control, suppression and mitigation, close variants of which are currently attempted by several asian countries (with different political systems) and western democracies, respectively. in the mitigation model, the goal is to reduce r as much as possible but not below 1.0, hoping to end up with a population that is largely immune, without overwhelming the healthcare system in the process (as in fig. 1a , but attempting to flatten the temporary surge of infected individuals). this could (but is not guaranteed to) lead to "herd immunity" (see, for example refs. 3, 4 ), which would limit spread in future epidemics caused by variants of the same virus. however, exponential processes are notoriously difficult to control, particularly in the absence of accurate real-time data and when the effect of policy changes is uncertain. the choice is stark: allowing the disease to spread to a large fraction of a population, however slowly, greatly increases the total number of infected people and would cause a loss of life that most societies will not accept. furthermore, given the difficulties in controlling exponential processes using limited information, even a strongly enforced mitigation strategy runs the risk of overwhelming the health care system and significantly increasing the mortality rate due to the failure to treat every patient optimally (primarily due to the lack of intensive care capacity and sufficient numbers of ventilators). if the healthcare system is overwhelmed, patients must be triaged as in wartime, potentially for extended periods of time. notably, both suppression and mitigation are unstable: the mitigation model might first wreck the health-care system and then (as the public demands harsher controls when mortality rises) also wreck the economy. the suppression model might first wreck the economy and then as public pressure forces a relaxation of control, the virus re-emerges. for many months, both approaches are likely to force a large fraction of the population into quarantine. this is because of the large number of asymptomatic carriers of covid-19; in the absence of population-scale testing, the measures need to be implemented in an indiscriminate manner, affecting the whole population. over time, this will result in severe and unequal economic deprivation. our estimate is that in the united states, gdp per capita is already lower by about 1000 usd per month. redistribution can offer some protection for the most vulnerable families, but if a loss of income of this magnitude persists for six or twelve months, it could generate a backlash against the social distance measures that are currently our only weapon for fighting the disease. as a result, epidemiologists are giving serious consideration to scenarios that alternate between lockdown and relaxation that will lead to more loss of life and add to even more economic uncertainty. we can use what we know about the dynamics of disease to suppress this pandemic in a way that is far less disruptive than indiscriminate lockdowns and social distancing. because it is less disruptive, a nation can sustain this approach for as long as it takes to find a safe and effective vaccine or a cure. reducing the disruption will thereby save lives. we know that at low levels of prevalence, testing, contact tracing and quarantine (ttq) is a very effective means of suppression 3,5 , because it reduces the effective rate of reproduction close to zero. it is not a feasible strategy for suppressing the virus in the current, higher prevalence conditions faced by most countries because it would demand resources that would overwhelm any health department. in addition, the ttq approach suffers from its own instability. unless it identifies every single person who becomes infected, asymptomatic individuals that are not identified will generate clusters that will not be detected until someone develops a severe infection that requires medical care. if only 10% of cases are severe enough to be tested after two weeks, a single missed case will lead to an average cluster of 100 new cases before it is found. as a result, once the rate of new cases exceeds the capacity of tracing, even briefly, the epidemic runs out of control and the exponential dynamics make it almost impossible to catch up without imposing a lock-down. some have suggested that an updated version of ttq that relies on modern surveillance technology could be viable because it will not make the same resource demands on the public health system. to be sure, it would be useful for innovators to work toward a working prototype that members of the public could voluntarily adopt. but because no such system has been deployed, even as a prototype, it would be dangerous for policy makers to count on the availability in the coming months of a system that is both effective in slowing the spread of disease and acceptable to most members of society. here, we propose a radically simpler strategy: just test everyone, repeatedly. when someone tests positive, ask them to self-quarantine and provide them public assistance that reduces the burden this imposes on them. this approach relies on a key observation that has not been widely appreciated, namely that what matters is the fraction of all individuals that are identified and quarantined. it follows that testing a small number of individuals with a highly accurate test can be much less effective than testing everyone with a less accurate test. in fact, there is a quantifiable relationship between the reproduction number of a virus, and the efficiency of a population-scale testing strategy that brings the effective reproduction number below 1. below we use analytical models to derive both an upper and a lower bound on the effectiveness of testing, and demonstrate their real-world relevance using more realistic stochastic models. the approach has several important advantages. first, it will work no matter how high the prevalence of infection might be. second, it does not suffer from the inherent instability of contact tracing. the offsetting disadvantage is that it is a challenge to test at the required scale, but this is not as difficult as it might at first seem. it could be implemented using mass distribution (e.g. regular mail) without returning samples to a central testing site. in fact, the tests required do not even have to be properly "diagnostic." they will not be the basis for any decision about medical care. they only influence the decision to self-quarantine. in the worst case, they may cause people who are not infectious to be quarantined, but this is already true for most people (including the authors) in the baseline lockdown scenario. this is an important feature, as it relaxes the demands on the quality of the test. the test can tolerate many false positives, because the result of a provisionally positive test is that someone self-quarantines for two weeks when they did not have to. false negatives are also acceptable as long as people are retested frequently. although this strategy should be introduced alongside of existing measures, it is a useful exercise to ask what level of testing would be required for this strategy by itself to contain any level of infection. clearly, if a perfectly accurate test were applied to the entire population at once, and those who tested positive were fully quarantined, the epidemic would immediately collapse with no new infections (fig. 1c) . . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint to examine the effects of false negatives and noncompliance, we first make the best case assumption about the timing of the tests: every person who is infected is tested before encountering someone who is susceptible. this limit can be approached, for example, by a very effective form of contact-tracing. for coronavirus, it has been estimated 7 that r0 = 2.4 and uncertain data from quarantine in wuhan 8 suggests that rq = 0.3. using the standard (continuous, deterministic) sir model, the equations in fig. 2a and methods show that the optimal population-scale testing strategy will succeed if at least two thirds of all new covid-19 cases are immediately identified and quarantined. if is the true positive rate of the test and is the fraction of the public that complies in the sense that they agree to be tested and follow any instruction to go into quarantine, this bound means that the product must be greater than 2/3. next we do the opposite -assume that the test and compliance are perfect, so = = 1, and consider the worst-case assumption on the timing of the tests: each day, a randomly selected fraction of the population is tested. under that strategy, we find that testing at a rate equal to ( ! − 1) percent of the population per infectious period will ensure that r < 1 (fig. 2b, methods) . using 0 = 2.4 and a two-week infectious period for covid-19, this implies that at least 10% of the population would have to be tested each day. real-world testing strategies could do much better than test at random. for example by implementing procedures that test individuals concurrently within a region; that run the screen as a sweep across a country; that slice the population into groups that are tested in a cycle; or use other variables to predict who is more likely to be infected and to test them more frequently (see methods). because herd immunity and other interventions -including the use of masks or reliance on social distancing -are additive with respect to the testing, any of these effects can lower the required frequency of the tests. for example, fig. 1d shows the required compliance rate as a function of the strength of other interventions, assuming a fixed false negative testing rate of 15%. the standard but simple and deterministic susceptible-infectious-recovered (sir) models used to calculate these bounds is based on strong assumptions and approximations, such as random mixing of all individuals. to relax those assumptions, we implemented a more realistic numerical simulation using a stochastic model on a social interaction graph (i.e. a stochastic network seir model) to model two realistic scenarios. we focused on the initial exponential growth phase of an epidemic. fig. 3a shows a simulation that starts with 100 infected individuals and assumes that the product of the compliance and true positive rate = 0.8. population-scale testing using random weekly tests was started on day 20, and immediately suppressed the epidemic, which was fully stopped by day 100. in contrast, without testing, viral spread caused a surge of infections. death rates were 0.19% with testing, and 0.66% without, i.e. a more than three-fold improvement corresponding to 1.5 million lives saved in a us-sized population. this demonstrates the power of populationscale testing and quarantine for the suppression of novel viruses. the second scenario modeled a country that exits from a lockdown that has suppressed the growth of a pandemic and a with a small fraction of individuals who are immune (fig. 3b) . in such cases, new outbreaks will inevitably happen. ttq and periodic lockdowns can suppress these outbreaks, but so can an ongoing process of testing and isolating. in the model, a lock-down was applied from day 20 which nearly extinguished the virus by day 100. at that point, the lockdown was lifted and social interactions returned to normal, but population-scale testing and quarantine was applied as above. once again, the epidemic was suppressed indefinitely, with total deaths limited to 0.16% of the population. if . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint the lockdown was lifted without population-scale testing, a powerful second wave was generated leading to the death of 0.57% of the population overall. the decreased mortality due to testing after lifting lockdown corresponds to more than one million lives saved in a us-sized population. this demonstrates that population-scale testing can be an effective replacement for periodic lockdown as a sustainable way to prevent the resurgence of the virus. finally, systematic simulation of the parameters of the stochastic model (supplemental figure 1) showed that with = 2/3 (for example, a compliance of 80% and test sensitivity of 85%) testing at least every 11 days on average was sufficient to suppress the epidemic, whereas testing less frequently was not. the scale of screening required for the approach is not unimaginable, as it is within an order of magnitude of the level of screening that is critical for protection of the segment of the workforce that is employed in essential sectors such as healthcare, elderly care, public order and delivery of foods and medicines. furthermore, building a test that could be applied at population scale is clearly feasible using current technology. it could be based on detection of antibodies to the virus 9 . however, as it takes time for antibodies to build, the antibody-test cannot detect cases early. a single test also does not discriminate between current and past infection. making sure that someone is not currently infected requires that an antibody test is performed twice over a period of three weeks, during which the individual must be held in strict quarantine. alternatively, the test can be combined with an rna or antigen test. despite these drawbacks, an antibody test will clearly be part of the solution, as it can detect immune individuals that can continue to work safely in health care or with risk groups. however, in countries where the epidemic is successfully suppressed, the fraction of immune individuals is far too small (e.g. 1-10%) for restoring normal levels of economic activity. furthermore, deploying antibody test at a population scale will be more difficult than using an rna test, as the current approach requires blood samples, which decreases compliance and makes self-testing more difficult. a population-scale test can also be based on viral proteins (technically more difficult but possible 10 ), or viral rna, like the current state-of-the-art diagnostic tests (for example ref. 11 ). technically, few measurements are easier and/or cheaper in biochemistry than determining whether a particular rna species is present in a particular sample. the main technical concern relates to false positives caused by contamination of input samples by the amplified dna from previous tests; this can be simply prevented by well-established procedures. despite the technical simplicity, detecting viral rna in the field at populationscale is difficult to achieve using the same design and strict regulatory framework that is used for tests designed for medical diagnostic purposes. current diagnostic tests for sars-cov-2 are qrt-pcr assays that require (1) nasopharyngeal swab collected by a trained nurse, (2) sample collection in viral transport media, (3) rna purification, (4) reverse transcription and quantitative pcr. the test is highly accurate, and the total cost is in the order of $100. such highly accurate testing is critical for accurate diagnosis of cases in a hospital setting. however, due to the very detailed and specific regulation, specialized staff and equipment, and centralized testing facilities, such tests have proven difficult to rapidly scale above thousands of assays in each location. a distributed system of sample collection and testing could, however, conceivably be used to scale qrt-pcr to population levels, particularly when using a regional sweeping approach to limit the number of simultaneous tests needed. the capacity could also be increased 10 to 100-fold by group testing 24 , a method with a long history of use in public health that was originally designed for syphilis tests, and now commonly also used for optimally efficient detection of defective components in industrial . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint production. although some components of tests are currently limiting due to sudden high demand, scaling up their production is not difficult, as the methodology is based on raw materials that are not scarce (e.g. plastic, sand) and biological molecules (enzymes, nucleotide triphosphates and short nucleic acid primers) that are easily produced either industrially or locally using simple biotechnological processes. we also note that qpcr instruments are currently in short supply, but isothermal tests are available that require only a waterbath (see below). a parallel relatively centralized testing method based on existing dna sequencing technology could also be fielded rapidly. in this approach, viral rna in the samples is used to generate dna sequences containing the virus sequences, a sample dna barcode (to identify each case) and two unique molecular identifiers 23 at both ends of the resulting dna fragment (to count the number of virus rnas per sample and to ensure that patient samples do not get mixed in the reaction), and then sequenced using a massively parallel sequencer. this approach is very scalable, as in principle, a single sequencing instrument that is routinely used in scientific research can report more than a billion results per day. furthermore, in the future, a test based on sequencing 19-21 that covers many acute infections could also be used to suppress or even eradicate a large number of infectious diseases simultaneously. this would have significant benefits to humanity, and would be very difficult to achieve using vaccines or drugs that target each infectious agent separately. alternatively, we envisage supplementing the current testing regime with a massproduced home test kit that could be used by anyone, result in a simple easily-understood readout, and be performed without specialized equipment. the test should be as easy to use as a pregnancy test, to ensure maximal compliance. boxes of e.g. 50 tests would be massmailed to all citizens, and a national information campaign would encourage everyone to test themselves weekly. in an infected individual, viral rna is present at reasonably high levels in nasopharyngeal swabs, throat swabs, sputum, and stool for up to two weeks 12 , with the greatest amounts in sputum and stool. sputum might be the ideal source for a home test kit, given the ease of sampling. compliance of the home testing could be increased by both rewards and penalties, and potentially enforced by adding a serial number to each test that needs to be reported together with the test result to collect the rewards. the test result can be open in such a way that the result is clear to everyone. it can also be designed so that it maintains privacy. here, the result (e.g. resulting color, number of bars that are visible) needs to be reported together with the serial number to a central facility to get the answer and/or the cash reward. the open and private approaches can also be combined, to design a test that is open and contains an encoded part that needs to be reported to collect a reward. such designs may complicate the approach, but would allow the healthcare system to obtain data that would facilitate monitoring of the outbreak and large-scale contact tracing. the cash rewards could also be contingent on being regularly tested. anyone found positive would be compelled to self-quarantine, possibly under monetary or criminal sanctions, or using additional rewards for compliance. provided that the test is sufficiently quick, testing could be performed in workplaces, or even in checkpoints exiting areas with high infection rate that are currently under lockdown. our approach is not something that can only be fielded in the far future. in fact, tests suitable for home use have already been developed. in contrast to the commonly used polymerase chain reaction (pcr), which optimally requires an instrument that repeatedly changes the temperature of the reaction, many other "isothermal" detection methods have been developed that operate using a single set temperature, and do not require special equipment beyond what is available in every kitchen (e.g. hot water). for example, an . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04.27.20078329 doi: medrxiv preprint isothermal and colorimetric test has been described 13, 14 , based on reverse transcription-loop mediated amplification (rt-lamp) technology. this test has several desirable properties: unlike pcr, it does not require temperature cycling; the readout is binary and can be achieved by simple observation; and it can start from crude samples 15 . many other technologies also have the potential to detect viral rna rapidly and isothermally [16] [17] [18] ; these include recombinase polymerase amplification (rpa), transcription mediated amplification, nicking enzyme amplification reaction (near), rolling circle replication, and in vitro viral replication assays. although a population-scale test does not need to be as accurate as a clinical-grade qrt-pcr test (see above), apart from a potential increase in errors due to sample collection, there is no theoretical reason why a self-test based on isothermal amplification would not achieve the false negative and positive rates that are equivalent to the current state-of-the-art methodology. making the necessary reagents at scale is also not difficult. per 10 million people, a 100 µl test requires 1,000 liters of reagents, consisting of primers, nucleotides, ph-sensitive dye and enzyme, all of which are easy to make at the required scale. a population-scale strategy has the potential to save many lives, and to buy precious time for a vaccine or an effective drug to be developed. it's important to recognize that in contrast to vaccine and drug development, scaling up testing does not depend on any new scientific discoveries; it is a matter of engineering and logistics only. a field test would synergize with drug treatment, as many antivirals act more effectively when they are given at an early stage of the infection. furthermore, development of field-applicable tests needed for rapid population-level screening will have great benefits in combating epidemics in countries with less developed healthcare systems, and would also help in responding to future epidemics, or variants of the current one. the costs of mobilization of scientific and industrial resources for rapid development of such a test are considerable; however, in our opinion, they are still orders of magnitude lower than the costs of the current suppression and mitigation strategies. although balancing collective goods such as economic activity and public health commonly involve very difficult trade-offs, we believe that such a trade-off is not relevant in this case, and that a population-scale screening policy can be implemented in such a way that will both save more lives and cause less economic and social disruption than the current approach. as there is little overlap with other industrial mobilization efforts, such as scaling up current testing regime, building of ventilators or developing drugs or a vaccine, the increased effort for the development of tests would also have very limited opportunity cost. the development of capacity for population-scale testing would also be an important and relatively inexpensive insurance against other pandemics, or re-emergence of covid-19 once herd immunity is lost or the virus mutates to evade immunity, vaccines and/or antiviral drugs. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint the authors declare no competing interests. the epidemic was first modelled with a standard (continuous, deterministic) susceptible, infected, removed (sir) model. in addition to the very general assumption that there are a relatively large number of cases, which allows modeling of a partially discrete system using a continuous model, the sir model is based on the following standard assumptions: (1) the population is fixed, (2) it mixes homogenously, (3) the only way a person can leave the susceptible group is to become infected, (4) the only way a person can leave the infected group is to recover from the disease, (5) recovered persons become immune, (6) age, sex, social status, genetics etc. do not affect the probability of being infected, (7) there is no inherited immunity, and (8) the other mitigation strategies and testing are independent of each other (for fig. 1d ). the assumption (2) leads the sir model to overestimate viral spread, as in reality population has substructure (e.g. families, workplaces) and is geographically separated and contacts are more likely between subsets of the population; this is not expected to materially affect our analysis as our conclusions are not based on the absolute rate of the spread, only on its exponential nature. in addition, we modeled the effect of testing in two ways. the first, maximally effective testing strategy assumed that every individual was tested before they infected another person, leading to the upper bound on testing performance in fig. 2a-b . under this model, the requirement for collapsing the epidemic is that the weighted average of the basic reproduction number 0 and the reproduction number in quarantine ! must be less than one: here, is the true positive rate of the test and is the compliance (fraction of all tested individuals who actually self-quarantine). using 0 = 2.4 and = 0.3 for covid-19, the product of the true positive rate and compliance must be greater than two thirds: . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. the second, lower bound testing strategy (fig. 2c-d) was modelled by adding an additional 'detected' state to the model, and adding transitions from infected to detected (with rate ) and from detected to recovered (with rate ). this corresponds to continuous random testing of the population at a fixed rate per person per day. here, the requirement for successful collapse of the epidemic is given by the basic reproduction number (assuming perfect quarantine; fig. 2d ), as follows. first, the rate equations for the sir model with testing are: rewriting the second equation above as follows: makes it clear that / will be negative (i.e. the epidemic will collapse) only if: note that the ratio / is the basic reproduction number 0 , so that the previous inequality can be rewritten as follows: in other words, the testing rate must exceed a threshold given by the recovery rate and the fraction of susceptible individuals / . in the beginning of an epidemic in a naïve population, when all individuals are susceptible, this reduces further to for sars-cov-2, assuming = 1/14 (i.e. an infectious interval of two weeks) and 0 = 2.4, the required minimal testing rate would be 10% of the population per day. as the . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint epidemic progresses, the required testing rate drops as fewer and fewer individuals remain susceptible and herd immunity kicks in. to understand the difference between (1) testing everyone at the same time, (2) testing everyone in a time separated manner, or (3) testing the population by random sampling, it is helpful to consider an extreme case of certainly and completely collapsing an epidemic by testing and quarantine, using a perfect test that detects all infected individuals, and complete quarantine. for optimally achieving this, it is necessary to identify everyone who is infected before they have infected anyone else (denoted efficiency, e = 1) and quarantining every infected individual (c = 1). this requires obtaining a minimum of n bits of information for a population of size n. in case (1), this is achieved by testing everyone at the same time with a perfectly accurate test that returns one bit (positive or negative). in this case, e = 1 and c = 1. however, when tests are separated in time (2), the order of testing becomes important. the optimal strategy discussed in the preceding section, testing everyone at different times, but before they have had the chance of infecting anyone else also works optimally and collapses the epidemic over a single infectious period. however, most strategies for testing n individuals during time ttest_interval before t0 have e < 1, and are not sufficient to completely collapse the epidemic using one testing round, as e depends on the relationship between the order of testing and the order of infections. for example, using a random order of testing allows some individuals that have already been tested negative to become infected during the ttest_interval (the mutual information between test results and person being infected at t0 is less than one bit). however, some other regimens using a perfectly sensitive test can collapse the epidemic (but not always prevent all future infections): for example, a geographical sweep where infections (individuals) are prevented from crossing a moving test front can be used to identify every infected individual in the population by performing a single round of n tests. in case (3), random sampling of n individuals, e is always less than 1. the testing becomes less efficient than testing each of the n individuals at the same time, because some individuals are tested twice, and some not at all; some information is thus not obtained, and some tests do not return information that is completely independent of information returned by other tests (sum of mutual information between all pairs of tests is not 0 bits). in other words, if individuals are selected randomly, during a given time interval, the tests will miss some individuals, and some individuals are tested more than once (this increases true positive rate for those individuals, but this does not make up for failing to catch some individuals entirely). considering the extreme case of immediate collapse, it may appear that testing in a time separated manner or by using random sampling will not work because non-concurrent testing can permit infections to cross the testing boundary, and random sampling clearly leaves some cases undetected. however, this very intuitive idea is incorrect, as collapsing an epidemic only requires that the rate of generation of new cases per current case is less than one. the limit for random testing can be obtained using the sir model extended with testing (sir+t), which abstracts away individuals and thus can (only) be used to investigate the effect of random, time-separated testing. analytically from this model, as shown above, the < 1 condition is true when tests are performed at a rate that is higher than 0 − 1 tests per mean infectious period. the same limit results from the following consideration: reducing 0 to less than one using the method representing the lower bound -a completely random testing regime -requires that an infected individual has less than an equal probability of (a) infecting another individual over (b) being tested and quarantined or recovering from the infection (analogously, in sir+t, the combined testing and recovery rate needs to be higher than the rate of new infections). events (a) can recur, but either event (b) terminates the chain. therefore, at r = 1 there will be on average one (a) event, which requires that the order of the infectious and protective events are randomly ordered with respect to each other, with equal density. this yields 0 − 1 tests and one recovery per 0 infections per infectious period, and an upper limit of 0 tests per infectious period at infinity (because as 0 → ∞ the expectation value for becomes the geometric series ∑ 2 %& outside of the theoretical consideration of = 1, multiple population-scale tests are always required to collapse the epidemic in the absence of other interventions that achieve the same aim. performing multiple tests over time imposes an additional constrain on optimality -the allocation of tests to each transmission interval. as described above, best performance of continuous testing and quarantine is thus achieved when testing is performed immediately after infection for each individual, or as requirement for exiting quarantine. testing blood before transfusion to prevent transmission of hiv or hepatitis c, testing at border crossings, conditional opening of lockdown, or some regimes that apply contacttracing may come close to approximating this limit, which for covid-19 is = ( " − 1)/ " = 0.57 per mean infectious period; fig. 2 ). however, in most scenarios, such testing efficacy is difficult to maintain over time (because contact is lost, and the unknown infectious intervals rapidly become randomly distributed over time). this level can thus be considered an upper limit of performance of any scenario applied at population scale. using a test whose true positive rate of 1 and testing everyone at the same time performs as well as the optimal strategy. as test sensitivity decreases, the performance of the concurrent regime becomes lower than optimal. however, concurrent testing still performs well above the lower limit obtained from the random testing model. the required pc rate to bring r0 < 1 using concurrent tests has a simple relationship with the exponential growth of infectious cases. over interval t-t0, pc > 1-(infectious cases at t0)/(infectious cases at t). however, it is not as simple to relate this to original r0, because the relationship between r0 and growth rate is a function of the distribution of the generation intervals (ref. 28 ). estimating at r0 = 2.4 using even probability distribution of infections over time, the infected population becomes approx. eight times larger in a single 14 day infectious period. this means that a testing regime that is regularly spaced at 14 day intervals should have pc value of > 7/8 = 0.875 to bring r0 < 1. this is confirmed using empirical simulations to assess the rate of exponential growth in the complete absence of immunity and all other types of interventions; the limit = 1 at 0 = 2.35 with testing every infectious period (14 days) is reached when ≈ 0.85 (compared to 0.58 for testing each individual directly after infection). the required testing interval at 0 = 2.35 and = 0.8 in the absence of other interventions and immunity is 11, 8 and 5.5 days for concurrent testing, testing each individual randomly once during each testing period, and continuous random testing, respectively. these considerations can be summarized as follows: the order of testing efficacies is: everyone before they have had a chance to infect anyone > everyone at the same time > everyone once during a period > testing by random sampling -with population-scale testing . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint remaining feasible and cost-effective by one or more orders of magnitude across all these regimens. the sir model fails to account for several key properties of real epidemics, such as social and geographical population structure, the discrete and stochastic nature of infection and disease progression, and the fact that testing cannot be instantaneous. to account for such more complex real-world phenomena, we implemented a stochastic network model using the gillespie algorithm for accurate numerical simulation of the stochastic dynamics. we used the seirsplus python package (https://github.com/ryansmcgee/seirsplus), which models an epidemic on a social graph, where each individual transitions between six states: susceptible, exposed, detected-exposed, infectious, detected-infectious, and recovered. the two detected states are used to model the effectiveness of testing and quarantine, and social distancing is modelled by removing edges from the initial social graph. we used a random social graph of mean degree 13 (median 10) and two-sided exponential tails, which was reduced to mean degree 2 for social distancing (lockdown) and quarantine. the population consisted of 10,000 individuals. in both shown scenarios we assumed that the test had 80% sensitivity, and epidemic parameters were modelled loosely after covid-19. detailed source code with comments and parameter settings for each model are available in the accompanying jupyter notebook at https://github.com/paulromer149/ubiquitous-testing. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 1, 2020. inequality that must be true to suppress transmission. for the epidemic to collapse, the weighted average of the natural reproduction number " and the reproduction number in self-quarantine ! must be less than one. here, represents the test true positive rate (fraction of all infectious individuals detected), and the rate of compliance. (c) parameters for a sir model with testing and a detected state. (d) requirements for testing to collapse an epidemic in the sir model with testing, expressed in terms of the testing rate required in a population where all individuals are susceptible, with inverse infectious interval . (e) parameters for the discrete, stochastic seir model on a social graph. each compartment was modelled for every individual on the social graph. (f) outcomes of ten simulation runs of the stochastic seir model on a social graph, showing total number of deaths as a function of the fraction tested every day, assuming compliance and true positive rate = 2/3. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 1, 2020. severe social distancing (lockdown) is applied after 20 days, and then lifted after 100 days. population-scale testing is implemented after day 100 (left) or not implemented (right). in all panels, shaded colored regions indicate policy regimes, and the total number of dead individuals is indicated in the sub-panel titles. for both panels, the product of compliance and test efficacy was set to = 0.8 and the testing rate was set to = 1/7. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint supplemental figure 1 | successful testing strategies under the stochastic seir model on a social graph. growth curves (daily new cases) showed that given = 2/3, testing at least every 11 days successfully flipped the sign of the exponential growth curve from day 20 (when testing started), whereas testing every 13 days was insufficient. red dashed curves, piecewise exponential fits for days 0 -20, 20 -100, and 100 -250. . cc-by-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 1, 2020. . https://doi.org/10.1101/2020.04. 27.20078329 doi: medrxiv preprint coronavirus disease 2019 (covid-19) situation report -59 impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand pandemics: risks, impacts, and mitigation the basic reproduction number (r0) of measles: a systematic review emerging infectious diseases and pandemic potential: status quo and reducing risk of global spread scientists say mass test in italian town have halted covid-19 there substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2) evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease 2019 in wuhan serological assays for emerging coronaviruses: challenges and pitfalls platinum nanocatalyst amplification: redefining the gold standard for lateral flow immunoassays with ultrabroad dynamic range detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr. euro surveillance : bulletin europeen sur les maladies transmissibles clinical presentation and virological assessment of hospitalized cases of coronavirus disease 2019 in travel-associated transmission cluster rapid colorimetric detection of covid-19 coronavirus using a reversetranscriptional loop-mediated isothermal amplification (rt-lamp) diagnostic platform rapid molecular detection of sars-cov-2 (covid-19) virus rna using colorimetric lamp development and validation of reverse transcription loop-mediated isothermal amplification (rt-lamp) for rapid detection of zikv in mosquito samples from brazil isothermal nucleic acid amplification techniques and their use in bioanalysis isothermal exponential amplification techniques: from basic principles to applications in electrochemical biosensors isothermal amplification of nucleic acids metagenomic sequencing with spiked primer enrichment for viral diagnostics and genomic surveillance a vision for ubiquitous sequencing loeffler 4.0: diagnostic metagenomics contributions to the mathematical theory of epidemics--i. 1927 counting absolute numbers of molecules using unique molecular identifiers the detection of defective members of large populations test everyone, repeatedly to defeat covid-19. medium medium.com/@sten.linnarsson/to-stop-covid-19-test-everyone-373fd80eb03b covid-19 mass testing facilities could end the epidemic rapidly infectious disease model with testing and conditional quarantine how generation intervals shape the relationship between growth rates and reproductive numbers we thank many colleagues for comments on the early version of the work. we are especially grateful to drs. minna taipale, mikko taipale and paul pharoah for review of the draft of manuscript. a draft of this paper was initially released as a public preprint (ref. 25) , and supporting, independently developed model reported by p.r. on www.paulromer.net. we also note that during the writing of this work, we became aware of two independent analyses, one by julian peto (ref. 26), and the other by a team consisting of david berger, kyle hirkenhoff and simon mongey that report similar conclusions (ref. 27) . key: cord-283517-7gd0f06m authors: deak, eszter; marlowe, elizabeth m. title: right-sizing technology in the era of consumer-driven health care date: 2017-08-01 journal: clinical microbiology newsletter doi: 10.1016/j.clinmicnews.2017.07.001 sha: doc_id: 283517 cord_uid: 7gd0f06m abstract technology for modern clinical and public health microbiology laboratories has evolved at an impressive rate over the last two decades. contemporary diagnostics can rapidly provide powerful data that can impact patient lives and support infectious disease outbreak investigations. at the same time, dramatic changes to health care delivery are putting new pressures on a system that is now focusing on patient-centric, value-driven, convenient care. for laboratories, balancing all these demands in a cost-contained environment remains a challenge. this article explores the current and future directions of diagnostics in our dynamic health care environment. the affordable care act (aca) in the united states was enacted by president obama in march 2010. the goal of the aca was to improve the quality of and access to health care by transforming insurance coverage and lowering health care costs. we have seen shifts in health care plans (i.e., account-based health plans) that have the consumers of the health care opting for lower monthly premiums with higher deductibles. these deductibles are often paid for by personal health savings accounts, thus pushing the costs of health care onto the individual consumer. couple this with an unprecedented boom in technology, which in some cases can offer on-demand diagnostics within the time of an office visit, and the result is consumer-driven health care, particularly for those who can afford it. despite recent administrative changes in washington and the uncertainty of "repeal and replace" in the republican agenda for the current aca, the trend toward consumer-driven health care, with an emphasis on pre-budgeted spending, is likely to continue. for consumers of a product who will continue to pay more of the bill, the bright side of this trend is a movement to value-based care delivery from the perspective of the affluent consumer. valuebased care is defined as safe, appropriate, and effective care at a reasonable cost, which is predicated on evidence-based medicine and proven outcomes. patients are looking for more pricing transparency and more options for efficient care delivery (i.e., telemedicine, retail care providers, and mobile health solutions). health care providers are trying to better understand consumer wants and needs, measure performance, and improve the patient experience; this is a distinct change from the historical fee-for-service system that did little to incentivize providers to produce value. from the laboratory's perspective, there continue to be operational challenges to lead these changes. emerging and re-emerging pathogens demand rapid responses at an unprecedented level. the skilled workforce continues to shrink, while the work demands go up. there are legislative influences on testing. at the same time, reimbursement and budgets are contracting. yet still, at the end of the day, the laboratory is expected to produce quality results for improved patient care. initiatives like antibiotic stewardship are helping to drive better outcomes with laboratory results, but many of these programs are dependent on post-analytical variables for the optimal impact on patient care to be realized [1, 2] . another key laboratory issue is the breath and scope of the technology that is now available. today, we have molecular point-ofcare (mpoc) devices that can provide a rapid diagnostic answer within 20 minutes in a clinic, multiplex pcr sample-to-answer devices that can screen for >20 analytes in a single specimen in about an hour, high-volume automation that can enhance throughput and efficiency in the clinical microbiology laboratory with digital imaging, and next-generation sequencing (ngs) that can reveal a treasure trove of information in a single test. combined, the changes in health care and technology have left many laboratories asking how to "right-size technology" for routine care while transforming practice. ultimately, change will depend on the goals that are driving the conversion and utilization of the technology into daily laboratory practice. factors may include syndrome-specific diagnostic needs, ease of use, the need for rapid results, improved sensitivity and specificity, operational needs (such as staffing and expertise), laboratory design (such as centralized versus decentralized models), cost, consumer demand, and the potential for improved patient outcomes. the laboratory must weigh all these factors while trying to make a business case to improve service despite the fact that there are few or no outcome data available to support the use of new technology. technology comes at a cost that is often shifted to the consumer, the patient. while consumer choice can help push innovation, one also has to wonder to what extent the market will allow the significant increases in testing costs that can come with technology. for example, in the case of acute gastroenteritis which is typically a self-limiting infection with the majority of specimens coming from an outpatient setting, traditionally a stool culture would be ordered that would cost a patient less than $100. the newer multiplex stool pcr panels can result in a charge that can cost a patient over $1,000. will patients be willing to bear paying this increased cost for such a diagnostic test long term? while one can agree that there is improved turnaround time, sensitivity and pathogen coverage in a sophisticated multiplex diagnostic assay, it must be used in conjunction with diagnostic algorithms that prevent needless additional downstream testing as well as excess costs. clearly, there is a need for diagnostic stewardship alongside antibiotic stewardship to improve quality and the prudent use of health care dollars. this article explores the impact of technology on the clinical and public health microbiology laboratory in the age of consumer-driven health care. testing considerations "right-sizing technology" means that the right test is offered at the right time for the right patient with maximal operational efficiency and cost-effectiveness. the outcome of right-sizing is to provide results with the potential to inform therapeutic and infection control decisions for improved care and, ultimately, reduced downstream costs. the diagnostic testing needs of a medical institution such as kaiser permanente in northern california, which is comprised of 21 hospitals and over 200 medical offices spread out over a wide geographical area with over 3.5 million members and serviced by a central laboratory, are significantly different than those of a 500-bed county hospital with an on-site laboratory. advances in technology have provided flexibility in diagnostic testing to address the differing needs of health care systems and the laboratories that serve them. for any given analyte, there are a number of highly sensitive and specific tests available from which to choose. considerations that go into the selection of a test or instrument platform for implementation include perceived turnaround time needs for improved patient care, sample volume requirements, number of tests expected, suitability for the intended laboratory based on available expertise and desired workflow, as well as cost. in the past decade, manufacturers have targeted their research toward development of more sensitive and specific mpoc diagnostic infectious disease platforms and tests. such mpoc tests have evolved for more practical use at the bedside. manufacturers have appreciably simplified tests by removing the need for sample manipulation and handling. instrumentation has become more automated and/or involves fully integrated systems that are portable or significantly smaller and more modular. instruments have also incorporated mechanisms for recording and transmitting results. all the while, tests have become faster while demonstrating improved sensitivity and specificity [3] . these modifications in technology have enabled molecular testing to migrate from large central laboratories. most poc tests are still moderately complex, which is defined by the clinical laboratory improvement amendments of 1988 (clia) as one requiring basic laboratory knowledge and training for personnel performing the test. users of these tests must adhere to clia regulatory requirements, which includes quality assurance, along with appropriate documentation, validation of analytical performance, proficiency testing, and ongoing competency training. clia director oversight is still required [4]. increasingly, diagnostic molecular tests are being designed and submitted for clia-waived status. clia-waived tests are defined by the food and drug administration (fda) as being "so simple and accurate as to render the likelihood of erroneous results negligible; or pose no reasonable risk of harm to the patient if the test is performed incorrectly" [4, 5] . based on this definition, non-laboratorians can perform the test without clia director oversight if they are following the manufacturer's instructions [4]. the first clia-waived mpoc test to receive fda approval was the alere i influenza a&b in 2015. to better ensure quality results are being reported, some of the new mpoc tests have incorporated internal electronic and reagent quality control (qc) and have built in a shut-down mechanism in the event of failed qc. since the waived testing program began in 1992, the number of approved clia-waived diagnostics has increased from 9 to over 100, with more than 20 analytes approved for infectious disease testing [6] . there are over 200,000 laboratories in the united states that now hold a certificate of waiver, which enables them to perform any clia-waived test [4]. however, just because anyone can perform the test does not mean they should. there must be an understanding of test limitations by all testing personnel. few non-laboratorians realize that the central laboratory filters out many inappropriate specimens, and nonlaboratorians require extensive training to understand the testing complexities of even waived tests. laboratories frequently receive incorrectly collected specimens and are asked to test them because there is a lack of appreciation of why these specimens would not be tested. for example, clostridium difficile testing is not performed on a formed stool specimen or for patients less than 1 year old due to the confounding issues of potential colonization, and as in a central laboratory, pre-analytical knowledge and conditions would need to exist to prevent misuse of testing. one question is whether we would be needlessly treating people in these cases if left to the facility performing the waived testing. also, testing a specimen type that is not included in the intended use of an fda cleared test will result in an off-label use of an assay. application of diagnostics in medicine is a balancing act between what we can do, what we need, and what we can afford. diagnostics will continue to evolve. it will become faster, cheaper, and easier to perform, but technology comes at a price, and implementing new technologies with faster turnaround times nearer the patient requires careful thought about placement within the flow of the patients. moving a rapid molecular test closer to the patient has the potential to have an immediate impact on therapeutic decisions. a prospective cohort study examined the potential cost benefit of near-patient mpoc testing for chlamydia trachomatis (ct) and neisseria gonorrhoeae (ng) in a clinic based on reduction of contact attempts [7] . as part of the study, 1,356 patients who had ct/ng nucleic acid amplification tests (naat) also completed a questionnaire to ascertain the maximum time patients were willing to wait after consultation for ct/ng test results and thus the potential for immediate treatment of individuals testing positive while preventing unnecessary treatment of patients who tested negative. the study determined that of the 1,356 patients, 26.2% were unwilling to wait even 20 minutes for the results of an mpoc test. based on the results from a questionnaire, of 129 patients who tested positive by a naat, use of a 20-minute mpoc test would have resulted in immediate treatment of 71.9% of the individuals, whereas a 90-minute test would have influenced the immediate treatment time of only 3.1% of these positive patients. of 1,227 patients who tested negative for ct/ng by naat, use of a 20-minute mpoc test would have prevented 3.2% of empirical treatments, while a 90-minute mpoc test would have prevented 0.3% of the empirical treatments. another study looked at the impact of the 90-minute xpert ct/ng test when sample collection was performed on arrival of the patient, with the intention being that the patients receive their results and treatment as needed during the appointment [8] . actual wait times were evaluated. only 21.4% of the patients received their results before leaving the clinic with the 90-minute xpert test. it was determined that a test turnaround time greater than 30 minutes would likely not be effective, given that it took 48 minutes from the time of sample collection to the clinical consultation. for such mpoc tests to affect patient management, results will need to be available at the time of consultation to maintain patient flow. there are limited studies examining the clinical impact of mpoc tests for other infectious diseases [9] . the infectious disease society of america's practice guidelines for group a streptococcus (gas) currently recommends two-tiered testing for pediatric patients [10] . it is recommended that rapid antigen detection tests (radts) be performed on throat swabs due to the rapid turnaround time of the test (<10 minutes). however, due to the low sensitivity, bacterial cultures are recommended for confirmatory testing of negative radts. clia-waived mpoc gas diagnostic tests with turnaround times comparable to those of the radts are becoming more readily available. these pcr tests do not detect group c or group g streptococci. however, they have been shown to have improved sensitivity for detection of gas, even compared to culture [3, 11] . additional studies are needed to assess the clinical value of these tests and the potential for detection of low-level colonization. in a retrospective study, blaschke et al. [12] examined visits to u.s. emergency departments (eds) using data from the national hospital ambulatory medical care survey. they found that rapid influenza diagnostic tests (ridt) were performed during 4.2 million visits and that 42% of influenza diagnoses were made in association with ridt. test results did suggest that some influence on physician behavior occurred, as patients diagnosed with influenza had fewer ancillary tests ordered (45% versus 53% of visits), fewer antibiotic prescriptions (11% versus 23%), and increased antiviral use (56% versus 19%) when the diagnosis was made in association with ridt. thus, diagnosis of influenza made in conjunction with ridt resulted in fewer tests and antibiotic prescriptions and more frequent use of antivirals. early influenza virus antigen-based poc tests lacked sensitivity [13, 14] . the newer mpoc tests are significantly more reliable and have the potential for improved outcomes in the poc environment [15] . however, as more mpoc options become available, it will be important for laboratories to continue to assess their performance, as not all mpoc tests may demonstrate the same sensitivity and specificity [16] . a recently published open-label, randomized, controlled trial looking at the routine use of mpoc testing of respiratory viruses in adults presenting to hospital with acute respiratory illness enrolled 720 patients (362 assigned to poc testing and 358 to routine care). the authors found that routine use of mpoc for respiratory viruses did not reduce antibiotic usage. however, many patients in the study were already started on antibiotics before the mpoc results were available. mpoc was also associated with a reduced length of stay and improved antiviral use [17] . the clinical laboratory and diagnostic effectiveness (clade) study was a prospective observational cohort study undertaken to assess the impact of a highly sensitive (97%) 20-minute cliawaived mpoc influenza test on patient management in the emergency department (ed) and associated economic benefit [18] . the study indicated that 57% of the ed physicians changed their management of patients, primarily of patients who tested influenza virus negative. the influenza test results impacted decisions about hospital admissions and discharges, ordering of additional medical procedures, and laboratory tests, as well as antimicrobial and antiviral usage. this model, applied to 2,000 ed visits, revealed a cost savings of nearly $800,000 [19] . the study reiterated that getting the right information to the right people at the right time has the ability to impact clinical care. community pharmacies have also become effective players in infectious disease management through provision of vaccinations and are increasingly offering poc tests. over 5% of the laboratories with a certificate of waiver are in pharmacies [20] . a physician-pharmacist collaborative practice agreement (cpa) can be set up to delegate prescriptive authority to pharmacists for treatment of infectious diseases based on clia-waived poc test results. the use of this model has been shown to be effective for influenza virus and gas [21, 22] . in a pilot study conducted at 55 pharmacies in 3 states using the cpa model, pharmacists performed a clia-waived poc influenza test to screen individuals presenting with influenza-like symptoms [22] . pharmacists provided oseltamivir to all individuals who tested positive for influenza virus by the poc test within an hour of the initial encounter. meanwhile, individuals who tested negative for influenza virus did not receive inappropriate antiviral therapy. in a similar pilot study, pharmacists performed a clia-waived poc gas diagnostic test to screen individuals coming into the pharmacies with symptoms of pharyngitis [22] . about 13 million physician office visits are due to acute pharyngitis every year. rates of antimicrobial use as high as 80% have been reported in the literature to treat pharyngitis, although gas has been shown to be associated with only 10% to 30% of pharyngitis cases. of the individuals screened in the study, about 18% tested positive for gas and were thus treated with an antimicrobial consistent with prevalence studies. this study indicates a significant potential of poc tests in pharmacies to decrease inappropriate antibiotic usage in the outpatient setting, although it must be emphasized that moving testing from a central laboratory to a medical unit or more accessible location does not guarantee improved outcomes without systematic changes in management. additionally, when poc testing is performed by clinical staff, errors can arise from a lack of understanding of the importance of qc and quality assurance [23] . the american academy of microbiology recently convened a colloquium of industry thought leaders and subject matter experts to evaluate the role of "near-patient testing," as well as the impact of this diagnostic "paradigm shift" for microbiology [24] . the report from this colloquium was recently published, with thoughtful recommendations. these recommendations were divided into three categories: (i) implementation, (ii) oversight, and (iii) evaluation. key recommendations included (i) rethinking patient flow in the clinical setting to optimize poc utilization, (ii) retaining proper oversight by the microbiology laboratory, and (iii) the need for better outcome data which includes health economics data [24, 25] . syndromic testing has gained popularity in recent years. these multiplex tests detect most common and some uncommon pathogens associated with a syndrome based on similar signs and symptoms. in 2008, the luminex xtag respiratory viral panel was the first multiplex molecular panel to receive fda clearance in the united states. since then, a number of large syndromic multiplex panels have been fda cleared for use in clinical diagnostics. multiplex panels currently exist for gastroenteritis (gastrointestinal [gi]), bloodstream infections, and meningitis/encephalitis. although some instrument platforms still require offline extraction, many platforms have evolved into sample-to-result assays requiring less than 5 minutes of hands-on time with a turnaround time of 1 to 2 hours. for the most part, the sensitivity and specificity of these multiplex tests are comparable; however, sensitivity and specificity of the individual targets can vary by platform. multiplexed molecular panels that can target up to 27 pathogens have the potential to simplify ordering for the physician, as well as workflow in the laboratory, and require less expertise on both ends as a single automated test. as new faster and simpler technologies are introduced for multiplexed platforms, there has been continued growth in adoption of these tests for clinical diagnosis. however, there are limitations to this shotgun approach that are associated with high financial costs as reimbursements continue to decrease, as well as test interpretation dilemmas, especially in the context of low prevalence rates. a point-counterpoint paper was recently published on large multiplex panels as first-line tests for respiratory and gi pathogens [26] . a proposed advantage was the potential to provide timely results for targeted therapy. however, detecting more pathogens might not impact treatment at all, as low sensitivity for certain targets can result in missed diagnoses with additional consequences. also, low prevalence rates for many of the targets may lead to false positives followed by unnecessary treatment and potentially delayed diagnosis. diagnostic errors caused by inappropriate ordering can cause delays in care or harm patients [27] . pre-test probability is important with sensitive molecular assays. a tuberculosis meningitis case that was misdiagnosed as herpes simplex virus 1 (hsv-1) infection presented by gomez et al. [28] underscored the risk of using syndromic multiplex assays without fully understanding the limitations associated with them. the patient's true diagnosis was delayed because of an initial hsv-1-positive filmarray meningitis/encephalitis (me) panel result, which ultimately contributed to severe neurological sequelae. on the other end of the spectrum, another recent article reported that the meningitis panel demonstrated reduced sensitivity for hsv detection from pediatric cerebrospinal fluid specimens [29] . positive results due to panel detection of colonization in a gastrointestinal panel with c. difficile and long-term shedding of organisms such as norovirus or rotavirus can also lead to inappropriate therapeutic decision making [26, 30, 31] . the selection of the platforms that laboratories implement is usually based on accuracy, cost, hands-on time, level of complexity, staffing, throughput, and convenience. however, we also need to think about how we are going to use the test once we implement it, whether to restrict ordering of these tests to only the sickest patients or to offer them to everyone as a first-line test. for a highvolume laboratory, using a costly multiplex platform as a first-line test is not feasible. patient outcome data based on large multiplex tests has been slow to evolve [9, 32] . additional data are needed to determine which patients will benefit from this type of testing. for respiratory infections, testing needs may vary by season and geography. during flu season, it may be more cost effective to perform a targeted influenza/rsv panel on patients presenting with respiratory symptoms before testing for a broad panel of organisms. panels may be better suited to the critically ill or immunocompromised populations. implementation of a testing algorithm for laboratory utilization of molecular multiplex panels with decision support built into ordering may be needed to avoid substituting one set of unintended consequences for another. education and mandated improved test utilization will hopefully improve economic outcomes for the laboratory and decrease the financial burden on the patient. clia-waived status is being obtained for multiplex platforms, with a number of implications. in october 2016, biofire diagnostics received fda clearance and clia waiver for the filmarray respiratory panel ez, which requires only 2 minutes of hands-on time and has a run time of 1 hour. the ez panel is the clia-waived version of the fda-cleared respiratory panel, which tests for 14 viral and bacterial pathogens, adenovirus, coronavirus, human metapneumovirus, human rhinovirus/enterovirus, the influenza viruses, parainfluenza virus, respiratory syncytial virus (rsv), bordetella pertussis, chlamydia pneumoniae, and mycoplasma pneumoniae. there are numerous questions that arise from the availability of these expensive multiplex tests for placement outside the central laboratory without required oversight by technical experts. it will be necessary to determine what algorithms will be used by physicians to decide which patients to test and how results will be interpreted, particularly if multiple targets are positive. until recently, multiplexed molecular panels have been one size fits all. panels with fixed prices based on fixed targets may be excessive and may not necessarily include all the pathogens being considered by the physician. multiple platforms may be required in order to address the needs of the physician in such cases. this scenario becomes a very expensive approach to diagnostic testing. testing needs to fit the medical center and be tailored to the population that the laboratory services. the diagnostic needs of a children's hospital can be very different from those of a medical center that caters to a large elderly population. likewise, a cancer center or a transplant center may have very specific diagnostic needs. nanosphere has fda clearance for its verigene respiratory pathogens flex nucleic acid test (rp flex) on the automated, sample-to-result verigene system, which allows flexibility in testing and is the first multiplex test that is scalable. each rp flex cartridge contains 16 viral and bacterial targets. the physician can order any combination of targets for testing. laboratories pay for only the targets that are ordered. results for other targets not initially ordered on the panel can be reflexed at an additional cost without having to re-run the test. for example, one possible scenario during influenza season is to first order only influenza virus targets or influenza virus plus rsv from the panel. if the result is negative, adenovirus, human metapneumovirus, rhinovirus, and parainfluenza virus can be ordered and the results released. bordetella sp. targets can be ordered separately based on clinical suspicion. medicare recently proposed universal non-coverage for respiratory multiplex panels, which will make it even more challenging for laboratories to utilize the technology. there has been a lot of discussion surrounding multiplex gi panels and whether this is clinically meaningful testing. in may 2017, medicare administrative contractor palmetto gba posted draft local coverage determinations (lcd) for two types of multiplex infectious disease tests [33] . this decision would provide limited coverage for nucleic acid amplification-based gi pathogen panels and a non-coverage decision for multiplex pcr respiratory viral panels. the lcd proposed coverage for molecular panels to detect gi pathogens would be limited to 5 targets (salmonella, campylobacter, shigella, cryptosporidium, and shiga toxin-producing escherichia coli), which represent the majority of foodborne pathogens. current infectious diseases society of america guidelines for infectious diarrhea suggest a selective approach to workup based on whether the patient has traveler's diarrhea with fever or blood, hospital-acquired diarrhea, or persistent diarrhea [34] . a flex platform may be more suitable for testing of diarrheal illnesses. regardless of the number of analytes on a gi panel, the cost when reimbursement is limited to a maximum of 5 targets, may be affected only by the actual cost of the panel itself. the different approach for multiplex pcr testing for respiratory viruses, apart from influenza a/b viruses, with or without inclusion of rsv, is being applied. the reasoning for non-coverage included the fact that the pathogen targets in such panels do not represent a common syndrome and that targets can be very rare. the notice said that a "one size fits all testing approach is screening and not a medicare benefit" and went on to say that "one size fits all panels contribute to test over-utilization, and increased cost to health care without specific benefit to a given patient. testing should be limited to organisms with the greatest likelihood of occurrence in a given patient population, and if results are negative, with a reflexive testing to more exotic organisms." examples are c. pneumoniae or b. pertussis in combination with rhinovirus, influenza viruses, and rsv [33]. telemedicine and remote diagnostics can take on several roles. today with total laboratory automation (tla) and digital microbiology, laboratories have the capability to read and review slides and plates from facilities that are miles or oceans away. a recent clinical microbiology newsletter article highlighted the impact of telemedicine on gram stains in the health care system in arizona [35] . telemedicine companies like vsee (www.vsee.com) have set up field kits with multiple devices that enable remote diagnosis. through the use of software like ehealth opinion, rural patients and physician experts in the u.s. and china are connected through the virtual doctor project [36] . such projects are expanding in many parts of the developing world [37] . telemedicine companies like doctor on demand (www.doctorondemand.com) offer virtual doctor's visits through tablet computers or smartphones. other areas of remote diagnostics being explored are internetbased programs and self-collected specimens for mail-in testing. there are currently fda-approved ct/ng naat assays for self-collected specimens in clinical settings. internet-based mailin programs for sexually transmitted infection (sti) screening has been successfully implemented in a public health system (www. iwantthekit.com), as well as through private companies (mylabbox.com) [38] . public health england in 2015 published a guidance document on commissioning an internet-based chlamydia screening program [39] . such strategies are aimed at diagnostic testing and improving access over the continuum of care. with this new way of delivering care comes the question of validating at-home self-collected specimens and the stability of a specimen mailed through the post. while sti programs may be a starting point for specimen self-collection, it begins a conversation that would expand the realm of consumerism in health care to a new level. one could argue that no one is better able to properly collect a specimen than the person with the greatest interest in the results, the patient. clinical studies comparing clinician-collected and self-collected specimens in a clinical setting for ct/ng have demonstrated that self-collected vaginal specimens have equivalent performance with acceptable patient satisfaction [40, 41] . for a recent review of self-collected specimens for infectious disease testing, see tenover et al. [42] . internet-based programs have the potential to triage non-critical medical needs while reducing visits to traditional brick-and-mortar clinics. at kaiser permanente, virtual visits have been used for several years through secure e-mails, telephone calls, and some video encounters. in the northern california kaiser permanente region, which has over 8,000 physicians and over 3.5 million members, virtual visits grew from 4.1 million in 2008 to 10.5 million in 2013, with projections that virtual visits will soon exceed physical visits [37] . near-patient testing for sti programs, such as the dean street clinic in london, offering walk-in sti testing and treatment with an short message service or sms text on a cell phone to let patients know their results (www.dean.st/testing), are also available. investment in such programs is evolving, yet what is lacking is the return on investment (roi) analysis, which is needed to further policies that could help provide financial support for the evolution of technology in daily practice. ngs has had one of the most significant impacts on microbial sciences since the advent of pcr. through initiatives like the cdc's advanced molecular diagnostics (amd) and response to infectious disease outbreaks, public health microbiology has started to transform into the next generation of thinking for the investigation, prevention, and control of infectious diseases. ngs has provided insight into questions that just was not possible through previous technology. for a review of ngs technologies and amd see maccannell [43] . ngs platforms like the minion (oxford nanopore) can provide portable real-time ngs analysis. the system is miniature in size, plugs into the usb port of a laptop, and offers minimal sample preparation at a low cost (https://nanoporetech.com/products/ minion). the potential of such technology is just beginning to be realized. applications such as the direct detection of mycobacterium tuberculosis from sputum for identification and antimicrobial susceptibility prediction available the same day have been described [44] . barriers are bioinformatics, interoperability of results, and building a workforce with a new skill set and infrastructure to support it. given the debate around reimbursement and multiplex panels, it will be interesting to see where the conversation leads with ngs. ngs will provide much more information than a 20-plex respiratory or stool panel. the technology is already changing practice in the public health laboratory, and the clinical microbiology laboratory is following [45, 46] . twenty years ago, the cdc created pulsenet, a molecular-subtyping network of federal, state, and local public health laboratories designed to facilitate the identification of and response to outbreaks caused by bacterial foodborne pathogens. the specific objectives of pulsenet are to detect foodborne disease case clusters through comparison of pulsed field gel electrophoresis (pfge) "fingerprint" patterns, to facilitate early identification of commonsource outbreaks, and to help food regulatory agencies identify areas where implementation of new measures is likely to improve the safety of the food supply. at the time, pfge was considered cutting-edge technology. to celebrate the 20th anniversary of puslenet, the economic impact of the program was recently published [47] . pulsenet costs roughly $7.3 million to operate but saves more than $500 million annually in medical and productivity costs avoided [47] . this roi is impressive, but the fact that it was 20 years before this economic analysis was published is surprising given the program's success. with federal budget cuts to critical programs that support national infrastructure, evaluating and communicating the value of technology to the health economics of the nation should be part of the national strategy. looking ahead, it is clear that it is only a matter of time before pfge will be replaced by ngs for such foodborne outbreak investigations in the pulsenet system. the economic impact of ngs should be analyzed in a timely manner so that the roi of this powerful technology can be communicated to the appropriate funding agencies. the same is true for the clinical microbiology laboratory. while it sounds very appealing to place an mpoc influenza virus platform nearer to the patient in the ed or in urgent care, the initial investment to place and maintain that testing may be a daunting sell to administrators. the question around this roi is where the cost avoidance is over the continuum of care. for patients coming through the ed during influenza season the greatest impact to patient management would be avoiding a hospital admission. the average cost of a hospital admission due to pneumonia is $14,143, according to the agency for healthcare research and quality (rockville, md) [48] . compared to the cost of a tamiflu prescription, which is roughly $100, the avoidance of hospital admission would clearly have the greatest financial impact. when one adds the implementation costs of the mpoc instrument and reagents over the course of a flu season, the roi can become a more comprehensive sell to the c-suite or the corporation's senior executives. tables 1 to 3 demonstrate the estimated cost of implementing an mpoc influenza assay in a hospital system with 14 medical centers, each with an ed. the total cost of implementation for instrument and reagents over 3 months of the respiratory disease season adds up to $420,00 ( table 1 ). the number of admissions that would need to be avoided to break even on the cost of implementation is 30, or roughly 2 per ed ( table 2 ). this number is equal to 0.5% of the estimated 6,000 tested patients over the course of the respiratory disease season, which would equate to an roi of <3 months (table 3) . another factor to consider when thinking about a rapid mpoc test is that placing testing correctly in the system may actually be cost neutral, because testing is being shifted without the need to bring on any additional testing. outcome studies looking at technology placement are forthcoming [9] . in 1987, the original patent for pcr was issued, with kary mullis listed as the inventor. in 1993, he was awarded the nobel prize for pcr. these accolades came only after the article reporting the invention was rejected by both nature and science. it was finally published in methods in enzymology [49] . as one reflects on the impact of technology like pcr, it is easy to lose sight of how far technology has come. the conversation is now focused on where we need to go. in the world of instant gratification that we have become so accustomed to living in, it is important to remember that changes in medicine take time and require data built on evidence. shifting practice also requires buy-in by stakeholders that the laboratory may not be considering. during the lifespan of technology, the stakeholders will range from biotechnology companies to laboratories, physicians, regulators, policymakers, guideline committees (steered by industry thought leaders), payers, and patients. these stakeholders will influence the maturity of technology application over time. thus, the laboratory community must assess and factor in key drivers that address need, satisfy administration, and influence health economics. properly designed pilot studies remain an important step in assessment. publishing these results is key to moving the field forward. sharing information that may be initially considered only for internal quality improvement projects is essential. laboratories could benefit from more coordinated collaboration from stakeholders, who have a vested interest in such data and the impact on patient care and health economics. at the end of the day, we are all consumers of health care. we should all be looking around asking, "what are our expectations?" with the aca, we have more patients to take care of and fewer health care dollars to do it with in an imperfect health care system. we have improved and expanded technology that allows us to ask how we improve access, overcome barriers, and provide smarter care. technology can help get us there, but thoughtful approaches to technology placement and health care delivery will make it a reality. impact of an antimicrobial stewardship program on antimicrobial utilization, bacterial susceptibilities, and financial expenditures at an academic medical center the confounding role of antimicrobial stewardship pro grams in understanding the impact of technology on patient care accurate detection of streptococcus pyogenes at the point of care using the cobas liat strep a nucleic acid test clia waived testing in infectious diseases the waiting game': are current chlamydia and gonorrhoea near-patient/point-of-care tests acceptable to service users and will they impact on treatment? impact of deploying multiple point-of-care tests with a 'sample first' approach on a sexual health clinical care pathway. a service evaluation implementation of non-batched respiratory virus assay significantly impacts patient outcomes in the icu clinical practice guideline for the diagnosis and management of group a streptococcal pharyngitis: 2012 update by the infectious diseases society of america point-counterpoint: a nucleic acid amplification test for streptococcus pyogenes should replace antigen detection and culture for detection of bacterial pharyngitis a national study of the impact of rapid influenza testing on clinical care in the emergency department evaluation of multiple test methods for the detection of the novel 2009 influenza a (h1n1) during the new york city outbreak role of rapid immunochromatographic antigen testing in diagnosis of influenza a virus 2009 h1n1 infection performance of the molecular alere i influenza a&b test compared to that of the xpert flu a/b assay direct comparison of alere i and cobas liat influenza a and b tests for rapid detection of influenza virus infection routine molecular point-of-care testing for respiratory viruses in adults presenting to hospital with acute respiratory illness (respoc): a pragmatic, open-label, randomised controlled trial experience with the roche cobas liat rapid influenza a/b assay during influenza season: analysis of test performance and qualification on the impact of patient management in the emergency department setting clinical microbiology director. personal communication pharmacists in the laboratory space: friends or foes? use of clia-waived point-of-care tests for infectious diseases in community pharmacies in the united states antimicrobial stewardship in outpatient settings: leveraging innovative physician-pharmacist collaborations to reduce antibiotic resistance practical challenges related to point of care testing changing diagnositc paradigms for microbiology advances afoot in microbiology point-counterpoint: large multiplex pcr panels should be first-line tests for detection of respiratory and intestinal pathogens improving diagnosis in health care delayed diagnosis of tuberculous meningitis misdiagnosed as herpes simplex virus-1 encephalitis with the filmarray syndromic polymerase chain reaction panel comparative evaluation of the filmarray meningitis/encephalitis molecular panel in a pediatric population norovirus and medically attended gastroenteritis in u.s. children optimum diagnostic assay and clinical specimen for routine rotavirus surveillance impact of a rapid respiratory panel test on patient outcomes practice guidelines for the management of infectious diarrhea telemicrobiology: focusing on quality in an era of laboratory consolidation telemedicine in primary health: the virtual doctor project zambia the patient will see you now: the future of medicine in your hands internet-based screening for sexually transmitted infections to reach nonclinic populations in the community: risk factors for infection in men internet-based chlamydia screening guidance for com missioning. london: phe publications internet-based screening for chlamydia trachomatis to reach non-clinic populations with mailed self-administered vaginal swabs self-collected versus clinician-collected sampling for chlamydia and gonorrhea screening: a systemic review and meta-analysis self-collected specimens for infectious disease testing next generation sequencing in clinical and public health microbiology same-day diagnostic and surveillance data for tuberculosis via whole-genome sequencing of direct respiratory samples detection of cytomegalovirus drug resistance mutations by next-generation sequencing the role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the eucast subcommittee an economic evaluation of pulsenet: a network for foodborne disease surveillance health cost and utilization project dancing naked in the mind field: vintage books key: cord-300930-47a4pu27 authors: beigel, r.; kasif, s. title: rate estimation and identification of covid-19 infections: towards rational policy making during early and late stages of epidemics date: 2020-05-24 journal: nan doi: 10.1101/2020.05.22.20110585 sha: doc_id: 300930 cord_uid: 47a4pu27 pandemics have a profound impact on our world, causing loss of life, affecting our culture and historically shaping our genetics. the response to a pandemic requires both resilience and imagination. it has been clearly documented that obtaining an accurate estimate and trends of the actual infection rate and mortality risk are very important for policy makers and medical professionals. one cannot estimate mortality rates without an accurate assessment of the number of infected individuals in the population. this need is also aligned with identifying the infected individuals so they can be properly treated, monitored and tracked. however, accurate estimation of the infection rate, locally, geographically and nationally is important independently. these infection rate estimates can guide policy makers at both state, national or world level to achieve a better management of risk to society. the decisions facing policy makers are very different during early stages of an emerging epidemic where the infection rate is low, middle stages where the rate is rapidly climbing, and later stages where the epidemic curve has flattened to a low and relatively sustainable rate. in this paper we provide relatively efficient pooling methods to both estimate infection rates and identify infected individuals for populations with low infection rates. these estimates may provide significant cost reductions for testing in rural communities, third world countries and other situations where the cost of testing is expensive or testing is not widely available. as we prepare for the second wave of the pandemic this line of work may provide new solutions for both the biomedical community and policy makers at all levels. covid-19 is a deadly disease caused by the sars-cov-2 rna virus. this novel coronavirus created an epidemic of global proportions killing over 200,000 people worldwide (as of april 28 th , 2020) and infecting millions. it also caused a medical crisis and unprecedented disruptions that long-term are likely to increase the risk for multiple-socio-economic downturns that are associated with both mortality and chronic disease. the pandemic created many urgent problems such as the development of antiviral medications, vaccines, and inexpensive and widely available testing capability, expanding er and icu capabilities and much more. however, proper response to the pandemic requires estimates of the rate of infection via testing. the challenge of rapid testing has created an outstanding community response from industry and academic centers producing tests to diagnose and identify infected patients. the majority of these tests rely on pcr based techniques that are very well established. newer tests are based on isothermal amplification and most recently crispr based methods using cas-13. these tests can deliver a result in minutes to hours. high throughput sequencing is also an option for large scale testing. in addition to the biomedical crisis, the virus has also created major challenges for policy makers that need to make life and death decisions based on ethical, clinical and economic factors of unprecedented importance. a full shutdown of the economy and forced social distancing create a colossal stress on the economy, unemployment, and collapse of many business sectors. opening the economy would increase risk for the aging population and people with pre-existing conditions such as diabetes, cancer, cardiovascular disease, respiratory disease, and immunodeficiency. the decisions facing policy makers are very different at the beginning of an emerging epidemic where the infection rate is low vs middle stages (when the rates are rapidly climbing) and at the later stages where the epidemic curve flattened to a low and relatively sustainable rate. during the early stages it is relatively inefficient to test millions of patients who might be suffering from symptoms caused by rsv or influenza. at the same time is it important to monitor any unexpected turns in the progression. in many rural communities the rates usually remain low except local bursts that need to be contained. similarly in third world countries it is economically and practically impossible to perform many tests early on. mathematically, the problems of identifying infected individuals ( identification ) and estimating the total number of infected individuals in a given population ( infection rate ) are related but in fact can be addressed by subtly different algorithms to reduce the number of tests needed and thereby the total cost of doing testing. these methods generally rely on a well-studied area called combinatorial group testing. however, as we will demonstrate in this brief communication, estimating the number of infected individuals can be solved by novel adaptation of methods developed in theoretical computer science aimed at approximate counting. here we refer to these methods as aca (approximate counting algorithms). more specifically, we describe comparatively efficient methods enabling estimation of the total number of infected individuals. intuitively, we pool samples from multiple individuals and repeatedly test these pools for the virus with a single test (or perhaps a small constant number of tests to achieve better sensitivity). we provide a detailed analysis of the accuracy of the approximate counting procedure with both theoretical analysis and simulation. as a simple example, if the infection rate in a population of 1,000,000 is around 1% we can produce an unbiased estimate of the infection rate with variance approximately 7·10 −5 by making 20 group tests. in contrast, testing a sample of 20 individuals to estimate the infection rate would produce an estimate whose variance is 4.95·10 -4 . our variance is approximately 7 times smaller. in addition to rate estimation we provide a review and analysis of several identification algorithms that can be deployed in communities with low infection rates that achieve reasonable improvement over the standard algorithms for group testing that have been previously explored. we first abstract the key computational problems as follows: • estimate the rate of the infection in the population or approximately count how many people test positive in a population of a given size with as few partially pooled tests as possible. • identify the people who are positive with as few partially pooled tests as possible. we focus on these problems in the context of the population that are actively suffering from the disease, rather than post-infection and recovery. we assume the testing is done using established genomic testing procedures. while similar methods are feasible to implement to detect individuals that have or already had the disease (e.g. via antibody presence testing) we expect this number to be significantly higher and our methods are comparatively more effective for lower fractions of individuals testing positive. batch testing assumption: given a set of samples from a set s of people (infected or not), it is technically feasible to form a single batch consisting of all samples and test that batch with a single test. the batch will test positive if and only if at least one person in s is positive. this assumption is reasonable for small batch sizes (e.g. 100 ≤ |s| ≤ 1000). we will provide methods to alleviate this technical restriction on batch size in the discussion. henceforth, n will refer to the (known) number of people (total size of the population) and k will stand for the (possibly unknown) number of people who will test positive. thus k/n is the infection rate. we study two problems: identification of infected individuals and estimating k by approximate counting . the formulas used throughout the methods section are provided in the supplement for convenience. typically, one estimates the infection rate by sampling individuals. this estimate of the rate using the sample mean, is known to be highly inaccurate when the probability of infection p is small, because the variance of the estimator is much larger than p 2 . using group testing, we will produce estimates whose variance is asymptotically proportional to p 2 . therefore our proposed methods are superior to sampling individuals when p is small. we will define a random variable y such that • y can be calculated by making ⌈logn⌉ batch tests • e(2 y ) = θ(k), i.e., e(2 y ) is provably asymptotically proportional to k • e(2 y ) can be computed exactly given n and k • using linear regression we can find constants a and b such that p ≈ e((a2 y +b)/n) • the variance of the estimator (a2 y +b)/n is o(p 2 ). in contrast, the variance of the sample mean estimator is p(1−p)/⌈logn⌉. when p is small, our estimate is much more accurate than sampling individuals. we will also define a random variable w such that • w can be calculated by making m⌈logn⌉ batch tests, where m is a parameter • w is the arithmetic mean of m independent copies of y • we will use w similarly to obtain a nearly unbiased estimator for p whose variance is o(p 2 /m). • in practice this is better than using the arithmetic mean of m independent copies of 2 y . some identification algorithms based on batch testing are already known. we design two new highly parallel algorithms that are efficient for small p. the batch size for these algorithms is not fixed, but instead can be chosen optimally by making a calculation based on the estimated infection rate. we will describe how to choose an algorithm and its batch size given n and an estimate for k. one particularly favorable aspect of these algorithms is the fact that they use very few rounds of group testing which makes them easier to implement in practice than competing methods. to provide an intuitive example illustrating the principle of group testing methodologies we begin with a magic trick which is folklore in popular mathematics (figure1). we ask the reader to think of a number x between 0 and 31, e.g., x = 5. we now perform five binary tests on groups we carefully design. each test returns 1 if the number x is in the group and 0 otherwise. in this case the result of the tests would be 00101 = 5, thereby identifying the hidden number. the tests and their results are listed below in figure 1 for completeness. _________________________________________________________________________ is your number in this set: {16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}? _0_ is your number in this set: {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31}? _0_ is your number in this set: {4,5,6,7,12,13,14,15,20,21,22,23,28,29,30,31}? _1_ is your number in this set: {2,3,6,7,10,11,14,15,18,19,22,23,26,27,30,31}? _0_ is your number in this set: {1,3,5,7,9,11,13,15,17,19,21,23,25,27,29 ,31}? _1_ _________________________________________________________________________ figure 1 : the magic trick: the answer is 00101 = 5 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10. 1101 the magic trick algorithm is described in full below. number the people 0 through n−1. write those numbers in binary. number the bits right to left starting from 0. (the i-th bit is in position 2 i in the binary representation. alternatively, bit i (x) = (x div 2 i ) % 2.) • let m = ⌈logn⌉. all logarithms are base 2 unless otherwise specified. • let x be the number whose binary representation is b m b m−1 ···b 0 person number x is the one who is testing positive, as revealed by the magic trick above. _________________________________________________________________________ complexity analysis: each person sample is divided into ⌈logn⌉ samples. exactly ⌈log⌉ tests are performed, and they can all be performed in parallel. the logistics of the process of producing the pools (batches) is not considered in this paper and can be performed by robotics or multi fluidic platforms. the size of the batches is n/2, which may potentially pose sensitivity and engineering challenges. however, when k = n/2, information theory tells us that at least n − 0.5logn − 1 tests are required, so we cannot do significantly better than just testing every individual. we now describe approximate counting algorithms that use pools of samples to estimate accurate infection rates. we also provide sketches of the complexity analysis that provide bounds on the number of tests needed to estimate these rates. as alluded to before we are focusing on testing populations with a relatively low disease rate. approximate counting algorithm (aca1): number the samples randomly 1 through n. choose independently subsets of size ⌈n/2⌉, ⌈n/4⌉, ⌈n/8⌉, ⌈n/16⌉, …, 1. let y = the number of subsets that test positive. then ey ≈ log(k) where k is the number of infected individuals. complexity analysis: each person provides a single sample. ⌈log 2 n⌉ tests are performed, and they can all be performed in parallel. the largest batch size is ⌈n/2⌉, which may pose some challenges as mentioned above. if k is not very small it is still possible to deal with the large batch size issue. for example, if the maximum allowed batch size is b then we could assume that all batches larger than b individuals would give the same test result as the largest batch. if all batches test negative, then we would estimate that the infection rate is less than 1/b. we can run aca1 several times to produce a more accurate estimate w, which is discussed in the results section. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10.1101/2020.05.22.20110585 doi: medrxiv preprint in this section we will follow up on the rate estimation algorithms from the previous section and develop exact identification algorithms (both probabilistic and deterministic) of infected patients in a population with a low infection disease rate. we will present three algorithms , analyze their comparative performance and use each judiciously for the appropriate infection rate we estimated in the previous section. we note that algorithm 0 is the natural approach that has been recently used in nebraska with pool size s=5 without optimizing the choice of the pool size to reduce the number of tests. to be fair to this algorithm, sensitivity of testing in very large pools may be reduced without proper optimization and therefore we present algorithm 0 (small pool size) below for completeness (fig 2) . . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10. 1101 our first new identification algorithm (pia1) is given in fig 3. analysis: the analysis of algorithm pia1 is given in the short summary below. • the probability that a pool contains 0 positives is . as an example consider this analysis with n = 10000, k = 100, and s=50. this is roughly a population of a small town with infection rate 0.01. we test 200 pools to start. on average 120.85 pools will contain 0 positives, 61.34 will contain exactly 1 positive, and 17.81 will contain more than 1 positive. we apply the magic trick algorithm on 79.15 pools of size 50, which takes 79.15log(50) = 474.9 tests. we check the results of the magic trick algorithm with 79.15 · 2 = 158.3 tests. then we test the remaining 890.5 people. on average, the number of tests performed is 1723.7. on average, 120.85*50 = 6042.5 people are classified in the 1st round, 61.34*50 = 3067 people are classified in the 3rd round, and the remaining 890.5 people are classified in the 4th round. on average, we obtain an individual's test result in 1.88 rounds. we might test the remaining people recursively, but that is harder to analyze because the expected number of remaining tests is not linear in the number of remaining people. recursion also increases the average and worst-case time to obtain an individual's test result. we would therefore use smaller pools than 50. for example, if we use pool size 25, we make only 1278.54 tests on average. in fact, pool size 24 is optimal. we calculated optimal pool size for several (n,k) pairs. empirically, optimal pool size seems to depend primarily on k/n. when k/n = 0.01, optimal pool size is 24 or 25 in the examples we tried. we now improve on the previous algorithms described in this section and present an improved probabilistic identification algorithm 2 (pia2) for a specific range of infection rate. the algorithm and its analysis are given below in figure 4 . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . run algorithm 2.5 on s to identify two people y and z test y, z, and the analysis of the pia2 algorithm is provided in the list below. • the probability that a pool contains 0 positives is . to summarize, on the average, the number of tests performed by algorithm 2 is: example 1: consider: n = 10,000 and k ≤ 100. pia2 with s=32 is the best choice. the expected number of tests is 1107.27. for comparison, algorithm 0 performs 1956.59 tests on average with its optimal pool size. example 2: n = 100,000 and k ≤ 100. pia2 with s=88 is the best choice. the expected number of tests is 2074.99. for comparison, pia0 performs 6276.37 tests on average. we now provide a deterministic identification algorithm applicable to populations with small infection rate (fig 5) . . a brief outline of the efficiency of the methods is provided in the list below. • identification given k = 1: ⌈logn⌉ tests in parallel (magic trick algorithm) • identification given k ≤ 1: ⌈log(n+1)⌉ tests in parallel (left to the reader) • identification given k ≤ 2: 2.5logn − 1 tests in loglogn − 1 rounds (described below) . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. analysis: let t(n) denote the number of tests made by da2.5. assume logn is a power of 2. • ) + 1 (n) t ( t ≤ 2 √n • t(4) = 4 by solving the recurrence we find t(n) ≤ 2.5logn − 1 when logn is a power of 2. example. what is t(36)? • t(9) = t(3) + t(3) + 1 = 7 • t(12) = t(3) + t(4) + 1 = 8 • t(36) = t(3) + t(12) + 1 = t(4) + t(9) + 1 = 12 note that t(36) < 2t(6) + 1. additional examples: • t(16) = t(4) + t(4) + 1 = 9 • t(32) = t(2) + t(16) + 1 = t(3) + t(12) + 1 = t(4) + t(9) + 1 = 12 . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . we first present a few empirical findings of approximate counting algorithms for infection rate estimation that generally support our ability to produce relatively accurate estimates using the methods provided in the previous section. : this graph displays the nearly linear relationship between k and e(2 y ). in order to exploit this relationship in practice, we simply run a program that calculates e(2 y ) for a known value of n and various values of k, and then perform a linear regression. figures 3-4 display the comparative accuracy of our pooling algorithms for estimating infection rates. in particular, figure 4 demonstrates the reduction in variance obtained by running aca1 multiple times. in order to estimate the infection rate we run aca1 m times. let w be the average of those results. the number of infected individuals k is linear in 2 w . we compute a linear regression to determine constants such that k ≈ a2 w + b. then the infection rate, p, is approximately (a2 w +b)/n. we refer to this algorithm as maca1. (in practice it is better to perform linear interpolation using two adjacent data points. the linear formula, however, is needed in order to estimate variance.) . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020 . . https://doi.org/10.1101 we now summarize the results of our probabilistic identification algorithms that we refer to as pia#. we start by providing a sample of our empirical findings using a few selected examples in table 1 . the full graph is provided in figure 9 for a relatively large population size. population size best algorithm p = 0.061 1000 ≤ n ≤ 10 6 pia2 p = 0.062 5500 ≤ n ≤ 10 6 pia1 p = 0.069 1000 ≤ n ≤ 10 6 pia1 p = 0.07 1200 ≤ n ≤ 10 6 pia0 p = 0.3 40 ≤ n ≤ 10 6 pia0 p = 0.31 300 ≤ n ≤ 10 6 individual testing (n tests) in figure 9 , we graphed the behavior (expected number of tests) of pia0, pia1, and pia2 when n = 10 6 and 0 < p < 0.07. we also observe that the optimal pool size seems to depend mostly on p. an information-theoretic lower bound on the number of tests required is . when p ≥ 0.005, we found that the number of tests performed is usually less then log ( ) k n . .4 1 log ( ) k n figure 9 : the expected number of tests performed by different algorithms for different infection rates p. pia2 is graphed in yellow. pia1 in red and pia0 in blue. pia2 generally outperforms the other algorithms when p ≤ 0.06. in this graph n = 1,000,000. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10.1101/2020.05.22.20110585 doi: medrxiv preprint the covid-19 pandemic produced a very high toll worldwide and created technological, biomedical and clinical challenges. in this paper we described a cost effective application of approximate counting methods to estimate infection rates. these methods can help make crucial and rapid policy decisions with reduced investment, especially in early or late stages of the epidemic or in underserved communities, e.g., rural counties or third world countries. we also reviewed several identification algorithms and introduced new ones that provide good to modest improvements in the number of tests for different infection rates. in some cases our methods reduce the number of tests by two fold which is significant. estimating the infection rate without identification has many applications. these estimates can help policy makers determine appropriate guidelines for opening or closing the economy or implementing different types of social distancing procedures. the new method lends itself naturally to performing a very small number of tests on autopsy samples and is potentially useful in counting the number of individuals who were infected prior to death. we also note that our proposed counting method can speculatively be extended to large scale vaccine trials. a large population can be vaccinated and another population can be used for placebo. we can use our cost efficient estimates to compare the rates in these two cohorts. in fact, we expect our variance estimates to be even better than we report above. our specific identification procedures of infected individuals improved on the number of tests that must be conducted saving in costs, especially when the infection rates are low and most tests are negative. notably, we use a very small number of rounds as compared to the best competing algorithms, enabling quicker turnaround. in this paper we primarily focused on using established methods such as pcr or isothermal amplification that have been shown to produce the most reliable diagnostics in the past. we have not considered multiplex-pcr as an alternative. there are newer diagnostic procedures based on crispr (cas-13) under development. these methods vary in sensitivity, availability and cost. in theory, it is also possible to barcode every sample with a genomic tag, pool the tagged sequences into a very large multiplexed dna sample and submit the sample for high throughput sequencing. standard procedures would allow us to read the sequences and allocate any detected viral dna to the appropriate individual using tags. error correcting procedures can be deployed to produce an efficient library of tags. however, high throughput sequencing presents its own challenges and therefore we focused on pooling samples and testing using more traditional approaches. these established methods might also be easier to deploy in third world countries or rural communities that do not have access to high throughput solutions. this work can be extended in multiple useful directions both mathematically and technologically. it is highly timely for a careful consideration and possible deployment given the expected flattening of the infection curve as we approach the summer of 2020 and the potential need to detect the disease in the fall of 2020. . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10. 1101 the formulas in this section were used in 1. proving our theoretical results about expected value and variance of 2 y and 2 w 2. creating tables of e[2 y ] for a given n (and all k) and e[2 w ] for given n and m (and all k) 3. creating tables of e[4 y ] and e[4 w ] the tables of e[2 y ] and e[2 w ] make it possible to estimate k via linear interpolation. the tables are also used with linear regression to obtain our empirical findings about expectation and variance. formulas used with aca1 formulas used with maca1: • for any c > 0, [c ] (1 c )p ) . cc-by-nc-nd 4.0 international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may 24, 2020. . https://doi.org/10. 1101 one test. this is how you get there evaluation of group testing for sars-cov-2 rna survey, foundations and trends in communications and information theory an optimal procedure for gap closing in whole genome shotgun sequences multi-node graphs: a framework for multiplexed biological assays muplex: multi-objective multiplex pcr assay design rapid molecular detection of sars-cov-2 (covid-19) virus rna using colorimetric lamp yinhua zhang massively multiplexed nucleic acid detection using cas13 the complexity of approximate counting counting large numbers of events in small registers key: cord-328294-gii1b7s7 authors: doty, richard l.; mishra, anupam title: olfaction and its alteration by nasal obstruction, rhinitis, and rhinosinusitis date: 2009-01-02 journal: laryngoscope doi: 10.1097/00005537-200103000-00008 sha: doc_id: 328294 cord_uid: gii1b7s7 the sense of smell has been largely ignored by otorhinolaryngologists, even though 1) its medical stewardship falls within their specialty's purview, 2) olfactory dysfunction is not uncommon in the general population, and 3) disorders of olfaction have significant quality of life, nutritional, and safety consequences. this report provides a succinct overview of the major intranasal neural systems present in humans (namely, cranial nerves o, i, and v, and the nonfunctional accessory [vomeronasal] organ system), along with a summary of notable findings resulting from the application of modern olfactory tests to patient populations, emphasizing diseases of the nose. such tests have led to the discovery of significant influences of age, gender, smoking, toxic exposure, and genetics on the ability to smell. within the field of otorhinolaryngology, they have revealed that 1) surgical and medical interventions in patients with rhinosinusitis do not, on average, lead to complete recovery of olfactory function, despite common beliefs to the contrary, and 2) associations are generally lacking between measures of airway patency and olfactory function in such cases. these findings have thrown into question the dogma that olfactory loss in rhinosinusitis is attributable primarily to blockage of airflow to the receptors and have led to histopathological studies demonstrating significant olfactory epithelial compromise in sinonasal syndromes. the sense of smell largely determines the flavor of foods and beverages and serves as an early warning system for the detection of environmental hazards, including spoiled foods, leaking natural gas, smoke, and various airborne pollutants. this primary sensory system contributes significantly to the quality of life, allowing for the full appreciation of flowers, perfumes, spices, and a vast array of foods and beverages, as well as the seashore, the mountains, and the seasons of the year. thus, it is no wonder that losses or distortions of smell sensation are of considerable significance to patients, particularly those dependent on this sense for their livelihood or safety (e.g., cooks, homemakers, plumbers, firefighters, perfumers, fragrance sales persons, wine merchants, food and beverage distributors, and employees of numerous chemical, gas, and public works industries). indeed, altered smell function can adversely influence food preferences, food intake, and appetite. in this report, we review the influences of nasal obstruction, rhinitis, and rhinosinusitis (as well as well as their medical and surgical treatments) on the ability to smell. because this neglected sensory system receives so little attention in most medical textbooks, including those of clinical allergy, otolaryngology, neurology, and immunology, an overview of olfactory anatomy, physiology, and measurement is also presented. in humans, three specialized neural systems are present within the left and right nasal chambers: 1) the main olfactory system (cranial nerve i [cn i]), 2) the trigeminal somatosensory system (cranial nerve v [cn v]), and 3) the nervus terminalis or terminal nerve (cranial nerve o [cn o]). cn i mediates odor sensations (e.g., chocolate, strawberry and apple), whereas cn v mediates, through both chemical and nonchemical stimuli, somatosensory sensations, including those of burning, cooling, irritation, and tickling. the coolness of menthol and peppermint are mediated by cn v, as, for example, are the sharp sensations induced by ammonia vapors and various acids. the function of cn o, a ganglionated neural plexus that spans much of the nasal mucosa before traversing the cribriform plate to enter the forebrain medial to the olfactory tract, is unknown in humans. this nerve, whose disruption in some rodents alters reproductive behavior, 1 was discovered after the other cranial nerves had been named and is highly conserved among the vertebrates, including humans. 2, 3 despite the fact that nearly all adult humans possess, in the lower recesses of each nasal chamber, a rudimentary vomeronasal (jacobson's) organ (vno) and a vno duct approximately 15 to 20 mm from the posterior aspect of the external naris, they lack an accessory olfactory bulb, a structure necessary for its function. thus, in adult humans this system is nonfunctional, and no neural connection from the vno to the brain has been established. 4 nonetheless, local electrophysiological responses have been recorded within the human vno lumen. 5 the olfactory neuroepithelium, which harbors the sensory receptors of the main olfactory system and some cn v free nerve endings, lines the upper recesses of the nasal chambers, including the cribriform plate, superior turbinate, superior septum, and sectors of the middle turbinate. this epithelium loses its general homogeneity postnatally, and as early as the first few weeks of life metaplastic islands of respiratory-like epithelia begin to appear, presumably as a result of insults from environmental viruses, bacteria, and toxins. such islands increase in extent and number throughout life. surprisingly, the exact size of the olfactory neuroepithelium in humans is still not well established, and there is recent suggestion that it may extend further onto the middle turbinate than previously believed. on the basis of morphological and biochemical criteria, the mature olfactory epithelium comprises at least six distinct cell types (fig. 1) . 6 the first, the bipolar sensory receptor neuron, is estimated to number approximately 6,000,000 cells in the adult, exceeding the number of receptor cells in any other sensory system except vision. the olfactory receptors are located on the ciliated dendritic ends of these cells, whose surface area probably exceeds 22 cm 2 in the human. the receptor cell axons coalesce into ϳ40 bundles (termed the olfactory fila), which are ensheathed by schwann-like cells. the fila traverse the cribriform plate of the ethmoid bone to enter the anterior cranial fossa and collectively constitute cn i. the second cell type, positioned near the surface of the epithelium, is the microvillar cell. these cells are said to number approximately 600,000 in the adult. each microvillar cell, whose function is unknown, contains microvilli. the third cell type, the supporting or sustentacular cell, also projects microvilli into the mucus. these cells are believed to 1) insulate the receptor cells from one another, 2) regulate the local ionic composition of the mucus, 3) deactivate odorants, and 4) help protect the epithelium from damage from foreign agents. the supporting cells contain xenobiotic-metabolizing enzymes (e.g., cytochrome p-450), a feature shared with the fourth cell type, the cell that lines the bowman glands and ducts. the bowman glands are a major source of mucus within the region of the olfactory epithelium. the fifth and six cell types are the globose (light) basal cell and horizontal (dark) basal cell, cells that are located near the basement membrane from which the other cell types arise. the same type of basal cell, probably a globose cell, can give rise to both neurons and nonneural cells when the olfactory epithelium is damaged, expressing a multiple potency rarely observed in stem cells. it is noteworthy that the olfactory ensheathing cells, which form the bundles of axons that make up the olfactory fila, enhance remyelination and axonal conduction in demyelinated spinal tract nerves, as well as in severed rat sciatic nerves, 7 exhibiting both schwann celllike and astrocyte-like properties. the cilia of the olfactory receptor cells lack dynein arms (hence, intrinsic motility). odorant transport through the mucus to the cilia is aided by "odorant binding proteins." approximately 1000 classes of odorant receptors are currently believed to exist, reflecting the expression of the largest known vertebrate gene family, a family accounting for approximately 1% of all expressed genes. in general, the olfactory receptors are linked to the stimulatory guanine nucleotide-binding protein g olf . when stimulated, they activate the enzyme adenylate cyclase to produce the second messenger adenosine monophosphate (camp) and subsequent events related to depolarization of the cell membrane and signal propagation. although a given receptor cell seems to express only one type of receptor derived from a single allel, each cell is electrophysiologically responsive to a wide but circumscribed range of stimuli. this implies that a single receptor accepts a range of molecular entities and that coding occurs via a complex cross-fiber patterning of responses. the olfactory bulb is a complex processing center, receiving both afferent and efferent input. this ovoid structure has clear concentric layers discernible using light microscopy. the layers are, in succession, the outermost olfactory nerve layer, the glomerular layer, the external plexiform layer, the mitral cell layer, the internal plexiform layer, and the innermost granule cell layer. in the human, the receptor cell axons of the olfactory fila, after traversing the cribriform plate, form the olfactory nerve layer and synapse in the second bulbar layer within the spherical glomeruli. in general, receptor neurons expressing a given receptor type project to one or, at most, two glomeruli, making the glomeruli in effect functional units. thus, a given odorant activates a spatially defined or restricted set of glomeruli. hence, the olfactory code is reflected, at this early stage, not only as different patterns across the mucosa, but across the glomeruli as well. the major second-order neurons of the olfactory bulb (the mitral and tufted cells) project their axons centrally to elements of the olfactory cortex. the olfactory cortex comprises 1) the anterior olfactory nucleus ([aon], which in the human has a large segment in the posterior olfactory bulb), 2) the olfactory tubercle (poorly developed in humans), 3) the prepiriform cortex, 4) the lateral entorhinal cortex, 5) the periamygdaloid cortex, and 6) the cortical nucleus of the amygdala. the afferent olfactory signal is modulated at all levels of the system, from the olfactory bulb to the olfactory cortex. olfaction is unique in that information from the olfactory bulb goes directly to cortical structures without passing through the thalamus. however, thalamic connections are present for relays between various elements of the primary and secondary olfactory cortices. the most widely used tests for assessing the ability to smell are those of odor threshold and odor identification. because these are the only tests routinely used in clinical settings, and because such tests are available commercially, the current discussion focuses on these measures. the reader is referred elsewhere for discussions of the comparative reliability, sensitivity, and validity of various types of modern olfactory tests. 8, 9 olfactory threshold tests the lowest concentration of an odorant that can be reliably detected is termed the detection or absolute threshold. usually, at lower perithreshold odorant concentrations, no odor quality can be discerned, only something different from air or the comparison diluent blank or blanks. in modern olfactory detection threshold testing, the subject is asked to report which of two or more stimuli (i.e., an odorant and one or more blanks) smells strongest, rather than to simply report whether or not an odor is perceived. such "forced-choice" procedures are less susceptible to contamination by response biases (e.g., the conservatism or liberalism in reporting the presence of an odor under uncertain conditions) than non-forced-choice procedures. in addition, they are more reliable and produce lower threshold values. 8 the instructions provided to a subject are critical in measuring a detection threshold because, if the subject is instructed to report which stimulus produces an odor rather than which stimulus is stronger, a spuriously high threshold value may result because the subject's attention is diverted away from subtle differences in the presented stimuli (odor quality is present only at higher perithreshold concentrations). the recognition threshold is the lowest concentration where odor quality is reliably discerned. however, it is nearly impossible to control criterion biases in recognition threshold measurement. thus, in a forced-choice situation, guesses are not randomly distributed among alternatives, potentially leading to a spuriously low recognition threshold for the preferred alternative. a classic example of this problem comes from taste psychophysics, in which some subjects report "sour" much more frequently than the other primary qualities in the absence of a clearly discernible stimulus, resulting in a erroneously low sour taste recognition threshold measure. two types of threshold stimulus presentation procedures have received the most use in modern times: the ascending method of limits procedure (ams) and the single staircase procedure (ss). in the aml procedure, an odorant (and comparison blanks) is sequentially presented from low to high concentrations and the point of transition between detection and no detection is estimated. in the ss method, the concentration of the stimulus is increased following trials on which a subject fails to detect the stimulus and decreased following trials in which correct detection occurs. an average of the up-down transitions ("reversals") is used to estimate the threshold value. in both the aml and ss procedures, the direction of initial stimulus presentation is made from weak to strong in an effort to reduce potential adaptation effects of prior stimulation. in general, the ss procedure is preferred to the aml procedure because it is more reliable, since most investigators who employ the aml technique present only a single ascending stimulus series. unfortunately, widespread use of the single-series aml procedure has led to the erroneous conclusion that threshold measures exhibit a high degree of intrasubject variability, a conclusion not borne out when thresholds are determined using the ss procedure. 8 a modern, commercially available threshold test kit that employs an ss procedure is shown in figure 2 . this kit uses squeeze bottles containing various half-log step concentrations of an odorant known to stimulate primarily cn i, namely, the rose-like smelling odorant phenyl ethyl alcohol (pea). norms based on hundreds of subjects spanning the entire age range allow for the practical application of this test in medical and industrial applications. the development and proliferation of easy-to-use, self-administered "scratch and sniff" tests of odor identification have significantly increased our understanding of smell function in humans, including the influences of such factors as age, gender, exposure to toxic agents, smoking behavior, and various disease states. such quantitative tests, derived from test measurement theory, focus on the comparative ability of individuals to identify a number of odorants at the suprathreshold level. the most popular of these tests are the 40-odorant university of pennsylvania smell identification test ([upsit], known commercially as the smell identification test™ [or sit]) 10, 11 ; the 12odor brief-smell identification test ([b-sit], also known as the cross-cultural smell identification test™), 12 and the 3-odor pocket smell test™ (pst) (sensonics, inc., haddon heights, nj). 8 the upsit has been used most widely, having been administered to approximately 200,000 people in europe and north america since 1985. this test, shown in figure 3 , employs norms based on nearly 4000 persons and is available in english, french, german, and spanish language versions. for a given item, the patient simply scratches open a microencapsulated label with a pencil tip, smells the label, and signifies the odor quality from four alternatives provided. even if no smell is perceived, a response is required (i.e., the test is forced-choice). in addition to indicating the level of absolute smell function (i.e., normosmia, mild hyposmia, moderate hyposmia, severe hyposmia, total anosmia), this test provides a percentile rank for each age and gender group. malingering is detected on the basis of improbable responses. in general, when equated for test length, tests of odor identification are more reliable than tests of odor detection threshold and require less administration time. furthermore, most identification tests can be self-administered and tend to correlate better with a patient's complaint than measures of detection threshold. nonetheless, tests of odor identification and detection are typically correlated with one another. 8 among major nonclinical findings derived from modern sensory tests, primarily the upsit, are the following: first, the ability to identify odors has a strong genetic basis, as determined from twin studies 13, 14 ; second, women, on average, are better able than men to identify odors, and this superiority is noticeable as early as 4 years of age and is culture independent [15] [16] [17] [18] ; third, significant loss of olfactory function occurs after the age of 65 years, with more than half of persons between 65 and 80 years of age and more than three-quarters of those 80 years of age and older having such loss 16, 18, 19 ; fourth, women, on average, retain the ability to smell longer than men 16 ; fifth, the decreased smell ability associated with smoking is present in prior cigarette smokers, and recovery to presmoking levels, while possible, can take years, depending on the amount and duration of prior smoking 20 ; and sixth, olfactory function is compromised in urban residents and in workers in some industries, including the paper and chemical manufacturing industries. [21] [22] [23] [24] [25] clinical studies employing such methodology during this period have found decreased smell function relative to matched controls in dozens of diseases and disorders (see table i ). the straightforward ability to quantify olfactory function, along with recent advances in in vivo medical imaging, has made it possible to better understand the physiological basis of a number of chemosensory deficits. for example, it is apparent today that congenital anosmia is nearly always associated with markedly deformed or absent olfactory bulbs and stalks. furthermore, head trauma-related smell loss is typically accompanied by decreased bulb and tract size that presumably reflects mitigation of trophic factors from the olfactory receptor neurons, which are often sheared off or otherwise altered in head trauma. the smell loss associated with chronic alcoholism has been found to be correlated with magnetic resonance imaging (mri)-determined 1) increased cortical and ventricular cerebral spinal fluid volumes and 2) reduced volumes of the thalamus and of cortical and subcortical gray matter. 26 the smell loss of multiple sclerosis is directly associated with the number of active plaques in central brain regions, 27, 28 and that of schizophrenia with diminished olfactory bulbs and tracts. 29 the presence of hypertrophied adenoid tissue can significantly block the nasal airflow of children whose airways are otherwise patent. crysdale et al. 30 noted a 43% reduction in nasal resistance following adenoidectomy in a group of 67 children ranging in age from 4 to 17 years before surgery, and fielder 31 reported a 19% postoperative reduction in such resistance in a group of 19 children admitted for adenoidectomy and myringotomies (with or without tonsillectomy) who had at least 1 g of adenoid tissue removed. in 1983, ghorbanian et al. 32 evaluated the degree to which nasal obstruction influences the olfactory sensitivity of children. this work, which has been subsequently replicated by others, 33 determined phenyl ethyl alcohol detection thresholds in 65 children with varying degrees of nasal obstruction and in 13 children without such obstruction. the threshold values were directly related to clinical ratings of the degree of nasal obstruction. these findings, shown in figure 4 , suggest that in this subject population the degree of nasal obstruction is associated with commensurate impairment in the ability to smell and that reduction in the degree of nasal obstruction results in commensurate recovery of smell function. it is well documented that the common cold can result in permanent loss of smell function. however, such loss usually occurs in later life after the olfactory membrane has presumably undergone considerable cumulative damage. for this reason, temporary smell loss following an upper respiratory infection is much more common. in general, virus-related acute rhinitis or rhinosinusitis follows three predictable phases; namely, a predromal phase, a cathartic phase, and a viscous phase. 34 the predromal phase is characterized by sweating, shivering, headaches, loss of appetite, and other nonspecific feelings of being ill. during this phase, tickling, burning, or dryness within the nose is common and the mucosa typically appears pale. the cathartic phase follows a few hours after the predromal phase and is characterized by increased mucosal redness and swelling, nasal obstruction, and secretion of watery mucus. a few days later, during the viscous phase, the nasal secretions thicken and the intensity and frequency of the aforementioned decline, disappearing after about a week. two studies have quantitatively assessed olfaction following the onset of the common cold, with an attempt to establish whether changes in smell function are coincident with changes in nasal congestion and secretion. in the first of these studies, akerland et al. 35 measured 1-butanol odor detection thresholds in a group of student volunteers before and 4 days after nasal inoculation with the coronavirus 229e. the nine individuals who developed a cold had impaired olfactory thresholds on the postinoculation test relative to the controls. whereas the change in smell function correlated with the degree of nasal congestion, it did not correlate with the amount of nasal discharge. the second study on this topic led to the conclusion that the common cold may, in fact, affect olfactory function independent of nasal congestion. 34, 36 in this experiment, whose main purpose was to evaluate the potential doserelated effects of oxymetazoline (administered unilaterally) on olfactory function, both psychophysical (intensity ratings, odor discrimination, butanol detection threshold) and electrophysiological (event-related potentials to h 2 s and co 2 ) data were obtained. nasal volume was assessed by acoustic rhinometry. thirty-six subjects (18 women, 18 men) were evaluated soon after they experienced the natural onset of a cold. after rhinitis onset (day 0), sensory and airway measurements were obtained on days 2, 4, 6, and 35. the cold produced a decrease in the volume of the anterior nasal cavity and an increase in mucus secretion, an increase in olfactory thresholds, a decrease in intensity ratings, and a decrease in n1 evoked potential amplitudes to both olfactory and trigeminal stimuli. when mucus secretion of the contralateral nasal cavity was controlled with oxymetazoline, n1 amplitudes to olfactory stimuli were still affected by the cold, as indicated by the significant increase of amplitudes as subjects recovered; however, this phenomenon was not found for any of the other test measures or for the responses to the trigeminal stimuli. overall, the olfactory test scores tended to improve during the viscous phase. a number of studies have sought to determine the influences of acute or chronic rhinitis on olfactory function. some such studies have differentiated between rhinitis and secondary nasal and sinus disease (i.e., sinusitis and/or polyposis), although this distinction is often difficult to maintain. currently, the term rhinosinusitis is preferable to the term sinusitis because invariably inflammation of the sinuses coexists with inflammation of the nose. rhinosinusitis can be further divided into acute, subacute, recurrent acute, and chronic types, during which acute exacerbation of chronic symptoms can occur. however, no studies have comparatively evaluated olfactory function in these various forms or stages of nasal disease. nonetheless, as described below, there is strong suggestion from numerous quarters that the degree of olfactory loss is correlated with disease severity. among the first nonquantitative studies in the english literature on olfaction in rhinosinusitis and/or polyposis were those of hotchkiss 37 in the mid 1950s and fein et al. 38 in the mid 1960s. hotchkiss evaluated selfreported olfactory function in 30 patients with nasal obstruction secondary to polyposis who reported smell loss. all were treated with a total dose of 70 mg of prednisone over a 6-day period. restoration of smell was said to follow the systemic steroid therapy, with the magnitude of the restoration being proportional to the amount of polyp shrinkage. the restoration was reportedly unrelated to the duration of the loss of olfactory function. however, the self-rated improvement lasted only (on average) 10 days after the discontinuation of therapy. fein et al. examined 18 patients who reported loss of smell associated with allergic rhinitis. of these patients, 14 had other diseases, including sinusitis, polyposis, and bronchial asthma. again, no objective sensory testing was performed. on the basis of self-report, the severity of the smell dysfunction was classified as mild, moderate, or severe. of the four patients who had only allergic rhinitis, two were said to have had mild smell loss, and two, moderate smell loss. of the 14 patients with other diseases, two reportedly had mild loss, six moderate loss, and six severe loss. in the latter group, severe loss was said to be associated with the presence of both polyposis and sinusitis. although improvement in some of the subjects from hyposensitization, antibiotics, polypectomy, or various combinations of treatments was noted, a lack of a well-defined experimental protocol employing quantitative olfactory tests and the introduction of the treatments in various combinations without control for their order or time precludes a determination of the relative efficacy of the interventions. more recently, tos et al. 39 had 91 patients with polyposis rate their olfactory function on a 0 -3 scale (0 ϭ normal, 1 ϭ slight impairment, 2 ϭ moderate impairment, and 3 ϭ anosmia) before and after 6 weeks of twice-daily nasal corticosteroid treatment. before treatment, the mean rating of the 44 patients who were to receive the corticosteroid sprays was 2.09, whereas that of the 47 patients who were to receive the placebo was 2.19. following treatment, the self-ratings were 1.86 for the corticosteroid-treated subjects and 2.19 for the placebotreated subjects. even though a statistically significant change occurred in the ratings after the administration of the corticosteroid, the degree of average self-rated im-provement was not marked and, for all practical purposes, moderate loss of smell function was still reported. in contrast to these largely subjective reports are a number of studies, most appearing since 1990, that have quantitatively assessed smell dysfunction in patients with rhinosinusitis. among the first of these studies was that of goodspeed et al. 40 in this work, systemic prednisone 50 mg was administered each day for 7 days to 20 anosmic or severely hyposmic patients of several types whose olfactory function was monitored using butanol thresholds and odor identification tests. loss of smell function following the cessation of prednisone treatment was variable and was not quantitatively tested after the discontinuance of the therapy. another early empirical study 41 sought to determine the efficacy of flunisolide nasal spray in restoring olfactory function in a selected set of patients with perennial rhinitis and nasal polyposis. in this report, flunisolide and nasal decongestant sprays were introduced, with the decongestant being discontinued a week later. the olfactory testing was performed at home, and the selfadministration of the nasal sprays was performed in the moffett position to enhance delivery. daily subjective ratings and a self-administered smell test revealed a return of smell function to the mid hyposmic range in five of the seven patients after approximately 2 weeks of the flunisolide treatment. the first large-scale empirical study of olfaction in allergic rhinitis was that by cowart et al. 42 in 1993. quantitative detection threshold measures for the rose-like smelling odorant phenyl ethyl alcohol were obtained in this well-designed and carefully executed study from 91 patients with symptoms of allergic rhinitis and from 80 nonatopic control subjects. the allergy patients exhibited significantly higher detection thresholds than did the controls, with 23.1% of the patients demonstrating a clinically significant loss (i.e., a threshold at or above the 2.5 percentile of control values). clinical or radiographic evidence of rhinosinusitis or nasal polyps or both in allergy patients was found to be associated with hyposmia: 14.3% of the allergy patients with no associated rhinosinusitis exhibited hyposmia, whereas 42.9% of the allergy patients with associated rhinosinusitis did so. no association between the smell test scores and nasal resistance was seen in either the patient or control groups, although nasal resistance was higher among patients than among control subjects. two years later, apter et al. 43 reported that 28 patients with chronic rhinitis and no associated polyposis or rhinosinusitis had an average olfactory test score (based on a composite of odor identification and detection tests) indicative of moderate hyposmia. thirty-four such patients with polyps and/or chronic sinusitis were found to be generally anosmic. these results were interpreted to mean that chronic rhinitis without associated sinusitis could result in some degree of olfactory loss, but that severe loss was usually associated with the presence of polyposis and/or rhinosinusitis. in the first study on this topic to employ the upsit, golding-wood et al. 44 evaluated olfactory function once before and once after 6 weeks of betamethasone treatment in 25 well-documented patients with perennial rhinitis. the patient group was initially divided into two groups: those who answered the question "is your sense of smell impaired?" affirmatively (n ϭ 15) and those who did not (n ϭ 10). the upsit scores of each of the 15 members of the former group were higher after the betamethasone treatment (respective group means [sd], 18.93 (9.4) and 33.4 (4.01)]. this was not the case for those who initially thought that they had no problems smelling (respective pretreatment/post-treatment means [sd], 33.40 (4.01) and 32.8 (4.94)]. as in earlier studies, however, the average post-treatment upsit score was still indicative of a mild hyposmic condition. in general, the upsit scores of the patients retained a similar rank order before and after treatment (spearman's correlation coefficient [r] ϭ 0.75). moderate correlations were found between the upsit scores and the self-ratings of olfactory function both before (r ϭ ϫ0.52) and after (r ϭ ϫ0.58) treatment. a year after this study, mott et al. 45 sought to determine the efficacy of topical corticosteroid nasal spray treatment after 8 weeks in severe olfactory loss associated with severe nasal and sinus disease. on average, the objective measures of olfaction improved significantly, and a decrease in the signs of nasal and sinus disease were noted on rhinoscopic evaluation. two-thirds of the patients noted a subjective improvement in smell function. these data, along with those of golding-wood et al., 44 imply that in many patients topical corticosteroid nasal spray, when administered in a head-down-forward position, mitigates, at least to some degree, the olfactory loss associated with severe nasal and sinus disease. in perhaps the most extensive study of olfaction in rhinitis and rhinosinusitis to date, simola and malmberg 46 compared odor detection thresholds obtained from 105 rhinitis patients to those of 104 healthy control subjects. age and rhinitis were found to be associated with smell dysfunction. both the proportion of hyposmic persons and the degree of the impairment of the sense of smell were significantly higher in the rhinitis group than in the control group. the nonallergic rhinitis patients' sense of smell was found to be poorer than that of the patients with seasonal or perennial allergic rhinitis. a history of operations for nasal polyposis was associated with hyposmia, but operations for chronic maxillary sinusitis were not. two other studies appeared more or less contemporaneously with the study by simila and malberg. in the first, kondo et al. 47 administered the upsit to 36 japanese patients with a history of sinusitis/polyposis and to 131 control subjects. the mean upsit score of the patients was significantly (p ͻ.001) lower (23.80, sd ϭ 7.12) than that of the controls (32.08, sd ϭ 3.57), despite some culture-related attenuation in the test scores of both groups. detection and recognition thresholds showed a similar association. as in the case of the study by golding-wood et al., 44 moderate correlations emerged between the odor test scores and the scores on a smell ability questionnaire (spearman's r ranging from 0.58 to 0.69). in the second study, apter et al. 48 assessed odor detection and identification in 1) 60 patients who presented to a smell and taste clinic with self-described olfactory loss and were found to have allergic rhinitis and 2) 30 patients with allergic rhinitis from an allergic clinic who had no chronic rhinosinusitis or polyposis. as might be expected, the patients presenting to the smell and taste clinic with olfactory dysfunction had significantly lower olfactory test scores than those who came from the allergy clinic and who were not specifically presenting with olfactory loss. in accord with the findings of several of the prior studies, olfactory function was inversely associated with the severity of the disease. however, no meaningful relationship was apparent between the visibility of the olfactory clefts (determined from endoscopic rhinoscopy) and smell function, regardless of the disease status. self-reported fluctuations in function were less frequent in the groups without chronic rhinosinusitis than in those with chronic rhinosinusitis and/or polyposis. interestingly, self-reported distortions in smell function were generally associated with a history of upper respiratory tract infections and were more apparent in individuals with less severe disease. duration of nasal symptoms alone was not meaningfully correlated with the degree of olfactory loss. recently, rydzewski et al. 49 assessed olfactory function using the elsberg blast-injection procedure in 240 patients with perennial rhinitis, seasonal rhinitis, and bronchial asthma. of their patients, 13.8% were hyposmic, and 7.6% anosmic. surprisingly, using electrogustometry, these authors found taste disorders in even a larger percentage of these patients (30.7%). the olfactory component of this work, however, must be viewed with caution in light of the methodological problems with elsberg olfactometry. this procedure has been criticized on numerous grounds, including 1) the lack of a forced-choice response, 2) the confounding of pressure with the number of molecules in the stimulus, 3) the introduction of a very unnatural stimulus pulse into the nose, and 4) the production of highly unreliable sensitivity measures. 50, 51 in aggregate, the studies reviewed above suggest that the degree of olfactory loss is usually associated with the severity of nasal sinus disease, with the greatest loss occurring in patients who have rhinosinusitis and polyposis. employing quantitative tests, smell function has been shown to improve in some patients following systemic administration of corticosteroids, as well as topical administration of corticosteroid sprays when administered in a head-down-forward position. nonetheless, no study has compared the latter mode of delivery to that of a standard mode, and no one has administered such drugs in a blind, placebo-controlled study. importantly, the limited data available suggest that only rarely has corticosteroid treatment restored function to normal levels, implying that either 1) some chronic permanent loss of olfactory function is present or 2) such treatments are not 100% effective in reversing the disease processes responsible for the olfactory loss. interestingly, no study has been able to document in rhinitis patients an association between olfactory test scores and intranasal airway access factors save total or near-total blockage, whether measured by rhinoscopy, rhinomanometry, or acoustic rhinometry. there is currently considerable support for the hypothesis that factors other than, or in addition to, nasal airflow are responsible for many instances of smell loss in patients with rhinosinusitis, in support of the notion that chronic inflammation may be toxic to olfactory neurons. for example, kern 52 presented data, albeit preliminary, that the severity of histopathological changes within the olfactory mucosa of patients with chronic rhinosinusitis is positively related to the magnitude of olfactory loss, as measured by the upsit. in addition, authors have shown that olfactory secretions are probably regulated by both mineralcorticoids and glucocorticoids. 53, 54 feron et al. 55 reported, in a study group of 33 subjects, that nasal biopsy specimens from the posterior superior turbinate, posterior medial turbinate, and posterodorsal septum of patients with nasal disease were less likely to contain olfactory neuroepithelium than analogous biopsy specimens from patients with no such disease. lee et al. 56 have demonstrated that biopsy specimens from the region of the olfactory epithelium of anosmic patients with rhinosinusitis were less likely to contain olfactory epithelial tissue than those from rhinosinusitis patients who were not anosmic (27% vs. 61% positive biopsy results, respectively). although detailed examination by lee et al. of the epithelium from rhinosinusitis patients with normal smell function did reveal islands of respiratory-like epithelium interspersed throughout the biopsy samples, such islands were much less prevalent than in the anosmic patients for whom olfactory epithelium could be found. abnormalities in the arrangement of the epithelial cell types was common in the anosmic biopsy specimens, and in cases where olfactory epithelium was identified, it was typically atrophic and thin, often comprising mainly sustentacular and basal cells. hilberg 57 evaluated the effect of the oral antihistamine terfenadine (a histamine type 1 [h 1 ] blocker) on an allergen challenge in subjects with nasal allergy uncomplicated by polyposis and compared these results with those obtained using the topical steroid budesonide. although both drugs had an effect on the hay fever symptoms during the nasal pollen challenge, only the budesonide improved the challenge-related decrement in olfactory sensitivity. this steroid also was more effective in increasing nasal volume. however, the improvement in olfactory function occurred in less than half of the patients (7/17 [41%]). lane et al. 58 employed an abbreviated 20-item version of the upsit and acoustic rhinometry to assess olfactory and nasal function, respectively, in the immediate response to a nasal allergen challenge in eight pollensensitive subjects. a significantly greater decrease in the cross-sectional nasal airway measure occurred following allergen challenge relative to a control challenge (70% vs. 22%). as in the case of other allergic rhinitis and rhinosinusitis studies, no association was found between the olfactory and acoustic rhinometric test measures. despite the small sample size and the use of an abbreviated upsit, a modest decrease in odor identification performance was seen following the allergen challenge (16%, p ϭ .08). in 1997, hinriksdottir et al. 59 evaluated odor detection thresholds in 20 patients with known allergic rhinitis to birch pollen before and after a topically applied birch pollen challenge during a nonsymptomatic period. following the provocation, olfactory function decreased. the change in threshold was related to the measured amount of nasal secretion but not to the patients' report of nasal obstruction or measures of nasal resistance. analogous findings were subsequently noted in a study by klimek and eggers 60 in which measures of odor identification, discrimination, and detection threshold were obtained in 17 patients with allergic rhinitis to grass pollen. in this work, testing was performed preseasonally and 3, 7, 14, and 21 days into the grass pollen season. after 2 weeks of pollen exposure, most subjects were hyposmic; by 3 weeks, all patients, without exception, had mild to severe hyposmia. in another study examining patients with grassrelated allergic rhinitis, moll et al. 61 examined the same olfactory measures as those used by klimek and eggers 60 in 28 patients with allergic rhinitis to grass pollen preseasonally and 3 weeks into the grass pollen season. in addition, 47 patients with allergic rhinitis to mites and 66 healthy control subjects were evaluated on a single test occasion. the mite-sensitive patients performed more poorly than the controls on all three olfactory tests. however, they outperformed the grass-sensitive patients (tested preseasonally) on the odor detection threshold test, but not on the other two measures. the intraseasonal test results of the grass-sensitive patients were decreased for all measures relative to the preseasonal tests. nevertheless, the grass-sensitive patients in the preseason period performed more poorly than the controls only on the odor detection threshold test. the intraseasonal grasssensitive patients outperformed the mite-sensitive patients on the identification and detection threshold tests, but underperformed the mite-sensitive patients on the odor discrimination test. this finding is paradoxical, however, because these three types of olfactory measures are typically positively correlated in a wide range of test situations. the authors of the study concluded, "therefore, the different kind of allergen exposure seems to result in a different pattern of allergic olfactory dysfunction." to our knowledge, only two studies have sought to determine empirically whether septoplasty improves olfactory function, 62,63 despite the widespread use of this procedure by otolaryngologists attempting to correct smell deficits. in the first of these studies, stevens and stevens 62 measured the olfactory thresholds of 100 patients before and after surgery. of the 100 patients examined, the primary surgical procedure of 63 patients was nasal septoplasty; of 24, septorhinoplasty; of 3, turbinate resection; and of 10, polypectomy. although the authors concluded that the surgical procedures, including septoplasty, improved olfactory function, the data for each type of operation was not provided separately, and their general conclusion is weakened by methodological considerations. in addition to not having a control group to examine the influences of repeated testing on the dependent measure, the questionable elsberg blast injection procedure 64 was used to determine olfactory sensitivity. in the second study to provide data on this topic, kimmelman 63 administered the upsit before and after septoplasty to 34 patients, 31 of whom had septal deformity and 3, nasal septal perforations. again, no control group for repeated testing effects was provided, although it is known that upsit scores on average change little on repeated testing. the mean (sem) upsit scores of these largely normally functioning patients were essentially equivalent before and after the operation (36.0 [0.4] versus 35.8 [0.4] ). in perhaps the first published study to specifically address the effects of rhinoplasty on olfactory function, champion 65 questioned 200 patients who had undergone rhinoplasty about their ability to smell. ten percent of patients reported temporary anosmia lasting from 6 to 18 months after the operation, and all apparently reported regaining normal smell function. because no empirical olfactory testing was performed, the accuracy of these observations is unknown. two years later, using ground coffee, oil of peppermint, and oil of clove as test stimuli, goldwyn and shore 66 performed both preoperative and postoperative olfactory tests on 64 patients who had undergone rhinoplasty alone, 22 who had undergone rhinoplasty in combination with submucous resection, and 11 who had undergone submucous resection alone. in addition, 57 control subjects were tested. the subjects were simply asked to identify the odors that were presented. apparently, no clear benefits of the operations on smell function were found, as the findings of this study were interpreted as supporting champion's conclusion that none of these types of operations have any long-term deleterious influences on smell function. however, this work is severely limited by not differentiating between patient types and by the use of a brief non-forced-choice identification test. in 1994, kimmelman 63 administered the upsit before and 2 to 4 weeks after surgery to 15 rhinoplasty patients. a small but statistically significant increase in performance was noted postoperatively (respective preoperative and postoperative means [sem] ϭ 33.9 [0.5] and 35.7 [0.6]). however, again no control group was tested to what degree repeated testing, per se, may have accounted for this improvement. in 1989 gross-isseroff et al. 67 obtained threshold and upsit measures in children with choanal atresia before and after surgical repair at relatively advanced ages (8 -31 y). the three patients who had bilateral atresia had permanent olfactory deficits, whereas the one patient who had unilateral atresia appeared to have normal function. these findings suggest that early sensory exposure may be needed for the normal development of olfactory function, although, as the authors pointed out, the small number of cases involved necessitates additional research on this point. the most common operative procedures impacting on the ability to smell are performed in patients with chronic rhinosinusitis and/or polyposis after more conservative treatments (e.g., allergen avoidance, nasal corticosteroids) have failed. most recent studies have administered corticosteroids both preoperatively and postoperatively, although some have used such medication only after surgery, confounding the interpretation of the findings. given the variation in olfactory measurement techniques used in such studies, this section is divided into 1) studies that have employed the standardized upsit (and in some cases additional tests); 2) studies that have employed a standardized combination of identification, discrimination, and detection threshold procedures 68 ; and 3) studies that have used other types of olfactory tests. studies using the university of pennsylvania smell identification test. in perhaps the first report of the influences of nasal surgery on smell function of the modern era, jafek et al. 69 in 1987 noted dramatic improvement in upsit and butanol detection threshold scores in one patient 4 months after an intranasal sphenoethmoidectomy (and intranasal antrostomies) and a continued 5-mg-daily regimen of prednisone (upsit scores increased from 10 to 31, the latter still indicative of mild microsmia; threshold values decreased by 4%). in another patient, even greater improvement was evidenced on the apparently sole post-treatment test performed a year after bilateral intranasal sphenoethmoidectomy and a regimen of triamcinolone acetonide (upsit scores increased from 9 to 38, the latter being normal; threshold values decreased by 45%). the authors concluded that these patients had received no benefit from either prior surgery or corticosteroid treatment alone, noting that "the results of this report raise an interesting question: why was the combined treatment with corticosteroids and surgery effective in long-term reversal of anosmia, whereas individual treatment with either modality had proved ineffective?" quantitative testing had not been performed after the earlier surgeries, which were not as extensive as those subsequently performed by jafek et al., and the duration and dosage of prior steroid treatment were not noted. in 1988, seiden and smith 70 examined olfactory function in five patients before and after endoscopic intranasal surgery within the osteomeatal complex. specifically, endoscopic intranasal ethmoidectomy and antrostomy were performed. on average, the degree of smell loss before surgery was indicative of total anosmia (mean upsit score ϭ 15.8, sd ϭ 8.73), although apparently some individuals had moderate hyposmia. four weeks to 8 weeks after surgery, all five patients exhibited marked improvement in their olfactory function, which fell, on average, within the microsmic range (mean upsit score ϭ 33.4, sd ϭ 4.02). in 1994, eichel 71 administered the upsit before and after intranasal surgery to 10 patients complaining of anosmia who had advanced obstructive bilateral nasal polyposis and pansinusitis. the surgery included bilateral nasal polypectomies, bilateral sphenoethmoidectomies, and bilateral nasal antral windows. all patients received testing 6 and 12 months postoperatively; four received an additional test at 18 months. postoperatively, all were treated with a topical corticosteroid nasal spray. the surgery was associated with improved upsit scores in 7 of the 10 patients (respective median preoperative and 6and 12-mo postoperative upsit scores: 10.5, 28, and 25.5), although average postoperative function was in the severe microsmic range. kimmelman 63 administered the upsit to nine patients undergoing ethmoidectomy and nine patients undergoing polypectomy before and 2 to 4 weeks after their surgeries. a small nonsignificant increase in upsit scores was noted postoperatively in the ethmoidectomy group, although, as in the study of eichel, average postoperative performance was in the moderate (nearly severe) microsmic range (respective mean [sem] scores ϭ 25.56 [3.47] and 27.89 [3.13] ; p ϭ .07). although a statistically significant improvement in upsit scores occurred in the polypectomy group (p ϭ . el nagger et al. 72 assessed olfactory function in 29 patients with bilateral nasal polyps before and after a polypectomy. following the operation, the patients received a 6-week course of beclomethasone nasal spray (beconase) to one nostril only, with the other acting as a control. although the upsit scores were higher for most individuals on the postoperative than on the preoperative tests, the changes in the observed upsit scores were modest and essentially of the same order of postoperative severity as seen in the study of kimmelman. one arrives at the following preoperative and postoperative mean upsit scores, respectively, from the data presented by these authors in the first two of their figures: 17.08 and 19.84 for the beconase nostrils and 16.44 and 21.42 for the control nostrils. from this perspective, neither the operative procedure nor the beclomethasone spray had much of an effect on overall smell function which, on average, fell within the anosmic or severe microsmic range. lund and scadding 73 evaluated the olfactory function of 50 hyposmic (upsit scores ͻ31) patients with chronic rhinosinusitis for 3 months before and after endoscopic nasal surgery. the postsurgical evaluations were performed at 1 year. the endoscopic procedure included uncinectomy, anterior ethmoidectomy, and perforation of the ground lamella of the middle turbinate in all cases, with posterior ethmoidectomy, sphenoidectomy, clearance of the frontal recess, and enlargement of the maxillary ostium in some cases. intranasal steroids were used up to the time of the surgery and for at least 3 months afterward. significant preoperative/postoperative improvement in upsit scores and in threshold values were observed in this group of patients (respective mean upsit scores ϭ 19.5 and 25.0), although, again, on average, the postoperative upsit scores were indicative of marked microsmia. in 1996, downey et al. 74 administered the upsit before and after endoscopic sinus surgery to 50 patients with subjective anosmia and symptoms of progressive sinusitis. after surgery, 52% of patients self-reported significant improvement in smell and had higher upsit scores. of the remaining patients, some had intermittent improvement, but most remained hyposmic or anosmic despite clinically well-healed ethmoid surgical beds. a relationship was observed between upsit scores and the severity of the disease, as defined using the kennedy staging system. thus, the mean upsit scores were 35, 31, 26, and 23 for stages i to iv of the disease, respectively. disease extending beyond the ethmoids, as determined by preoperative computed tomography, was typically associated with persistent anosmia. recently, friedman et al. 75 administered the upsit to 50 patients before and 5 weeks after endoscopic sinus surgery with middle turbinate medialization and preservation. iatrogenic synechia formation was produced by initially abrading the caudal end of the middle turbinate and the opposing septal mucosa using a microdebrider. no statistically significant differences in preoperative/postoperative upsit scores were found (respective mean upsit scores ϭ 35.18 and 35.57), leading the authors to conclude that middle turbinate medialization has no discernible adverse effect on olfaction. test. in 1988, leonard et al. 76 administered odor detection and threshold tests to 25 patients known to have olfactory dysfunction. the tests were administered before and after unilateral or bilateral transantral ethmoidectomy. smell function reportedly returned to normal in nine of the patients in one or both nasal chambers after surgery (36%), whereas four evidenced mild hyposmia (16%), five moderate to severe hyposmia (20%), and the remainder no improvement (28%). surgery on one side of the nose appeared in some cases to improve smell function on the contralateral side. five years later, hoseman et al. 77 administered a "qualitative and a semiquantitative olfactory function test" to each side of the nose of 111 patients before and after intranasal surgery for chronic polypoid ethmoiditis. eighty-seven of these patients received a complete sphenoethmoidectomy, and 24 a partial resection of the ethmoidal cell system. the olfactory test comprised a nonforced-choice odor identification component, in which the subject was required to report the quality of vanillin (or cinnamon oil), peppermint oil, menthol, and acetic acid with "corrective feedback, when needed" and an odor detection threshold component. in the threshold component, non-forced-choice detection thresholds to three stimuli (phenylethanol, benzylacetate, and formic acid) were established. before surgery, 65% of the patients were reportedly hyposmic or anosmic, whereas after surgery only 8% were similarly smell deficient. no association was noted postoperatively between the size of the middle turbinate and smell ability. however, the authors concluded that their results largely reflected improvement of airflow to the receptors and that "an inflammatory affection of the sense organ itself could not be responsible [for the loss]." a year after the study of hoseman et al., 77 delank and stoll 78 evaluated odor detection thresholds to 2-phenylethanol and dimethyldisulfide before and after nasal endoscopic sinus surgery in 78 patients with chronic sinusitis with nasal polyposis. employing an ascending threshold procedure, they noted preoperative hyposmia or odor discrimination problems in 40% of their sample, and anosmia in another 36%. however, only 22% of their patients complained spontaneously of smell dysfunction. following endoscopic surgery, 71% of the smell-deficient patients reportedly improved. postoperative thresholds for 2-phenylethanol and dimethyldisulfide worsened in 9% of the patients. postoperative olfactory discrimination deteriorated in 11%. preoperative and postoperative olfactory function was not predictable in individual cases when nasal polyposis was limited. in a similar subsequent study, these authors extended the patient group to 115 patients with chronic sinusitis. 79 preoperatively, only about half of the patients (58%) were aware of or complained of an olfactory deficit. however, the threshold testing found 83% to be either hyposmic (52%) or anosmic (31%). after surgery, 70% of the patients reportedly exhibited some improvement in olfactory function; normosmia, however, was relatively rare, being achieved in only 25% of the hyposmic patients and 5% of the anosmic patients. the olfactory function of 8% of the patients was worse after surgery than before surgery. therefore, the authors concluded that "the prevalence of olfactory dysfunction in chronic sinusitis is preoperatively higher, and the rate of [postoperative] improvement is lower, than generally assumed." the authors also noted that "resections of the middle turbinate may have a negative effect on olfaction, due to damage to the olfactory fila or alteration of the normal aerodynamic pattern within the olfactory cleft." as noted by earlier investigators, the degree of olfactory dysfunction was associated with the degree of intranasal polyposis. min et al. 80 determined, in 1995, butanol thresholds before and after functional endoscopic nasal surgery in 80 patients with chronic sinusitis. patients with prior surgery, asthma, aspirin intolerance, nasal allergy, or cystic fibrosis were screened from the study group. in accord with other studies (e.g., downey et al., 74 ) preoperative dysfunction was associated, in general, with the severity of sinusitis (determined in this case from computed tomography scans of the ostiomeatal-unit complex). the percentages of persons with normosmia, hyposmia and anosmia before surgery were reported as 22%, 45%, and 33%, respectively. after surgery, these percentages were 36%, 48%, and 16%. although a postoperative average reduction in threshold values was noted, the postoperative mean threshold value remained within the range considered indicative of hyposmia. no association was present between the degree of postoperative olfactory improvement and either the severity or duration of the sinusitis. more recently, klimek et al. 81 tested the odor identification, discrimination, and identification ability of 31 patients with nasal polyps 1 to 3 days before endonasal polyposis surgery and six postoperative times thereafter (approximately 1 wk and approximately 1, 2, 3, and 6 months). on average, the olfactory function, as measured by all three tests, fluctuated postoperatively, with best recovery (i.e., mild hyposmia) occurring approximately 3 months after surgery. six months after surgery moderate hyposmia was noted to about the same degree as was observed before the surgery. the authors concluded: "this study demonstrates that olfactory function is impaired in patients with nasal polyps. endonasal sinus surgery might improve olfactory function with best results within 3 months after surgery." studies using ratings of self-perceived olfactory function. in 1997, jankowski et al. 82 asked patients to remember what their sense of smell was like before and after either 1) a radical ethmoidectomy in which all the bony lamellae and mucosa within the labyrinth were removed, including a large antrostomy, sphenoidotomy, frontal sinusotomy, and middle turbinectomy (n ϭ 39) or 2) a less systematic ethmoidectomy adapted to the extent of the disease (n ϭ 37). they were also asked to remember what their sense of smell seemed like at 6-month intervals after the operation up to the time of filling out the questionnaire. the patients were required to mark their remembrances on a 10-point scale ranging from no functional improvement to complete recovery. in general, the ratings suggested similar improvement in olfactory function in both groups 6 months after surgery, and a maintenance of the same level 36 months after nasalization. some decrement was noted in reported smell function 24 months after the less extensive ethmoidectomy. however, this study suffers from many problems, not the least of which was the lack of an actual sensory measure, the requirement of a patient remembering function retrospectively over long periods, and demand characteristics attendant on being asked to report the effectiveness of an operative procedure to which they had subjected themselves. remarkable progress has been made in the last decade in understanding the function of the olfactory system. at the transduction level, the discovery of the gene family that controls the expression of olfactory receptors has been a monumental event. at the measurement level the development and proliferation of practical and reliable olfactory tests has spawned hundreds of studies that otherwise would not have been made, demonstrating olfactory dysfunction in a wide range of clinical disorders and leading to the discovery that olfactory loss is a very early clinical sign of several major neurodegenerative diseases. the comparatively few studies that have employed modern psychophysical tests to patients with rhinitis or rhinosinusitis have generally found an association between the degree of smell loss and the severity of nasal disease, although, except in cases of marked obstruction, no relationship is apparent between airway patency and olfactory dysfunction. this observation, along with recent histopathological studies of the olfactory mucosa in these disorders and the fact that even after nasal surgery and corticosteroid treatments, smell function rarely returns, on average, to normal levels, suggests that airflow access is not the only factor determining smell loss in such patients. although the weight of the evidence suggests that nasal steroid sprays, when appropriately administered, can improve olfactory function in some patients, not a single double-blind study employing placebos has evaluated the efficacy of such procedures in restoring smell function. it is hoped that the widespread availability of easy-to-use tests of olfactory function will lead to such controlled studies in the not-too-distant future. nervus terminalis lesions, ii: enhancement of lordosis induced by tactile stimulation in the hamster structure and function of the nervus terminalis nervus terminalis (cranial nerve zero) in the adult human vomeronasal organ in bats and primates: extremes of structural variability and its phylogenetic implications the human vomeronasal system: a review adult olfactory epithelium contains multipotent progenitors that give rise to neurons and non-neural cells olfactory bulb ensheathing cells enhance peripheral nerve regeneration a study of the test-retest reliability of ten olfactory tests tests of human olfactory function: principal components analysis suggests that most measure a common source of variance development of the university of pennsylvania smell identification test: a standardized microencapsulated test of olfactory function the smell identification test administration manual development of the 12-item cross-cultural smell identification test (cc-sit) a twin study of odor identification and olfactory sensitivity twin analysis of odor identification and perception gender and endocrine-related influences upon olfactory sensitivity smell identification ability: changes with age sex differences in odor identification ability: a cross-cultural analysis performance on a smell screening test (the modsit): a study of 510 predominantly illiterate chinese subjects age, gender, medical treatment, and medication effects on smell identification dose-related effects of cigarette smoking on olfactory function solvent-associated olfactory dysfunction: not a predictor of deficits in learning and memory solvent-associated decrements in olfactory function in paint manufacturing workers olfactory function in chemical workers exposed to acrylate and methacrylate vapors work related impairment of nasal function in swedish woodwork teachers long-term effects on the olfactory system of exposure to hydrogen sulphide olfactory loss in alcoholics: correlations with cortical and subcortical mri indices olfactory dysfunction in multiple sclerosis olfactory dysfunction in multiple sclerosis: relation to plaque load in inferior frontal and temporal lobes olfactory bulb volume is reduced in patients with schizophrenia cephalometric radiographs, nasal airway resistance, and the effect of adenoidectomy the effect of adenoidectomy on nasal resistance to airflow odor perception in children in relation to nasal obstruction die olfaktorische sensitivitat bei der rachenmandelhyperplasie olfactory function in acute rhinitis olfactory threshold and nasal mucosal changes in experimentally induced common cold effects of the nasal decongestant oxymetazoline on human olfactory and intranasal trigeminal function in acute rhinitis influence of prednisone on nasal polyposis with anosmia the loss of smell in nasal allergy efficacy of an aqueous and a powder formulation of nasal budesonide compared in patients with nasal polyps corticosteroids in olfactory dysfunction topical corticosteroids can alleviate olfactory dysfunction hyposmia in allergic rhinitis allergic rhinitis and olfactory loss the treatment of hyposmia with intranasal steroids topical corticosteroid treatment of anosmia associated with nasal and sinus disease sense of smell in allergic and nonallergic rhinitis a study of the relationship between the t&t olfactometer and the university of pennsylvania smell identification test in a japanese population fluctuating olfactory sensitivity and distorted odor perception in allergic rhinitis assessment of smell and taste in patients with allergic rhinitis a test of the validity of the elsberg method of olfactometry techniques in olfactometry: a critical review of the last one hundred years chronic sinusitis and anosmia: pathologic changes in the olfactory mucosa mineralocorticoid receptors in the mammalian olfactory mucosa expression of glucocorticoid receptor mrna and protein in the olfactory mucosa: physiologic and pathophysiologic implications new techniques for biopsy and culture of human olfactory epithelial neurons olfactory mucosal findings in patients with persistent anosmia after endoscopic sinus surgery effect of terfenadine and budesonide on nasal symptoms, olfaction, and nasal airway patency following allergen challenge acoustic rhinometry in the study of the acute nasal allergic response olfactory threshold after nasal allergen challenge olfactory dysfunction in allergic rhinitis is related to nasal eosinophilic inflammation comparison of olfactory function in patients with seasonal and perennial allergic rhinitis quantitative effects of nasal surgery on olfaction the risk to olfaction from nasal surgery the sense of smell, i: a new and simple method of quantitative olfactometry anosmia associated with corrective rhinoplasty the effect of submucous resection and rhinoplasty on the sense of smell olfactory function following late repair of choanal atresia clinical evaluation of olfaction steroid-dependent anosmia endoscopic intranasal surgery as an approach to restoring olfactory function improvement of olfaction following pansinus surgery effect of beconase nasal spray on olfactory function in post-nasal polypectomy patients: a prospective controlled trial objective assessment of endoscopic sinus surgery in the management of chronic rhinosinusitis: an update anosmia and chronic sinus disease effects of middle turbinate medialization on olfaction surgical correction of olfactory disorders olfaction after endoscopic endonasal ethmoidectomy die riechfunktion vor und nach endonasaler operation der chronisch-polyposen olfactory function after functional endoscopic sinus surgery for chronic sinusitis recovery of nasal physiology after functional endoscopic sinus surgery: olfaction and mucociliary transport olfactory function after microscopic endonasal surgery in patients with nasal polyps comparison of functional results after ethmoidectomy and nasalization for diffuse and severe nasal polyposis olfactory function in chronic alcoholics assessment of olfactory deficits in detoxified alcoholics olfactory dysfunction in amyotrophic lateral sclerosis olfactory disorder in motor neuron disease are there cognitive subtypes in adult attention deficit/hyperactivity disorder? presence of both odor identification and detection deficits in alzheimer's disease olfaction in neurodegenerative disease: a meta-analysis of olfactory functioning in alzheimer's and parkinson's diseases impaired olfaction as a marker for cognitive decline: interaction with apolipoprotein e epsilon 4 status olfactory dysfunction in anorexia and bulimia nervosa abnormally diminished sense of smell in women with oestrogen receptor positive breast cancer olfactory function in painters exposed to organic solvents smell and taste function in subjects with chronic obstructive pulmonary disease: effect of longterm oxygen via nasal cannulas olfactory deficits in cystic fibrosis: distribution and severity olfactory function in young adolescents with down's syndrome olfactory dysfunction in down's syndrome olfactory deficits and down's syndrome pre-and post-operative studies of olfactory function in patients with anterior temporal lobectomy olfactory functioning before and after temporal lobe resection for intractable seizures contribution of medial versus lateral temporal-lobe structures to human odour identification odor identification deficit of the parkinsonism-dementia complex of guam: equivalence to that of alzheimer's and idiopathic parkinson's disease olfactory dysfunction in guamanian als, parkinsonism and dementia long-term follow-up of olfactory loss secondary to head trauma and upper respiratory tract infection olfactory dysfunction in patients with head trauma olfactory identification deficits in hiv infection odor identification in huntington's disease patients and asymptomatic gene carriers olfactory function in huntington's disease patients and at-risk offspring kallmann syndrome: mr evaluation of olfactory system multimodal sensory discrimination deficits in korsakoff's psychosis olfactory disturbances as the initial or most prominent symptom of multiple sclerosis olfactory dysfunction in multiple sclerosis: relation to longitudinal changes in plaque numbers in central olfactory structures olfactory function in atypical parkinsonian syndromes olfactory function in patients with nasopharyngeal carcinoma following radiotherapy olfactory dysfunction in parkinsonism: a general deficit unrelated to neurologic signs, disease stage, or disease duration the olfactory and cognitive deficits of parkinson's disease: evidence for independence bilateral olfactory dysfunction in early stage treated and untreated idiopathic parkinson's disease olfactory testing as an aid in the diagnosis of parkinson's disease: development of optimal discrimination criteria olfactory function in parkinson's disease subtypes is parkinson's disease a primary olfactory disorder? [review olfactory dysfunction in type i pseudohypoparathyroidism: dissociation from gs alpha protein deficiency ventral frontal deficits in psychopathy: neuropsychological test findings olfactory function in restless legs syndrome olfactory identification deficits in schizophrenia: correlation with duration of illness olfactory identification and stroop interference converge in schizophrenia olfactory identification and psychosis olfactory deficits in schizophrenia regional metabolism in microsmic patients with schizophrenia olfactory deficits in schizophrenia are not a function of task complexity olfactory identification deficit in relation to schizotypy monorhinal odor identification and depression scores in patients with seasonal affective disorder odor identification ability among patients with sjogren's syndrome touch and taste in the mouth: presence and character of sapid solutions the fine structure of the olfactory mucosa in man key: cord-307123-h48uwj93 authors: kiechle, frederick l.; arcenas, rodney c.; rogers, linda c. title: establishing benchmarks and metrics for disruptive technologies, inappropriate and obsolete tests in the clinical laboratory date: 2014-01-01 journal: clin chim acta doi: 10.1016/j.cca.2013.05.024 sha: doc_id: 307123 cord_uid: h48uwj93 benchmarks and metrics related to laboratory test utilization are based on evidence-based medical literature that may suffer from a positive publication bias. guidelines are only as good as the data reviewed to create them. disruptive technologies require time for appropriate use to be established before utilization review will be meaningful. metrics include monitoring the use of obsolete tests and the inappropriate use of lab tests. test utilization by clients in a hospital outreach program can be used to monitor the impact of new clients on lab workload. a multi-disciplinary laboratory utilization committee is the most effective tool for modifying bad habits, and reviewing and approving new tests for the lab formulary or by sending them out to a reference lab. laboratory test overutilization is estimated to represent 2.9% to 56% of all laboratory tests internationally. efforts have been made to reduce the demand for or utilization of these over utilized tests [1] [2] [3] [4] [5] [6] . the most efficient outcomes have involved the formation of a laboratory utilization committee [2, 6] or a laboratory formulary committee [5] based on the hospital pharmacy and therapeutics committee's organizational structure. this committee evaluates the clinical value of laboratory tests using an evidence-based review of the appropriate medical literature. this same literature is reviewed by numerous professional specialty medical organizations as well as healthcare insurance carriers to determine what tests or procedures should be performed and reimbursed. the conclusions based on these reviews need to be updated on a regular basis. "the quality of guidelines is only as good as the published studies on which they are based" [7] . often relevant studies evaluating laboratory tests demonstrate negative findings and are not published [7, 8] . this phenomenon is referred to as positive publication bias or publication bias. tzoulaki et al. [8] demonstrated publication bias during a review of reports evaluating emerging cardiovascular biomarkers. therefore, misinterpretation is a potential impact of failing to publish studies with negative results during a review of evidence-based literature. readers beware. tests may be obsolete and should be retired from clinical use, while others may be inappropriately used for specific disease categories. the playing field is not level. there are at least six newer game-changing disruptive technologies being evaluated [9] [10] [11] which will result in modifications of clinical practice and laboratory testing modalities. these newer disruptive technologies may replace obsolete or inappropriate tests. lab utilization benchmarks and metrics are under continuous flux as a consequence. in the case of evolving newer technology, it is imperative to explore their impact early in their development to anticipate and monitor their impact on laboratory testing. the three authors have reviewed the current literature related to laboratory test utilization with an emphasis on where do the definitions of obsolete or inappropriate test utilization originate. we evaluated whole genome sequencing, next generation sequencing and proteomics as examples of high impact disruptive technologies that generate large quantities of data that need software to reduce to clinically useful results. practical examples of obsolete and inappropriate tests are reviewed as potential metrics to monitor improvement in test utilization. another useful metric is test utilization by clients in a hospital outreach program which can be used to monitor the impact of new clients on laboratory workload. finally, the result of published data from the work of laboratory utilization committees is summarized. benchmarks and metrics for laboratory utilization will be reviewed for three disruptive technologies as well as obsolete and inappropriately used tests. medical practice as well as pathology is in the midst of the rapid development of at least six major game-changing disruptive technologies. they include genetics, proteomics, digital pathology, informatics, therapeutic pathology and in vivo diagnostics [9] [10] [11] . all six of these disruptive technologies share similar issues like resolution of best applications for routine clinical use, paucity of evidence-based outcome literature for review, education of practitioners and physician users of the clinical information generated and software to convert big databases the method generates into clinical useful information [9] [10] [11] . the utilization of these techniques will increase as these barriers or obstacles to clinical use are overcome. an example of a disruptive technology is next generation sequencing or massively parallel sequencing [12] [13] [14] . this technique is currently not cleared by the u.s. food and drug administration [13] . it has been used to generate genome wide sequences and one of the authors (flk) has had his genome sequenced at the clia approved laboratory at illumina (san diego, ca). the results revealed 3.23 million variants compared with the reference method and 20,426 of these variants were in the exome or in the coding elements. the study interrogated 344 genes causally associated with 140 conditions as recommended by the american college of medical genetics. in that limited number of genes, 1,254 variants were detected and classified as clinically significant (0), carrier status (1), variants of unknown significance (255), likely benign variants (356) and benign variants (642). the definition of these variants calls and the failure of this technique to detect deletions, insertions, interspersed repeats and tandem repeats (repeats adjacent to each other like triplet repeats [15] ) may lead to inappropriate interpretation of the results and expensive follow up clinical and laboratory evaluation. for example, a clinically significant pathogenic variant reported in at least 3 unrelated cases with control data may be found in additional genome studies in other populations [16] to be a benign variant that is also found with a new variant which contains the mutation that leads to the most significant deleterious effect on gene function. the software application for variant significance assignment, like datagenno [17] , will need to be up-to-date with the latest genotype/phenotype associations to prevent false positive findings and inappropriate follow-up testing. in 2009 the highest rate of reported cancers was prostate, lung and bronchus and colon and rectum for men with female breast replacing prostate for women in the u.s. [18] . the annual incidence rate was 459 cases per 100,000 individuals. comprehensive sequencing of numerous human cancers have revealed driver genes, 2 to 8 such genes per tumor, which alter intracellular signal transduction pathways related to the cells future death or survival and/or genome maintenance [19, 20] . there are at least 10 fda approved cancer therapies based on the inhibition of these tumor-activated intracellular pathways [19] . for example, the braf kinase inhibitor, vemurafenib, has shown a response rate in 50% of patients with metastatic melanoma that have the braf valine to glutamic acid mutation at codon 600 (v600e) [21] . this v600e mutation is associated with aggressive clinical course in patients with thyroid papillary microcarcinoma [22] . in one study of a hybrid score composed of one molecular diagnostic (v600e) and 3 histopathologic parameters were used to predict this tumor's clinical course with a sensitivity of 96% and specificity of 80% [22] . the selection of the correct molecular diagnostic tests for specific tumors is aided by published guidelines. immunohistochemistry detection of estrogen and progesterone receptors in breast cancer from american society of clinical oncology and college of american pathologists [23] and selection of lung cancer patients for egfr and alk tyrosine kinase inhibitors from the international association for study of lung cancer, association for molecular pathology and college of american pathologists [24] . whole genome sequencing of a tumor will provide access to all known and unknown variants related to the tumor's survival skills [25] . the development of software [26] which will convert the patient's raw genome sequence into a medically relevant assessment of therapeutic targets and drug metabolism based on the tumor's body site will be very useful. from this genome analysis, the clinician wants to know what anticancer drug or drugs will this patient respond to as well as the dose. maldi-tof (matrix assisted laser desorption ionization-time of flight) spectroscopy is a relatively new technology to the clinical microbiology laboratory. pathogen identification has always relied on visual and biochemical interrogation where the summary of results may point to a specific identification (genus and species) or sometimes to at least the genus level. visual and biochemical results can sometimes yield variable results meaning in some cases the id may change depending on the result. the use of maldi-tof allows the clinical microbiology laboratory to identify bacteria once an isolate has been cultured potentially without performing any biochemical testing [11, 27, 28] . the implications are quicker pathogen identifications to clinicians and the potential to affect antibiotic treatment before susceptibility results are available. the ability to obtain a quicker answer will disrupt the testing workflow and require a re-evaluation of that workflow to optimize the use of maldi-tof and antibiotic susceptibility testing [11, 27, 28] . benchmarks and subsequent metrics for monitoring laboratory test utilization have been developed by professional subspecialty medical organizations in the format of recommendations and guidelines [29] . examples include guidelines for hypothyroidism in adults from the american association of clinical endocrinologists and the american thyroid association [30] , definition of myocardial infarction from the american college of cardiology foundation and american heart association [31] , definition of diabetes mellitus from the american diabetes association [32] , pharmacogenetics as well as follow-up testing for metabolic diseases identified by expanded newborn screening using tandem mass spectrometry from the national academy of clinical biochemistry [33, 34] , and use of bone metabolic markers from the japan osteoporosis society [35] . thirty-five of these specialty societies have joined the choosing wisely project organized by the american board of internal medicine. societies are asked to provide five specific, evidence-based recommendations on when tests and procedures may be appropriate or inappropriate for patient care (www.choosingwisely.org). a review of the lists from 26 specialty societies revealed 135 recommendations. laboratory tests were referenced in 25 items or 18.5% of the total. only one organization, american society of clinical pathology, had a list of 5 laboratory test-related recommendations [36] . kale et al. [37] reviewed the national annual savings if outpatient visits to the primary care physicians did not include unnecessary or inappropriate laboratory tests including cbc ($32.7 million), urinalysis ($3.3 million) and basic metabolic panel ($10.1 million). those three procedures yield an annual cost savings of $46.1 million compared to the elimination of inappropriate pap tests at an annual savings of $47.8 million. these figures illustrate the magnitude of healthcare savings by implementing simple laboratory test ordering practices which reduce duplication and/or inappropriate testing. collaboration by subspecialty medical societies in disruptive technology development and improvements in routine clinical laboratory test utilization will be a fertile area for the development of benchmarks and metrics for future laboratory test utilization. the appropriateness of laboratory tests and the appropriate utilization of laboratory tests are always important for patient care, but require increased scrutiny in the era of containment of healthcare costs. objective criteria for the judging of appropriateness of tests and their utilization have not been universally developed or applied, so it is not always easy to define these terms [38] . insurance companies are recognizing the medical and the financial burden of unnecessary testing and are taking action. many companies have posted information on their websites defining obsolete and unreliable lab tests which are readily accessible on the internet, including aetna [39] , united healthcare [40] and amerihealth [41] . one criterion for judging the appropriateness of a test is to determine if it is obsolete. the definition of "obsolete" as noted in the merriam-webster on-line dictionary at http://www.merriam-webster. com/dictionary/ is "no longer in use or no longer useful". synonyms include: antiquated, archaic, dated, démodé, demoded, fossilized, and kaput. over time, with advances in medical technology, laboratory tests become outdated. although it is difficult to remove a test from a laboratory's formulary, there are good reasons to do so. reasons include the availability of a more sensitive, specific, or accurate test or new guidelines recommend the elimination of a test with the replacement of another. there are tests in clinical pathology that must be considered for obsolescence. these include t3 uptake and lactic acid dehydrogenase (ldh) isoenzymes in the clinical chemistry lab and bleeding time in hematology. obsolete tests in the microbiology laboratory include bacterial antigen detection tests, group b streptococcus antigen (gbs) testing and hiv-1 western blot. t3 uptake and free thyroxine index (fti) are still ordered by physicians, despite the fact that alternative tests have been available for many years. t3 uptake is an old test designed with a purpose of indirectly measuring free thyroxine (t4). it was developed before the availability of direct assays able to accurately measure free t4 levels [42] . standardization of free t4 assays has been reported using the time-consuming equilibrium dialysis in combination with isotope dilution-liquid chromatography/tandem mass spectrometry [43] . t3 uptake is an assessment of unsaturated (unbound to thyroxine) thyroid binding proteins in serum and is used with total t4 to calculate fti. the fti is obtained by multiplying the (total t4) times (t3 update). there is no longer a need to estimate free t4 when there are assays for the direct measurement of free t4 in every laboratory. supporting evidence for the obsolescence of t3 update and fti has been available for decades [44] . current guidelines for the diagnosis and management of hypothyroidism [30] , hyperthyroidism [45] , and thyroid disease in pregnancy [46] , no longer include the assessment of t3 update or fti. the analysis of lactic acid dehydrogenase (ldh) isoenzymes by electrophoresis has been utilized as an aid in the diagnosis of myocardial infarction. with the development of a more specific test for myocardial damage, troponin, there is little use for this insensitive and time-consuming electrophoretic assay. current guidelines clearly establish that the preferred marker for cardiac injury is troponin [31] . bleeding time is a crude test of hemostasis (the arrest or stoppage of bleeding). it is an indication of how well platelets interact with blood vessel walls to form blood clots. indirectly assesses platelet function. it is performed by making a small incision on the skin and measuring, in seconds, the time taken for bleeding to stop. the test was designed to assess platelet function or exclude von willebrand disease. this test is labor intensive, invasive, poorly reproducible, and insensitive [36] . historically it was performed because screening tests with a higher sensitivity for platelet dysfunction and von willebrand disease (vwd) were unavailable. bleeding time has been replaced by instrumentation that can assess platelet function in whole blood by aggregation studies [47, 48] . available instrumentation includes the pfa-10 (platelet function analyzer, siemens usa), the verifynow (accumetrics), the plateletworks (helena), the impact (diamed) and the thromboelastograph (teg) (haemonetics). initial tests for a bleeding disorder rule out more common causes of bleeding. these tests include complete blood and platelet counts, ptt, pt, and possibly fibrinogen level or thrombin time. additional tests for von willebrand disease (vwf: antigen, ristocetin cofactor activity. factor viii clotting activity) can confirm the disease [48] . bacterial antigen detection tests should be considered obsolete. they have historically been used as an adjunct to other laboratory tests for the diagnosis of bacterial meningitis. the test's purported advantages were the rapid detection of h. influenza, n. meningitides, s. pneumoniae, and s. agalactiae. overall, the sensitivity is essentially the same as that of a gram-stained smear of a cyto-centrifuged csf specimen [49, 50] . with the advent of vaccines to h. influenza type b and n. meningitides (a, c, y, and w-135) the antigen testing is even less useful. the literature confirms that the use of direct antigen testing from the csf is neither sensitive nor specific [49, 50] . more importantly, the gram stain and cultures still need to be performed regardless of the initial antigen test result. based on the data reviewed, our laboratory has discontinued this testing in-house. this is an example of testing that was removed from the market based on recommendations from the centers for disease control (cdc) stating that the rapid antigen detection tests for gbs are not sensitive enough to replace the culture based prenatal screening or to use in place of the risk-based approach when culture results are unknown at the time of labor [51] . because of the poor performance of the rapid antigen testing, cdc has recommended intrapartum chemoprophylaxis antibiotics be administered to women who have certain risk factors [51] . our laboratory looked at internal data for our patient population and found the sensitivity of the rapid antigen test that was being used was 28%. forty-one patients were missed on the rapid antigen test were detected only by culture. data from the literature show an average sensitivity of 25.7% among various labs surveyed [52] . this is an example where cdc recommendations and assessment of testing performance within your laboratory supports moving a test into obsolescence. the hiv-1 western blot (wb) has been a constant as one of the confirmatory tests for hiv antibody testing. however, the wb is moving its way into obsolescence as newer generation antigen/antibody and molecular assays become part of the new hiv testing algorithms. hiv-1 wbs have always had issues with indeterminate results due to a myriad of factors (false positives and/or lack of specificity, kit design, etc.), which required either re-testing at a later time and/or molecular testing for hiv-1 nucleic acid [53] . the cdc/aphl, who, and france each have different interpretation criteria for defining a positive confirmatory result which indicates a lack of standardization for defining a patient as positive for hiv-1 [54] . the immune response to hiv-2 is well known to produce antibodies that cross react on the hiv-1 wb which could lead to a false positive hiv-1 result [54] . the lack of improvements or advancements of the wb compared to other antibody based assays has been nicely shown by masciotra et al. [55] . they demonstrate that the rapid hiv tests actually detect hiv-1 antibodies several days earlier before the wb becomes positive [55] . the introduction of 4th generation antigen/antibody testing has allowed clinicians to detect reactive patients earlier in their disease course compared to 3rd generation testing effectively narrowing the window of serological detection by approximately 4-8 days [56] . because of these new developments, clsi and aphl have recommended new algorithms that incorporate antigen/antibody combination tests, rapid tests, and molecular testing [57, 58] . the wb should be a test that will be obsolete as laboratories adopt the newer algorithms for hiv testing. in addition to obsolete tests, tests may be inappropriately used. inappropriate utilization includes the failure to follow current practice guidelines for the diagnosis and management of disease, thus ordering the incorrect test, panel of tests, or algorithm of tests; ordering tests too frequently or lack of medical rationale for the test. the problem and the resolution need to be reshaped as a way to "improve patient outcomes and lower costs" [59] . the acknowledgement of the laboratory test utilization problem has been known and published for more than 2 decades [1-6,60] studies have been done to estimate and document the percentage of unnecessary tests [37, 61] . a typical scenario for inappropriate test ordering occurs with thyroid function testing in the diagnosis and management of hypothyroidism. the guidelines are clear on the appropriate tests [30] . thyroid stimulating hormone (tsh) is regarded as the best screening test, followed by free thyroxine (free t4) if the tsh is abnormal. additional tests are often ordered including total t4. there is no need for a total t4 measurement if a free t4 is provided. furthermore, adding a total t4 level may confuse the diagnosis if changes in binding proteins via disease or drug therapy result in a total t4 that is inconsistent with other test results. tests may be ordered inappropriately or at other times, the wrong test is ordered. vitamin d testing is known to result in both inappropriate and erroneous ordering [62] . the correct test for the routine assessment of vitamin d status or deficiency is 25-hydroxy vitamin d. the test 1,25-dihydroxy vitamin d is often mistakenly ordered. the dihydroxy form of vitamin d is occasionally ordered in patients with kidney disease (decreased levels are one of the earliest changes to occur in persons with early kidney failure). however, most of the orders for 1,25-dihydroxy vitamin d are simply erroneous. tests for both vitamin d2 and d3 are unnecessary for the assessment of vitamin d status. there is no need to differentiate between the d2 and d3 forms other than in the research setting. viral cultures have traditionally been the gold standard for virological detection in the clinical microbiology laboratory. direct viral detection direct from patient samples have also been utilized with monoclonal antibodies to detect viral antigen(s) with the intent of obtaining a result in a timelier manner than viral cultures. molecular technologies have now become established in the clinical microbiology laboratory. with the increased sensitivity and specificity [63] and almost always a shorter turn-around-time, it is reasonable to ask "why do viral cultures?" [63, 64] . one example of molecular testing that helps to answer this question are panels developed for respiratory viruses. there are several commercial companies that offer their version of an rvp (respiratory virus panel). our laboratory offers a rvp assay that detects 20 viral targets. not only are results obtained sooner than traditional virology testing, but our molecular rvp offers greater sensitivity and specificity than our prior direct fluorescence assay (dfa) that was sent out to a reference laboratory. because of the increased sensitivity, the rvps have allowed us to document co-viral infections, which have not been fully appreciated before with dfa or culture based testing [64] . there have been reports of increased severity of disease, as well as reports of decreased severity of disease in the setting of co-viral infections [65] . much more research must be done to understand the potential interactions of different respiratory viruses in the setting of a respiratory infection. our laboratory publishes a "virogram" during the respiratory virus season and related viral coinfections (figs. 1 and 2 and table 1 ). the "virogram" charts the respiratory virus prevalence among the patient populations tested to give an idea what is circulating in the community. another example of molecular testing replacing viral cultures is the detection of enterovirus (ev) from csf. ev pcr has been shown to have greater sensitivity over culture [66] . a study (unpublished) performed within our institution looked at the utility of an in-house ev pcr and its affect on length of stay (los) and cost on 20 ev pcr negative and 20 ev pcr positive patients. those with an ev pcr negative result had an average los of 2.1 days greater than those that were ev pcr positive and had an estimated $187,992 of additional cost related to in-patient care. viral cultures still have importance in growing the virus for the purposes of subtyping, identifying new strains, or antiviral testing. however, these should remain in specialty or research labs and not be routine for clinical diagnostic testing. pcr testing for the causative agent of lyme disease may initially make sense, but the life cycle of b. burgdorferi is such that pcr detects borrelia dna in the blood in less than half of patients that are in the early acute stage of disease when the characteristic erythema migrans is present [67] . therefore, pcr is not recommended as a first line test for making the initial diagnosis of lyme disease. a review article showed that the median percent sensitivities of pcr testing from blood, skin biopsy, csf, and synovial fluid are 10-48%, 64-76%, 23-73%, and 66-83%, respectively [67] . the current recommendation for diagnostic testing involves a two-tiered algorithmic approach involving antibody testing [68] . there may be clinical utility for pcr, but only if the serology testing is negative or inconclusive and clinical history and symptoms strongly suggest lyme disease. hospitals have introduced outreach programs that market laboratory services to physician offices, nursing homes, and other hospitals [69, 70] to increase test volumes and reduce unit costs per test. this effort generates increased laboratory test utilization which should be monitored by average number of specific tests ordered per subspeciality physician per month ( [69] , table 40 .10). the data is collected from patient requisitions that are processed each day in the accessioning area of the outreach specimen receiving area. the sales force will introduce outreach administration to the projected number of new client accounts to be opened each month. the impact of the laboratory can be estimated by multiplying the volume of a specific test per physician for an office practice of multiple physicians. these volumes will estimate the utilization of tests per laboratory section and predict the future need for additional analyzers and/or personnel to perform the laboratory test procedures as the outreach client number increase. utilization data varies by medical subspeciality. for example, urology had 94 requisitions/physician, 1.4 procedures/requisition and 132 procedures/physician with the psa being the highest test volume. compare that to the nursing homes with 189 requisitions/physician, 3.6 procedures/requisition and 688 procedures/physician with electrolytes being the highest test volume [69] . this data is used primarily for planning the best strategies to absorb the increased workload new clients will bring to the outreach business. it can also be used to monitor the use of obsolete and inappropriate lab tests to develop educational efforts to improve test utilization practices by subspeciality among outreach clients. an excellent review of laboratory test utilization, understanding the many factors involved, and steps to implement changes are provided in a recent publication [4] . the author provides insight and guidelines to assist the laboratory in initiating improvements in laboratory utilization. ultimate goals include: developing and adopt more-effective testing algorithms, reducing testing costs, use new technologies cost-effectively, and shorten the time to diagnosis. interventions for hospitals and laboratories focus on changing physicians' test ordering behavior and include: 1. eliminating obsolete tests and modifying requisition forms. the laboratory can alter test-requisition forms to steer clinicians in the right direction. one such option is an "out of sight, out of mind" approach in which certain tests simply don't appear on the menu. 2. assisting in the education to promote appropriate lab testing can be part of a hospitals continuing medical education (cme) program for clinicians through grand rounds, newsletters, and cme lectures. 3. reinforcing positive changes by auditing clinicians' use of new protocols and offering feedback. 4. a two tier review process for molecular send out tests is a useful tool for pathologists to learn and advice on the wisdom of molecular assays [71] . finally, all hospitals should implement a laboratory utilization or formulary committee to help in overseeing testing and promoting good testing practices, similar to pharmacy and therapeutics committees. this approach has been met with great success [2, 5, 6] . the use of molecular methods for respiratory virus testing has allowed us to detect patients with more than 1 virus present. the presence of multiple viruses within a patient sample is currently underappreciated and the effect on the overall disease presentation is currently unknown. managing utilization of new diagnostic tests clinical laboratory tests not performed in a central hospital laboratory managing the demand for laboratory testing: options and opportunities demand management and test request rationalization laboratory test utilization program. structure and impact in a large academic medical center changing physician behavior in ordering diagnostic tests biomarkers in cardiovascular medicine. the shame of publication bias bias in association of emerging biomarkers with cardiovascular disease pathology resident and fellow education in a time of disruptive technologies a national agenda for the future of pathology in personalized medicine current status of matrix-assisted laser desorption ionisation-time of light mass spectrometry in the clinical microbiology laboratory national human research institute, charting a course for genomic medicine from base pairs to bedside assuring the quality of next-generation sequencing in clinical laboratory practice developing genome and exome sequencing for candidate gene identification in inherited disorders. an integrated technical and bioinformatics approach mechanisms of trinucleotide repeat instability during human development the 1000 genomes project consortium datagenno: building a new tool to bridge molecular and clinical genetics the role of high-throughput technologies in clinical cancer genomics cancer genome landscapes improved survival with vemurafenib in melanoma with braf v600e mutation a combined molecular-pathologic score improves risk stratification of thyroid papillary microcarcinoma american society of clinical oncology/college of american pathologists guideline recommendations for immunochemical testing of estrogen and progesterone receptors in breast cancer (unabridged version) molecular testing guideline for selection of lung cancer patients for egfr and alk tyrosine kinase inhibitors personalized medicine in a phase 1 clinical trials program: the md anderson cancer center initiative snver gui: a desktop tool for variant analysis of next-generation sequencing data mass spectrometry and tandem mass spectrometry characterization of protein patterns, protein markers and whole proteomes for pathogenic bacteria maldi-tof-ms-based species identification and typing approaches in medical mycology implementing clinical practice guidelines american association of clinical endocrinologists and american thyroid association taskforce on hypothyroidism in adults, clinical practice guidelines for hypothyroidism in adults accf 2012 expert consensus document on practical considerations in the interpretation of troponin elevations. a report of the american college of cardiology foundation task force on clinical expert consensus documents diagnosis and classification of diabetes mellitus laboratory analysis and application of pharmacogenetics to clinical practice follow up testing for metabolic diseases identified by expanded newborn screening using tandem mass spectrometry guidelines for the use of bone metabolic markers in the diagnosis and treatment of osteoporosis when less is more for patients in laboratory testing top 5" lists top $5 billion do we know what inappropriate laboratory utilization is? a systematic review of laboratory clinical audit aetna clinical policy bulletin: obsolete and unreliable tests and procedures united healthcare's policy number 300.1. obsolete or unreliable tests amerihealth's medical policy bulletin #00.01.24d. obsolete or unreliable diagnostic tests and medical services comparison of three free t4 (ft$) and free t3 (ft#) immunoassays in healthy subjects and patients with thyroid diseases and severe non-thryoid illnesses use of frozen sera for ft4 standardization: investigation by equilibrium dialysis combined with isotope dilution-mass spectrometry and immunoassay a reappraisal of the free thyroxine index hyperthyroidism and other causes of thyrotoxicosis: management guidelines of the american thyroid association and guidelines of the american thyroid association for the diagnosis and management of thyroid disease during pregnancy and postpartum standardization of platelet function testing by aggregometry through new clsi guideline diagnosis and management of von willebrand disease: guidelines for primary care comparison of bacterial antigen test and gram stain for detecting classic meningitis bacteria in cerebrospinal fluid rapid bacterial antigen detection is not clinically useful prevention of perinatal group b streptococcal disease infections in international pregnancy study: performance of the optical immunoassay test for detection of group b streptococcus frequency, causes, and new challenges of indeterminate results in western blot confirmatory testing for antibodies to human immunodeficiency virus comparison of alternative interpretive criteria for the hiv-1 and hiv-2 infections evaluation of an alternative hiv diagnostic algorithm using specimens from seroconversion panels and persons with established hiv infections evaluation of the performance of the abbott architect hiv ag/ab combo assay criteria for laboratory testing and diagnosis of human immunodeficiency virus infection; approved guideline, clsi document m53-a silver spring, maryland: the association of public health laboratories order patrol -how to rein in test requests professional review of laboratory utilization ready, fire! aim! an enquiry into laboratory test ordering vitamin d testing-what's the right answer? role of cell culture for virus detection in the age of technology detection of respiratory viruses by molecular methods an analytical comparison of four commercial respiratory virus panels. clinical virology symposium poster diagnosis of enteroviral meningitis by use of polymerase chain reaction of cerebrospinal fluid, stool and serum specimens diagnosis of lyme borreliosis notice to readers: recommendations for test performance and interpretation from the second national conference on serological diagnosis of lyme disease outreach implementation requirements: a case study hospital laboratory outreach: benefits and planning clinical requests for molecular tests. the 3-step evidence check clinical labs of hillview 3375 hillview avenue phd university of alberta hospital 481.24 walter mackenzie centre 8440 112 th street edmonton ab t6g 2b7 canada key: cord-310714-kqzlwka0 authors: braz, lucia maria almeida; tahmasebi, roozbeh; hefford, philip michael; lindoso, josé angelo lauletta title: visceral leishmaniasis diagnosis: a rapid test is a must at the hospital bedside date: 2020-06-16 journal: clinics (sao paulo) doi: 10.6061/clinics/2020/e2036 sha: doc_id: 310714 cord_uid: kqzlwka0 nan at the time of the widespread availability of rapid diagnostic tests for sars-cov-2-the causative virus of the covid-19 pandemic-from drugstores throughout brazil, there is a distinct lack of use of rapid diagnostic tests for visceral leishmaniasis (vl) at the bedsides of hospitalized patients. these tests are mainly distributed by the ministério da saúde do brazil-ms (ministry of health) only to the laboratórios centrais de saúde publica -lacens (central public health laboratories) and are predominantly provided to public hospitals. vl is the most severe form of leishmaniasis and it causes high morbidity and mortality in developing countries (1, 2) . leishmania infantum chagasi is responsible for vl in the new world, with typical clinical signs and symptoms of splenomegaly, hepatomegaly, and fever (3) . vl can be lifethreatening, and because 90% cases of vl occur in brazil, reliable and rapid diagnosis of vl is required (4) . as stated by the ms, vl case confirmation is based on clinical suspicion and positive laboratory diagnosis via either parasitological tests (pts), which are dependent on invasive procedures such as bone marrow aspiration or biopsy, or serological tests such as indirect immunofluorescence (ifi) or immunochromatographic tests (its) using rk39 recombinant antigens (5) . the serological tests ifi and it-rk39 have the advantage of being minimally invasive and they can be performed in large numbers (6) . however, ifi requires a fluorescence microscope and is time consuming. the procedure of it-rk39 takes only 10-15 minutes and requires only 10-20 ml of the peripheral blood. it is a rapid and low-cost bedside test. the rk39 dipstick used for its is the product of a gene cloned from the leishmania genus containing a 39-amino acid repeat conserved among viscerotropic leishmania species (7). the main brands of it-rk39 that were previously provided by the brazilian public health system consisted of kalazar detectt (inbios international, seattle, wa, usa), it leish s (bio-rad laboratories inc., france), and onsitet leishmania igg/igm combo test (ctk biotech, usa), which have now been replaced with the lsh ab eco test (eco diagnóstica, nova lima, mg, brasil). kalazar detectt was the first rapid test for vl diagnosis that was adopted by the brazilian public health system in 2009. it has a sensitivity and specificity of 88.1% and 90.6%, respectively. in 2015, it leish s replaced the kalazar detectt and showed an improved sensitivity and specificity of 93% and 97%, respectively (8) . however, these it-rk39 tests would usually present a lower accuracy when tested in patients coinfected with hiv (9,10). in 2017, the onsitet leishmania igg/igm combo test replaced the it leish s (8). today, ms recommends using a new brand, the lsh ab eco test, a qualitative immunoassay for the detection of antibodies (rk39) against vl in human serum (11) . the specificity for this test is equal to 100% (95% ci 0.93-1), indicating that it has high specificity for the rk39 protein. the sensitivity presented by the lsh ab eco is 92% (95% ci 0.82-0.97) (11) . lsh ab eco test was declared by the agência nacional de vigilância sanitária-anvisa (national health surveillance agency) as a criterion for the laboratory confirmation of suspected cases of the disease. therefore, suspicious patients, including those presenting with clinical signs compatible with disease and those coming from a region with known occurrence of transmission, alongside a positive rapid test, can be considered confirmed cases of vl based on clinical laboratory criteria. the lsh ab eco test has technical specifications and execution methodology similar to those of the brands used before. according to the manufacturer, lsh ab eco, a lateral flow chromatographic immunoassay used to detect class g immunoglobulin for leishmania donovani, uses recombinant antigens in the test line and chicken anti-protein a in the control line. it is easy to use and interpret. in accordance with the manufacturer's instructions and technical orientation from sdp/iom/funed n o 001/2019 (12) , the procedure of the test is as follows: add 20 ml of serum/plasma or 1 drop of blood (10 ml) to the test strip pad, below the arrows. if serum, plasma, or blood is applied to the test strip horizontally on a flat surface, take the strip by the green label and place it vertically, with the arrow pointing downwards, in a test tube or microwell containing 2-3 drops (150 ml) of the diluent buffer. if serum, plasma, or blood is applied to the test strip vertically, add 2-3 drops (150 ml) of the diluent doi: 10.6061/clinics/2020/e2036 copyright & 2020 clinics -this is an open access article distributed under the terms of the creative commons license (http://creativecommons.org/licenses/by/ 4.0/) which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is properly cited. no potential conflict of interest was reported. received for publication on may 27, 2020. accepted for publication on may 29, 2020 buffer to the base of the microwell or test tube and read the test result after 10 minutes. it is important to highlight that this it for rk39 is produced by a brazilian biotechnology industry located in the state of minas gerais, brazil, as the previously used brands were produced by industries situated outside of brazil. this is an important achievement for the brazilian health system with regard to vl diagnosis. in summary, the test is suitable for use at the bedside, requires a minimal amount (10 ml) of peripheral blood, with no need of special equipment, and is simple to perform and read, with the results being available in 10 minutes. however, this simple dipstick test for rk39, distributed by the ministério da saúde do brazil to public laboratories, is not available in some public hospitals, including hospital das clinicas da faculdade de medicina da universidade de são paulo (hcfmusp). rapid test-rk39 is not even offered by the majority of private laboratories and private hospitals for vl diagnosis. even if it is provided, the turnaround time between sending the sample and receiving results is typically a minimum of 24 hours; thus, it can hardly be deemed as a 'rapid test' with any conviction. it is time to change the narrative and alter the distributive flowchart of this test. it is necessary to use the rk39 it at the bedside of suspected vl patients across hospitals to the greatest effect. why not employ the technical skills of a team who usually attend patient needs, such as nurses and nursing technicians, thereby ensuring that the rk39 it truly does indeed become a rapid diagnostic bedside test? vl can be lethal and patients simply cannot afford to wait for diagnoses/treatments. braz lma and lindoso jal designed the study, drafted and reviewed the manuscript. tahmasebi r and hefford pm reviewed the manuscript, also for english language. all of the authors have read and approved the content of the manuscript. visceral leishmaniasis in brazil: revisiting paradigms of epidemiology and control diagnosis of leishmaniasis unusual clinical manifestations of leishmania (l.) infantum chagasi in an hiv-coinfected patient and the relevance of its1-pcr-rflp: a case report a pcr and rflp-based molecular diagnostic algorithm for visceral leishmaniasis guia de vigilância em saúde. leishmaniose visceral diagnosing visceral leishmaniasis with the recombinant k39 strip test: experience from the sudan diagnostic and prognostic value of k39 recombinant antigen in indian leishmaniasis evaluation of a new brand of immunochromatographic test for visceral leishmaniasis in brazil made available from 2018 field evaluation of rk39 tests and direct agglutination test for diagnosis of visceral leishmaniasis in a population with high prevalence of human immunodeficiency vírus in ethiopia comparison of parasitological, serological, and molecular tests for visceral leishmaniasis in hiv-infected patients: a cross-sectional delayed-type study controle de qualidade de kits para diagnóstico da leishmaniose visceral humana. avaliac¸ão do teste rápido imunocromatográfico lsh ab eco da eco diagnóstica para o imunodiagnóstico da leishmaniose visceral humana (lvh). relatório técnico. belo horizonte key: cord-225183-6rusimb5 authors: boukai, ben; wang, jiayue title: bayesian modeling of covid-19 positivity rate -the indiana experience date: 2020-07-09 journal: nan doi: nan sha: doc_id: 225183 cord_uid: 6rusimb5 in this short technical report we model, within the bayesian framework, the rate of positive tests reported by the the state of indiana, accounting also for the substantial variability (and overdispeartion) in the daily count of the tests performed. the approach we take, results with a simple procedure for prediction, a posteriori, of this rate of 'positivity' and allows for an easy and a straightforward adaptation by any agency tracking daily results of covid-19 tests. the numerical results provided herein were obtained via an updatable r markdown document. the indiana state department of health, (isdh), as any other similar entity across the nation and world-wide, is closely monitoring nowadays a pandemic of the 2019 novel coronavirus, aka: covid-19. according to the world health organizing, (who), this highly contagious respiratory virus was first identified and reported in the city of wuhan in china in early january 2020. since then, this virus continues to spread and infect people around the world, including the united states. on march 11, 2020, the who published an assessment that covid-19 can be characterized as a pandemic. as of the date of this report (july 7, 2020), the john hopkins university's coronavirus resource center, reported over 11,500,000 infected people and 538,000 deaths due to the coronavirus, globally, of which over 2,938,000 infections and 130,000 deaths are in the usa alone. the state of indiana was not spared of the contagious impact of the virus. on march 6, isdh confirmed the first case of covid-19 in a hoosier with recent travel. on march 16, the isdh reported the first death in indiana due to . it subsequently worked with federal and local partners, including the centers for disease control and prevention, (cdc), to respond to this pandemic and the grave public health situation. among other responses, the isdh created a dashboard and a data depository for tracking and reporting the daily number of covid-19 deaths in the state as well as the daily number of tests performed and the daily number of positive cases. while there is substantial (and urgent) effort being made, worldwide, for modeling of the the death rate (or the infection fatality rate, ifr) of covid-19, (see for example basu (2020) ), there has been very little attention given to modeling the rate of reported positive tests (often referred to rate of 'positivity'). in this report we model, within the bayesian framework, the rate of positive tests reported by the the state of indiana, accounting also for the substantial variability of the daily number test performed. the bayesian approach has been used successfully in other covid-19 related studies, e.g.: dana, at. el. (2020), or bayes, at. el. (2020). however, to the best of our knowledge, none of the available studies (to date) have direct relevancy to the modeling of the rate of positivity. the approach we take, provides a method for a valid prediction of this rate and allows for an easy and a straightforward adaptation by any agency tracking daily results of covid-19 tests. the indiana covid-19 data are available for retrieval through the isdh data hub [10] as are reported for the state in the file covid report date.xlsx. it includes the daily records of the number of covid-19 deaths cases, the total (daily) number of testings performed and the count of the positive cases, (see appendix b for additional information). for the purpose of this report, we focus attention only on two of the quantities reported; the daily reported number of test performed, covid test and the daily reported number of positive tests, covid count as were reported for the (successive) 127 daily records to date. their ratio, the daily percentage of positive tests (ppt), is used for tracking and monitoring the pandemic's progression by the state and localities. the ppt is also an important indicator of the scope and extent of the state's testing enterprise; its high value suggests that testing is conducted primarily of the sickest patients and less so of the mild or the asymptomatic cases. a lower ppt may suggest that testing has been extended to cover patients with milder or no symptoms at all. according to the cdc guidelines [7] , among other stipulations, a ppt ≤ 15% serves as a threshold for entering phase ii of the reopening plans for the state, whereas achieving a ppt ≤ 10% serves a threshold of entering phase iii of the reopening plans. at present, the 7-day average of this rate (of positive tests) for indiana is 7.61% and cumulatively, the indiana ppt stands at 9.18% (i.e. the total number of positive tests relative to the total number of tests performed to date, see also the isdh dashboard, which enabled the state to enter phase iii of the reopening plans. thus, the importance of the appropriate modeling and tracking this percentage of reported positive tests cannot be overstated, as it has tremendous economic implications for the state and its people. in particular, constructing a valid predictive model, which predicts reasonably well the 'next-day' ppt, would provide the state with an early sign of a looming surge in positive cases, and would allow it to implement corrective measures and policies. since the available records for the latest few days is still updating, we included records up to the last 3 days in the series. thus, the data series includes m = 109 data points out of the available 127 records. also marked in the figure are two horizontal lines indicating the 10% and 5% thresholds. in appendix a below, we provide basic descriptive statistics concerning the daily reported number of tests and the daily reported number of positive tests as well as of their ratio, the ppt. in figure 2 above, we present the cumulative ppt rate for indiana over these days. we also super-imposed on the daily chart the 0.95 (or 95%) posterior predictive interval of [0.07981, 0.1001] for the current indiana (cumulative) ppt rate, as resulted from the bayesian predictive modeling we developed and present in this report (see illustrations 1 and 2 below). as can be seen, the calculated posterior prediction interval contains the cdc's 10% threshold, which could serve as an early warning flag to the state and may prompt it to initiate some mitigating measures (such as mandating masks in public) in an effort to maintain the rate of positivity below it. having retrieved the data as described above, we labeled them as y i ≡covid count and k i ≡covid tests for i = 1, . . . , m where m = 109 is the total number of observations included in the analysis. thus, we denote the given data as ordinarily, when modeling the count of positive test results y i recorded out of the k i tests performed (assuming independence, risk homogeneity of the tested population and no lagging information), the binomial model would be appropriate. so that conditional on the number of tests performed, and a given p ∈ (0, 1) accordingly, given k 1 , k 2 , . . . , k m , the daily counts of the positive tests, y 1 , y 2 , . . . , y m are conditionally independent binomial random variables with p = pr(t est = +). in a similar fashion, we model the reported number of tests performed k 1 , k 2 , . . . , k m as (independent) negative binomial random variables. that is, for given (fixed) integer r > 0, and θ ∈ (0, 1), as we can see from appendix a, the observed marginal distribution of the reported daily number of test performed is exhibiting over-dispersion features that are characteristic to mixed-poisson or to negative binomial counts. thus, combining (1)(2), we obtain that given (p, θ), the joint probability (mass) function of the m pairs, (y i , k i ), i = 1, . . . , m, (i.e. the likelihood function), is where, ∝ indicates proportionality of terms, up to a constant, and where n m = m j=1 k j and x m = m j=1 y j are the cumulative reported number of tests performed and the cumulative reported positive tests. we note in passing that the cumulative ppt rate mentioned in the introduction is merely the ratio,p m := x m /n m . the standard bayesian predictive model in the case of binomial-negative binomial counts (as in (1)-(2)), is that with the conjugate beta-beta joint prior distribution on (p, θ) (see for example gelman et. al. (2014) ). that is, in the case of the likelihood function f (d m |p, θ) given in (3) above, we consider the bayesian model that assumes that for given p, and n m , for some a > 0 and b > 0 and where, given θ > 0, for some fixed integer r > 0, and some c > 0, and d > 0. thus, the joint prior pd f of (p, θ) is, accordingly, and since the joint posterior pd f for (p, θ) **given** the data d m , is we immediately obtain from (3)-(6) (due to the conjugacy) that **given** the data, x m and n m , the (marginal) posterior distribution of p, denoted as π(p |x m , n m ), is also a beta distribution and the (marginal) posterior distribution of θ, denoted as π(θ |n m ), is also a beta distribution. specifically, it follows that given x m and n m , where these four updated parameters are given by: hence, the bayes estimates of θ and p given the data (x m , n m ), are the respective posterior means:p remark: the choice in (2) for using the negative binomial distribution to model the reported daily number of tests k i , could be seen as specific to the indiana covid-19 testing data, which might reflect testing capacity limitation and daily variability unique to that state. other models that account for the observed overdispersion characteristics of the data (see appendix a) could be used instead. for instance, one may alternatively consider the related mixed-poisson distribution in (2). having observed (x m , n m ), the posterior predictive distribution , under the bayesian model (3)-(6) and given (x m , n m ), of a 'new' (or 'future') observation on the number of positive cases, y * , out of a given k * = k * new tests is the the beta-binomial distribution, for y * = 0, 1, . . . , k * . we denote this (predictive) distribution for y * , given k * = k * , as with a m and b m as in (7). the corresponding posterior predictive mean and variance of y * are given by in a similar manner we obtain the posterior predictive distribution under this bayesian model and given (x m , n m ), of a 'new' (or 'future') number of tests k * is the beta-negative binomial distribution. in fact, with k * | θ ∼ negbinom(r, θ), we have for k * = 0, 1, 2, . . . . we denote this (posterior predictive) distribution for k * as with c m and d m as in (7). the corresponding posterior predictive mean and variance of k * are given by it is straightforward to see that the joint posterior predictive probability (mass) function of (y * , k * ), can easily be obtained from expressions (7) and (8) is clearly, along with the value of y * as the number of positive tests predicted out of the predicted number of tests, k * , one may obtain their ratio, p * := y * /k * , as the rate of positive tests predicted next, given the data. while an explicit expression for the posterior predictive distribution of p * is not readily available, it may be estimated, quite accurately, through monte-carlo simulations. towards that end, we denote by q * m (·) the posterior predictive cd f of p * , given the data d m . that is, for any t ∈ r, and recall that the α th percentile (α ∈ (0, 1)), of this distribution, is defined as thus, when available, the interval, [t * 1−α , t * α ] serves as a 1 − 2α posterior prediction interval for p * given the data d m , so that as was mentioned in the previous section, while explicit expression for q * m (·), the posterior predictive cd f of p * , given the data d m , is not available, it may be estimated via monte-carlo simulations which exploit the explicit expression, in (9) , for the joint posterior predictive distribution of (y * , k * ). given the data, d m and with parameters a m , b m , c m and d m and with r as above, generate a random sample of a large size b (b = 5, 000, say) from q * m (·) as follows: 1) given n m , draw k * ∼ betanegbinom(r, c m , d m ); 2) given x m , n m and k * = k * , draw y * ∼ betabinom(k * , a m , b m ) to obtain the pair (y * , k * ); 3) calculate the simulated predicted ppt as p * = y * /k * . b times so as to form p * 1 , p * 2 , . . . , p * b as a random sample from q * m (·). having obtained the random sample p * 1 , p * 2 , . . . , p * b , we estimate q * m (·) by its empirical ver-sionq * accordingly, and in similarity to (10), we estimate the α st percentile of q * m (·) bŷ a simple r script (see appendix c) which utilizes the built-in functions rbnbinom() (for the betanegbinom(·) distribution) and rbbinom() (for the betabinom(·) distribution), of the extradistr package produces the simulated sample from the (respective) posterior predictive distribution of (y * , k * ) and the corresponding predicted ppt, p * = y * /k * . we continue with the same prior parameterization used in illustration 1, of a = b = 1, c = d = 1 and r = 3. recall that the given data d m , yields, m = 109, n m = 5.22946 × 10 5 and x m = 4.6907 × 10 4 . we first simulated b = 5000 sample values for the respective predictive distributions of (y * , k * ) and of p * and used these simulated values to estimate the posterior predictive distribution of p * , which in turn, was used it to obtain the 95% posterior prediction interval, [0.07981, 0.1001] for the 'next-day' indiana's ppt as was displayed in figure 2 . the means and standard deviations of the (estimated) posterior predictive distributions of y * , k * and p * , along with the corresponding 95% prediction interval are presented in table 1 figure 3 : the estimated (monte-carlo) posterior predictive distribution of indiana ppt, p * , along with the marked (in blue) the 95% prediction interval figure 3 above, displays the monte-carlo histogram of that predictive distribution, along with a nonparametric and a normal density (in red) approximations. also marked are the corresponding bound for the 95% posterior prediction interval for p * . the monte-carlo marginal (posterior) distribution of k * is displayed in figure 4 and that of y * in figure 5 . we conclude this illustration with figure 6 , where we display the posterior prediction intervals for the ppt as were calculated for each report day in the series. that is, based on the given data on the n th day, d n , we calculated the 95% posterior predicted interval for the ppt on the (n + 1) th , day (the "next" day), for each n = 2, 3, . . . , m of the m = 109 days available in the data set. as can be seen, each of the daily calculated ppt (as in figure 2 ), fell well within the corresponding prediction interval (marked in red in figure 6 ) as was calculated based on the previous' days data, thus providing also a partial validation for the applicability of this bayesian approach (with its underlying assumptions) to these covid-19 count data. the indiana covid-19 data are available for retrieval through the isdh data hub[10] as are reported for the state in the file covid report date.xlsx. it includes the daily records (as columns) on: • date: date of the event, it is equal to the investigation starting date for positive cases; the date of death for deaths; the coalesce of specimen date and report date for testings (if specimen collected date is unknown, use report date), respectively • covid test: total number of testings (i.e. number of new people tested on the date. indiana residents only) • daily delta tests: the number of most recent (i.e latest report) new testings that are reported into the testing pool. the date of specimen collected is typically earlier than the report date • daily base tests: the number of tests from the last report. records might be removed due to information correctness • covid count: total number of positive cases (i.e. number of patients who started investigation for their positive report) on the date • daily delta cases: the number of most recent(i.e latest report) new positive cases that are reported into the positive case pool. the investigation starting date could be earlier than the report date due to necessary process • daily base cases: the number of positive cases from the last report. records might be removed due to information correctness • covid deaths total: number of deaths on the date • daily delta deaths the number of most recent (i.e latest report) new death cases that are reported into the death case pool. the date of death could be earlier than the report date due to necessary process and confirmation • daily base deaths: the number of deaths from the last report. records might be removed due to information correctness • covid count cumsum: the cumulative number of positive cases as of the report date • covid deaths cumsum: the cumulative number of deaths as of the report date • covid test cumsum: the cumulative number of tests as of the report date a simple r script to obtain the monte-carlo sample from q m (·), bayesian data analysis chapman & hall/crc texts in statistical science estimating the infection fatality rate among symptomatic covid-19 cases in the united states health affairs 39, no. 7. project hopethe people-to-people health foundation coronavirus covid-19 global cases covid-19 positive cases, evidence on the time evolution of the epidemic or an indicator of local testing capabilities? a case study in the united states available modelling death rates due to covid-19: a bayesian approach available online at brazilian modeling of covid-19(bram-cod): a bayesian monte carlo approach for covid-19 spread in a limited data set context available online at medrxiv preprint doi cdc guidelines-cdc activities and initiatives supporting the covid-19 response and the presidents plan for opening america up again -centers for disease control and prevention available online as indiana covid-19 data report-indiana state department of health available online key: cord-319436-mlitd45q authors: brinati, d.; campagner, a.; ferrari, d.; locatelli, m.; banfi, g.; cabitza, f. title: detection of covid-19 infection from routine blood exams with machine learning: a feasibility study date: 2020-04-25 journal: nan doi: 10.1101/2020.04.22.20075143 sha: doc_id: 319436 cord_uid: mlitd45q background the covid-19 pandemia due to the sars-cov-2 coronavirus, in its first 4 months since its outbreak, has to date reached more than 200 countries worldwide with more than 2 million confirmed cases (probably a much higher number of infected), and almost 200,000 deaths. amplification of viral rna by (real time) reverse transcription polymerase chain reaction (rrt-pcr) is the current gold standard test for confirmation of infection, although it presents known shortcomings: long turnaround times (3-4 hours to generate results), potential shortage of reagents, false-negative rates as large as 15-20%, the need for certified laboratories, expensive equipment and trained personnel. thus there is a need for alternative, faster, less expensive and more accessible tests. material and methods we developed two machine learning classification models using hematochemical values from routine blood exams (namely: white blood cells counts, and the platelets, crp, ast, alt, ggt, alp, ldh plasma levels) drawn from 279 patients who, after being admitted to the san raffaele hospital (milan, italy) emergency-room with covid-19 symptoms, were screened with the rrt-pcr test performed on respiratory tract specimens. of these patients, 177 resulted positive, whereas 102 received a negative response. results we have developed two machine learning models, to discriminate between patients who are either positive or negative to the sars-cov-2: their accuracy ranges between 82% and 86%, and sensitivity between 92% e 95%, so comparably well with respect to the gold standard. we also developed an interpretable decision tree model as a simple decision aid for clinician interpreting blood tests (even off-line) for covid-19 suspect cases. discussion this study demonstrated the feasibility and clinical soundness of using blood tests analysis and machine learning as an alternative to rrt-pcr for identifying covid-19 positive patients. this is especially useful in those countries, like developing ones, suffering from shortages of rrt-pcr reagents and specialized laboratories. we made available a web-based tool for clinical reference and evaluation. this tool is available at https://covid19-blood-ml.herokuapp.com. the pandemic disease caused by the sars-cov-2 virus named covid-19 is requiring unprecedented responses of exceptional intensity and scope to more than 200 states around the world, after having infected, in the first 4 months since its outbreak, a number of people between 2 and 20 million with at least 200,000 deaths. to cope with the spread of the covid-19 infection, governments all over the world has taken drastic measures like the quarantine of hundreds of millions of residents worldwide. however, because of the covid-19 symptomatology, which showed a large number of asymptomatics [12] , these efforts are limited by the problem of differentiating between covid-19 positive and negative individuals. thus, tests to identify the sars-cov-2 virus are believed to be crucial to identify positive cases to this infection and thus curb the pandemic. to this aim, the current test of choice is the reverse transcriptase polymerase chain reaction (rt-pcr)-based assays performed in the laboratory on respiratory specimens. taking this as a gold standard, machine learning techniques have been employed to detect covid-19 from lung ct-scans with 90% sensitivity, and high auroc ( 0.95) [25, 18] . although chest cts have been found associated with high sensitivity for the diagnosis of covid-19 [1] , this kind of exam can hardly be employed for screening tasks, for the radiation doses, the relative low number of devices available, and the related operation costs. a similar attempt was recently performed on chest x-rays [4] , which is a low-dose and less expensive test, with promising statistical performance (e.g., sensitivity 97%). however, since almost 60% of chest x-rays taken in patients with confirmed and symptomatic covid19 have been found to be normal [40] , systems based on this exam need to be thoroughly validated in real-world settings [6] . the public health emergency requires an unprecedented global effort to increase testing capacity [29] . the large demand for rrt-pcr tests (also commonly known as nasopharyngeal swab tests) due to the worldwide extension of the virus is highlighting the limitations of this type of diagnosis on a large-scale such as: the long turnaround times (on average over 2 to 3 hours to generate results); the need of certified laboratories; trained personnel; expensive equipment and reagents for which demand can easily overcome supply [26] . for instance in italy, the scarcity of reagents and specialized laboratories forced the government to limit the swab testing to those people who clearly showed symptoms of severe respiratory syndrome, thus leading to a number of infected people and a contagion rate that were largely underestimated [34] . for this reason, and also in light of the predictable wide adoption of mobile apps for contact tracing [14] , which will likely increase the demand for population screening, there is an urgent need for alternative (or complementary) testing methods by which to quickly identify infected covid-19 patients to mitigate virus transmission and guarantee a prompt patients treatment. on a previous work published in the laboratory medicine literature [13] , we showed how simple blood tests might help identifying false positive/negative rrt-pcr tests. this work and the considerations made above strongly motivated us to apply machine learning methods to routine, low-cost 2 blood exams, and to evaluate the feasibility of predictive models in this important task for the massscreening of potential covid-19 infected individuals. in what follows we report this feasibility study in detail. the aim of this work is to develop a predictive model, based on machine learning techniques, to predict the positivity or negativity for covid-19. in the rest of this section we report on the dataset used for model training and on the data analysis pipeline adopted. the dataset used for this study was made available by the irccs ospedale san raffaele 3 and it consisted of 279 cases, randomly extracted from patients admitted to that hospital from the end of february 2020 to mid of march 2020. each case included the patient's age, gender, and values from routine blood tests, as well as the result of the rt-pcr test for covid-19, performed by nasopharyngeal swab. the parameters collected by the blood test are reported in table 1 . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04. 22.20075143 doi: medrxiv preprint the dependent variable "swab" is binary and it is equal to 0 in the absence of covid-19 infection (negative swab test), and it is equal to 1 in the case of covid-19 infection (positive to the swab test). the number of occurrences for the negative and positive class was respectively 102 (37%) and 177 (63%), thus the dataset was slightly imbalanced towards positive cases. figure 1 shows the pairwise correlation of the features used for this study, while figure 2 focuses on variables "age", "wbc ", "crp", "ast " and "lymphocytes". first of all, the categorical feature gender has been transformed into two binary features by one-hot encoding. further, we notice that the dataset was affected by missing values in most of its features (see table 2 ). to address data incompleteness, we performed missing data imputation by means of the multivariate imputation by chained equation (mice) [5] method. mice is a multiple imputation method that works in an iterative fashion: in each imputation round, one feature with missing values is selected and is modeled as a function of all the other features; the estimated values are then used to impute the missing values and re-used in the subsequent imputation rounds. we chose this method because multiple imputation techniques are known to be more robust and better capable to account for uncertainty compared with all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. single imputation ones [33] (as they employ the joint distribution of the available features), and mice in particular can also handle different data types. we developed and compared different classes of machine learning classifiers. in particular, we considered the following classifier models: -decision tree [35] (dt); -extremely randomized trees [16] (et); -k-nearest neighbors [2] (knn); -logistic regression [20] (lr); -naïve bayes [23] (nb); -random forest [21] (rf); -support vector machines [36] (svm). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04.22.20075143 doi: medrxiv preprint we also considered a modification of the random forest algorithm, called three-way random forest classifier [7] (twrf), which allows the model to abstain on instances for which it can express low confidence; in so doing, a twfr achieves higher accuracy on the effectively classified instances at expense of coverage (i.e., the number of instances on which it makes a prediction). we decided to consider also this class of models as they could provide more reliable predictions in a large part of cases, while exposing the uncertainty regarding other cases so as to suggest further (and more expensive) tests on them. from a technical point of view, since random forest is a class of probability scoring classifiers (that is, for each instance the model assigns a probability score for every possible class), the abstention is performed on the basis of two thresholds α, β ∈ [0, 1]: if we denote with 1 the positive class and 0 the negative class, then each instance is classified as positive if score(1) > α and score(1) > score(0), negative if score(0) > β and score(0) > score (1) and, otherwise, the model abstains. in these models the performance is usually evaluated only on the non-abstained instances [15] , and the coverage is a further performance element to be considered. the models mentioned above have been trained, and evaluated, through a nested cross validation [19, 9] procedure. this procedure allows for an unbiased generalization error estimation while the hyperparameter search (including feature selection) is performed: an inner cross-validation loop is executed to find the optimal hyperparameters via grid search and an outer loop evaluates the model performance on five folds. models were evaluated in terms of accuracy, balanced accuracy 4 , positive predictive value (ppv) 5 , sensitivity, specificity and, except for the three-way random forest, the area under the roc curve (auc). after discussing this with the clinicians involved in this study, we considered accuracy and sensitivity to be the main quality metrics, since false negatives (that is, patients positive to covid-10 which are, however, classified as negative, and possibly let go home) are more harmful than false positives in this screening task. 4 we recall that balanced accuracy is defined as the average of sensitivity and specificity. if accuracy and balanced accuracy significantly differ, the data could be interpreted as unbalanced with respect to class prevalence. 5 we recall here that ppv represents the probability that subjects with a positive screening test truly have the disease. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. tables 3 and 4 show the 95% confidence intervals of, respectively, average accuracy and average balanced accuracy (that is, the average of sensitivity and specificity) of the models (on the nested cross-validation) trained on the two best-performing sets of features: the first one, dataset a, includes all the variables, while the second one, dataset b, excludes the "gender " variable, as this was found of negligible predictive value. figure 3 shows the performance of the traditional models (i.e., the twrf model was excluded) on the nested cross-validation. to further validate the above findings, the entire dataset has been splitted into training and test/validation sets, respectively the 80% and the 20% of the total instances. the best performing model, i.e. the random forest classifier, trained on dataset b, achieved the following results on the test/validation set: accuracy = 82% , sensitivity = 92%, ppv = 83%, specificity = 65%, auc = 84%. figures 4 and 5 show the performance of this model in the roc and precision/recall space, respectively. the optimal hyperparameters found are shown in table 5 . similarly, for the best three-way random forest classifier on the validation set we observed: accuracy = 86%, sensitivity = 95%, ppv = 86%, specificity = 75%, coverage = 70% (that is, for 30% of the validation instances the model abstained). the feature importance assessed for the the best performing model (random forest on dataset b), are shown in figure 6 . in order to provide an interpretable overview (in the sense of explainable ai [17] ) of the predictive models that we developed, we also developed a decision all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. tree model, which is shown in figure 7 . although the depicted decision tree is associated with a lower discriminative performance than the two former (inscrutable) models, such a tree can be used as a simple decision aid by clinicians interested in the use of blood values to assess covid-19 suspect cases. we have developed two machine learning models to discriminate between patients who are either positive or negative to the sars-cov-2, which is the coronavirus causing the covid-19 pandemia. in this task, patients are represented in terms of few basic demographic characteristics (gender, age) and a small array of routine blood tests, chosen for their convenience, low cost and because they are usually available within 30 minutes from the blood draw in regular emergency department. the ground truth was established through rt-pcr swab tests. we presented the best traditional model, as it is common practice, and a three-way model, which guarantees best sensitivity and positive predictive value: the former is the proportion of infected (and contagious) people who will have a positive result and therefore it is useful to clinicians when deciding which test to use. on the other hand, ppv is useful for patients as it tells the odds of one having covid-19 if they have a positive result. the performance achieved by these two best models (sensitivity between 92% and 95%, accuracy between 82% and 86%) provides proof that this kind of data, and computational models, can be used to discriminate among potential covid-19 all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. infectious patients with sufficient reliability, and similar sensitivity to the current gold standard. this is the most important contribution of our study. also from the clinical point of view, the feature selection was considered valid by the clinicians involved. indeed, the specialist literature has found that covid-19 positivity is associated with lymphopenia (that is, abnormally low level of white blood cells in the blood), damage to liver and muscle tissue [42, 39] , and significantly increased c-reactive protein (crp) levels [10] . in [27] a comprehensive all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. list of the most frequent abnormalities in covid-19 patients has been reported: among the 14 conditions considered, they report increased aspartate aminotransferase (ast), decreased lymphocyte count (wbc), increased lactate dehydrogenase (ldh), increased c-reactive protein (crp), increased white blood cell count (wbc) and increased alanine aminotransferase (alt). these parameters are also the most predictive features identified by the best classifier (random forest), all together with the age attribute. also other studies confirm the relevance of these features and their association with the covid-19 positivity [8, 30, 32, 44] , compared to other kinds of pneumonia [43] . this also gives confirmation that our models ground on clinically relevant features and that most of these values can be extracted from routine blood exams. the interpretable decision tree model provides a further confirmation (see figure 7 ) of the soundness of the approach: the clinicians (ml, gb) and the biochemist (df) involved in this study found reasonable that the ast would be the first parameter to consider (i.e., mirrored by the fact that ast was the root of the decision tree) and that it was found to be the most important predictive feature. indeed, values of ast below 25 are good predictors of covid-19 positivity (accuracy = ppv = 76%), while values below 25 are a good predictor of covid-19 negativity (accuracy = negative predictive value = 83%). similar observations can also be made about crp, lymphocytes and general wbc counts. no statistically significant difference was found between the accuracy and the balanced accuracy of the models (as mirrored by the overlap of the 95% confidence intervals), as a sign that the dataset was not significantly unbalanced. moreover, we can notice that the best performing ml classifier (random forest) exhibited a very high sensitivity (∼ 90%) but, in comparison, a limited specificity of only 65%. that gives the main motivation for the three-way classifier: this model offers a trade-off between increased specificity (a 10% increment compared all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04.22.20075143 doi: medrxiv preprint fig. 7 an interpretable decision tree, developed in order to support the interpretation of the predictions from the other models. color gradients denote predictivity for either classes (shades of blue correspond to covid-19 negativity, shades of orange to positivity). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04. 22.20075143 doi: medrxiv preprint with the best traditional ml model) and reduced coverage, as the three-way approach abstains on uncertain instances (i.e., the cases that cannot be classified with high confidence neither as positive nor negative ). this means that the model yields more robust and reliable prediction for the classified instances (as it is mirrored by the increase in all of the performance measures), while for the other ones it is anyway useful in suggesting further tests, e.g., by either a pcr-rna swab test or a chest x-ray. in regard to the specificity exhibited by our models, we can further notice that even while these values are relatively low compared with other tests (which are more specific but slower and less accessible), this may not be too much of a limitation as there is a significant disparity between the costs of false positives and false negatives and in fact our models favors sensitivity (thus, they avoid false negatives). further, the high ppv (> 80%) of our models suggest that the large majority of cases identified as positives by our models would likely be covid-19 positive cases. that said, the study presents two main limitations: the first, and more obvious one, regards the relatively low number of cases considered. this was tackled by performing nested cross-validation in order to control for bias [38] , and by employing models that are known to be effective also with moderately sized samples [3, 31, 37] . nonetheless, further research should be aimed at confirming our findings, by integrating hematochemical data from multiple centers and increasing the number of the cases considered. the second limitation may be less obvious, as it regards the reliability of the ground truth itself. although this was built by means of the current gold standard for covid-19 detection, i.e., the rrt-pcr test, a recent study observed that the accuracy of this test may be highly affected by problems like inadequate procedures for collection, handling, transport and storage of the swabs, sample contamination, and presence of interfering substances, among the others [28] . as a result, some recent studies have reported up to 20% false-negative results for the rrt-pcr test [41, 24, 22] , and a recent systematic review reported an average sensitivity of 92% and cautioned that "up to 29% of patients could have an initial rt-pcr false-negative result". thus, contrary to common belief and some preliminary study (e.g., [11] ), the accuracy of this test could be less than optimal, and this could have affected the reliability of the ground truth also in this study (as in any other using this test for ground truthing, unless cases are annotated after multiple tests. however, besides being a limitation, this is also a further motivation to pursue alternative ways to perform the diagnosis of sars-cov-2 infection, such as our methods are. future work will be devoted to the inclusion of more hematochemical parameters, including those from arterial blood gas assays (abg), to evaluate their predictiveness with respect to covid-19 positiveness, and the inclusion of cases whose probability to be covid-positive is almost 100%, as they resulted positive to two or more swabs or to serologic antibody tests. this would allow to associate a higher weight with misidentifying those cases, so as, we conjecture, improve the sensitivity further. moreover, we want to investigate the interpretability of our models further, by both having more clinicians validate the current decision tree, and possibly construct a more accurate one, so that clinicians can use it as a convenient decision aid to interpret blood tests in regard to covid-19 suspect cases (even off-line). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04. 22.20075143 doi: medrxiv preprint finally, this was conceived as a feasibility study for an alternative covid-19 test on the basis of hematochimical values. in virtue of this ambitious goal, the success of this study does not exempt us from pursuing a real-world, ecological validation of the models [6] . to this aim, we deployed an online web-based tool 6 by which clinicians can test the model, by feeding it with clinical values, and considering the sensibleness and usefulness of the indications provided back by the model. after this successful feasibility study, we will conceive proper external validation tasks and undertake an ecological validation to assess the cost-effectiveness and utility of these models for the screening of covid-19 infection in all the real-world settings (e.g., hospitals, workplaces) where routine blood tests are a viable test of choice. not applicable not applicable research involving human subjects complied with all relevant national regulations, institutional policies and is in accordance with the tenets of the helsinki declaration (as revised in 2013), and was approved by the authors' institutional review board on the 20th of april. individuals signed an informed consent authorizing the use of their anonymously collected data for retrospective observational studies (article 9.2.j; eu general data protection regulation 2016/679 [gdpr]), according to the irccs san raffaele hospital policy (iog075/2016). the developed web tool is available at the following address: https://covid19-blood-ml. herokuapp.com/ the complete dataset will be made available on the zenodo platform as soon as the work gets accepted for publication. 6 the tool is available at the following address: https://covid19-blood-ml.herokuapp.com/. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april 25, 2020. . https://doi.org/10.1101/2020.04. 22.20075143 doi: medrxiv preprint correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases an introduction to kernel and nearest-neighbor nonparametric regression model selection for support vector machines: advantages and disadvantages of the machine learning theory covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks mice: multivariate imputation by chained equations in r the proof of the pudding: in praise of a culture of real-world validation for medical artificial intelligence the three-way-in and three-wayout framework to treat and exploit ambiguity in data di napoli r (2020) features, evaluation and treatment coronavirus (covid-19) on over-fitting in model selection and subsequent selection bias in performance evaluation epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr covid-19: identifying and isolating asymptomatic people helped eliminate virus in italian village routine blood tests as a potential diagnostic tool for covid-19 quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing. science all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted extremely randomized trees explainable ai: the new 42? coronavirus detection and analysis on chest ct with deep learning the elements of statistical learning: data mining, inference, and prediction applied logistic regression random decision forest insufficient sensitivity of rna dependent rna polymerase gene of sars-cov-2 viral genome as confirmatory test using korean covid-19 cases naive (bayes) at forty: the independence assumption in information retrieval false-negative results of real-time reverse-transcriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deep-learning-based ct diagnosis and insights from two cases artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct development and clinical application of a rapid igm-igg combined antibody test for sars-cov-2 infection diagnosis laboratory abnormalities in patients with covid-2019 infection potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (covid-19) diagnostic testing for severe acute respiratory syndrome-related coronavirus-2: a narrative review time course of lung changes on chest ct during recovery from 2019 novel coronavirus (covid-19) pneumonia random forest for bioinformatics dysregulation of immune response in patients with covid-19 in wuhan multiple imputation for nonresponse in surveys as covid-19 cases, deaths and fatality rates surge in italy, underlying causes require investigation a survey of decision tree classifier methodology learning with kernels: support vector machines, regularization, optimization, and beyond stabilizing classifiers for very small sample sizes bias in error estimation when using cross-validation for model selection clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan, china chest x-ray findings in 636 ambulatory patients with covid-19 presenting to an urgent care center: a normal chest x-ray is no guarantee chest ct for typical 2019-ncov pneumonia: relationship to negative rt-pcr testing liver injury in covid-19: management and challenges a comparative study on the clinical features of covid-19 pneumonia to other pneumonias functional exhaustion of antiviral lymphocytes in covid-19 patients the complete code will be made available on the zenodo platform as soon as the work gets accepted for publication. key: cord-308021-cnf4xljc authors: kohns vasconcelos, malte; renk, hanna; popielska, jolanta; nyirenda nyang’wa, maggie; burokiene, sigita; gkentzi, despoina; gowin, ewelina; donà, daniele; villanueva-medina, sara; riordan, andrew; hufnagel, markus; eisen, sarah; da dalt, liviana; giaquinto, carlo; bielicki, julia a. title: sars-cov-2 testing and infection control strategies in european paediatric emergency departments during the first wave of the pandemic date: 2020-10-13 journal: eur j pediatr doi: 10.1007/s00431-020-03843-w sha: doc_id: 308021 cord_uid: cnf4xljc between february and may 2020, during the first wave of the covid-19 pandemic, paediatric emergency departments in 12 european countries were prospectively surveyed on their implementation of sars-cov-2 disease (covid-19) testing and infection control strategies. all participating departments (23) implemented standardised case definitions, testing guidelines, early triage and infection control strategies early in the outbreak. patient testing criteria initially focused on suspect cases and later began to include screening, mainly for hospital admissions. long turnaround times for test results likely put additional strain on healthcare resources. conclusion: shortening turnaround times for sars-cov-2 tests should be a priority. specific paediatric testing criteria are needed. european reference laboratories established widespread capacities for testing for severe acute respiratory syndrome coronavirus 2 (sars-cov-2) within a matter of days after a diagnostic test was made publicly available [1, 2] . world health organization (who) and public health authorities in europe issued case definitions, testing and infection control recommendations for covid-19 in january 2020. the current understanding of covid-19 in paediatric patients is that children more often have mild disease compared to adults [3, 4] . current knowledge suggests that the peak of infectiousness of sars-cov-2 infection occurs a few days before and after the onset of symptoms, meaning that presymptomatic people are able to transmit the infection [5] . transmission unobserved by public health authorities occurs frequently, with at least one-third of cases having been undetected during the early epidemic [6, 7] . the aim of this study was to describe the implementation of testing and infection control strategies and their evolution in paediatric emergency departments in europe. from mid-february to the first week of may 2020, we surveyed all european paediatric collaboration sites within the penta id paediatric research network weekly for developments in their testing and infection control strategies. portable document format (pdf) survey forms were sent out by email to contact officers at 78 paediatric departments across europe. the survey form is available as an online supplement to this article. completed and signed survey forms were handed in by return email. missing weekly replies were imputed as last observation carried forward. paediatric departments of 23 mostly tertiary care hospitals in 12 european countries (belgium, germany, france, italy, poland, portugal, the uk, the netherlands, greece, spain, lithuania and switzerland) participated in the surveys (response rate 29%). multiple sites participated in the uk (5, 3 tertiary and 2 secondary level), germany (5, 4 tertiary and 1 secondary level), spain (3, 1 network of sites representing the madrid region, 1 tertiary and 1 secondary level) and poland (2, both tertiary level). in each of the remaining countries (belgium, france, italy, portugal, the netherlands, greece, lithuania and switzerland), one site participated. by the end of february 2020, all hospitals had implemented standardised case definitions for suspected covid-19 cases, with the majority (16 out of 21 participating at that point in time, 76%) following national government or public health authority guidelines and three directly following who guidelines. standardised definitions of suspected cases showed high similarity between sites. all definitions consisted of a clinical component of acute respiratory infection and an epidemiological component of possible exposure to the virus. the latter changed between february and april: initially, definitions at all sites required contact within 14 days with a confirmed case or travel to specified geographic areas; in time, this changed to staying in any area with ongoing community transmission. twenty participating sites used suspected case definitions from the beginning that did not exclude patients on detection of an alternative pathogen. two of the spanish sites and one site in poland initially excluded patients with confirmed alternative diagnoses of respiratory infections from being suspect cases for covid-19. this changed by april at all three sites, so that afterwards detection of another pathogen that could explain the respiratory symptoms no longer excluded a patient from being a suspect case and from undergoing sars-cov-2 testing. ten sites (43%) reported that they strictly only tested patients for sars-cov-2 if they matched the definition of a suspected case. another 12 (52%) had a policy to only test patients matching the case definition but reported that exceptions occurred regularly. until april, no site reported that their decision to test patients was based on separate local guidelines. in april, several german and uk sites started broader testing, first with testing of patients admitted to oncology or intensive care units and from the end of april with routine screening of all admitted patients. table 1 shows example developments of testing guidelines at four participating children's emergency departments. in terms of sampling site, 52% of the participating hospitals restricted testing to upper respiratory samples, while 48% obtained upper and lower respiratory samples if the latter could be obtained. although discharge and infection control strategies after admission relied heavily on test results, by the end of the survey period only 9 hospitals (39%) received test results multiple times daily, while another 7 (30%) had waiting times of more than 24 h before test results would be back. at all but two sites, where faster turnaround times could be achieved, expected time to test results did not change over the survey period. by the beginning of march, 9 (43%) were discharging patients with pending test results when they were clinically stable. three of the seven hospitals where results took more than 24 h to come back would not discharge patients while test results were pending, regardless of whether the patients were clinically stable. by mid-march all hospitals were discharging patients with pending test results. until the beginning of march, sites saw up to 30 suspected cases per week. while only one site (in germany) had a child positive for sars-cov-2, 67% of sites had already provided care to suspected cases of sars-cov-2. at the different sites, the highest number of patients tested per week differed widely between 7 per week for a secondary care hospital in western germany and 112 for a specialised tertiary care children's hospital in north england. community test centres for covid-19 opened across germany in early to mid-march and in the uk only in april. opening of community test centres in proximity to the surveyed sites coincided with varying stages of development of overall case numbers in the areas. therefore, our data allow no firm conclusion on whether opening of community test centres alleviated patient pressure on paediatric emergency departments. most departments used early clinical triage at the emergency department to separate suspected cases from other patients from the beginning of the survey period. only one site (in greece) initially did not triage but changed this in the last week of february. two uk hospitals had plans to refer patients with positive test results to other hospitals for admission, and eight hospitals were planning to place multiple patients tested positive in cohort isolation if limited capacity for individual isolation occurred. at most hospitals, staff used respirators, i.e. filtering face piece (ffp) masks, when treating suspected cases in the emergency department. in contrast, staff at four uk sites, in the netherlands and at one site in poland used surgical masks only. this did not change over the survey period. in the early stages of the covid-19 pandemic, paediatric emergency departments implemented standardised case definitions, testing guidelines and infection control measures rapidly. while this is an important and reassuring finding regarding the preparedness of paediatric emergency care in europe, it may be a limitation of this survey that our sample of hospitals was biased towards tertiary care hospitals with strong international research links. although infection control strategies and even discharge of patients relied heavily on receiving sars-cov-2 test results, most hospitals only received these after considerable delay, often more than 24 h. shortening turnaround times for tests should be a priority. prior to discharge, infection control measures on uninfected patients awaiting test results place a huge burden on emergency care resources. most departments rightly responded by discharging patients while test results were pending. this does not, however, mitigate against the public health impact of delayed result reporting on efficient contact tracing and subsequent isolation or quarantine of contacts in the community. the guidelines for testing focused on two aims: establishing aetiology in children with symptoms of ari and excluding infection for inpatient infection control purposes. this was a necessary restriction while numbers of new infections were high and capacities for testing were limited. we believe that in the current situation with vastly expanded laboratory capacities, a broader approach with more testing of mildly symptomatic patients or asymptomatic contacts may be warranted. to allocate testing resources responsibly, we believe that specific testing criteria for the paediatric population are needed because both the individual risk of children to suffer from severe disease and to sustain transmission in the community differ from that in adults [10, 11] . children and adolescents suffer serious consequences from school closures and allowing schools to re-open has positive social, psychological and economic implications [12] . benefits of broader access to testing may include the ability to detect outbreaks in day care facilities and schools earlier in order to limit spread of infections while maintaining as much normality as possible for children and adolescents. authors' contributions mkv, cg and jab designed the study, all authors commented on the design; cg provided resources for the survey; mkv received and analysed the survey forms; mkv, hr, jp, mnn, dd and jab wrote the manuscript; sb, eg and ar revised the manuscript; all authors commented on the manuscript and approved the final version. funding open access funding enabled and organized by projekt deal. conflict of interest the authors declare that they have no conflict of interest. ethical approval the survey was considered a clinical audit. this article does not contain any studies with human participants or animals performed by any of the authors. open access this article is licensed under a creative commons attribution 4.0 international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. laboratory readiness and response for novel coronavirus (2019-ncov) in expert laboratories in 30 eu/ eea countries coronavirus infections in children including covid-19: an overview of the epidemiology, clinical features, diagnosis, treatment and prevention options in children systematic review of covid-19 in children shows milder cases and a better prognosis than adults temporal dynamics in viral shedding and transmissibility of covid-19 genetic structure of sars-cov-2 reflects clonal superspreading and multiple independent introduction events modelling the covid-19 epidemic and implementation of population-wide interventions in italy coronavirus country profiles. www. ourworldindata.org/coronavirus. accessed aktueller lage-/situationsbericht des rki zu covid-19 transmission of sars-cov-2 by children covid-19 in children and adolescents in europe: a multinational, multicentre cohort study coronakrise: kinder haben das recht auf bildung acknowledgements the authors would like to thank the following people for the kind contribution of survey data: key: cord-281495-beb164oy authors: charpentier, charlotte; ichou, houria; damond, florence; bouvet, elisabeth; chaix, marie-laure; ferré, valentine; delaugerre, constance; mahjoub, nadia; larrouy, lucile; le hingrat, quentin; visseaux, benoit; mackiewicz, vincent; descamps, diane; fidouh-houhou, nadhira title: performance evaluation of two sars-cov-2 igg/igm rapid tests (covid-presto and ng-test) and one igg automated immunoassay (abbott) date: 2020-09-03 journal: j clin virol doi: 10.1016/j.jcv.2020.104618 sha: doc_id: 281495 cord_uid: beb164oy the aim of this study was to assess the analytical performances, sensitivity and specificity, of two rapid tests (covidpresto® test rapid covid-19 igg/igm and ng-test® igm-igg covid-19) and one automated immunoassay (abbott sars-cov-2 igg) for detecting antisars-cov-2 antibodies. this study was performed with: (i) a positive panel constituted of 88 sars-cov-2 specimens collected from patients with a positive sars-cov-2 rt-pcr, and (ii) a negative panel of 120 serum samples, all collected before november 2019, including 64 samples with a cross-reactivity panel. sensitivity of covid-presto® test for igm and igg was 78.4% and 92.0%, respectively. sensitivity of ng-test® for igm and igg was 96.6% and 94.9%, respectively. sensitivity of abbott igg assay was 96.5% showing an excellent agreement with the two rapid tests (κ = 0.947 and κ = 0.936 for ngtest ® and covid-presto® test, respectively). an excellent agreement was also observed between the two rapid tests (κ = 0.937). specificity for igm was 100% and 86.5% for covid-presto® test and ng-test®, respectively. specificity for igg was 92.0%, 94.9% and 96.5% for covid-presto®, ngtest ®, and abbott, respectively. most of the false positive results observed with ng-test® resulted from samples containing malarial antibodies. in conclusion, performances of these 2 rapid tests are very good and comparable to those obtained with automated immunoassay, except for igm specificity with the ng-test®. thus, isolated igm should be cautiously interpreted due to the possible false-positive reactions with this test. finally, before their large use, the rapid tests must be reliably evaluated with adequate and large panel including early seroconversion and possible cross-reactive samples sars-cov-2 antibodies. this study was performed with: (i) a positive panel constituted of 88 sars-cov-2 specimens collected from patients with a positive sars-cov-2 rt-pcr, , in particular patients presenting strong covid-19 suspicions with negative pcr. serological tests also make it possible to catch up later with undiagnosed people at time of active infection, since antibodies have been found in almost all people who have been in contact with sars-cov-2 within a variable period depending on the severity of the infection [1, 2] . furthermore, studies showed that the kinetics of appearance of igm and igg were relatively close [3] . two types of tests are available to detect anti-sars-cov-2 antibodies: rapid lateral flow tests and automated immunoassays. several studies have assessed analytical performances of the automated immunoassays [4] [5] [6] [7] . on the other hand, although a very large number of rapid tests have been developed, few of them have been reliably evaluated with a suitable serum panel. however, this is very important to have data about the efficacy of these rapid tests to reliably detect anti-sars-cov-2 antibodies, since their increasing use in the world. the aim of this study was to assess the analytical performances (sensitivity and specificity) and agreement of two rapid tests and one automated immunoassay for detecting antibodies against sars-cov-2. in addition, we assessed five samples containing autoantibodies (four rheumatoid factor and one systemic lupus erythematosus). we also assessed the serum of 54 health-care workers who presented clinical symptoms during the epidemic period for whom sars-cov-2 rt-pcr was negative or not carried out. we evaluated two lateral flow tests: covid-presto ® test rapid covid-19 igg/igm (aaz, boulogne-billancourt, france) and ng-test ® igm-igg covid-19 (ng biotech, guipry, france) according to the manufacturer's instructions. five and ten microliters of serum for covid-presto ® test and ng-test ® , respectively, were added and results were read and interpreted 10 minutes after depositing serum. abbott sars-cov-2 igg kit (chemiluminescent microparticle immunoassay) (abbott, il, usa) was performed according to the manufacturer's instructions. the assay cut-off is an index of 1.40 and the assigned grey zone is comprised between 1.12 and 1.68. all statistical analyses were performed using excel. to assess sensitivity, rt-pcr results were chosen as gold standard. cohen kappa statistics and absolute agreement were calculated to evaluate the agreement between the different tests. all participants were not opposed to the collection of their data. sensitivity of covid-presto ® test was assessed on 88 samples collected between day 4 and day 42 after onset of symptoms and sensitivity of the ng-test ® was assessed on a subgroup of 59 samples among the 88 samples tested with covid-presto ® test, collected between days 7 and 28 after onset of symptoms ( table 1) . sensitivity of covid-presto ® test for igm was 67% (n=12/18), 88% (n=29/33) and 76% (n=28/37) for samples collected between days 4 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. sensitivity of covid-presto ® test for igg was 72% (n=13/18), 94% (n=31/33) and 100% (n=37/37) for samples collected between days 4 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. when combining igm and igg, sensitivity of covid-presto ® test was 83% (n=15/18), 97% (n=32/33) and 100% (n=37/37) for samples collected between days 4 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. sensitivity of ng-test ® for igm was 83% (n=5/6), 100% (n=22/22) and 97% (n=30/31) for samples collected between days 7 and 9 after, between days 10 and 14, and after 14 days after j o u r n a l p r e -p r o o f onset of symptoms, respectively. sensitivity of ng-test ® test for igg was 83% (n=5/6), 96% (n=21/22) and 97% (n=30/31) for samples collected between days 7 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. when combining igm and igg, sensitivity of ng-test ® test was 83% (n=5/6), 100% (n=22/22) and 97% (n=30/31) for samples collected between days 7 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. among the 59 serum samples of this pcr positive panel tested by the two rapid tests, 57 were compared with abbott sars-cov-2 igg automated immunoassay. sensitivity of abbott igg test was 67% (n=4/6), 100% (n=22/22) and 100% (n=29/29) for samples collected between days 7 and 9, between days 10 and 14, and after 14 days after onset of symptoms, respectively. agreement between abbott assay and rapid tests (igm/igg combined) was of 96.5% (n=55/57). in one case, the two rapid tests detected igg that were not detected by abbott (index=0.94), this sample was collected between days 7 and 9 after symptoms onset. for the second case, igg were detected in the greyzone of abbott (index=1.45) but not by ng-test ® . this latter sample was collected between days 10 and 14 after symptoms onset and igm were positive with the two rapid tests. specificity of covid-presto ® test was assessed on 120 samples described in the methods section. specificity of ng-test ® and abbott assay was assessed on a subgroup of 52 samples among the 120 samples tested with covid-presto ® test ( table 1) . npv was 97.5%, 95.9% and 94.3% for covid-presto ® , ng-test ® and abbott, respectively. in the present study, we evaluated two different lateral flow tests (covid-presto ® and ng-test ® ) and compared their performances to that of the automated abbott immunoassay using the same samples panel. sensitivity has been assessed using a panel of 88 serum samples of covid-19-infected patients (confirmed with a positive pcr), serum was collected between day 4 and day 42 after symptoms onset. sensitivity for igm, among the samples collected before day 9 after symptoms onset, was 67% and 83% for covid-presto ® test and ng-test ® , respectively. in the recent study of nicol et al., they found sensitivity of ng-test for igm of 43.8% for the samples collected before day 7 after symptoms onset and of 81.8% among all samples [5] . the excellent sensitivity of covid-presto ® test observed in our study confirmed the findings of the prazuck et al. study showing 100% of sensitivity in samples collected more than 15 days after symptoms [8] . among some samples collected before day 10 after symptoms onset, a simultaneous detection of igm and igg antibodies has been detected. these findings are in line with the antibodies kinetics described for igm and igg also using lateral flow rapids, as previously described with other techniques [3] . in the present study for covid-presto ® test, it allowed to increase the sensitivity from 67% when only igm are taken into account to 83% when both igm and igg are taken into account, highlighting the important added value to interpret the rapid tests by combining igm and igg antibodies. sensitivity for igg in samples collected later than 10 days after symptoms onset was excellent with the different tests being equal to 97.1%, 96.2% and 100% for covid-presto ® , ng-test ® , and abbott, respectively. thus, both rapid tests showed an excellent sensitivity for igg with a very good agreement with abbott. a previous study assessing abbott test performance j o u r n a l p r e -p r o o f showed sensitivity of 100% for igg for samples collected after 15 days after symptoms onset and of 69% for samples collected between 9 and 14 days after symptoms onset [6] . in this latter study, results sensitivity for igg were similar using ng-test ® [6] . in another study, igg sensitivity of abbott test was 91.8% for patients hospitalized 15 days after symptoms onset and 95.7% for patients non-hospitalized 20 days after symptoms onset. a limitation of our study could be that most of the patients of the positive panel presented severe infections, since 74% of them were hospitalized in infectious disease unit or in intensive care. interestingly, among the 14 out-patients, samples were collected for 9 of them 10 days after symptoms onset, showing positive igm and/or igg in seven cases with covid-presto ® test. insufficient quantity of serum for these patients was available to also test with ng-test ® and abbott. previous studies have reported that the kinetics and intensity of immune response could differ depending on the disease severity [1, 2] , thus it will be needed to also evaluate rapid tests in mild and pauci-symptomatic patients. another limitation is the difference in the number of tested samples for the early panel (serum samples collected before 9 days after symptoms onset) between the two rapid tests that which can bias the comparison between these tests for this group. a limitation is that we make this evaluation from serum samples and not from capillary blood specimens. regarding specificity evaluation, a crucial point for rapid tests, we used a large panel with 120 pre-endemic samples including 64 representatives of different profiles that can generate possible cross-reactivity. in our study, we showed an excellent specificity, above 96% in all cases and equal to 100% for igm with covid-presto ® test. the excellent specificity of covid-presto ® test was also observed in the study of prazuck et al. [8] . in our study, the only issue regarding specificity is for igm with ng-test ® , since specificity is only of 86.5%. however, this low specificity is mainly due to cross-reactivity with sera containing reactivity malarial antibodies. in the study of nicol et al. igm specificity with ng-test ® was 95.3% [5] , higher j o u r n a l p r e -p r o o f than in our study, however their negative panel contained no serum with malaria antibodies. regarding automated immunoassay, we showed a very good specificity of 96.2% for igg with abbott, confirming previous results of 99.3%, 99.6% and 100% [9] . serum samples containing malarial antibodies are absent or underrepresented in the negative panel of the other studies, although they are known to generate possible cross reactivity. this is very important to include it in the negative panel, since this is a differential diagnosis in patients returning from malaria endemic region with flu-like symptoms. overall, in our study, we observed a very good ppv and npv for both rapid tests. in conclusion, analytical performances for detection of anti-sars-cov-2 igg antibodies by two lateral flow rapid tests are very good and quite comparable to those obtained with automated immunoassay. however, serological tests should be used after day 10 following symptoms onset. before this, rt-pcr is the gold standard test for covid-19 diagnosis. the interpretation by combining igm and igg increased sensitivity of rapid tests. the presence of isolated igm should be cautiously interpreted due to the possible false-positive reactions. finally, the rapid tests must be reliably evaluated with adequate and large panels including early seroconversion and possible cross-reactive samples, before their large use and particular interest in low-resource settings. antibody responses to sars-cov-2 in patients with covid-19 different longitudinal patterns of nucleic acid and serology testing results based on disease severity of covid-19 patients interpreting diagnostic tests for sars-cov-2 inmicovid-19 laboratory team, performance evaluation of abbott architect sars-cov-2 igg immunoassay in comparison with indirect immunofluorescence and virus microneutralization test assessment of sars-cov-2 serological tests for the diagnosis of covid-19 through the evaluation of three immunoassays: two automated immunoassays (euroimmun and abbott) and one rapid lateral flow immunoassay (ng biotech) immunoassays in comparison with microneutralisation clinical evaluation of serological igg antibody response on the abbott architect sars-cov-2 infection evaluation of performance of two sars-cov-2 rapid whole-blood finger-stick igm-igg combined antibody tests saint-louis core (covid research) group, evaluation of covid-19 igg/igm rapid test from orient gene biotech key: cord-017359-zr0bo9el authors: pfannschmidt, karlson; hüllermeier, eyke; held, susanne; neiger, reto title: evaluating tests in medical diagnosis: combining machine learning with game-theoretical concepts date: 2016-05-10 journal: information processing and management of uncertainty in knowledge-based systems doi: 10.1007/978-3-319-40596-4_38 sha: doc_id: 17359 cord_uid: zr0bo9el in medical diagnosis, information about the health state of a patient can often be obtained through different tests, which may perhaps be combined into an overall decision rule. practically, this leads to several important questions. for example, which test or which subset of tests should be selected, taking into account the effectiveness of individual tests, synergies and redundancies between them, as well as their cost. how to produce an optimal decision rule on the basis of the data given, which typically consists of test results for patients with or without confirmed health condition. to address questions of this kind, we develop an approach that combines (semi-supervised) machine learning methodology with concepts from (cooperative) game theory. roughly speaking, while the former is responsible for optimally combining single tests into decision rules, the latter is used to judge the influence and importance of individual tests as well as the interaction between them. our approach is motivated and illustrated by a concrete case study in veterinary medicine, namely the diagnosis of a disease in cats called feline infectious peritonitis. different types of tests, such as measuring serum antibody concentrations, are commonly used in medical diagnostics in order to reveal the health condition of an individual. the effectiveness of a single test is typically determined by correlating the test outcome with the true condition. moreover, classical statistical hypothesis testing can be used to compare different test procedures in terms of their effectiveness. in this paper, we tackle the problem of evaluating or selecting a test procedure from a slightly different perspective using methods of (semi-)supervised machine learning. roughly speaking, the idea is that, by learning a model in which various candidate tests play the role of predictor variables, information about the usefulness of individual tests as well as their combination is provided by properties of that model. an approach of that kind has at least two important advantages: -first, it not only allows for judging the usefulness of single tests but also of combined tests, i.e., the combination of different tests into one overall (diagnostic) decision rule. thus, it informs about possible synergies (as well as redundancies) between individual tests and the potential to improve diagnostic accuracy thanks to a suitable combination of these tests. -second, going beyond the standard setting of supervised learning, a machine learning approach suggests various ways of improving the selection of tests by taking advantage of additional sources of information. an important special case is the use of semi-supervised learning to exploit "unlabeled" data coming from individuals for which tests have been made but the true health condition is unknown. this situation is highly relevant in medical practice, because tests can often be conducted quite easily, whereas determining the true health condition is very difficult or expensive. our approach is motivated by a concrete case study in veterinary medicine, namely the diagnosis of a disease in cats called feline infectious peritonitis (fip). complete certainty about whether or not a cat is fip-positive, and eventually will die from the disease, requires a necropsy [1, 10] ; unfortunately, no test performed in a cat while still alive has a 100 % sensitivity or 100 % specificity. consequently, while different tests can be applied to cats quite easily, "labeling" a cat in the sense of supervised learning is expensive, difficult and time-consuming. in addition to the use of (semi-supervised) machine learning methodology in medical diagnosis, we propose a game-theoretical approach for measuring the usefulness of individual tests as well as model-based combinations of such tests. roughly speaking, the idea is to consider a combination of tests as a "coalition" in the sense of cooperative game theory, and the "payoff" of the coalition as the diagnostic accuracy achieved by the test combination. this approach will be detailed in the next section, prior to elaborating more closely on our case study in sect. 3, presenting experimental results in sect. 4 and concluding the paper in sect. 5. suppose a set of tests x 1 , . . . , x k to be available. we consider the outcome of each test as a random variable x k : ω −→ r, where ω is the population of individuals to which the test can be applied. jointly, the k tests thus define a random vector the health state is a dichotomous variable y ∈ y = {−1, +1}. typically, each test is a positive indicator in the sense that p(y = +1 | x k ) increases with x k , i.e., the larger x k , the larger the probability of the positive class. using machine learning terminology, each test corresponds to a feature or predictor variable. moreover, x is the instance space, each x ∈ x is an instance, and y is the (binary) output or response variable. if a diagnostic decisionŷ ∈ {−1, +1} is not necessarily based on a single test x k alone, but possibly uses a combination of several tests, a first question concerns the way in which such a combination is realized. from a machine learning point of view, this question is related to the choice of an underlying models class (hypothesis space) where j ≤ k is the number of tests included in the decision rule. formally, we specify a combined test in terms of the subset the model class h could be defined, for example, as the class of linear threshold functions of the form where w 1 , . . . , w j , t ∈ r + and · maps true predicates to +1 and false predicates to −1; moreover, σ(j) is the j-th test included in the combination, i.e., i.e., a decision rule that minimizes the loss in expectation. we denote the expected loss of this model, which corresponds to the bayes predictors in h |a| , by e * (a) = l y, h * a (x) d p(x, y).(2) in practice, of course, neither the bayes predictor h * a nor the ideal generalization performance e * (a) are known. instead, we only assume a data set d = d l ∪ d u to be given, which consists of a set of labeled instances and possibly another set of unlabeled instances (test results without ground truth) d u = {x j } u j=1 ⊂ x . from a machine learning point of view, it is then natural to estimate the generalization performance on the basis of d for each a ⊆ [k] . to this end, models (1) can be fitted and their generalization performance can be estimated, for example, using cross-validation techniques or the bootstrap. more specifically, what can be estimated in this way is the generalization performance of a model that is trained on a combination a and data in the form of l labeled and u unlabeled examples. therefore, we shall denote a corresponding estimate byê(a, l, u ) or simplyê(a) (assuming the underlying data to be given). needless to say, the estimatesê(a) thus obtained are not necessarily monotone in the sense thatê(b) ≤ê(a) for a ⊆ b. in fact, while e * (a) is the generalization performance of the bayes predictor, i.e., the model that is obtained in the limit of an infinite sample size (provided the underlying learner is consistent), the estimatesê(a) are obtained from models trained on a finite (and possibly small) data set. therefore, practical problems such as overfitting become an issue, i.e., including additional tests may deteriorate instead of improve generalization performance. how can the ideal generalization performances be estimated? starting with the finite-sample estimates our proposal is to correct these estimates so as to assure monotonicity. in fact, monotonicity is the main difference between the ideal and finite-sample scores. apart from that, the ideal scores (3) should not differ too much from the estimates (4), i.e., e * (a) ≈ê(a), at least if the training data is not too small. these considerations suggest the following estimation principle: find a set of values (3) that satisfy monotonicity while remaining as close as possible to the corresponding scores (4). this principle can be formalized as an optimization problem of the following kind: the above problem can be tackled by means of methods for isotonic regression. more specifically, since the inclusion relation on subsets induces a partial order on 2 [k] , methods for isotonic regression on partially ordered structures are needed [3, 14] . consider the set function ν : obviously, ν is a monotone measure (of the usefulness of combined tests). moreover, this measure can be normalized by setting where ν (∅) is the performance of the best (default) decision rule that does not use any test, i.e., which either always predictsŷ = +1 or alwaysŷ = −1. the measure ν * (·) thus defined satisfies the following properties: thus, ν * is a normalized, monotone (but not necessarily additive) set function, referred to as fuzzy measure or capacity in the literature [5] . for each combined test a, ν * (a) is a reasonable measure of the usefulness of this test. in a similar way, a measure v • can be defined on the basis of the finite-sample scores (4), that is, by normalizing ν where ν min = 1− max b⊆[k]ê (b) and ν max = 1− min b⊆[k]ê (b). note, however, that this measure is not necessarily monotone. which of the two measures is more meaningful, ν * or ν • ? the answer to this question depends on practical considerations and what the measure is actually supposed to capture. when being interested in the potential asymptotic usefulness of a test combination, then ν * is the right measure. otherwise, if a model induced from a concrete set of training data is supposed to be put into (medical) practice, ν • is arguably more relevant. from the point of view of (cooperative) game theory, each (test) combination a ⊆ [k] can be seen as a coalition and ν ∈ {ν * , ν • } as the characteristic function, i.e., v(a) is the payoff achieved by the coalition a. thanks to this view, we can take advantage of various established game-theoretical concepts for analyzing the importance of individual players, which correspond to tests in our case, as well as the interaction between them. in particular, the shapley value, also called importance index, is defined as follows [17] : the shapley value of ν is the vector ϕ(ν) = (ϕ(1), . . . , ϕ(k)). for monotone measures (such as ν = ν * ), one can show that 0 ≤ ϕ(k) ≤ 1 and k k=1 ϕ(k) = 1; thus, ϕ(k) is a measure of the relative importance of the test x k . the interaction index, as proposed by [13] , is defined as follows: this index ranges between −1 and +1 and indicates a positive (negative) interaction between the tests x i and x j if i i,j > 0 (i i,j < 0). it is worth mentioning that the approach put forward in this section is quite in line with the idea of shapley value regression [11] , which makes use of the shapley value in order to quantify the contribution of predictor variables in (linear) regression analysis (quantifying the value of a set of variables in terms of the r 2 measure on the training data). feline infectious peritonitis (fip) is a disease with an affinity to young cats, a predisposition to involve cats living in larger groups. as it exhibits typical physical examination and clinical laboratory findings, it appears to be easy to diagnose. however, while a presumptive diagnosis is quickly established, a definite diagnosis is difficult to impossible to obtain without gross and histopathological evaluation including immunohistochemistry [1, 10] . the seroprevalence is high, especially in catteries where up to 90 % of the cats are positive [2] , but also up to 50 % of cats living in single-cat households have coronavirus-specific antibodies [4] . of these, 5-10 % will develop the deadly form of fip. a characteristic symptom of fip is body cavity effusion, which also appears in other diseases [8] . several treatment options exist for some of these diseases while fip is deadly and no reliable effective therapy is known so far [16] . therefore, it is important to diagnose the correct disease early. several diagnostic tests are available that diagnose fip, for which sensitivity, specificity, positive and negative predictive value vary between different studies, presumably because different forms of fip (effusive and dry) were investigated and because various clinical signs, geographic locations, years of investigation, prevalence and combination of tests were used [4, 6, 7, 9, 15, 18] . in studies so far, no cat had all available tests performed. the data underlying our study includes the following diagnostic tests: -albumin to globulin ratio, plasma (x 1 ) and effusion (x 2 ) -rivalta test (x 3 ) -presence of antibodies against feline coronavirus (fcov, x 4 ) -reverse transcriptase nested polymerase chain reaction (rt-npcr) to detect fcov-rna in edta-blood (x 5 ) and in the effusion (x 6 ) -immunofluorescence staining (ifa) of fcov antigen in macrophages in the effusion (x 7 ) our dataset consists of 100 cats in total. for 29 of these cats, a necropsy was performed to establish the gold standard diagnosis; 11 of the 29 cats were diagnosed with feline infectious peritonitis (fip). additionally, the above 7 diagnostic tests were performed on all cats (i.e., k = 7, l = 29 and u = 71). to estimate the generalization accuracy (in terms of the simple 0/1 loss function) of each of the 2 7 = 128 combined diagnostic tests, we employ a semisupervised classification technique called maximum contrastive pessimistic likelihood estimation (mcpl) [12] . logistic regression with l 2 penalization is used as the base learner in mcpl, i.e., individual tests are combined using a linear model of the form (1). estimatesê(a) of the (finite-sample) classification errors are obtained as follows: we resample the set of 29 labelled cats and split the resulting sample into 16 training and 13 test examples. the remaining 71 cats without label information are added to the training set. this procedure is repeated 501 times for each of the 128 combinations of tests, and the results are averaged. to obtain estimates e * (a) of the ideal generalization performances, the finite-sample estimates are subsequently corrected using isotonic regression [3, 14] as described in sect. 2.4. to further illustrate the importance of the diagnostic test rt-npcr, fig. 2 shows the mean validated classification accuracy for all 128 test combinations. the 80 % empirical percentiles are indicated by the vertical lines, and the subsets are sorted in decreasing order of their mean validated accuracy. moreover, the results for those subsets including rt-npcr (measured in blood) are highlighted in blue. evidently, the concentration of subsets containing rt-npcr (blood) is systematically higher to the left of the plot, which confirms that the inclusion of the test improves diagnostic accuracy. the effect of isotonic regression on the finite-sample estimates is shown in fig. 3 . here, each blue dot corresponds to an estimateê(a) for a particular subset a of diagnostic tests. since partial monotonicity, which is assured by isotonic regression, cannot be visualized in a two-dimensional plot, the data points are sorted by their corrected classification accuracy (and ties are broken at random). the green line shows the isotonic regression fit. the corrected performance estimates ν * (a) can subsequently be used to calculate the shapley values for each diagnostic test. the results are shown in fig. 4 . due to the monotonicity of ν * , all values are now positive. again, the rt-npcr tests achieve the highest shapley values, but fcov antibody titer and ifa (effusion) obtain values > 0.15, too. note that the relative order of the rt-npcr tests changed from the one in fig. 1 , probably due to their accuracy being very similar and the random nature of the bootstrap validation. figure 5 shows the accuracy estimates for all subsets. the dots indicate the corrected accuracies ν * (a) and are used to sort subsets in decreasing order, while fig. 2) , the subsets containing rt-npcr (blood) can mostly be found on the left side of the plot; this trend is now even more pronounced. an important question for a veterinary physician is which combination a of tests to perform, taking into account both diagnostic accuracy and effort. figure 6 shows the corrected accuracies ν * (a) (green dots) of all subsets of tests and their combined monetary cost in euro. the pareto set, consisting of those combinations that are not outperformed by any other combination in terms of both accuracy and cost at the same time, is indicated as a blue line. from a practical point of view, the result suggests to use a single diagnostic test, namely rt-npcr (blood or effusion), because the inclusion of more tests yields only minor improvements. this is confirmed by the pairwise interaction indices shown for both measures ν • and ν * in table 1 . all these measures are negative, suggesting that the tests are more redundant than complementary. note that, once a decision in favor of using a single test is made, the shapley value, as a measure of average improvement achieved by adding a test, is no longer the best indicator of the usefulness of a test. instead, a selection should be made based on the tests' individual performance. with a validated accuracy of 87 %, rt-npcr (effusion) appears to be the best choice in this regard. in this paper, we proposed a method for measuring the importance and usefulness of predictor variables in (semi-/supervised) machine learning, which makes use of concepts from cooperative game theory: subsets of variables are considered as coalitions, and their predictive performance plays the role of the payoff. although our approach is motivated by a concrete application in veterinary medicine, namely the diagnosis of feline infectious peritonitis in cats, it is completely general and can obviously be used for other learning problems as well. for the case study just mentioned, our method produces results that appear to be plausible and agree with the medical experts' experience. roughly speaking, there are two strong diagnostic tests that are significantly more accurate than others; practically, it suffices to use one of them, since a combination with other tests yields only minor improvements. there are several directions for future work. for example, the principle we proposed in sect. 2.4 for inducing ideal generalization performances e * (a) from finite-sample estimatesê(a) is clearly plausible and, moreover, seems to be indeed able to calibrate the original estimates thanks to an ensemble effect. nevertheless, it calls for a more thorough analysis and theoretical justification. recommendations from workshops of the second international feline coronavirus/feline infectious peritonitis symposium prevalence of feline coronavirus types i and ii in cats with histopathologically verified feline infectious peritonitis structure algorithms for partially ordered isotonic regression performances of different diagnostic tests for feline infectious peritonitis in challenging clinical cases fundamentals of uncertainty calculi with applications to fuzzy inference comparison of different tests to diagnose feline infectious peritonitis using direct immunofluorescence to detect coronaviruses in peritoneal in peritoneal and pleural effusions sensitivity and specificity of cytologic evaluation in the diagnosis of neoplasia in body fluids from dogs and cats positive predictive value of albumin: globulin ratio for feline infectious peritonitis in a mid-western referral hospital population a comparison of lymphatic tissues from cats with spontaneous feline infectious peritonitis (fip), cats with fip virus infection but no fip, and cats with no infection analysis of regression in game theory approach contrastive pessimistic likelihood estimation for semi-supervised classification techniques for reading fuzzy measures (iii): interaction index algorithms for a class of isotonic regression problems using direct immunofluorescence to detect coronaviruses in peritoneal and pleural effusions effect of feline interferon-omega on the survival time and quality of life of cats with feline infectious peritonitis a value for n-person games detection of ascitic feline coronavirus rna from cats with clinically suspected feline infectious peritonitis key: cord-001253-3jnkki5z authors: mohammad, fahim; theisen-toupal, jesse c.; arnaout, ramy title: advantages and limitations of anticipating laboratory test results from regressionand tree-based rules derived from electronic health-record data date: 2014-04-14 journal: plos one doi: 10.1371/journal.pone.0092199 sha: doc_id: 1253 cord_uid: 3jnkki5z laboratory testing is the single highest-volume medical activity, making it useful to ask how well one can anticipate whether a given test result will be high, low, or within the reference interval (“normal”). we analyzed 10 years of electronic health records—a total of 69.4 million blood tests—to see how well standard rule-mining techniques can anticipate test results based on patient age and gender, recent diagnoses, and recent laboratory test results. we evaluated rules according to their positive and negative predictive value (ppv and npv) and area under the receiver-operator characteristic curve (roc aucs). using a stringent cutoff of ppv and/or npv≥0.95, standard techniques yield few rules for sendout tests but several for in-house tests, mostly for repeat laboratory tests that are part of the complete blood count and basic metabolic panel. most rules were clinically and pathophysiologically plausible, and several seemed clinically useful for informing pre-test probability of a given result. but overall, rules were unlikely to be able to function as a general substitute for actually ordering a test. improving laboratory utilization will likely require different input data and/or alternative methods. laboratory testing is the single highest-volume medical activity [1] . its main role is to help adjust the level of clinical suspicion of a diagnosis to help rule it in or out; it is also used for disease monitoring. in practice, the level of clinical suspicion and the probability of a given test result can be correlated: the higher the suspicion, the more likely it is that the result will confirm the diagnosis. information that feeds into the clinical suspicionincluding the age and gender of the patient, prior diagnoses, and prior laboratory results-thus may also influence the test result. in principle, this relationship can be used to improve laboratory testing by making it possible to estimate the pre-test probability of getting a given test result before ordering the test, and, in the limit, to reduce test utilization without adversely affecting patient outcomes. indeed, ordering fewer tests, where warranted, might benefit outcomes by saving the patient the burden of following up false positives (or negatives) [2] [3] [4] . conceptually, the relationship between clinical suspicion and pre-test probability is used routinely to help set guidelines regarding when and when not to order a given test. for example, the pre-test probability of lyme serology being positive given a targetoid rash is high enough that, given the test's sensitivity and specificity, ordering the test is contraindicated [5] . because of the large number of tests and clinical scenarios that exist, and in light of evidence from across medicine that utilization of laboratory testing can be improved [1, 6] , it is of interest to understand whether analyzing large clinical databases using the robust application of standard statistical techniques can turn this relationship into actionable decision-support rules-or whether progress toward better laboratory utilization might instead lie elsewhere. we sought to test the limits of rule-mining for this purpose. to what extent can laboratory results be anticipated computationally based on data available to the clinician, or a clinical decision support system, at the time of the order? we addressed this question using generalized linear modeling (glm), a generalized form of linear regression [7] , and, for comparison, classification trees (ct) [8, 9] . we used four types of input-age, gender, diagnoses (three-digit icd-9 codes), and results of laboratory tests on blood samples added to the record in the seven days before a given test was ordered-to build simple, robust models for whether the result of a test would be within the reference interval (''normal'') or outside of it in a given direction (''abnormal''), treating high and low results separately. we based our study on 10 years of records from the beth israel deaconess medical center (bidmc), a 585-bed tertiary care center in boston, ma. we first anonymized records and reconciled test names (work approved by bidmc committee on clinical investigation's institutional review board for research involving human subjects, protocol 2012-p-000229/ 01). informed consent was not obtained because patient records/ information was anonymized prior to analysis. each blood test (the test of interest), over 69.4 million in all, was marked as an in-house test (performed at the hospital) or a sendout (performed off-site). for each test, we compiled a list of all instances in which the test was ordered and performed. for each instance, we recorded the patient's age, gender, and any diagnoses or other blood-test test results from the seven days prior to the result of interest. when a test was ordered multiple times within a seven-day period, we considered only the most recent one (i.e., the one closest in time to the sendout order) as input data. for relevance, we considered only those tests that were ordered at least 1,000 times over the entire 10-year period, for an average of at least twice a week. we randomly divided the resulting instances into a training set and a test set (see below for details). all tests had either two (reference vs. abnormal) or three (low, normal, or high) possible response values. for tests with three values, we performed two separate rule searches: one for high vs. not high-i.e., grouping normal and low-and one for low vs. not low. we sought to identify simple, robust subsets of our input data to evaluate as linear predictors (''rules'') for whether a test result would be normal or abnormal. to do this, we used glm twice: first to find rules based on a particular training set and a second time to find rules based on just those items that were common to rules found from a number of different training sets (to avoid overfitting any one training set). we did this as follows, for each test of interest (the response variable or ''response''). we first excluded those input variables (''features'') that appeared with fewer than 5 percent of the response. we then temporarily set aside the most common features (those of the complete blood count and basic metabolic panel) as well as age and gender, and searched the remaining items for frequent featuresets (using the apriori algorithm [10, 11] ). we then added back to each resulting featureset the common features, age, and gender (which are frequent items by definition, since they appear in all instances) with a support threshold of 0.60 (i.e., itemsets for which all items were present with at least 60 percent of instances of the response variable). this set-aside/add-back approach sped the search for featuresets without loss of comprehensiveness. we used each featureset to create a model for the test of interest using r's glm function (with the family argument set to ''binomial''). we used backward feature elimination to remove non-significant features one at a time from the featureset (using a significance threshold p-value of 1610 25 ; see below) until the only features that remained were all significantly correlated with the response. we also removed features that are used to calculate the result for the test of interest-e.g., cd4 and cd8 count for t-cell count, which is the sum of cd4 and cd8-for all but proof-ofprinciple runs. the significance threshold was corrected for multiple comparisons by dividing the traditional threshold of p = 0.05 by the product of the total number of tests considered and the average number of rules generated for each test. the combined total number of features (in-house tests plus sendout tests plus diagnoses) was 170+81+434 = 685. the average number of rules after application of glm for the first time for each test is 6. thus our threshold p-value was 0.05/(6*685) = 1.2610 25 , which we rounded to 1610 25 . we constructed a model for the result by running glm a second time on a training set (see below) based on this reduced featureset. of note, there was no guarantee that any feature would be significantly correlated (p#1610 25 ) or that there would be enough instances (glm's threshold was 200) of the test appearing with all features of even the reduced featureset for glm to produce a model. when feature elimination resulted in no significant features or too few instances, no model was constructed. we scored models using ppv, npv, and roc auc. we were interested only in models that were robust to the size and choice of training set. therefore we repeated the above process for a range of training set-test set splits (80-20, 70-30, 60-40, 50-50, 40-60, 30-70, and 20-80 percent). for each split, we ran the above process 10 times and found the number of rules with auc$0.75. we decided on using a 60-40 split for downstream analyses as this split generated a total number of rules comparable to 70-30 and 80-20 splits but with less training data (fig. 1) . finally, for each test of interest, we selected features that appeared in a strict majority of rules for that test and reran glm using only those features. this made rules both simpler and more robust by removing features whose presence was contingent on a particular choice of training or test set. for each of the inhouse and sendout tests we used cart, implemented as rpart in r (rpart v3.1-50; cran.r-project.org/ package = rpart), to predict the response from all input features, using 60:40 training:test-set splits. we fixed some of the metrics (see below) that were used in building the final tree. the cart grows classification tree in two stages. in stage one, a tree is grown by finding a feature which best splits the data into two groups. splitting is done only if the overall ''impurity,'' the number of outcomes different from the majority (e.g., a ''low'' response alongside many ''normal'' responses), decreases, above some threshold (the ''complexity parameter;'' 0.01). then, in top-down fashion, these two subgroups are further divided in a recursive manner until the subgroups reach a minimum size (minsplit = 20 records) or until no further improvement can be made. the resulting tree may overfit the training data. to avoid this, crossvalidation (xval = 10; 10-fold cross-validation) was used in the second stage by pruning the tree. we fixed the maximum depth (maxdepth) of the tree, i.e., the maximum number of branchings from stem to leaf, to be 20. the final models were tested on the test data and performance statistics are found. we repeated modelbuilding 10 times for each test and summarized the statistics. data-processing was performed in python (enthought canopy python version 2.7.3. r (version 2.15.3) was used for statistical analysis and reports generation. to determine how well sendout and in-house test results can be anticipated based on basic information available in the medical record, we used two independent methods-generalized linear modeling (glm) and classification and regression trees (cart)to build simple, robust test-result predictors and then evaluated the performance of these predictors according to the standard clinical metrics of positive predictive value (ppv) and negative predictive value (npv), as well as sensitivity and specificity via the receiveroperator curve (roc) area under the curve (auc). as proof of principle for glm, we first tested it on the anion gap, a result calculated by subtracting the serum concentrations of the anions chloride and bicarbonate from those of the cations sodium and potassium, and confirmed that our methods found a rule for elevated anion gap based on these four items. we next applied glm to 81 sendout tests ordered regularly at our hospital. glm generated rules for just 11 of these tests. for the remaining tests, either no recent diagnosis or in-house test result (or age or gender) was sufficiently correlated with the sendout test result, or there were not enough instances in which correlated items appeared with the result, to generate a rule. only two tests-for high corticotropin (acth) and for low ceruloplasmin-had npv$0.95. of these, ceruloplasmin had a ppv$0.94. the mean auc for all rules was 0.69, with models for only three tests having an average auc$0.75 over 10 repeat runs. removal of features that did not appear in a majority of rules had essentially no effect on these aucs (difference in mean auc#0.02). cart generated rules for 60 tests. however, the auc for most of these rules was low, with only five tests having auc$0.75: free t3, alpha-macroglobulin, ca27-29, hyaluronic acid, and alpha fetoprotein (auc 0.75-0.79). we next applied glm to in-house tests. a total of 170 in-house tests were analyzed. a number of rules exhibited a high ppv (the probability of seeing an abnormal value given a prediction of an abnormal value by the rule) or npv (the probability of seeing a normal value given prediction of a normal value). these were mostly components of the complete blood count (cbc) and metabolic panels. interestingly, the predictive power of these rules was almost exclusively based on a previous measurement of the test in question: in other words, the best rules were for repeat tests, and the best predictor of a result being normal or abnormal was whether it had been normal or abnormal within the previous seven days. for example, the npv for a low red blood cell count was 0.95 (with ppv = 0.75), with a rule that depended most on the previous red blood cell count also having been low, and the ppv for high total calcium was 0.98 (npv = 0.76) and based exclusively on the previous total calcium having been high. for comparison, we applied cart to in-house tests, again including in the input data the most recent result for that test if performed within a week of the order. again, a number of rules exhibited a high ppv ($0.95), and again these were often tests of the cbc and metabolic panels, with rules based almost exclusively on a previous abnormal value. examples included low white blood cell count (wbc; ppv = 0.97, npv = 0.79), platelet count (0.95, 0.88), and serum sodium (0.96, 0.65), and high total calcium (0.99, 0.67), mean corpuscular volume (0.98, 0.84), and iron (0.97, 0.56) all of which were determined almost exclusively from the previous value being low or high (table 1) . overall, there was good agreement in ppv between glm and cart for tests for which both methods found rules, but cart outperformed glm noticeably in npv (fig. 2 ). the growing availability of large clinical databases has raised interest in the possibility of using systematic rule-mining for clinical decision support [12] [13] [14] [15] . one popular and well characterized approach has been logistic regression [16] [17] [18] , a special case of generalized linear modeling (glm). researchers have applied these approaches for diverse health-related purposes including prediction of cardiovascular risk [19] , mortality in head trauma [18] , texture analysis of magnetic resonance images [16] , and many other applications [17, 20, 21] . however, we note that glm does not easily incorporate missing values, as it removes records with missing features; a feature will be ''missing'' for any record in which that test (the feature) was not performed. other methods, such as classification and regression trees (cart) and artificial neural networks [18] , have also been applied. most of these studies were limited in scope to predicting risk of a particular diagnosis. harper [22] compared four classification techniques (regression, cart, artificial neural networks, and discriminant analysis) on four different datasets and concluded that there was no obvious best choice for their data; while cart performed best, regression was fastest and nearly as good. similar comparative studies on coronary artery disease [20] and alzheimer disease [23] indicated that newer algorithms such as ann and random forests [24] have little advantage over simpler, more traditional approaches. also, the utility and limitations of these approaches for predicting laboratory results (as opposed to diagnoses) are unclear. however, while cart is both a top performer and overcomes glm's problem with missing values, it is also more computationally intensive and potentially less sensitive to simple algebraic relationships among features (e.g., among sodium, chloride and bicarbonate and the anion gap). therefore we chose glm as a well-understood approach with strong performance and excellent speed, and cart as the bestperforming complementary approach for purposes of comparison. given the importance of laboratory testing, we asked how much information regression-or classification tree-based rules could provide in assessing the pre-test probability of a test result being abnormal for 251 commonly ordered in-house and sendout tests at our hospital. data-mining can sometimes find spurious correlations, artifacts of the particular partitioning of the data into training and test set. to avoid such artifacts, we repeated our regression on multiple independent partitions of the data and kept only items that appeared in a majority of the resulting rules. this safeguard also had the effect of simplifying rules by making each rule dependent on a smaller number of items. as expected, the effect on performance was negligible and dependence on the resulting items was more often clinically and pathophysiologically plausible than rules derived from each run. when data-mining it is also important to consider the setting. the rules we found do not exist in a vacuum but are ''contingent'' in the sense that they depend on current clinical practice. certain tests and panels are ordered in patterns. in a sense, contingency is a form of selection bias: there may well be other diagnoses or test result results that correlate with the result for the test of interest that are not routinely measured according to current best practices. however, as long as the setting in which such rules would be applied is the substantially similar to that in which they were found, selection bias would have little if any effect on finding rules. as long as one is clear that one is looking for relationships in a current practice process, and not among all things that could possibly be measured, any rules that are discovered will by construction be setting-appropriate. but while our rules appear to be plausible and settingappropriate, the motivating question behind this study is whether the rules we found could be useful clinically. one way to approach this question is by considering the positive and negative predictive value of each rule (ppv and npv). these metrics are in contrast to sensitivity and specificity, by which rules are often measured but which do not incorporate disease prevalence in spite of its importance to clinical decision-making. a ppv of 0.95 means that when a rule suggests that the test result will be abnormal, the result actually will be abnormal 95 percent of the time. a npv of 0.95 means that when a rule suggests that the test result will be normal, the result actually will be normal 95 percent of the time. we found rules with ppv and/or npv$0.95 (by glm) for only two tests that are sendouts at our hospital-one of which is ceruloplasmin, which we have previously suggested is overordered via chart review [25] . in contrast, for in-house tests we found over a dozen such rules. interestingly, the main determinant for rules for in-house tests was a normal or abnormal result for the same test within the previous seven days. although in this study we did not set out explicitly to make a statement about repeat laboratory testing, the appropriateness of which has been investigated elsewhere [4] , these results suggest that repeat laboratory testing within one week does not always add information that could not have been anticipated from the previous result. refining this observation using the same unbiased approach we have followed here is potentially an area for future investigation. our results should not be taken as a categorical criticism of repeat testing. first, while the ppv was $0.95 in several cases, the npv was more typically 0.70-0.85. thus, while prediction that a result will be abnormal may be correct 95 percent of the time, which may be good enough to discourage repeat ordering, prediction that a result will be normal may not be so dependable. therefore use of a rule depends on the subtle distinction of whether the clinical question is ''will the result be abnormal'' vs. ''will the result be normal.'' second, we note that no rules with such strong performance were found for the majority of our sendout or in-house tests by either of our two complementary approaches. thus while the rules we found can inform clinical decision-makers, the information they provide rarely replaces the information obtained from actually performing these tests. it is interesting to note that on average, our simple rules yielded a ppv of 0.84 and an npv of 0.75. this means that on average, rules will correctly predict an abnormal laboratory result 5 times out of 6 (5/6<0.84) and correctly predict a normal result 3 times out of 4. while not good enough to replace testing (especially for rules that depend on previous test results), these observations raise the question of how much better prediction can get. integration of information not considered in the present study, including vital signs, chief complaints, and physical findings, may improve prediction by these methods. the ulysses syndrome the dangers of false-positive and false-negative test results: false-positive results as a function of pretest probability the landscape of inappropriate laboratory testing: a 15-year systematic review and meta-analysis laboratory evaluation in the diagnosis of lyme disease elementary, my dear doctor watson generalized linear models electronic health record surveillance algorithms facilitate the detection of transfusion-related pulmonary complications using classification trees to assess low birth weight outcomes fast algorithms for mining association rules fast discovery of association rules data mining and clinical data repositories: insights from a 667,000 patient data set mining association rules from clinical databases: an intelligent diagnostic process in healthcare machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records predictive data mining in clinical medicine: current issues and guidelines textural analysis of contrast-enhanced mr images of the breast establishing a clinical decision rule of severe acute respiratory syndrome at the emergency department comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data improved cardiovascular risk prediction using nonparametric regression and electronic health record data comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease predicting improvement in urinary and bowel incontinence for home health patients using electronic health record data a review and comparison of classification algorithms for medical decision making application and comparison of classification algorithms for recognition of alzheimer's disease in electrical brain activity (eeg) random forests the overuse of serum ceruloplasmin measurement key: cord-316047-d9cpe9yl authors: gonzalez, t.; de la rubia, m. a.; hincz, k. p.; comas-lopez, m.; subirats, laia; fort, santi; sacha, g. m. title: influence of covid-19 confinement on students’ performance in higher education date: 2020-10-09 journal: plos one doi: 10.1371/journal.pone.0239490 sha: doc_id: 316047 cord_uid: d9cpe9yl this study analyzes the effects of covid-19 confinement on the autonomous learning performance of students in higher education. using a field experiment with 458 students from three different subjects at universidad autónoma de madrid (spain), we study the differences in assessments by dividing students into two groups. the first group (control) corresponds to academic years 2017/2018 and 2018/2019. the second group (experimental) corresponds to students from 2019/2020, which is the group of students that had their face-to-face activities interrupted because of the confinement. the results show that there is a significant positive effect of the covid-19 confinement on students’ performance. this effect is also significant in activities that did not change their format when performed after the confinement. we find that this effect is significant both in subjects that increased the number of assessment activities and subjects that did not change the student workload. additionally, an analysis of students’ learning strategies before confinement shows that students did not study on a continuous basis. based on these results, we conclude that covid-19 confinement changed students’ learning strategies to a more continuous habit, improving their efficiency. for these reasons, better scores in students’ assessment are expected due to covid-19 confinement that can be explained by an improvement in their learning performance. the coronavirus covid-19 outbreak disrupted life around the globe in 2020. as in any other sector, the covid-19 pandemic affected education in many ways. government actions have followed a common goal of reducing the spread of coronavirus by introducing measures limiting social contact. many countries suspended face-to-face teaching and exams as well as placing restrictions on immigration affecting erasmus students [1] . where possible, traditional classes are being replaced with books and materials taken from school. various e-learning platforms enable interaction between teachers and students, and, in some cases, national television shows or social media platforms are being used for education. some education systems announced exceptional holidays to better prepare for this distance-learning scenario. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 in terms of the impact of the covid-19 pandemic on different countries' education systems many differences exist. this lack of homogeneity is caused by such factors as the start and end dates of academic years and the timing of school holidays. while some countries suspended in-person classes from march/april until further notice, others were less restrictive, and universities were only advised to reduce face-to-face teaching and replace it with online solutions wherever practicable. in other cases, depending on the academic calendar, it was possible to postpone the start of the summer semester [2] . fortunately, there is a range of modern tools available to face the challenge of distance learning imposed by the covid-19 pandemic [3] . using these tools, the modification of contents that were previously taught face-to-face is easily conceivable. there are however other important tasks in the learning process, such as assessment or autonomous learning, that can still be challenging without the direct supervision of teachers. all these arguments end in a common topic: how to ensure the assessment's adequacy to correctly measure students' progress. thus, how can teachers compare students' results if they differ from previous years? on one hand, if students achieve higher scores than in previous years, this could be linked with cheating in online exams or with changes in the format of the evaluation tools. on the other hand, lower grades could also be caused by the evaluation format change or be attributable to autonomous learning as a less effective teaching method. the objective of this article is to reduce the uncertainty in the assessment process in higher education during the covid-19 pandemic. to achieve this goal, we analyze students' learning strategies before and after confinement. altogether, our data indicates that autonomous learning in this scenario has increased students' performance and higher scores should be expected. we also discuss the reasons underneath this effect. we present a study that involves more than 450 students enrolled in 3 subjects from different degrees from the universidad autónoma de madrid (spain) during three academic years, including data obtained in the 2019/2020 academic year, when the restrictions due to the covid-19 pandemic have been in force. e-learning has experienced significant change due to the exponential growth of the internet and information technology [4] . new e-learning platforms are being developed for tutors to facilitate assessments and for learners to participate in lectures [4, 5] . both assessment processes and self-evaluation have been proven to benefit from technological advancement. even courses that solely offer online contents such as massive open online courses (moocs) [6, 7] have also become popular. the inclusion of e-learning tools in higher education implies that a greater amount of information can be analyzed, improving teaching quality [8] [9] [10] . in recent years, many studies have been performed analyzing the advantages and challenges of massive data analysis in higher education [11] . for example, a study of gasevic et al. [12] indicates that time management tactics had significant correlations with academic performance. jovanovic et al also demonstrated that assisting students in their management of learning resources is critical for a correct management of their learning strategies in terms of regularity [13] . within few days, the covid-19 pandemic enhanced the role of remote working, e-learning, video streaming, etc. on a broad scale [14] . in [15] , we can see that the most popular remote collaboration tools are private chat messages, followed by two-participant-calls, multiperson-meetings, and team chat messages. in addition, several recommendations to help teachers in the process of online instruction have appeared [16] . furthermore, mobile learning has become an alternative suitable for some students with fewer technological resources. regarding the feedback of e-classes given by students, some studies [17] point out that students were satisfied with the teacher's way of delivering the lecture and that the main problem was poor internet connection. related to autonomous learning, many studies have been performed regarding the concept of self-regulated learning (srl), in which students are active and responsible for their own learning process [18, 19] as well as being knowledgeable, self-aware and able to select their own approach to learning [20, 21] . some studies indicated that srl significantly affected students' academic achievement and learning performance [22] [23] [24] . researchers indicated that students with strongly developed srl skills were more likely to be successful both in classrooms [25] and online learning [26] . these studies and the development of adequate tools for evaluation and self-evaluation of learners have become especially necessary in the covid-19 pandemic in order to guarantee good performance in e-learning environments [27] . linear tests, which require all students to take the same assessment in terms of the number and order of items during a test session, are among the most common tools used in computerbased testing. computer adaptive test (cat), based on item response theory, was formally proposed by lord in 1980 [28] [29] [30] , as is the case with linear testing. some platforms couple the advantages of cat-specific feedback with multistage adaptive testing [38] . the use of cat is also increasingly being promoted in clinical practice to improve patient quality of life. over the decades, different systems and approaches based on cat have been used in the educational space to enhance the learning process [39, 40] . considering the usage of cat as a learning tool, establishing the knowledge of the learner is crucial for personalizing subsequent question difficulty. cat does have some negative aspects such as continued test item exposure, which allows learners to memorize the test answers and share them with their peers [41, 42]. as a solution to limit test item exposure, a large question bank has been suggested. this solution is unfeasible in most cases, since most of the cat models already require more items than comparable linear testing [43]. the aim of this study is to identify the effect of covid-19 confinement on students' performance. this main objective leads to the first hypothesis of this study which can be formulated as h1: covid-19 confinement has a significant effect on students' performance. the confirmation of this hypothesis should be done discarding any potential side effects such as students cheating in their assessment process related to remote learning. moreover, a further analysis should be done to investigate which factors of covid-19 confinement are responsible for the change. a second hypothesis is h2: covid-19 confinement has a significant effect on the assessment process. the aim of the project was therefore to investigate the following questions: 1. is there any effect (positive or negative) of the covid-19 confinement on students' performance? 2. is it possible to be sure that the covid-19 confinement is the origin of the different performance (if any)? 3. what are the reasons for the differences (if any) in students' performance? 4. what are the expected effects of the differences in students' performance (if any) in the assessment process? we have used two online platforms. the first one is e-valuam [44] , an online platform that aims to increase the quality of tests by improving the objectivity, robustness, security and relevance of assessment content. e-valuam implements all the cat tests described in the following sections. the second online platform used in this study is the moodle platform provided by the biochemistry department from universidad autónoma de madrid, where all the tests that do not use adaptive questions are implemented. adaptive tests have been used in the subjects "applied computing" and "design of water treatment facilities". traditional tests have been used in the subject "metabolism". 2.1.1 cat theoretical model. let us consider a test composed by n q items. in the most general form, the normalized grade s j obtained by a student in the j-attempt will be a function of the weights of all the questions α and the normalized scores ψ (s j = s j (α, φ)), and can be defined as: where the φ i is defined as where δ is the kronecker delta, a i the correct answer and r i the student's answer to the i-question. by using this definition, we limit φ i to only two possible values: 1 and 0; φ i = 1 when the student's answer is correct and φ i = 0 when the student gives a wrong value. this definition is valid for both open answer and multiple-choice tests. in the case of multiple-choice test with n r possible answers, φ i can be reduced to consider the random effect. in this case: independently of using eqs 2 or 3, to be sure that s j (α, φ) is normalized (i.e. 0< = s j (α, φ)< = 1), we must impose the following additional condition on α: in the context of needing a final grade (fg) between 0 and a certain value m, which typically takes values such as 10 or 100, we just need to rescale the s j (α, φ) value obtained in our model by a factor k, i.e. fg j = k s j (α, φ). we will now include the option of having questions with an additional parameter l, which will be related to the level of relevance of the question. l is a number that we will assign to all the questions included in the repository of the test (i.e. the pool of questions from where the questions of a j-test will be selected). the concept of relevance can take different significances depending on the context and the opinion of the teachers. in our model, the questions with lower l values will be shown initially to the students, when the students answer correctly a certain number of questions with the lower l value, the system starts proposing questions from the next l value. by defining n l as the number of possible l values, the l value that must be obtained in the k-question of the j-test can be defined as: where trunc means the truncation of the value between brackets. it is worth noting that l k is proportional to the sum of the student's answers to all the previous questions in the test. this fact means that, in our model, the l k depends on the full history of answers given by the student. l k is inversely proportional to n q , which means that it takes a higher number of correct answers to increase l k . once l k is defined, a randomly selected question is shown to the student. another important fact that implies the use of eq 5 in the adaptive test is that we will never have l k