Key Words Competing Interests Article Information

Biomarkers, prediction modeling, prostate 
cancer, clinical utility, decision analysis, 
discrimination, calibration, net benefit

Dr Vickers reports grants from National 
Institutes of Health during the conduct of the 
study; personal fees from Opko outside the 
submitted work. In addition, Dr Vickers has a 
patent Arctic Partners issued. Dr Assel reports 
grants from National Institutes of Health 
during the conduct of the study.

Received on June 30, 2020 
Accepted on August 7, 2020

Soc Int Urol J. 2020;1(1):16–22 

Biomarker Evaluation and Clinical Development
Melissa Assel, Andrew J. Vickers

Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, United States

Abstract

Most candidate biomarkers are never adopted into clinical practice. The likelihood that a biomarker with good 
predictive properties will be incorporated into urologic decision-making and will improve patient care can be 
enhanced by following established principles of biomarker development. Studies should follow the REMARK 
guidelines, should have clinically relevant outcomes, and should evaluate the biomarker on the same patients to whom 
the biomarker would be applied in practice. It is also important to recognize that biomarker research is comparative: 
the question is not whether a biomarker provides information, but whether it provides better information than is 
already available. Continuous biomarkers should not be categorized above or below a fixed cutpoint: risk prediction 
allows for individualization of care. The risk predictions must be calibrated, that is, close to a patient’s true risk, and 
decision analysis is required to determine whether using the biomarker in clinical practice would change decisions 
and improve outcomes. Finally, impact studies are needed to evaluate how use of the biomarker in the real world 
affects outcomes.

Introduction

Biomarkers are used either to assess the risk of a current diagnostic state, such as having biopsy-detectable cancer,  
or to predict the risk of a future event, such as prostate cancer death. In the former case, the biomarker gives the 
clinician information at less cost, risk, and inconvenience than the diagnostic test; in the latter case, it provides 
an estimate of probability for occurrence of a future outcome in an individual patient. In this paper, we review 
methodologic considerations for biomarker development using serum biomarkers in prostate cancer as an example. 
We do not discuss how biomarkers are discovered or how they can best be measured accurately and reproducibly.  
We start from early phase studies in humans evaluating the association between the biomarker and the outcome,  
and move to the later phase trials (ie, impact studies) examining the effects of the biomarker when used in the clinic.

To be used most effectively, biomarkers need to be integrated into other information available to the clinician, such 
as a patient’s age or the stage of the tumor. This can be done informally by “clinical judgment,” using cutpoints and 
clinical rules, or by using a prediction model. In the case of PSA for prostate cancer early detection, an early approach 
was to use the clinical rule of “PSA > 4 or positive digital rectal examination (DRE).” This subsequently evolved to the 
more informal clinical judgment approach, in which the urologist considers the age of the patient, the recent clinical 
history (such as symptoms of benign prostate disease) and the DRE, as well as the absolute level of PSA. In the last 10 
to 15 years, there has been a move to statistical methods of risk prediction. Using statistical models such as the “PCPT 
risk calculator [1]” or the “PBCG model [2],” the urologist enters clinical data about the patient age, race, DRE, family 
history, and history of prior negative biopsy, as well as the level of PSA, and obtains a percentage risk of high-grade 
cancer. The advantage of prediction models is that they give more accurate predictions than either informal clinical 
jud g ment—nu merou s st ud ies have demonst rated t hat computer models out per for m cl i n icia ns  
[3–5]—or the risk groupings used for clinical prediction rules [6–9]. Moreover, use of prediction models allows greater 
individualization of care. A man who is older, has comorbidities, or is averse to medical procedures, but has a PSA 

16 SIUJ  •  Volume 1, Number 1  •  October 2020 SIUJ.ORG

MOLECULAR BIOMARKERS IN UROLOGIC ONCOLOGY: ICUD-WUOF CONSULTATION

mailto:vickersa%40mskcc.org?subject=SIUJ
http://www.siuj.org


Abbreviations 

AUC area under the curve
DRE  digital rectal examination
EPCA  early prostate cancer antigen 
PCPT  Prostate Cancer Prevention Trial
PCPTRC  Prostate Cancer Risk Calculator
ROC  receiver operating characteristic
EPCA early prostate cancer antigen

level just above 4 might reasonably ask whether his PSA 
warrants a biopsy; comparably, a man anxious about 
prostate cancer who has a PSA just below 4 might want 
reassurance that his risk is indeed low. It is only by using 
predicted probabilities that urologists can have a rational 
conversation about risk that takes into account patient 
preferences and characteristics.

Statistical methods for building models are described 
at length in various publications and are not further 
discussed here [10]. Instead, we focus on approaches to 
assess the predictiveness of a biomarker in 2 different 
scenarios: when the biomarker is used independently 
and when it is incorporated into a prediction model.

Can the Biomarker Predict the Outcome  
of Interest?
Choose the right outcome
The appropriate clinical endpoint for a biomarker is 
sometimes more complex that it appears. Well-known 
studies such as the PRACTICAL collaboration have 
developed polygenic risk scores for the endpoint of 
incident prostate cancer [11]. But incident prostate cancer 
is not synonymous with cancer-related mortality or 
morbidity. The central problem of prostate cancer early 
detection is overdiagnosis, reflecting that cancers are 
diagnosed that would never cause symptoms during the 
course of the patient’s natural life. It is thus not as useful 
to know a man’s risk of a prostate cancer diagnosis as it 
is to know the risk of prostate cancer metastasis or death: 
a man at high risk of prostate cancer death might want 
to consider screening to find a cancer early before it 
spreads; it is not at all clear what a man should do if he is 
at higher risk of prostate cancer. Biomarkers or models 
that predict the risk of any grade cancer on prostate 
biopsy can be subject to a similar criticism: we want to 
find cancers that we would consider treating (eg, grade 2 
or higher disease); we do not need to know about the risk 
of all cancers, including grade group 1 disease, the most 
appropriate management strategy for which is to order a 
second biopsy.

Naturally, the ideal endpoint for any biomarker to 
predict is cancer-specific morbidity (ie, metastasis) 
or mortality. Given that such endpoints may occur 
10 or 20 years after diagnosis, this is challenging and 
has been attempted only for a handful of prostate 
cancer biomarkers, including the 4Kscore [12,13], the 
DECIPHER score [14], and, of course, PSA [15,16].

Does the biomarker distinguish between 
samples of clearly distinguishable patients?
Investigators can test whether biomarker levels differ in 
clearly distinguishable groups of people. These studies 
can be performed relatively quickly as samples can be 
obtained from patients on the basis of an outcome status 
already achieved as opposed to following a cohort of 
patients prospectively until the outcome of interest 
occurs, or waiting to accrue patients undergoing a 
procedure such as biopsy. For example, in the now 
retracted EPCA study, levels of EPCA in men with 
prostate cancer were compared with those in healthy 
men, healthy women, and patients with other diseases, 
such as liver cancer or benign lung disease.

Diagnostic accuracy should instead be assessed using 
a sample representative of the population, as shown 
in Table 1. In scenarios A and B, we have a biomarker 
with high sensitivity and specificity (both 90%) for 
advanced disease, but unable to distinguish localized 
disease (sensitivity and specificity of 50%). Scenario A 
represents a sample with an equal number of patients 
in each disease group, in which the sensitivity and 
specificity in the entire population for detecting cancer 
is 70%. However, if the distribution of patients were 
more ref lective of the population, as in scenario B, 
the sensitivity and specificity drop to 58% and 63%, 
respectively.

Is the biomarker associated with the outcome  
in patients the biomarker would be applied  
to in practice?
Just as drugs are studied in the patients who would 
receive the drug were it shown to be effective, 
biomarkers should be studied in the patients to whom 
they would be applied in practice. The development of 
free-to-total PSA ratio is a good example of a marker 
that moved from research on convenience samples 
to the target population of men considering biopsy. 
First, Christensson et al. demonstrated an association 
between the ratio of a PSA isoform, free PSA, and the 
total amount of PSA in serum (free-to-total PSA ratio) 
is significantly lower among men with prostate cancer 
than in men with benign prostate hyperplasia [18]. 
Catalona et al. then determined that free-to-total PSA 
can enhance the specificity of prostate cancer screening 
by confirming that the association of free-to-total PSA 

17SIUJ.ORG SIUJ  •  Volume 1, Number 1  •  October 2020

Biomarker Evaluation and Clinical Development

http://www.siuj.org


and prostate cancer on biopsy remains significant 
among men with total PSA values of 4.1 to 10 ng/mL 
indicated for prostate biopsy in clinical practice [19].

How well does the biomarker predict the 
outcome of interest compared to information 
available to the clinician?
A useful biomarker should make additional information 
available to the clinician. In the Catalona et al. example 
above, measurement of free-to-total PSA ratio added 
information about prostate cancer risk over and above 
total PSA and DRE [19]. Assessments of discrimination 
or clinical utility (explained in detail below) can be 
used to compare the performance of a new biomarker 
with the performance of an existing model or existing 
biomarker.

Alternatively, a biomarker can be combined with 
other clinical factors by building a prediction model. 
For example, Klein et al. assess the added value of 
the Genomic Prostate Score by demonstrating that 
it is significantly associated with the risk of adverse 
pathology on multivariable logistic regression analysis 
when added to model containing standard clinical 
predictors (age, PSA, clinical stage, and biopsy) or 
established prediction models, including Cancer of the 
Prostate Risk Assessment score [20] and the National 
Comprehensive Cancer Network risk groupings [21].

Assessing Predictiveness
Discrimination
The area under the receiver operating characteristic 
curve (AUC), also referred to as the concordance statistic 
(or C index), is commonly used to assess discrimination: 
the probability that a randomly selected patient with the 

disease will have higher predicted probability of having 
the disease according to the test compared to a randomly 
selected subject without the disease [22-24].

When comparing the discrimination of different 
models or biomarkers, investigators are encouraged to 
report the difference in discrimination along with 95% 
confidence intervals. Approaches to assess whether 
there is a significant difference in discrimination depend 
on whether the models being compared are “nested.” 
A nested model is created when, for example, a new 
biomarker is added to an existing model, for instance, 
when the 2 models are PSA, DRE, and age versus PSA, 
DRE, age, and free-to-total-ratio. A comparison of 2 
existing models, such as the PCPT and the PBCG model, 
would be non-nested. In these cases, the Delong test can 
be used to test for a difference in discrimination [25]. 
When models are nested, the P-value from the Wald test 
corresponding to the biomarker should be reported, the 
Delong test being invalid [26,27].

Calibration
To be clinically useful, a prediction model must not 
only be able to discriminate between patients with 
and without the disease but also provide an accurate 
risk prediction. The degree to which predictions are 
in agreement with the observed outcomes is known 
as calibration [28]. A calibration plot visualizes the 
agreement between model predictions on the x-axis and 
the actual outcome on the y-axis. This is typically done 
by splitting the data into equal sized groups of increasing 
predicted probabilities (deciles) and plot the mean 
of the observed outcome by the decile of prediction  
[23]. See Figure 1 for an example of a calibration 
plot. A model with poor calibration in ranges of 

TABLE 1.

Two hypothetical studies that sample from individuals without disease, with benign disease, with localized disease, 
or with advanced cancer*

Cancer (Advanced or Localized) No Cancer (No Disease or Benign Condition)

Scenario A

Biomarker Positive (90% x 250) + (50% x 250) = 350 (true positives) (10% x 250) + (50% x 250) = 150 (false positives)

Biomarker Negative (10% x 250) + (50% x 250) = 150 (false negatives) (90% x 250) + (50% x 250) = 350 (true negatives)

Total 500 500

Scenario B

Biomarker Positive (90% x 50) + (50% x 200) = 145 (true positives) (10% x 250) + (50% x 500) = 275 (false positives)

Biomarker Negative (10% x 50) + (50% x 200) = 105 (false negatives) (90% x 250) + (50% x 500) = 475 (true negatives)

Total 250 750

18 SIUJ  •  Volume 1, Number 1  •  October 2020 SIUJ.ORG

MOLECULAR BIOMARKERS IN UROLOGIC ONCOLOGY: ICUD-WUOF CONSULTATION

http://www.siuj.org


probabi lities in which treat ment decisions ca n 
reasonably differ is likely to be of limited clinical value, 
even if discrimination is excellent: it is difficult to make a 
good decision if information about patient risk is wrong.

Some biomarkers, such as the prostate health 
index [29] and ExoDx Prostate IntelliScore, provide 
a score, and decisions are made by comparing the 
score with a proposed cutpoint but these scores do not 
represent risk of disease. Therefore, it is not possible 
to assess calibration in the traditional sense, although 
investigators can report the probability of the outcome 
above and below the previously proposed cutpoints to 
assess clinical value.

Clinical utility
A new biomarker is of value only if its use leads to 
improvement in patient outcome via a change in 
treatment decision patterns. A full assessment of 
the prognostic value of a biomarker or model must 
incorporate clinical consequences of the resulting 
decisions made. Table 2 shows a hypothetical study of 
1000 men with elevated PSA levels. Risk of cancer on 
biopsy was calculated on the basis of a prediction model 
including a new biomarker. This shows that 300 men 

had high-grade cancer and that among the 510 men 
with a predicted risk of 10% or greater (our threshold 
to indicate biopsy) cancer was detected in 210 of these 
men (Table 2). The clinical consequences shown in 
Table 2 indicate that to determine whether it is better to 
biopsy all men or to use the statistical model and biopsy 
those with a 10% risk of high-grade cancer, we need to 
consider whether it is worth missing 90 cancers to avoid 
490 biopsies.

In some cases, the results will be fairly obvious: if, 
for instance, there were only 10 high-grade cancers 
missed for a reduction of 490 biopsies, the value of 
the biomarker would be apparent. When results are 
not immediately clear, decision analysis can be of 
value. One of the simplest approaches, and the most 
widely used according to the urologic literature, is 
“net benefit,” which incorporates the consequences of 
clinical decisions of a prediction model or biomarker 
in the analysis [30]. Net benefit incorporates both 
discrimination (AUC) and calibration, making it an 
ideal statistic for comparing prognostic value [31]. A key 
aspect of net benefit is that the level of risk at which a 
patient opts to undergo a biopsy is informative of how a 

FIGURE 1.

A calibration plot for a model predicting the risk of high-grade prostate cancer on prostate biopsy. The dots show 
the average risk (and 95% CI) of patients divided into 10 groups of increasing risk. The dots and dashed regression 
line fall above the 45-degree line for good calibration, demonstrating that patients had higher risk than that 
predicted by the model. This is particularly a problem for risks around 10%, the sort of risk at which a patient might 
opt for prostate biopsy. Such a calibration plot would raise questions about whether the model should be used to 
inform prostate biopsy decision-making. 

0

20

40

60

80

100

O
bs

er
ve

d 
Ri

sk
 (%

)

0 20 40 60 80 100

Predicted Risk (%)

19SIUJ.ORG SIUJ  •  Volume 1, Number 1  •  October 2020

Biomarker Evaluation and Clinical Development

http://www.siuj.org


TABLE 2.

Hypothetical results of a biomarker study for prostate biopsy illustrating clinical consequences and decision analysis.

Strategy Biopsied
Biopsy 
avoided

High-grade 
cancers 
caught

High-grade 
cancers 
missed

Unnecessary 
biopsies

Net benefit

Biopsy all men with 
elevated PSA ied

1000 0 300 0 700 300 – (700 x — )  = 2220.100.90

Biopsy all men with 
elevated PSA

510 490 210 90 300 210 – (300 x — )  = 1770.100.90

patient weighs the relative harms of a false-positive (an 
unnecessary biopsy with risks of side-effects including 
infectious complications and hospitalization) versus 
a false-negative (missing or delaying the detection of 
a high-grade cancer) result. This level of risk is termed 
threshold probability [30]. The threshold probability 
chosen in Table 2 was 10%, corresponding to odds 
of 10:90, implying that missing a cancer is 9 times 
worse than performing an unnecessary biopsy [32]. A 
threshold of 10% corresponds to a “number-needed-
to-test” of 1/10% = 10, meaning that 10 men need to 
be biopsied to find 1 cancer [32,33]. Applying this 
9:1 ratio to the study results gives the findings in  
Table 2, where it can be seen that the biomarker 
is actually harmful. Even though the marker has 
reasonable sensitivity and specificity (~70% and ~60%), 
too many high-grade cancers are missed for the decrease 
in unnecessary biopsy achieved [34]. One obvious issue is 
that the threshold can vary between patients or doctors: 
a patient worried about cancer might opt of a threshold 
of 6%, whereas one nervous about medical procedures 
might demand a 15% risk before considering biopsy. 
In decision curve analysis, the threshold probability is 
varied over a reasonable range and net benefit plotted 
against threshold probability [30]. By visualizing the 
decision curve, one can readily ascertain whether one 
strategy or model is optimal for across the full range  
of threshold probabilities of interest. For more on 
decision curves, which are very widely used in urology 
research, a selection of further reading is available at 
www.decisioncurveanalysis.org.

Impact Studies
Decision analytic techniques provide hypothetical 
assessments of clinical consequences. Impact studies 
assess the real-world consequences of a new biomarker- 
or model-based strategy. For example, an impact study 
might assess whether the results of the biomarker 
translated to changes in decisions. For instance, a 
typical study of the 4Kscore would conclude that, 

hypothetically, were doctors to use the 4Kscore to make 
biopsy decisions based on a cut-off of 10%, then the 
biopsy rates would fall by about 50%. In a study designed 
to determine what happens in actual practice, Konety 
et al. reported a 65% reduction in prostate biopsies in 
men receiving the 4Kscore [35]. However, not all impact 
studies are consistent with clinical biomarker studies: 
White et al. found that use of the PHI in practice led 
to a very large decrease in the capture of high-grade 
cancers, with an approximate 30% risk of high-grade 
cancer amongst men who avoided biopsy [36]. Impact 
studies are also undertaken because some endpoints 
are not entirely predictable from clinical research. Early 
research on PSA did find that it detected prostate cancer 
at an early stage, but it was unclear if prostate cancer 
screening regimens based on PSA led to reductions 
in mortality. The European Randomized Study of 
Screening for Prostate Cancer followed men for 16 years 
and demonstrated a reduction in mortality with PSA 
screening, and can therefore be considered an impact 
study [37,38].

Study Design Issues
The R EM AR K g uidelines discuss study design 
considerations at length [39]. For instance, one key point 
is that assessors of the outcome should be blinded from 
the biomarker status. Another key concept is that of 
internal versus external validation. Interval validation 
occurs when a multivariable regression model is 
developed or a new cutpoint for a biomarker is selected 
and evaluated for performance on the same dataset. 
When a prediction model or biomarker cutpoint is 
developed and assessed on the same dataset estimates of 
performance are over-optimistic, a phenomenon known 
as overfitting [40,41]. Harrell et al. describe methods for 
obtaining optimism-corrected internal assessments of 
performance including data splitting, cross validation, 
and bootstrapping [42].

External validation not only solves the problem of 
overoptimism but evaluates genuine differences between 

20 SIUJ  •  Volume 1, Number 1  •  October 2020 SIUJ.ORG

MOLECULAR BIOMARKERS IN UROLOGIC ONCOLOGY: ICUD-WUOF CONSULTATION

http://www.decisioncurveanalysis.org
http://www.siuj.org


cohorts. A model predicting recurrence after radical 
prostatectomy, for instance, may be affected by surgeon 
skill—less skilled surgeons having higher recurrence 
rates—or by differences in pathologic grading. An 
excellent practical example of external validation 
was a study showing that the risk of prostate cancer 
among Chinese men with a given PSA had been shown 
to be lower than for European men, the most likely 
explanation being that Chinese men have higher rates of 
benign disease. This true difference between cohorts will 
mean that prediction models using PSA will likely have 
poor properties when applied in China [43].

Recommendations
In this paper, we have outlined the evaluation of pros-
tate cancer biomarkers. Our key “take-aways” can be 
summa rized as follows:

1.  Biomarkers should predict risk rather than be categorized 
as being above or below a fixed cutpoint: risk prediction 
allows individualization of care.

2. Choose a clinically relevant outcome: many endpoints 
commonly used in biomarker studies, such as incident 
prostate cancer or advanced surgical pathology, are 
problematic.

3. Evaluate the biomarker on the patients to whom the 
biomarker would be applied in practice.

4.  Follow the REMARK guidelines for the conduct and 
reporting of biomarker studies.

5. Biomarker research is comparative: the question is not 
whether a biomarker provides us with information, but 
whether it provides us better information than we already 
have, from clinical features or a currently used biomarker.

6.  Report discrimination, calibration, and net benefit: a 
biomarker must be able to discriminate better than existing 
predictors, but risk predictions must be close to a patient’s 
true risk; decision analysis is required to determine whether 
using the biomarker in clinical practice would change 
decisions and whether doing so would improve outcomes.

7. Conduct impact studies: evaluate how use of the biomarker 
in the real world affects outcomes.

Conclusions
It has often been noted that biomarker research has 
a poor track record of getting biomarkers into clinical 
practice. Following established principles of biomarker 
development increases the chances that a biomarker 
with good predictive properties will be incorporated 
into urologic decision-making and ultimately improve 
patient care. 

References

1. Thompson IM, Ankerst DP, Chi C, et al. Assessing prostate cancer 
risk: results from the Prostate Cancer Prevention Trial. J Ntl Cancer 
Inst. 2006;98(8):529-34.

2. Ankerst DP, Straubinger J, Selig K, et al. A Contemporary Prostate 
Biopsy Risk Calculator Based on Multiple Heterogeneous Cohorts. 
Eur Urol. 2018;74(2):197-203.

3. Kattan MW, Yu C, Stephenson AJ, Sartor O, Tombal B. Clinicians 
ver sus nomogram: predic ting future technetium-9 9 m bone 
scan positivity in patients with rising prostate-specific antigen 
af ter r adic al pros t ate c tomy for pros t ate c ancer. Urology. 
2013;81(5):956-61.

4. Jelovsek JE, Chagin K, Brubaker L, et al. A model for predicting  
the risk of de novo stress urinar y incontinence in women 
undergoing pelvic organ prolapse surgery. Obstet Gynecol. 2014;123 
(2 Pt 1):279-87.

5. Ross PL, Gerigk C, Gonen M, et al. Comparisons of nomograms 
and urologists’ predictions in prostate cancer. Semin Urol Oncol. 
2002;20(2):82-8.

6. Peeters KC, Kat tan MW, Har tgrink HH, et al. Validation of a 
nomogram for predicting disease-specific sur vival af ter an R0 
resection for gastric carcinoma. Cancer. 2005;103(4):702-7.

7. Novotny AR, Schuhmacher C, Busch R, Kattan MW, Brennan MF, 
Siewer t JR. Predicting individual sur vival af ter gastric cancer 
resection: validation of a U.S.-derived nomogram at a single high-
volume center in Europe. Ann Surg. 2006;243(1):74-81.

8. Weiser MR, Landmann RG, Kat tan MW, et al. Individualized 
prediction of colon cancer recurrence using a nomogram. J Clin 
Oncol. 2008;26(3):380-5.

9. Weiser MR, Gönen M, Chou JF, Kattan MW, Schrag D. Predicting 
survival after curative colectomy for cancer: individualizing colon 
cancer staging. J Clin Oncol. 2011;29(36):4796-802.

10. Steyerberg EW. Clinical prediction models: a practical approach to 
development, validation and updating. New York: Springer; 2019.

11. Schumacher FR, Al Olama AA, Berndt SI, et al. Association analyses 
of more than 14 0,000 men identif y 6 3 new prostate cancer 
susceptibility loci. Nat Genet. 2018;50(7):928-36.

12. Ver tosick E A, Häggström C, Sjoberg DD, et al. Prespecified 4 
Kallikrein Marker Model (4Kscore) at Age 50 or 60 for Early 
Detection of Lethal Prostate Cancer in a Large Population Based 
Cohor t of Asymptomatic Men Followed for 20 Years. J Urol. 
2020:101097ju0000000000001007.

21SIUJ.ORG SIUJ  •  Volume 1, Number 1  •  October 2020

Biomarker Evaluation and Clinical Development

http://www.siuj.org


13. Sjoberg DD, Vickers AJ, Assel M, et al. Twenty-year risk of prostate 
cancer death by midlife prostate-specific antigen and a panel of 
four Kallikrein markers in a large population-based cohort of healthy 
men. Eur Urol. 2018;73(6):941-8.

14. Marascio J, Spratt DE, Zhang J, et al. Prospective study to define 
the clinical utility and benefit of Decipher testing in men following 
prostatectomy. Prostate Cancer Prostatic Dis. 2020;23(2):295-302.

15. Vickers AJ, Ulmert D, Sjoberg DD, et al. Strategy for detection 
of prostate cancer based on relation between prostate specific 
antigen at age 40-55 and long term risk of metastasis: case-control 
study. BMJ. 2013;346:f2023.

16. Vickers AJ, Cronin AM, Björk T, et al. Prostate specific antigen 
concentration at age 60 and death or metastasis from prostate 
cancer: case-control study. BMJ. 2010;341:c4521.

17. Leman ES, Cannon GW, Trock BJ, et al. EPCA-2: a highly specific 
serum marker for prostate cancer. Urology. 2007;69(4):714-20.

18. Christensson A, Björk T, Nilsson O, et al. Serum prostate specific 
antigen complexed to alpha 1-antichymotrypsin as an indicator of 
prostate cancer. J Urol. 1993;150(1):100-5.

19. Catalona WJ, Smith DS, Wolfert RL, et al. Evaluation of Percentage 
of Free Serum Prostate-Specific Antigen to Improve Specificity of 
Prostate Cancer Screening. JAMA. 1995;274(15):1214-20.

20. Cooperberg MR, Broering JM, Carroll PR. Risk Assessment for 
Prostate Cancer Metastasis and Mortality at the Time of Diagnosis. 
J Ntl Cancer Inst. 2009;101(12):878-87.

21. Network NCC.

22. Bamber P. The area above the ordinal dominance graph and the area 
below the receiver operating characteristic graph. J Math Psychol. 
1975;12(4):387-415. 

23. Steyerberg E W, Vickers A J, Cook NR, et al. Assessing the 
performance of prediction models: a framework for traditional and 
novel measures. Epidemiology. 2010;21(1):128-38.

24. Hanley JA, McNeil BJ. The meaning and use of the area under 
a receiver operating characteristic (ROC) cur ve. Radiology. 
1982;143(1):29-36.

25. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas 
under two or more correlated receiver operating characteristic 
curves: a nonparametric approach. Biometrics. 1988;44(3):837-45.

26. Demler OV, Pencina MJ, D’Agostino RB, Sr. Misuse of DeLong test 
to compare AUCs for nested models. Stat Med. 2012;31(23):2577-87.

27. Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient 
for assessing new predictive markers. BMC Med Res Methodol. 
2011;11(1):13.

28. Hilden J, Habbema JD, Bjerregaard B. The measurement of 
performance in probabilistic diagnosis. II. Trustworthiness of the 
exact values of the diagnostic probabilities. Methods Inf Med. 
1978;17(4):227-37.

29. Jansen FH, van Schaik RH, Kurstjens J, et al. Prostate-specific 
antigen (PSA) isoform p2PSA in combination with total PSA and 
free PSA improves diagnostic accuracy in prostate cancer detection. 
Eur Urol. 2010;57(6):921-7.

30. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for 
evaluating prediction models. Med Decis Making. 2006;26(6):565-74.

31. Van Calster B, Vickers AJ. Calibration of risk prediction models: 
impact on decision-analy tic per formance. Med Decis Making. 
2015;35(2):162-9.

32. Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step 
guide to interpreting decision cur ve analysis. Diagn Progn Res. 
2019;3(1):18.

33. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches 
to the evaluation of prediction models, molecular markers, and 
diagnostic tests. BMJ. 2016;352:i6.

34. Vickers A J, Cronin AM, Gönen M. A simple decision analy tic 
solution to the comparison of t wo binar y diagnostic tests.  
Stat Med. 2013;32(11):1865-76.

35. Konet y B, Zappala SM, Parekh DJ, et al. The 4Kscore® Test 
Reduces Prostate Biopsy Rates in Community and Academic Urology 
Practices. Rev Urol. 2015;17(4):231-40.

36. White J, Tutrone RF, Reynolds MA. Second Reply to Letter to the 
Editor re: “Clinical utility of the Prostate Health Index (phi) for biopsy 
decision management in a large group urology practice setting”. 
Prostate Cancer Prostatic Dis. 2019;22(4):639-40.

37. Schröder FH, Hugosson J, Carlsson S, et al. Screening for prostate 
cancer decreases the risk of developing metastatic disease: findings 
from the European Randomized Study of Screening for Prostate 
Cancer (ERSPC). Eur Urol. 2012;62(5):745-52.

38. Hugosson J, Roobol MJ, Månsson M, et al. A 16-yr Follow-up of 
the European Randomized study of Screening for Prostate Cancer. 
Eur Urol. 2019;76(1):43-51.

39. McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark 
GM. Reporting recommendations for tumour MARKer prognostic 
studies (REMARK). Br J Cancer. 2005;93(4):387-91.

40. Smith GC, Seaman SR, Wood AM, Royston P, White IR. Correcting 
for optimistic prediction in small data sets. Am J Epidemiol. 
2014;180(3):318-24.

41. Steyerberg E. Over fit ting and optimism in prediction models.  
New York: Springer Verlag; 2009:83-100.

42. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: 
issues in developing models, evaluating assumptions and adequacy, 
and measuring and reducing errors. Stats Med. 1996;15(4):361-87.

43. Chen R, Sjoberg DD, Huang Y, et al. Prostate Specific Antigen 
and Prostate Cancer in Chinese Men Undergoing Initial Prostate 
Biopsies Compared with Western Cohorts. J Urol. 2017;197(1):90-6

22 SIUJ  •  Volume 1, Number 1  •  October 2020 SIUJ.ORG

MOLECULAR BIOMARKERS IN UROLOGIC ONCOLOGY: ICUD-WUOF CONSULTATION

http://www.siuj.org