Dermatology: Practical and Conceptual


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 1

Dermatology Practical & Conceptual

Detection of Malignant Melanoma Using Artificial 
Intelligence: An Observational Study of  

Diagnostic Accuracy
Michael Phillips,1 Jack Greenhalgh,2 Helen Marsden,2 Ioulios Palamaras3

1 Royal Perth Hospital, Perth, Australia; Harry Perkins Institute for Medical Research, Perth, Australia; and Centre for Medical Research, 

University of Western Australia, Perth, Australia

2 Skin Analytics Ltd., London, UK

3 Barnet and Chase Farm Hospitals, Royal Free NHS Foundation Trust, London, UK

Key words: melanoma, artificial intelligence, primary care, detection, identification

Citation: Phillips M, Greenhalgh J, Marsden H, Palamaras I. Detection of malignant melanoma using artificial intelligence: an 
observational study of diagnostic accuracy. Dermatol Pract Concept. 2020;10(1):e2020011. DOI: https://doi.org/10.5826/dpc.1001a11

Accepted: July 10, 2019; Published: December 31, 2019

Copyright: ©2019 Phillips et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, 
which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: None.

Competing interests: The authors have no conflicts of interest to disclose.

Authorship: All authors have contributed significantly to this publication.

Corresponding author: Michael Phillips, MMedSci, MRF Biostatistics Unit, 6th Floor MRF Building, Royal Perth Hospital, GPO Box 
X2213, Perth, WA 6847, Australia. Email: Michael.phillips@perkins.uwa.edu.au

Background: Malignant melanoma can most successfully be cured when diagnosed at an early stage 
in the natural history. However, there is controversy over screening programs and many advocate 
screening only for high-risk individuals.

Objectives: This study aimed to evaluate the accuracy of an artificial intelligence neural network 
(Deep Ensemble for Recognition of Melanoma [DERM]) to identify malignant melanoma from der-
moscopic images of pigmented skin lesions and to show how this compared to doctors’ performance 
assessed by meta-analysis.

Methods: DERM was trained and tested using 7,102 dermoscopic images of both histologically con-
firmed melanoma (24%) and benign pigmented lesions (76%). A meta-analysis was conducted of 
studies examining the accuracy of naked-eye examination, with or without dermoscopy, by specialist 
and general physicians whose clinical diagnosis was compared to histopathology. The meta-analysis 
was based on evaluation of 32,226 pigmented lesions including 3,277 histopathology-confirmed ma-
lignant melanoma cases. The receiver operating characteristic (ROC) curve was used to examine and 
compare the diagnostic accuracy.

Results: DERM achieved a ROC area under the curve (AUC) of 0.93 (95% confidence interval: 0.92-
0.94), and sensitivity and specificity of 85.0% and 85.3%, respectively. Avoidance of false-negative 
results is essential, so different decision thresholds were examined. At 95% sensitivity DERM achieved 
a specificity of 64.1% and at 95% specificity the sensitivity was 67%. The meta-analysis showed 
primary care physicians (10 studies) achieve an AUC of 0.83 (95% confidence interval: 0.79-0.86), 

ABSTRACT

mailto:Michael.phillips@perkins.uwa.edu.au


2 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

such as the recent Cochrane reviews of 

skin cancer.

Methods

DERM was designed and developed 

using deep learning techniques that 

identify and assess features of pig-

mented lesions that are associated with 

MM [23-28]. Deep learning differs from 

earlier machine learning methods by 

learning features that are associated 

with MM directly from the data, rather 

than using features predetermined by a 

researcher. The algorithm was trained 

and validated against a dataset of 

archived dermoscopic images of skin 

lesions, using 10-fold cross-validation. 

This approach allows every image to 

be tested once, while ensuring the same 

image does not appear in the training 

and test datasets. Cross-validation is 

performed by splitting the dataset into 

several (10) “folds” (datasets). The algo-

rithm is tested against each fold, with 

the remainder used for training. The 

results for each fold are then averaged 

so that the overall performance can be 

assessed.

The image dataset was collated from 

several different sources including the 

PH2 dataset [29], Interactive Atlas of 

Dermoscopy [30], and ISIC archive [31]. 

An additional 672 dermoscopic lesion 

images were collected from a variety 

of other sources. The ISIC archive con-

tains a large number of images obtained 

from children, which are easy to clas-

sify as benign. Their inclusion in the 

dataset was found to optimistically bias 

results so they were excluded from the 

development work. The ISIC archive 

also contains a large number of iden-

tical and near-identical images which 

were removed from the dataset. The 

involved smartphone photography and 

4 provided an estimate of the probability 

of malignancy. None of these apps had 

been assessed for diagnostic accuracy 

[17]. Understandably there is concern 

about the possible harm to patients that 

poorly designed, inaccurate, and/or mis-

leading consumer apps may cause [18-

20]. However, with appropriate devel-

opment and suitable evaluation there 

is no reason why modern electronic 

technology could not improve diag-

nostic accuracy. Recently, an artificial 

intelligence (AI) algorithm categorizing 

photographs of pigmented lesions has 

been shown to be capable of classifying 

MM with a level of competence com-

parable to that of dermatologists [21]. 

As Obermeyer and Emanuel state in a 

recent review, “Machine learning has 

become ubiquitous and indispensable 

for solving complex problems in most 

sciences. The same methods will open 

up vast new possibilities in medicine” 

[22]. However, there are ethical issues 

associated with the clinical applications 

of AI in medicine that do not apply to 

current business applications, astron-

omy, or chemistry, and these cannot be 

ignored [23].

The primary aim of this study was 

to evaluate the diagnostic accuracy of 

an AI algorithm (Deep Ensemble for 

Recognition of Melanoma [DERM]) 

developed by Skin Analytics Limited. 

The secondary aim was to improve the 

methodology for evaluating an AI diag-

nostic tool by comparing DERM’s per-

formance with clinical examination by 

physicians and stratification based on 

level of expertise and use of dermoscopy 

using a meta-analysis of diagnostic stud-

ies. But it should be noted that this was 

not designed to be a systematic review 

Introduction

Malignant melanoma (MM) is less com-

mon than basal and squamous cell skin 

cancer; however, the incidence of MM 

is increasing faster than that of other 

forms of cancer and it is responsible 

for the majority of skin cancer deaths 

[1]. Early diagnosis of MM (stage 1) 

has more than 95% five-year relative 

survival rate compared with 8% to 25% 

for MM diagnosed at later stages [2].

Current practice guidelines in the 

United Kingdom recommend appro-

priately trained health care profession-

als assess all suspect pigmented lesions 

using dermoscopy [1,3]. Diagnosis is 

confirmed with biopsy, histological 

examination, and specialist pathologi-

cal interpretation. Pressure to diagnose 

MM early leads to a high proportion of 

benign pigmented lesions being referred 

from primary care to specialist care, and 

a large proportion of biopsied lesions 

are found to be benign [4,5]. This cre-

ates increased demands on overbur-

dened secondary care and pathology 

service resources [6]. Improved accuracy 

of pigmented lesion review in primary 

care would help reduce this pressure. 

Techniques such as dermoscopy with 

classification algorithms, reflectance 

confocal microscopy, and telederma-

tology have been reported to improve 

diagnostic accuracy of MM [7-15]. 

However, the diagnostic accuracy is still 

dependent on the degree of experience 

of the examiners and the equipment 

required is costly [16].

A large number of smartphone 

applications for MM detection have 

been released recently. However, there 

is little evidence of clinical valida-

tion. Kassianos et al reviewed 39 apps 

that addressed skin cancer issues; 19 

with sensitivity and specificity of 79.9% and 70.9%; and dermatologists (92 studies) 0.91 (0.88-0.93), 
87.5%, and 81.4%, respectively.

Conclusions: DERM has the potential to be used as a decision support tool in primary care, by pro-
viding dermatologist-grade recommendation on the likelihood of malignant melanoma.

ABSTRACT


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 3

number of MM diagnoses confirmed 

by histology, from which the counts 

could be derived. The reports were 

also examined for information con-

cerning physician experience (general 

vs specialist physician) and context of 

use (primary care, secondary care). A 

meta-analysis from this data was con-

ducted. The Stata user-written packages 

METANDI [42] and MIDAS [43] were 

used, and a meta-regression was used to 

examine associations between diagnos-

tic accuracy and year of study report, 

level of care, and expertise of the practi-

tioner. Many of the dermoscopy studies 

reported multiple results for each lesion 

using different dermoscopic algorithms 

(eg, ABCD, 7-point checklist, etc. [44]); 

all of these results were included in the 

dataset. Since this produces a clustered 

dataset, violating the statistical assump-

tion of the independence of observa-

tions, we conducted a sensitivity anal-

ysis. Multiple datasets were generated 

in which 1 estimate only was randomly 

included for each study where there 

were multiple estimates. The results 

indicated that the initial estimates were 

not sensitive to the clustering (details 

of this analysis are not reported here).

and a meta-analysis enables comparison 

to a variety of different clinician expe-

riences and evaluation techniques. This 

analysis was not intended to be system-

atic review, but the PRISMA guidelines 

were followed when appropriate.

A literature search was conducted 

for studies reporting diagnostic accu-

racy data of naked-eye clinical exam-

ination, with or without dermoscopy, 

compared with histologically confirmed 

diagnosis. MEDLINE (413), Web of 

Science (707), and EMBASE (322) were 

searched for the period from January 

1, 1990, to September 30, 2017, using 

terms “accuracy pigmented lesions 

PLUS melanoma pigmented lesions 

PLUS detection,” “dermoscopy pig-

mented lesions PLUS melanoma pig-

mented lesions PLUS accuracy,” and 

“melanoma pigmented lesions PLUS 

diagnosis pigmented lesions PLUS pri-

mary care.” Studies included in previous 

systematic reviews were also included 

[2,15,39-41]. The PRISMA flow dia-

gram is shown in Figure 2. One author 

(M.P.) conducted the literature search 

and extracted counts of true negative; 

true positive; false negative; false pos-

itive; or estimates of sensitivity, speci-

ficity, number of lesions examined, and 

final dataset consists of a total of 7,102 

unique pigmented lesion images, 24% 

being confirmed as MM by histopathol-

ogy, though subtype information was 

not available, the rest being made up of 

benign and nonbenign lesions.

DERM generates a continuous 

response to an image with limits of 0 

and 1, which reflects its “confidence” 

that the lesion is MM: a value close to 

1 indicates MM and near 0 indicates a 

benign lesion. A nonparametric receiver 

operating characteristic (ROC) curve 

analysis was used to examine the over-

all diagnostic accuracy of the result 

using Pepe’s nonparametric methods 

with bootstrapped estimation [32]. The 

gold standard for MM was histopathol-

ogy. We examined different cut-points 

used by DERM to categorize lesions 

as positive or negative, ie, illustrating 

alternative diagnostic rules from the 

diagnostic model [33]. The methods of 

Youden [34] and Liu [35] were used, as 

well as the values that maximized the 

ROC area, resulted in a sensitivity and 

a specificity of 95%, and generated less 

than 1% false negative. The area under 

the curve (AUC) of the ROC curve, spec-

ificity/sensitivity, and diagnostic odds 

ratios were calculated for each of these 

cut-points.

The ROC AUC is not a perfect assess-

ment measure for diagnostic methods 

when the standard error of the estima-

tor is quite different for the diagnostic 

alternatives (benign pigmented lesions 

vs MM), as is the case for DERM (see 

Figure 1) [36]. This issue was addressed 

by constructing the Lorenz curve (a mir-

ror image of the ROC curve) with the 

associated Gini index [37].

To compare the accuracy of DERM 

with that of current diagnostic practices, 

we decided to conduct a meta-analysis 

of studies of diagnostic accuracy for 

MM rather than have a limited panel of 

dermatologists conduct parallel assess-

ments, as has been done in other studies 

[21,38]. We chose this approach because 

biopsy-based histopathology provides 

the gold standard for MM diagnosis, 

Figure 1. Level of confidence of Deep Ensemble for Recognition of Melanoma (DERM) 

algorithm by lesion type.


4 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

The empirical ROC curve analysis 

showed that DERM has a high level of 

accuracy with an AUC of 0.928 (95% 

confidence interval: 0.922-0.935) and an 

acceptable goodness-of-fit χ2 = 6,078 (P = 
0.98) (Figure 3). The Lorenz curve analy-

sis gave a Gini index of 0.857. The Gini 

index has an upper limit of 1 and the 

high value is indicative of high inequality 

estimated the median level of confi-

dence as 0.059 (interquartile range: 

0.016-0.171) when the lesion was a 

benign pigmented lesion and 0.651 

(interquartile range: 0.417-0.849) when 

the lesion was MM. The equality of 

the 2 medians was compared by Fisher 

exact test and found to be significantly 

different (P < 0.0001).

All analysis was conducted by 

M.P. using the Stata statistical package 

(StataCorp. 2015. Stata Statistical Soft-

ware: Release 15. College Station, TX: 

StataCorp LP).

Most of the data used to create the 

algorithm were based on anonymous, 

publicly available images, and an addi-

tional 672 anonymized dermoscopic 

lesion images were generously made 

available by clinical dermatologists. The 

meta-analysis data were derived from 

published papers that did not include 

individual patient data. There was no 

requirement for ethics approval, but 

the Ethics Committee of Royal Perth 

Hospital was informed of the study as 

a courtesy.

Results

Histograms showing the distribution 

of the DERM value for MM and for 

benign lesions are shown in Figure 1. 

The histograms show that the value 

does not follow a normal distribution 

and there is a different dispersion of 

data for the 2 types of lesion. DERM 

Figure 3. The receiver operating characteristic curve of Deep Ensemble for Recognition of 

Melanoma (DERM) results. Shaded area shows 95% confidence interval.

Figure 2. PRISMA flow diagram of publications searched for the meta-analysis.


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 5

Table 1. Indices of Diagnostic Accuracy (±95% CI) at 
Different Cut-Points of the DERM Confidence Value

Cut-Point
DERM 
Value

Sensitivity 
(%)

Specificity 
(%)

Diagnostic 
Odds Ratio

Optimum (maxi-
mum AUC)

0.272 85.0
(83.2-86.7)

85.3
(84.4-86.3)

33.0
(28.3-38.4)

Confidence ≥0.50 0.50 67.3
(65.0-69.5)

95.5
(94.9-96.0)

43.7
(37.1-51.5)

80% Sensitivity 0.35 80
(fixed)

90.8
(90.0-91.5)

37.1
(32.1-42.9)

95% Sensitivity 0.11 95.0
(93.8-96.0)

64.1
(62.8-65.4)

33.6
(26.8-42.1)

High sensitivity 0.05 98.6
(98.0-99.1)

46.5
(45.2-47.9)

62.9
(41.7-95.0)

80% Specificity 0.21 88.2
(86.6-89.7)

80
(fixed)

32.7
(27.9-38.4)

95% Specificity 0.795 66.9
(64.3-69.3)

95%
(fixed)

38.3
(32.1-45.7)

AUC = area under the curve; CI = confidence interval; DERM = Deep Ensemble 
for Recognition of Melanoma.

between MM and benign lesions, which 

supports the ROC analysis.

The Youden, Liu, and maximum 

AUC methods estimated the same 

optimum cut-point at a value of 0.272 

(95% confidence interval: 0.232-0.313) 

(Table 1). As the sensitivity increases, 

the expected loss of specificity occurs, 

but when the sensitivity is fixed at 95%, 

specificity is still 64%.

The summary of 82 studies that 

investigated the diagnostic accuracy of 

naked-eye examination (n = 29) or der-

moscopy (n = 53) for pigmented lesions 

and MM between 1990 and 2017 is 

shown in Table 2. A visual guide to the 

study accuracy is provided in the forest 

plots in Figures 4 and 5. Table 3 shows 

the pooled and weighted values of sen-

Table 2. Studies for Meta-analysis of Diagnostic Accuracy

Author [Ref] Date Total Lesions
No. of Malignant 
Melanomas (%)

Sensitivity 
(%)

Specificity 
(%)

Country of 
Patients

Annessi [47] 2007 198 96 (48.5) 81.3 69.6 Italy

Argenziano [48] 1998 309 106 (34.3) 95.0 75.0 Italy

Argenziano [49] 2006 2,528 12 (0.475) 79.2 71.8 Spain, Italy

Argenziano [50] 2011 283 78 (27.6) 87.8 74.5 Italy

Ascierto [51] 2010 54 12 (22.2) 66.6 76.2 Italy

Barzegari [52] 2005 122 6 (4.92) 100 90.0 Iran

Benelli [53] 1999 401 60 (15.0) 85.0 89.1 Italy

Benelli [54] 2000 600 76 (12.7) 68.8 86.0 Italy

Binder [55] 1995 100 37 (37.0) 73.0 74.0 Austria

Binder [56] 1997 240 58 (24.2) 63.0 91.0 Austria

Blum [57] 2004 269 84 (31.2) 95.2 77.8 Germany

Bono [58] 2002 313 125 (39.9) 88.5 75.5 Italy

Bono [59] 2006 206 76 (36.9) 63.0 80.0 Italy

Carli [60] 1998 15 4 (26.7) 58.5 83.5 Italy

Carli [61] 2003 200 44 (22.0) 91.9 35.2 Italy

Carli [62] 2003 311 28 (9.00) 100 88.5 Italy

Cristofolini [63] 1994 220 33 (15.0) 86.5 77.0 Italy

Dal Pozzo [64] 1999 713 168 (23.6) 94.6 85.5 Italy

Doliantis [65] 2005 40 20 (50.0) 84.6 77.7 Australia

Dreiseitl [66] 2009 458 146 (31.9) 96.0 72.0 Germany

Dummer [67] 1993 824 25 (3.03) 80.5 95.5 Germany

Feldmann [68] 1998 500 30 (6.00) 88.0 64.0 Austria

Fueyo-Casado [69] 2009 303 16 (5.28) 100 97.0 Brazil

Gereli [70] 2010 96 48 (50.0) 89.6 31.2 Turkey

(table continues next page)


6 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

and nonexperts both for naked-eye 

visual clinical examination (P < 0.001) 

and dermoscopy (P < 0.001), which is 

reflected in the estimated values shown 

in Table 3, where experts have both 

higher sensitivity and specificity than 

nonexperts, and is most marked for 

specificity for both methods and for sen-

sitivity only for dermoscopy (Figure 6). 

The contrast in accuracy is most obvi-

ous for primary vs secondary care (P < 

0.0001) with the AUC differing by 8% 

specificity = 83%, β = 0.048, P = 0.81; 
and for dermoscopy the pooled results 

are as follows: AUC = 0.91, sensitiv-

ity = 86%, specificity = 81%, β = 0.397, 
P = 0.005.

Meta-regression for the year of pub-

lication showed no significant associ-

ation assessed by the combination of 

sensitivity and specificity for either 

visual clinical examination (P = 0.25) 

or dermoscopy (P = 0.18). There was a 

significant difference between experts 

sitivity, specificity, and diagnostic odds 

ratio for the studies. The pooled results 

for all studies are as follows: AUC = 

0.90, sensitivity = 85%, and specificity = 

82%. The beta value (an indicator of 

asymmetry of the summary ROC curve) 

is statistically significant (β = 0.263, P = 
0.022), indicating that the diagnostic 

odds ratio shows variation across the 

summary ROC curve. For naked-eye 

examination the pooled results are as 

follows: AUC = 0.88, sensitivity = 79%, 

Table 2. Studies for Meta-analysis of Diagnostic Accuracy (continued)

Author [Ref] Date Total Lesions
No. of Malignant 
Melanomas (%)

Sensitivity 
(%)

Specificity 
(%)

Country of 
Patients

Glud [71] 2009 83 12 (14.5) 92.0 81.0 Denmark

Haenssle [72] 2010 1,219 127 (10.4) 62.0 97.0 Germany

Har-Shai [73] 2005 400 53 (13.3) 86.0 74.0 Israel

Henning [74] 2008 150 50 (33.3) 92.0 38.0 USA

Keefe [75] 1990 222 11 (4.95) 85.7 66.5 Scotland

Krähn [76] 1998 80 39 (48.8) 90.0 93.0 Germany

Kreusch [77] 1992 317 96 (30.3) 98.9 94.1 Germany

Lorentzen [78] 1999 232 49 (21.1) 59.0 92.0 Denmark

Lorentzen [79] 2000 258 64 (24.8) 70.7 88.0 Denmark

Luttrell [80] 2012 200 25 (12.5) 91.2 94.0 Austria

MacKie [81] 2002 126 69 (54.8) 97.0 55.0 Scotland

McGovern [82] 1992 237 16 (6.75) 44.0 94.0 USA

Menzies [83] 1996 385 107 (27.8) 92.0 71.0 Australia

Menzies [84] 2008 497 105 (21.1) 95.0 80.0 Australia

Menzies [85] 2013 465 217 (46.7) 93.0 70.0 Australia

Nachbar [86] 1994 172 69 (40.1) 92.8 91.2 Germany

Nilles [87] 1994 260 72 (27.7) 90.0 85.0 Germany

Perrinaud [88} 2007 90 78 (86.7) 98.0 37.0 Switzerland

Piccolo [89] 2014 165 33 (20.0) 91.0 52.0 Italy

Rao [90] 1997 72 51 (70.8) 91.5 59.3 USA

Rosendahl [9] 2011 246 79 (32.1) 82.6 80.0 Australia

Skvara [91] 2005 325 63 (19.4) 31.7 87.3 Austria

Soyer [92] 1995 159 65 (40.9) 94.0 82.0 Italy

Soyer [93] 2004 231 68 (29.4) 96.3 32.8 Italy

Stanganelli [94] 2000 3,372 55 (1.63) 80.0 99.5 Italy

Unlu [95] 2014 115 24 (20.9) 91.6 64.8 Turkey

Walter [96] 2013 1,436 36 (2.51) 91.7 33.1 England

Westerhoff [97] 2000 100 50 (50.0) 54.6 56.1 Australia

Zalaudek [98] 2006 150 44 (29.3) 94.0 71.9 Multiple

Youl [99] 2007 11,116 49 (0.441) 60.0 98.0 Australia

All studies (n = 55) 32,226 3,277 (10.2)


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 7

Figure 4. Forest plot for naked-eye examination.

Figure 5. Forest plot for dermoscopy.


8 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

results confirm that clinician experience and use of dermos-

copy improve accuracy. DERM achieves an AUC of 0.93, sen-

sitivity and specificity of 85% and 85%, respectively, when 

using the estimated optimum value of 0.28. This is higher 

than naked-eye visual assessment (0.88, 80% and 71%), and 

similar to findings for dermatologists with dermoscopy (0.91, 

85% and 82%). This is illustrated by plotting a ROC curve of 

the data from studies in the meta-analysis, and superimposing 

the DERM data from 4 cut-points (Figures 6 and 7).

A recent comprehensive series of Cochrane reviews con-

cluded that visual inspection alone had a specificity of 42% 

at a fixed sensitivity of 80% and a sensitivity of 76% at a 

fixed specificity of 80%, whereas dermoscopy plus visual 

inspection had a specificity of 92% at a fixed sensitivity of 

80% and a sensitivity of 82% at a fixed specificity of 80% 

(0.83 vs 0.91) (Figure 7). There was no association between 

the AUC and year of study publication, suggesting that diag-

nostic accuracy is not improving over time (P = 0.63).

Discussion

Summary

Herewith we present an extensive evaluation of the ability 

of DERM to identify MM from dermoscopic images of skin 

lesions. This preliminary analysis demonstrates the ability of 

an AI-based system to learn features of a skin lesion that are 

associated with MM, which can then be applied to the identi-

fication of MM. We conducted a meta-analysis of MM diag-

nostic accuracy to generate comparative values from current 

primary care and specialist dermatologist practices. These 

 Table 3. Meta-analysis Results

Subgroup
No. of 

Estimatesa
No. of 
Lesions

No. of 
Malignant 
Melanoma

Sensitivity 
(%) 

(95% CI)

Specificity 
(%) 

(95% CI)

sROC Area 
(95% CI)

All studies Naked eye 29 23,930 2,140 79
(72-85)

83
(76-88)

0.88
(0.85-0.91)

Dermoscopy 79 33,749 5,031 86
(83-89)

81
(76-86)

0.91
(0.88-0.93)

All studies Nonexperts 20 22,580 1,630 82
(73-89)

73
(60-83)

0.85
(0.82-0.88)

Experts 65 29,767 3,812 84
(79-87)

85
(80-89)

0.91
(0.88-0.93)

All studies Primary care 10 19,152 867 80
(65-89)

71
(52-85)

0.83
(0.79-0.86)

Secondary care 87 36,673 5,480 85
(82-88)

82
(77-87)

0.91
(0.88-0.93)

Nonexperts Naked eye 9 16,304 1,045 78
(60-90)

74
(54-88)

0.83
(0.80-0.86)

Dermoscopy 11 6,279 585 83
(76-89)

72
(55-84)

0.86
(0.83-0.89)

Experts Naked eye 16 7,115 922 79
(70-86)

86
(79-91)

0.90
(0.87-0.92)

Dermoscopy 49 22,652 2,890 85
(79-89)

85
(77-90)

0.91
(0.89-0.94)

Primary care Naked eye 6 14,822 595 78
(52-92)

74
(43-91)

0.83
(0.80-0.86)

Dermoscopy 4 4,330 272 82
(74-87)

66
(57-74)

0.83
(0.79-0.86)

Secondary care Naked eye 19 8,597 1,372 79
(71-86)

85
(78-90)

0.89
(0.86-0.91)

Dermoscopy 68 28,076 4,108 87
(83-90)

82
(75-87)

0.91
(0.88-0.93)

aThe number of estimates exceeds the number of studies because multiple estimates are made using dermoscopy with alter-
native diagnostic algorithms.
CI = confidence interval; sROC = summary receiver operating characteristic.
 

Research  |  Dermatol Pract Concept 2020;10(1):e2020011 9

Strengths and Limitations

We trained our algorithm using archived images that have 

been published to train clinicians. It is likely that biases exist 

in the datasets (eg, patient demographics, MM subtypes, 

image capture methods), but it is very difficult to determine 

whether such biases exist and thus have been introduced 

into DERM during its development. In addition, it must be 

[45]. Our meta-analysis showed for visual inspection alone 

specificity of 83% when sensitivity was 80%; sensitivity of 

78% when specificity was 80%; specificity of 86% when 

sensitivity was 80%; and sensitivity of 87% when specificity 

was 80%. DERM gave comparable indices of specificity of 

89% at sensitivity of 80% and a sensitivity of 90% at spec-

ificity of 80%.

Figure 7. Summary receiver operating characteristic curves for primary and secondary care overlaid with the Deep Ensemble for Recogni-

tion of Melanoma (DERM) sensitivity and specificity at cut-points from Table 1 (the shaded rectangle shows the summary point from the 

meta-analysis).

Figure 6. Summary receiver operating characteristic curves for naked eye and dermoscopic diagnosis overlaid with the Deep Ensemble for 

Recognition of Melanoma (DERM) sensitivity and specificity at cut-points from Table 1 (the shaded rectangle shows the summary point from 

the meta-analysis). AUC = area under the curve.


10 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

false-negative and false-positive results have equal impor-

tance. This is not the case when dealing with a life-threatening 

disease, such as MM, where a cut-point that maximizes sen-

sitivity—thus reducing the number of false-negative cases—

should be adopted. However, this results in a higher false-pos-

itive rate, which has health care and patient costs associated 

with further investigations. The most appropriate cut-point 

for use in a clinical setting will need to be determined by 

consensus agreement taking into account both clinical and 

economic factors and is likely to be different for different 

clinical settings and levels of care.

At high levels of sensitivity, DERM offers comparable 

specificity to dermatologists with dermatoscopes. DERM 

could therefore provide dermatologist-grade advice on like-

lihood of MM to general practitioners without the cost and 

training requirements of dermoscopy. While diagnostic accu-

racy plays a pivotal role in the clinical evaluation of diagnos-

tic tests, it does not prove that the test improves outcomes in 

relevant patient populations or that it enhances health care 

quality, efficiency, and cost-effectiveness. The only way to 

truly determine a test’s utility in the real-life decision-making 

setting of clinics is by conducting prospective clinical trials. 

We are currently conducting clinical validation studies of 

DERM. To our knowledge, no other AI-based MM diagnos-

tic test is undergoing such extensive clinical utility testing 

[23,46,47].

Conclusions

Our study demonstrates the ability of an AI-based system to 

learn features of a skin lesion photograph that are associated 

with MM. DERM has the potential to be used in primary 

care to provide dermatologist-grade decision support. It is 

too early to say deployment of DERM would reduce onward 

referral, but such clinical validation is ongoing. 

References

 1. Marsden JR, Newton-Bishop JA, Burrows L, et al. Revised U.K. 

guidelines for the management of cutaneous melanoma 2010. Br 

J Dermatol. 2010;163(2):238-256.

 2. Wernli KJ, Henrikson NB, Morrison CC, Nguyen M, Pocobelli 

G, Whitlock EP. Screening for Skin Cancer in Adults: An Updated 

Systematic Evidence Review for the US Preventive Services Task 

Force [Internet]. U.S. Preventive Services Task Force Evidence 

Syntheses, formerly Systematic Evidence Reviews. Rockville, MD: 

Agency for Healthcare Research and Quality (US); 2016.

 3. National Collaborating Centre for Cancer (UK). Melanoma: 

Assessment and Management. National Institute for Health and 

Care Excellence: Clinical Guidelines. London: National Institute 

for Health and Care Excellence (UK); 2015.

 4. Welch HG, Woloshin S, Schwartz LM. Skin biopsy rates and 

incidence of melanoma: population based ecological study. BMJ. 

2005;331(7515):481.

emphasized that the algorithm was trained predominantly 

using images of images rather than images created in a clini-

cal setting. We are currently collecting such images during a 

clinical trial and plan to report the results in the near future.

By using postbiopsy histology as the gold standard for 

both DERM and the inclusion criteria for our meta-analysis, 

images of nonsuspicious lesions have not been included when 

training or evaluating DERM. We have therefore not shown 

the ability of DERM (or clinicians) to accurately classify 

nonsuspicious lesions, which could lead to verification bias 

as was observed by a study of cancer registry data during a 

prospective follow-up [46]. However, this bias will apply to 

both the evaluation of DERM and the meta-analysis results, 

so it seems unlikely that the comparison of the 2 would be 

affected, but it remains a possibility.

A strength of our study is that the use of a meta-analysis 

of naked-eye examination and dermoscopy, the most common 

current diagnostic methods for MM used in primary care, is 

based on evaluation of 32,226 pigmented lesions including 

3,277 histopathology-confirmed MM.

Comparison With Existing Literature

Recently, 2 other groups who retooled versions of Google’s 

Inception network for the identification of melanoma showed 

accuracy equivalent to or better than that of a panel of 

dermatologists [22,23]. However, this approach is likely to 

generate issues such as overfitting (because of the small size 

of the review panel) and a lack of generalization (because of 

the selected nature of the voluntary reviewers).

A recent addition to the literature was the publication 

of an extensive systematic review by the Cochrane Collab-

oration skin group [45]. Four studies were conducted on 

melanoma diagnosis in adults by visual inspection, dermos-

copy with and without visual inspection, reflectance confocal 

microscopy, and smartphone applications for triaging suspi-

cious lesions. The dates of publication were slightly different 

from our study dates (up to August 2016 compared with 

September 2017), they searched more databases, and they 

did not limit themselves to histology-confirmed pathology as 

the diagnostic outcome but also included clinical follow-up 

of benign-appearing lesions, cancer registry follow-up, and 

“expert opinion with no histology or follow-up.” Despite 

these differences, the number of studies is very similar. We 

identified 108 studies (29 visual and 79 dermoscopy) and they 

identified 104 (24 visual and 86 dermoscopy).

Implications for Research and Practice

Using different cut-points at which DERM defines a lesion 

as MM, the sensitivity and specificity ranged between 85.0% 

to 98.6% and 85.3% to 62.9%, respectively. The cut-points 

calculated by the Youden and Liu methods assume that 


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 11

24. Simonyan K, Zisserman A. Very deep convolutional networks. 

Presented at International Conference on Machine Learning and 

Applications (IEEE ICMLA’15); Miami, FL: IEEE; 2015. https://

arxiv.org/abs/1602.07261.

25. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Incep-

tion-ResNet and the impact of residual connections on learning. 

2016. https://arxiv.org/abs/1602.07261.

26. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks 

for biomedical image segmentation. Navab N et al, eds. MICCAI 

2015, Part III, LNCS 9351, pp 234-241, 2015. DOI: 10.1007/978-

3-319-24574-4_28. https://arxiv.org/abs/1505.04597.

27. Codella N, Nguyen Q-B, Pankanti S, et al. Deep learning ensem-

bles for melanoma recognition in dermoscopy images. IBM J Res 

Dev. 2017;61(4/5):5:1-5:15. https://arxiv.org/abs/1610.04662.

28. Clevert D, Unterthiner T, Hochreiter S. Fast and accurate deep 

network learning by exponential linear units (ELUs). Presented at 

International Conference on Learning Representations, San Juan, 

Puerto Rico, 2016. https://arxiv.org/abs/1511.07289.

29. Mendonca T, Ferreira P, Marques J, Marcal A, Rozeira J. PH2—a 

dermoscopic image database for research and benchmarking. 

35th Annual International Conference of the IEEE Engineering 

in Medicine and Biology Society: Osaka, Japan, 2013. DOI: 

10. 1109/EMBC.2013.6610779. https://www.ncbi.nlm.nih.gov/

pubmed/24110966.

30. Argenziano G, Soyer P, De Giorgio V, et al. Interactive Atlas of 

Dermoscopy. Milan, Italy: Edra Medical Publishing and New 

Media; 2000: 208.

31. ISIC. ADDI Project 2012. PH2 Database. Available at: https://

www.fc.up.pt/addi/ph2%20database.html. Accessed 15/5/2017. 

32. Pepe MS. The Statistical Evaluation of Medical Tests for Classifi-

cation and Prediction. Oxford: Oxford University Press; 2003.

33. Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, 

Van Calster B. Assessing the incremental value of diagnostic and 

prognostic markers: a review and illustration. Eur J Clin Invest. 

2012;42(2):216-228.

34. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32-

35.

35. Liu X. Classification accuracy and cut point selection. Stat Med. 

2012;31(23):2676-2686.

36. Lee WC. Probabilistic analysis of global performances of diagnos-

tic tests: interpreting the Lorenz curve-based summary measures. 

Stat Med. 1999;18(4):455-471.

37. Irwin JR, Hautus MJ. Lognormal Lorenz and normal receiver 

operating characteristic curves as mirror images. R Soc Open Sci. 

2015;2(2):140280.

38. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against ma-

chine: diagnostic performance of a deep learning convolutional 

neural network for dermoscopic melanoma recognition in com-

parison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842.

39. Rajpara SM, Botello AP, Townend J, Ormerod AD. Systematic re-

view of dermoscopy and digital dermoscopy/artificial intelligence 

for the diagnosis of melanoma. Br J Dermatol. 2009;161(3):591-

604.

40. Harrington E, Clyne B, Wesseling N, et al. Diagnosing malignant 

melanoma in ambulatory care: a systematic review of clinical 

prediction rules. BMJ Open. 2017;7(3):e014096.

41. Bafounta ML, Beauchet A, Aegerter P, Saiag P. Is dermoscopy (epi-

luminescence microscopy) useful for the diagnosis of melanoma? 

Results of a meta-analysis using techniques adapted to the eval-

 5. Chen SC, Pennie ML, Kolm P, et al. Diagnosing and managing 

cutaneous pigmented lesions: primary care physicians versus 

dermatologists. J Gen Intern Med. 2006;21(7):678-682.

 6. Goodson AG, Grossman D. Strategies for early melanoma detec-

tion: approaches to the patient with nevi. J Am Acad Dermatol. 

2009;60(5):719-735.

 7. Vestergaard ME, Macaskill P, Holt PE, Menzies SW. Dermoscopy 

compared with naked eye examination for the diagnosis of prima-

ry melanoma: a meta-analysis of studies performed in a clinical 

setting. Br J Dermatol. 2008;159(3):669-676.

 8. Menzies SW. Evidence-based dermoscopy. Dermatol Clin. 

2013;31(4):521-524, vii.

 9. Rosendahl C, Tschandl P, Cameron A, Kittler H. Diagnostic 

accuracy of dermatoscopy for melanocytic and nonmelanocytic 

pigmented lesions. J Am Acad Dermatol. 2011;64(6):1068-1073.

10. Herschorn A. Dermoscopy for melanoma detection in family 

practice. Can Fam Physician. 2012;58(7):740-745, e372-e378.

11. Creighton-Smith M, Murgia RD 3rd, Konnikov N, Dornelles A, 

Garber C, Nguyen BT. Incidence of melanoma and keratinocytic 

carcinomas in patients evaluated by store-and-forward telederma-

tology vs. dermatology clinic. Int J Dermatol. 2017;56(10):1026-

1031.

12. Borsari S, Pampena R, Lallas A, et al. Clinical indications for 

use of reflectance confocal microscopy for skin cancer diagnosis. 

JAMA Dermatol. 2016;152(10):1093-1098.

13. Xiong YQ, Ma SJ, Mo Y, Huo ST, Wen YQ, Chen Q. Comparison 

of dermoscopy and reflectance confocal microscopy for the diag-

nosis of malignant skin tumours: a meta-analysis. J Cancer Res 

Clin Oncol. 2017;143(9):1627-1635.

14. Kardynal A, Olszewska M. Modern non-invasive diagnostic tech-

niques in the detection of early cutaneous melanoma. J Dermatol 

Case Rep. 2014;8(1):1-8.

15. Kittler H, Pehamberger H, Wolff K, Binder M. Diagnostic accu-

racy of dermoscopy. Lancet Oncol. 2002;3(3):159-165.

16. Brewer AC, Endly DC, Henley J, et al. Mobile applications in 

dermatology. JAMA Dermatol. 2013;149(11):1300-1304.

17. Kassianos AP, Emery JD, Murchie P, Walter FM. Smartphone ap-

plications for melanoma detection by community, patient and gen-

eralist clinician users: a review. Br J Dermatol. 2015;172(6):1507-

1518.

18. Ferrero NA, Morrell DS, Burkhart CN. Skin scan: a demonstra-

tion of the need for FDA regulation of medical apps on iPhone. J 

Am Acad Dermatol. 2013;68(3):515-516.

19. Wolfe JA, Ferris LK. Diagnostic inaccuracy of smartphone 

applications for melanoma detection: reply. JAMA Dermatol. 

2013;149(7):885.

20. Stoecker WV, Rader RK, Halpern A. Diagnostic inaccuracy of 

smartphone applications for melanoma detection: representative 

lesion sets and the role for adjunctive technologies. JAMA Der-

matol. 2013;149(7):884.

21. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level clas-

sification of skin cancer with deep neural networks. Nature. 

2017;542(7639):115-118.

22. Obermeyer Z, Emanuel EJ. Predicting the future—big data, 

machine learning, and clinical medicine. N Engl J Med. 

2016;375(13):1216-1219.

23. Char DS, Shah NH. Implementing machine learning in 

health care—addressing ethical challenges. N Engl J Med. 

2018;378(11):981-983.

https://arxiv.org/abs/1602.07261
https://arxiv.org/abs/1602.07261
https://arxiv.org/abs/1602.07261
https://arxiv.org/abs/1610.04662
https://arxiv.org/abs/1511.07289


12 Research  |  Dermatol Pract Concept 2020;10(1):e2020011

60. Carli P, De Giorgi V, Naldi L, Dosi G. Reliability and inter-observ-

er agreement of dermoscopic diagnosis of melanoma and melano-

cytic naevi: Dermoscopy Panel. Eur J Cancer Prev. 1998;7(5):397-

402.

61. Carli P, Quercioli E, Sestini S, et al. Pattern analysis, not simplified 

algorithms, is the most reliable method for teaching dermoscopy 

for melanoma diagnosis to residents in dermatology. Br J Derma-

tol. 2003;148(5):981-984.

62. Carli P, Mannone F, de Giorgi V, Nardini P, Chiarugi A, Giannotti 

B. The problem of false-positive diagnosis in melanoma screening: 

the impact of dermoscopy. Melanoma Res. 2003;13(2):179-182.

63. Cristofolini M, Zumiani G, Bauer P, Cristofolini P, Boi S, Micciolo 

R. Dermatoscopy: usefulness in the differential diagnosis of cuta-

neous pigmentary lesions. Melanoma Res. 1994;4(6):391-394.

64. Dal Pozzo V, Benelli C, Roscetti E. The seven features for melano-

ma: a new dermoscopic algorithm for the diagnosis of malignant 

melanoma. Eur J Dermatol. 1999;9(4):303-308.

65. Dolianitis C, Kelly J, Wolfe R, Simpson P. Comparative perfor-

mance of 4 dermoscopic algorithms by nonexperts for the diag-

nosis of melanocytic lesions. Arch Dermatol. 2005;141(8):1008-

1014.

66. Dreiseitl S, Binder M, Hable K, Kittler H. Computer versus human 

diagnosis of melanoma: evaluation of the feasibility of an auto-

mated diagnostic system in a prospective clinical trial. Melanoma 

Res. 2009;19(3):180-184.

67. Dummer W, Doehnel KA, Remy W. Videomicroscopy in differen-

tial diagnosis of skin tumors and secondary prevention of malig-

nant melanoma [in German]. Hautarzt. 1993;44(12):772-776.

68. Feldmann R, Fellenz C, Gschnait F. The ABCD rule in dermatos-

copy: analysis of 500 melanocytic lesions [in German]. Hautarzt. 

1998;49(6):473-476.

69. Fueyo-Casado A, Vázquez-Lopez F, Sanchez-Martin J, Gar-

cia-Garcia B, Pérez-Oliva N. Evaluation of a program for the 

automatic dermoscopic diagnosis of melanoma in a general 

dermatology setting. Dermatol Surg. 2009;35(2):257-262.

70. Gereli MC, Onsun N, Atilganoglu U, Demirkesen C. Compari-

son of two dermoscopic techniques in the diagnosis of clinically 

atypical pigmented skin lesions and melanoma: seven-point and 

three-point checklists. Int J Dermatol. 2010;49(1):33-38.

71. Glud M, Gniadecki R, Drzewiecki KT. Spectrophotometric 

intracutaneous analysis versus dermoscopy for the diagnosis 

of pigmented skin lesions: prospective, double-blind study in a 

secondary reference centre. Melanoma Res. 2009;19(3):176-179.

72. Haenssle HA, Korpas B, Hansen-Hagge C, et al. Seven-point 

checklist for dermatoscopy: performance during 10 years of 

prospective surveillance of patients at increased melanoma risk. J 

Am Acad Dermatol. 2010;62(5):785-793.

73. Har-Shai Y, Glickman YA, Siller G, et al. Electrical impedance 

scanning for melanoma diagnosis: a validation study. Plast Re-

constr Surg. 2005;116(3):782-790.

74. Henning JS, Stein JA, Yeung J, et al. CASH algorithm for dermos-

copy revisited. Arch Dermatol. 2008;144(4):554-555.

75. Keefe M, Dick DC, Waleel RA. A study of the value of the sev-

en-point checklist in distinguishing benign pigmented lesions from 

melanoma. Clin Exp Dermatol. 1990;15(3):167-171.

76. Krähn G, Gottlöber P, Sander C, Peter RU. Dermatoscopy and 

high frequency sonography: two useful non-invasive methods 

to increase preoperative diagnostic accuracy in pigmented skin 

lesions. Pigment Cell Res. 1998;11(3):151-154.

uation of diagnostic tests. Arch Dermatol. 2001;137(10):1343-

1350.

42. Harbord RM, Whiting P. metandi: meta-analysis of diagnostic 

accuracy using hierarchical logistic regression. Stata Journal. 

2009;9(2):211-229.

43. Dwamena B. Stata module for meta-analytical integration of 

diagnostic test accuracy studies. Statistical Software Components 

S456880, Department of Economics, Boston College. https://

ideas.repec.org/c/boc/bocode/s456880.html.

44. Lee JB, Hirokawa D. Dermatoscopy: facts and controversies. Clin 

Dermatol. 2010;28(3):303-310.

45. Dinnes J, Deeks JJ, Chuchu N, et al. Dermoscopy, with and 

without visual inspection, for diagnosing melanoma in adults. 

Cochrane Database Syst Rev. 2018;12:CD011901.

46. Begg C, Greenes R. Assessment of diagnostic tests when dis-

ease verification is subject to selection bias. Biometrics. 

1983;39(1):207-215.

47. Annessi G, Bono R, Sampogna F, Faraggiana T, Abeni D. Sensi-

tivity, specificity, and diagnostic accuracy of three dermoscopic 

algorithmic methods in the diagnosis of doubtful melanocytic 

lesions: the importance of light brown structureless areas in 

differentiating atypical melanocytic nevi from thin melanomas. J 

Am Acad Dermatol. 2007;56(5):759-767.

48. Argenziano G, Fabbrocini G, Carli P, De Giorgi V, Sammarco 

E, Delfino M. Epiluminescence microscopy for the diagnosis of 

doubtful melanocytic skin lesions: comparison of the ABCD rule 

of dermatoscopy and a new 7-point checklist based on pattern 

analysis. Arch Dermatol. 1998;134(12):1563-1570.

49. Argenziano G, Puig S, Zalaudek I, et al. Dermoscopy improves 

accuracy of primary care physicians to triage lesions suggestive 

of skin cancer. J Clin Oncol. 2006;24(12):1877-1882.

50. Argenziano G, Longo C, Cameron A, et al. Blue-black rule: a sim-

ple dermoscopic clue to recognize pigmented nodular melanoma. 

Br J Dermatol. 2011;165(6):1251-1255.

51. Ascierto PA, Palla M, Ayala F, et al. The role of spectrophotometry 

in the diagnosis of melanoma. BMC Dermatol. 2010;10:5.

52. Barzegari M, Ghaninezhad H, Mansoori P, Taheri A, Naraghi ZS, 

Asgari M. Computer-aided dermoscopy for diagnosis of melano-

ma. BMC Dermatol. 2005;5:8.

53. Benelli C, Roscetti E, Dal Pozzo V, Gasparini G, Cavicchini S. 

The dermoscopic versus the clinical diagnosis of melanoma. Eur 

J Dermatol. 1999;9(6):470-476.

54. Benelli C, Roscetti E, Pozzo VD. The dermoscopic (7FFM) versus 

the clinical (ABCDE) diagnosis of small diameter melanoma. Eur 

J Dermatol, 2000;10(4):282-287.

55. Binder M, Schwarz M, Winkler A, et al. Epiluminescence mi-

croscopy: a useful tool for the diagnosis of pigmented skin 

lesions for formally trained dermatologists. Arch Dermatol. 

1995;131(3):286-291.

56. Binder M, Puespoeck-Schwarz M, Steiner A, et al. Epilumines-

cence microscopy of small pigmented skin lesions: short-term 

formal training improves the diagnostic performance of derma-

tologists. J Am Acad Dermatol. 1997;36:197-202.

57. Blum A, Clemens J, Argenziano G. Three-colour test in dermos-

copy: a re-evaluation. Br J Dermatol. 2004;150(5):1040.

58. Bono A, Bartoli C, Cascinelli N, et al. Melanoma detection. Der-

matology. 2002;205(4):362-366.

59. Bono A, Tolomio E, Trincone S, et al. Micro-melanoma detection: 

a clinical study on 206 consecutive cases of pigmented skin lesions 

with a diameter ≤3 mm. Br J Dermatol. 2006;155(3):570-573.

https://ideas.repec.org/c/boc/bocode/s456880.html
https://ideas.repec.org/c/boc/bocode/s456880.html


Research  |  Dermatol Pract Concept 2020;10(1):e2020011 13

provide added benefit for the dermatologist? A study comparing 

the results of three systems. Br J Dermatol. 2007;157(5):926-933.

89. Piccolo D, Crisman G, Schoinas S, Altamura D, Peris K. Comput-

er-automated ABCD versus dermatologists with different degrees 

of experience in dermoscopy. Eur J Dermatol. 2014;24(4):477-

481.

90. Rao BK, Marghoob AA, Stolz W, et al. Can early malignant mela-

noma be differentiated from atypical melanocytic nevi by in vivo 

techniques? Skin Res Technol. 1997;3(1):8-14.

91. Skvara H, Teban L, Fiebiger M, Binder M, Kittler H. Limitations 

of dermoscopy in the recognition of melanoma. Arch Dermatol. 

2005;141(2):155-160.

92. Soyer HP, Smolle J, Leitinger G, Rieger E, Kerl H. Diagnostic reli-

ability of dermoscopic criteria for detecting malignant melanoma. 

Dermatology. 1995;190(1):25-30.

93. Soyer HP, Argenziano G, Zalaudek I, et al. Three-point checklist 

of dermoscopy: a new screening method for early detection of 

melanoma. Dermatology. 2004;208(1):27-31.

94. Stanganelli I, Serafini M, Bucch L. A cancer-registry-assisted 

evaluation of the accuracy of digital epiluminescence microscopy 

associated with clinical examination of pigmented skin lesions. 

Dermatology. 2000;200(1):11-16.

95. Unlu E, Akay BN, Erdem C. Comparison of dermatoscopic diag-

nostic algorithms based on calculation: the ABCD rule of derma-

toscopy, the seven-point checklist, the three-point checklist and 

the CASH algorithm in dermatoscopic evaluation of melanocytic 

lesions. J Dermatol. 2014;41(7):598-603.

96. Walter FM, Prevost AT, Vasconcelos J, et al. Using the 7-point 

checklist as a diagnostic aid for pigmented skin lesions in gen-

eral practice: a diagnostic validation study. Br J Gen Pract. 

2013;63(610):e345-e353.

97. Westerhoff K, McCarthy WH, Menzies SW. Increase in the sensi-

tivity for melanoma diagnosis by primary care physicians using 

skin surface microscopy. Br J Dermatol. 2000;143(5):1016-1020.

98. Zalaudek I, Argenziano G, Soyer HP, et al. Three-point check-

list of dermoscopy: an open internet study. Br J Dermatol. 

2006;154(3):431-437.

99. Youl PH, Baade PD, Janda M, Del Mar CB, Whiteman DC, Aitken 

JF. Diagnosing skin cancer in primary care: how do mainstream 

general practitioners compare with primary care skin cancer clinic 

doctors? Med J Aust. 2007;187(4):215-220.

77. Kreusch J, Rassner G, Trahn C, Pietsch-Breitfeld B, Henke D, 

Selbmann HK. Epiluminescent microscopy: a score of morpholog-

ical features to identify malignant melanoma. Pigment Cell Res. 

1992;Suppl 2:295-298.

78. Lorentzen H, Weismann K, Petersen CS, Larsen FG, Secher L, 

Skødt V. Clinical and dermatoscopic diagnosis of malignant mel-

anoma: assessed by expert and non-expert groups. Acta Derm 

Venereol. 1999;79(4):301-304.

79. Lorentzen H, Weismann K, Kenet RO, Secher L, Larsen FG. 

Comparison of dermatoscopic ABCD rule and risk stratification 

in the diagnosis of malignant melanoma. Acta Derm Venereol. 

2000;80(2):122-126.

80. Luttrell MJ, McClenahan P, Hofmann-Wellenhof R, Fink-Puches 

R, Soyer HP. Laypersons’ sensitivity for melanoma identification 

is higher with dermoscopy images than clinical photographs. Br 

J Dermatol. 2012;167(5):1037-1041.

81. Mackie RM, Fleming C, McMahon AD, Jarrett P. The use of the 

dermatoscope to identify early melanoma using the three-colour 

test. Br J Dermatol.2002;146(3):481-484.

82. McGovern TWM, Litaker MSM. Clinical predictors of malignant 

pigmented lesions: a comparison of the Glasgow seven-point 

checklist and the American Cancer Society’s ABCDs of pigmented 

lesions. J Dermatol Surg Oncol. 1992;18(1):22-26.

83. Menzies SW, Ingvar C, Crotty KA, McCarthy WH. Frequen-

cy and morphologic characteristics of invasive melanomas 

lacking specific surface microscopic features. Arch Dermatol. 

1996;132(10):1178-1182.

84. Menzies SW, Kreusch J, Byth K, et al. Dermoscopic evaluation 

of amelanotic and hypomelanotic melanoma. Arch Dermatol. 

2008;144(9):1120-1127.

85. Menzies SW, Moloney FJ, Byth K, et al. Dermoscopic evaluation 

of nodular melanoma. JAMA Dermatol. 2013;149(6):699-709.

86. Nachbar F, Stolz W, Merkle T, et al. The ABCD rule of dermatos-

copy: high prospective value in the diagnosis of doubtful mela-

nocytic skin lesions. J Am Acad Dermatol. 1994;30(4):551-559.

87. Nilles M, Boedeker RH, Schill WB. S. Surface microscopy of 

naevi and melanomas—clues to melanoma. Br J Dermatol. 

1994;130(3):349-355.

88. Perrinaud A, Gaide O, French LE, Saurat J-H, Marghoob AA, 

Braun RP. Can automated dermoscopy image analysis instruments