Subjects
Abstract
Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Deep learning-aided decision support for diagnosis of skin disease across skin tones
Efficient diagnosis of psoriasis and lichen planus cutaneous diseases using deep learning approach
Data availability
The de-identified teledermatology data used in this study are not publicly available due to restrictions in the data-sharing agreement.
Code availability
The deep learning framework (TensorFlow) used in this study is available at https://www.tensorflow.org/. The training framework (Estimator) is available at https://www.tensorflow.org/guide/estimators. The deep learning architecture (Inception-v4) is available at https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v4.py.
References
Hay, R. J. et al. The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions. J. Invest. Dermatol. 134, 1527β1534 (2014).
Lowell, B. A., Froelich, C. W., Federman, D. G. & Kirsner, R. S. Dermatology in primary care: prevalence and patient disposition. J. Am. Acad. Dermatol. 45, 250β255 (2001).
Awadalla, F., Rosenbaum, D. A., Camacho, F., Fleischer, A. B. Jr & Feldman, S. R. Dermatologic disease in family medicine. Fam. Med. 40, 507β511 (2008).
Feng, H., Berk-Krauss, J., Feng, P. W. & Stein, J. A. Comparison of dermatologist density between urban and rural counties in the United States. JAMA Dermatol. 154, 1265β1271 (2018).
Resneck, J. & Kimball, A. B. The dermatology workforce shortage. J. Am. Acad. Dermatol. 50, 50β54 (2004).
Johnson, M. L. On teaching dermatology to nondermatologists. Arch. Dermatol. 130, 850β852 (1994).
Ramsay, D. L. & Weary, P. E. Primary care in dermatology: whose role should it be? J. Am. Acad. Dermatol. 35, 1005β1008 (1996).
The Distribution of the US Primary Care Workforce (Agency for Healthcare Research & Quality, 2012); https://www.ahrq.gov/research/findings/factsheets/primary/pcwork3/index.html
Seth, D., Cheldize, K., Brown, D. & Freeman, E. F. Global burden of skin disease: inequities and innovations. Curr. Dermatol. Rep. 6, 204β210 (2017).
Federman, D. G., Concato, J. & Kirsner, R. S. Comparison of dermatologic diagnoses by primary care practitioners and dermatologists. A review of the literature. Arch. Fam. Med. 8, 170β172 (1999).
Moreno, G., Tran, H., Chia, A. L. K., Lim, A. & Shumack, S. Prospective study to assess general practitionersβ dermatological diagnostic skills in a referral setting. Australas. J. Dermatol. 48, 77β82 (2007).
Tran, H., Chen, K., Lim, A. C., Jabbour, J. & Shumack, S. Assessing diagnostic skill in dermatology: a comparison between general practitioners and dermatologists. Australas. J. Dermatol. 46, 230β234 (2005).
Federman, D. G. & Kirsner, R. S. The abilities of primary care physicians in dermatology: implications for quality of care. Am. J. Manag. Care 3, 1487β1492 (1997).
UpToDate https://www.uptodate.com/home
Cutrone, M. & Grimalt, R. Dermatological image search engines on the Internet: do they work? J. Eur. Acad. Dermatol. Venereol. 21, 175β177 (2007).
Yim, K. M., Florek, A. G., Oh, D. H., McKoy, K. & Armstrong, A. W. Teledermatology in the United States: an update in a dynamic era. Telemed. e-Health 24, 691β697 (2018).
Whited, J. D. et al. Clinical course outcomes for store and forward teledermatology versus conventional consultation: a randomized trial. J. Telemed. Telecare 19, 197β204 (2013).
Mounessa, J. S. et al. A systematic review of satisfaction with teledermatology. J. Telemed. Telecare 24, 263β270 (2018).
Cruz-Roa, A. A., Arevalo Ovalle, J. E., Madabhushi, A. & GonzΓ‘lez Osorio, F. A. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med. Image Comput. Comput. Assist. Inter. 16, 403β410 (2013).
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (IEEE, 2018); https://doi.org/10.1109/isbi.2018.8363547
Yuan, Y., Chao, M. & Lo, Y.-C. Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance. IEEE Trans. Med. Imaging 36, 1876β1886 (2017).
Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836β1842 (2018).
Brinker, T. J. et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 113, 47β54 (2019).
Maron, R. C. et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur. J. Cancer 119, 57β65 (2019).
Okuboyejo, D. A., Olugbara, O. O. & Odunaike, S. A. Automating skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850β854 (International Association of Engineers, 2013).
Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938β947 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115β118 (2017).
Han, S. S. et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS ONE 13, e0191493 (2018).
Sun, X., Yang, J., Sun, M. & Wang, K. A benchmark for automatic visual classification of clinical skin disease images. Proceedings of the European Conference on Computer Vision (ECCV) 2016 206β222 (Springer, 2016); https://doi.org/10.1007/978-3-319-46466-4_13
Boer, A. & Nischal, K.C. www.derm101.com: a growing online resource for learning dermatology and dermatopathology. Indian J. Dermatol. Venereol. Leprol. 73, 138β140 (2007).
Wilmer, E. N. et al. Most common dermatologic conditions encountered by dermatologists and nondermatologists. Cutis 94, 285β292 (2014).
Yang, J., Sun, X., Liang, J. & Rosin, P. L. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018); https://doi.org/10.1109/cvpr.2018.00137
Okuboyejo, D. A. Towards automation of skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850β854 (International Association of Engineers, 2013).
Mishra, S., Imaizumi, H. & Yamasaki, T. Interpreting fine-grained dermatological classification by deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2019).
Guyatt, G. Usersβ Guides to the Medical Literature: Essentials of Evidence-Based Clinical Practice 3rd edn (McGraw-Hill Education/Medical, 2015).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br. J. Surg. 102, 148β158 (2015).
Webber, W., Moffat, A. & Zobel, J. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 1β38 (2010).
Krauss, J. C., Boonstra, P. S., Vantsevich, A. V. & Friedman, C. P. Is the problem list in the eye of the beholder? An exploration of consistency across physicians. J. Am. Med. Inform. Assoc. 23, 859β865 (2016).
Eng, C., Liu, Y. & Bhatnagar, R. Measuring clinicianβmachine agreement in differential diagnoses for dermatology. Br. J. Dermatol. https://doi.org/10.1111/bjd.18609 (2019).
Sundararajan, M., Taly, A., & Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning vol. 70, 3319β3328 (2017).
Karimkhani, C. et al. Global skin disease morbidity and mortality: an update from the global burden of disease study 2013. JAMA Dermatol. 153, 406β412 (2017).
Stern, R. S. & Nelson, C. The diminishing role of the dermatologist in the office-based care of cutaneous diseases. J. Am. Acad. Dermatol. 29, 773β777 (1993).
Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results (Institute for Health Metrics and Evaluation (IHME), 2018); http://ghdx.healthdata.org/gbd-results-tool
Romano, C., Maritati, E. & Gianni, C. Tinea incognito in Italy: a 15-year survey. Mycoses 49, 383β387 (2006).
Prabhu, V. et al. Prototypical clustering networks for dermatological disease diagnosis. In Proceedings of the 4th Conference on Machine Learning for Health Care (MLHC, 2019).
He, S. Y. et al. Self-reported pigmentary phenotypes and race are significant but incomplete predictors of Fitzpatrick skin phototype in an ethnically diverse population. J. Am. Acad. Dermatol. 71, 731β737 (2014).
Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2, e190096 (2019).
SNOMED home page. SNOMED http://www.snomed.org/
Simpson, C. R., Anandan, C., Fischbacher, C., Lefevre, K. & Sheikh, A. Will systematized nomenclature of medicine-clinical terms improve our understanding of the disease burden posed by allergic disorders? Clin. Exp. Allergy 37, 1586β1593 (2007).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence 4278β4284 (AAAI, 2017).
Snoek, C. G. M., Worring, M. & Smeulders, A. W. M. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia 399β402 (ACM, 2005); https://doi.org/10.1145/1101149.1101236
Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 1223β1231 (NIPS, 2012).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/pdf/1502.03167.pdf (2015).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211β252 (2015).
Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169β198 (1999).
Permutation feature importance. Azure Machine Learning Studio https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance.
Chihara, L. M. & Hesterberg, T. C. Mathematical Statistics with Resampling and R (Wiley, 2018).
Hahn, S. Understanding noninferiority trials. Korean J. Pediatr. 55, 403β407 (2012).
Acknowledgements
We thank W. Chen, J. Yoshimi, X. Ji and Q. Duong for software infrastructure support for data collection. Thanks also go to G. Foti, K. Su, T. Saensuksopa, D. Wang, Y. Gao and L. Tran. We also appreciate the input of C. Chen, M. Howell and A. Paller for their feedback on the manuscript. Last, but not least, this work would not have been possible without the participation of the dermatologists, primary care physicians and nurse practitioners who reviewed cases for this study, and S. Bis who helped to establish the skin condition mapping.
Ethics declarations
Competing interests
K.K. and S.J.H. were consultants of Google LLC. R.H.-W. is an employee of the Medical University of Graz. G.d.O.M. is an employee of Adecco Staffing supporting Google LLC. This study was funded by Google LLC. The remaining authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. Yuan Liu, A.J., C.E., D.H.W., K.L., P.B., J.G., V.G., D.A., Yun Liu, R.C.D. and D.C. are inventors on a filed patent related to this work. The authors declare no other competing interests.
Additional information
Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisherβs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance of the deep learning system (DLS) and clinicians, broken down for each of the 26 categories of skin conditions and βotherβ.
a, Top-1 and top-3 sensitivity of the DLS on validation set A (n=3,756). b, Top-1 and top-3 sensitivity of the DLS and three types of clinicians: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set B (n=963). Numbers in parentheses in the x-axes indicate the number of cases. Detailed breakdown of each clinician and the DLS performance on the subset of cases graded by each clinician are in Supplementary Table 8. Error bars indicate 95% CI (see Statistical Analysis).
Extended Data Fig. 2 Performance of the deep learning system (DLS) and the clinicians on the 419-way classification: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set A (n=3,756) and validation set B (n=963).
a, Top-1 and top-3 accuracy for the DLS and clinicians across all cases and 419 categories of skin conditions. b, Average overlap (to assess the full differential diagnosis) of the DLS and clinicians. Error bars indicate 95% confidence intervals (see Statistical Analysis).
Supplementary information
Supplementary Information (download PDF )
Supplementary Methods, Figs. 1β10 and Tables 1β13.
Rights and permissions
About this article
Cite this article
Liu, Y., Jain, A., Eng, C. et al. A deep learning system for differential diagnosis of skin diseases. Nat Med 26, 900β908 (2020). https://doi.org/10.1038/s41591-020-0842-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41591-020-0842-3
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
